When choosing a Singapore cloud server, besides considering the number of CPU cores and hard drive type, memory performance cannot be ignored, as it is a key hidden cost determining overall performance. Higher memory bandwidth means faster CPU data access, which is crucial for databases, scientific computing, virtualization, and big data processing. However, cloud platforms only provide memory parameters; the actual performance behind it—especially bandwidth and latency—remains a black box. To unravel this mystery, two highly authoritative tools exist for the two major platforms, Linux and Windows: STREAM and SiSoftware Sandra, respectively. They help you quantify memory performance and avoid application bottlenecks caused by memory limitations.
STREAM: The Memory Bandwidth Benchmark for Linux Platforms
STREAM is not a complex integrated software, but a small, classic, and core-oriented C language benchmark program. Its design goal is pure and focused: measuring sustainable memory bandwidth. It has become a staple in industry standards and academic research because its test modes accurately simulate the most typical and bandwidth-intensive memory access patterns in modern computers.
STREAM achieves its goal through four core tests:
1. COPY: `a(i) = b(i)`, copies a large array to another large array, testing bandwidth for simple data movement.
2. SCALE: `a(i) = q * b(i)`, multiplies an array by a constant and writes it to another array, testing bandwidth involving simple arithmetic operations.
3. ADD: `a(i) = b(i) + c(i)`, adds two arrays and writes them to a third array, testing bandwidth involving multiple data sources.
4. TRIAD: `a(i) = b(i) + q * c(i)`, a combination of COPY, SCALE, and ADD, often considered a comprehensive representation of STREAM's performance.
It works by looping these operations across arrays much larger than the CPU's cache levels (L1, L2, L3). Because the data volume is too large to remain entirely in the cache, the CPU must constantly read and write data from main memory (RAM). At this point, the speed bottleneck is no longer the CPU's computing power, but rather the memory subsystem's ability to provide data—that is, the sustainable memory bandwidth we are concerned with.
Using STREAM on a Linux cloud host typically requires compiling from source code, a process that itself serves as a test of the system development environment. The basic steps are as follows:
Download the STREAM source code and compile it using optimized compiler options:
-O3: Highest level of optimization
-march=native: Optimize for the current CPU architecture
-mtune=native: Fine-tune for the current CPU
-fopenmp: Enable OpenMP multi-threading support (for testing multi-core memory bandwidth)
-DSTREAM_ARRAY_SIZE: Define the test array size, which should be at least 4 times the size of the L3 cache.
gcc -O3 -march=native -mtune=native -fopenmp -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream
./stream
3. Run the test
The key to compilation and execution is the macro `-DSTREAM_ARRAY_SIZE`. The array size it defines must be large enough, typically recommended to be 4-8 times the size of the CPU's L3 cache. For example, if your cloud server has 20MB of L3 cache, the array size should be set to at least approximately 100 million double-precision floating-point numbers (8 bytes each, totaling approximately 800MB). If the array is too small, all the data will be in the cache, resulting in an astonishing cache speed measurement rather than actual memory bandwidth, rendering the result meaningless.
After running, STREAM will output the average bandwidth (usually in MB/s) for each test item. You should pay close attention to the TRIAD value. Comparing this value with the theoretical memory bandwidth of your cloud server instance (if provided by the vendor) allows you to assess the efficiency of its memory subsystem. For example, a cloud server with a theoretical bandwidth of 12.8 GB/s, if the STREAM measured a TRIAD value of 11.5 GB/s or higher, generally indicates excellent memory performance; if it is significantly lower, it may indicate high virtualization overhead, improper NUMA (Non-Uniform Memory Access) configuration, or other underlying bottlenecks.
SiSoftware Sandra: A Comprehensive System Diagnostic Tool for Windows
While testing can be performed using a ported version of STREAM in Windows cloud hosting environments, SiSoftware Sandra is a more mainstream and convenient choice. It's a powerful system information, diagnostic, and benchmarking tool. Unlike STREAM's "single focus," Sandra is comprehensive, with memory bandwidth testing being just one of its many functional modules.
The advantage of using Sandra for memory testing lies in its user-friendliness and rich reference system. You don't need to handle compilation and parameter tuning; simply click on the corresponding test item in the graphical interface. It automatically recognizes the system configuration and runs a series of complex read/write, copy, and latency tests. After the test is complete, it not only provides absolute values (such as bandwidth in GB/s and latency in nanoseconds) but also a chart with multiple comparative references.
This intuitive comparison allows you to immediately determine where your cloud hosting's memory performance stands in the market. For example, test results might show that your Windows cloud hosting's memory bandwidth is lower than that of a typical desktop with the same CPU. This could be due to overhead introduced by the cloud platform's virtualization layer or fewer memory channels allocated to virtual machines than the physical CPU actually supports. Beyond bandwidth, Sandra can also test memory latency and caching performance. Latency testing measures the time required for the CPU to access different random addresses in memory, a crucial metric for latency-sensitive applications such as high-frequency trading and real-time game servers. Caching and memory tests display the gradual decrease in bandwidth and increase in latency from L1 cache to main memory, providing a clear understanding of the system's memory hierarchy performance.
In short, memory performance isn't as readily apparent as CPU frequency or hard drive capacity, but its impact on modern computing applications is decisive. Sandra and STREAM, one a precise "yardstick" in Linux and the other a versatile "dashboard" in Windows, allow you to see beyond the specifications of cloud platforms and directly access the true capabilities of the memory subsystem.