Supercomputing servers are the representative works of supercomputers in the field of computing. The design concept of supercomputing servers is to solve large-scale and complex problems that traditional servers cannot handle through ultimate performance. Supercomputing servers play a key role in cutting-edge scientific research and also drive innovation in industries such as manufacturing, healthcare, and meteorology. What are the technical features and modern social impacts of supercomputing servers?
Ultimate performance and heterogeneous computing architecture
The primary feature of a supercomputing server is that its computing power far exceeds that of an ordinary server. Take the 2023 TOP500 list as an example. The Frontier supercomputer, which ranks first, has a peak performance of 1.194 Exaflops (exaflops per second), equivalent to the combined computing power of 1.5 million high-performance laptops. The realization of this performance depends on heterogeneous computing architectures:
Multi-core CPU and accelerator collaboration: The combination of AMD EPYC processor and Instinct MI250X GPU is adopted. The CPU is responsible for logical control and task scheduling, while the GPU performs high-concurrency computing tasks
Customized chip design: For instance, the Sunway 26010 many-core processor used by Sunway Taihulight in China integrates 260 computing cores on a single chip, achieving a high performance-to-power ratio.
Three-dimensional stacking technology: By vertically stacking chips, the data transmission distance is shortened. Fugaku of Japan adopts TSMC's 7nm process and 3D packaging, increasing the memory bandwidth to 1TB/s.
High-speed interconnection and low-latency networks
The collaborative operation of tens of thousands of computing nodes within a supercomputer relies on advanced interconnection technologies. Cray's Slingshot network architecture achieves an inter-node transmission rate of 200Gb per second in Frontier and adopts an adaptive routing algorithm to reduce congestion. Optical interconnection technology is gradually becoming popular. For instance, the European LUMI supercomputer uses a silicon photonics engine to integrate optical modules into the processor package, reducing latency to the nanosecond level. This network design enables the parallel computing efficiency of tens of thousands of nodes to exceed 90%, while that of traditional data centers is usually less than 70%.
Multi-level storage and data throughput optimization
To meet the requirements of EB-level data processing, supercomputers adopt a hierarchical storage system. Each computing node is configured with NVMe SSD for local storage at the node to provide microsecond-level cache access. Parallel file systems such as Lustre or GPFS virtualize tens of thousands of hard disks into a single namespace, achieving aggregated bandwidth of several terabytes per second. Intel Optane persistent Memory, a non-volatile memory technology, is applied to the Aurora supercomputer in the United States. It can retain data even after a power outage, reducing the restart recovery time by 80%.
Energy efficiency management and cooling innovation
The power consumption of supercomputers can reach the 20MW level, which is equivalent to the electricity load of medium and small cities. Green computing technology becomes key: The German Jupiter supercomputer adopts immersion phase change cooling, reducing cooling energy consumption by 40% and optimizing the PUE (Energy Efficiency Ratio) to 1.05. The Oak Ridge National Laboratory has developed an AI model to dynamically adjust the CPU frequency, saving 15% of power while ensuring calculation accuracy.
A multi-disciplinary software ecosystem
The value of supercomputers is ultimately released through the software stack. In the parallel programming model, the combination of OpenMP and MPI achieves parallelism at the level of tens of millions of threads, and NVIDIA CUDA accelerates GPU computing. The Singularity container technology on containerized deployment enables scientific research software to be seamlessly migrated across different supercomputing platforms. The integration of quantum classical hybrid computing IBM Qiskit Runtime with supercomputers enables the acceleration of quantum algorithms in material simulation.
In the field of physics, supercomputers simulate the evolution of the universe: The Summit supercomputer in the United States runs HACC (Hardcore Cosmology Code) to reproduce the distribution of dark matter at a scale of 3 trillion particles, verifying the Big Bang theory. In nuclear fusion research, the European MarconiFusion supercomputer conducted full-scale plasma simulation of the ITER tokamak device to guide magnetic confinement optimization.
During the COVID-19 pandemic, supercomputers have demonstrated significant value: protein folding prediction. The Frontera supercomputer in the United States ran AlphaFold2 and completed the 3D structural analysis of the spike protein of the novel coronavirus within a week, accelerating vaccine design. In virtual drug screening, Fugaku of Japan conducted molecular docking simulations on 2,000 existing compounds and identified remdesivir as a potential therapeutic drug within 48 hours.
Supercomputing empowers fluid dynamics simulation in intelligent manufacturing. The Boeing 777X passenger aircraft conducts turbulence simulation with 120 million grids on the German Hawk supercomputer, optimizing the wing design to increase fuel efficiency by 10%. Under the Materials Genome Project, China's Tianhe-3 has screened out a new high-temperature alloy formula, which has increased the temperature resistance of gas turbine blades to 1500°C and extended their service life by three times.
In the future, all countries will elevate the construction of supercomputers to a national strategy. Exascale of the United States plans to invest 6 billion US dollars to deploy three E-class supercomputers. China's 14th Five-Year Plan clearly sets the goal of developing 10 E-class supercomputers. In the next stage, supercomputers will be combined with quantum computing and brain science. Europe plans to launch an exascale quantum-classical hybrid supercomputer by 2030. With the popularization of computing power, cloud supercomputing platforms such as AWS ParallelCluster are lowering the usage threshold. Small and medium-sized enterprises can obtain PetaPFLOPS level computing power by paying tens of thousands of yuan each year.