Dense Memory Cluster (DMC)
The DMC at the Alabama Supercomputer Center has 3740 CPU cores and 26.1 terabytes of distributed memory. Each compute node has a local disk (up to 7.5 terabytes of which are accessible as /tmp). Also attached to the DMC is a high performance Spectrum Scale storage cluster, which has 93 terabytes of high performance storage accessible as /scratch from each node. Home directories as well as third party applications use a separate BeeGFS volume and share 750 terabytes of storage.
The machine is physically configured as a cluster of 20, 24, 36, 128, or 192 CPU core SMP boards. Thirty-eight nodes have 2.5 GHz Intel 10-core Xeon Ivy Bridge processors and 128 gigabytes of memory. Twelve nodes have 2.1 GHz 18-core Broadwell processors and 128 GB of memory. One node has 2.1 GHz Skylake-SP processors and 6 TB of memory. Twenty-two nodes have 2.7 GHz 18-core Skylake-SP processors and 96 GB of memory. Thirteen nodes have 2.0 GHz 64-core Milan processors and 1 TB of memory. One node has a 2.3 GHz Intel 12-core Haswell processors and 128 gigabytes of memory. One node has 12-core 2.2 GHz Broadwell processors and 128 gigabytes of memory. The three login nodes are an 16-core virtual machines emulating Ivy Bridge, but running on Haswell hardware.
The DMC has 14 NVIDIA GPU (Graphic Processing Unit) chips. These are a combination of: one node with two Tesla P100 cards with 16 GB of memory, one node with four Volta V100 cards with 32 GB of memory, and two nodes with four Ampere A100 cards with 40 GB of memory. These multicore GPU chips are similar to those in video cards, but are installed as math coprocessors. This can give significant performance advantages for software that has been adapted to use these processors.
Thus the processing capacity of the DMC cluster is:
Conventional processing theoretical peak capacity - 237 TFLOPs
Single precision theoretical peak GPU capacity - 233 TFLOPs
Double precision theoretical peak GPU capacity - 117 TFLOPs
Total DMC capacity - 470 TFLOPs
The DMC has a 10 gbps connection to the internet via a firewall. Within the cluster, message passing and access to shared file system servers send traffic over an FDR InfiniBand network.
Home directory storage, applications, and shared data is on a parallel, shared, file system (currently BeeGFS). The home file system currently has 750 TB of usable disk space. A high performance, shared, scratch file system has a 92 TB capacity (currently Spectrum Scale).