New GPU Offering / U-M Information and Technology Services

1/17/2024

By Stephanie Dascola

We are pleased to introduce NVIDIA’s Multi-Instance GPU (MIG) technology to Great Lakes, featuring a total of 16 GPUs. This development promises to optimize your computing experience.

HOW MIG WORKS:

NVIDIA’s Multi-Instance GPU (MIG) technology divides our 8 GPUs into 16 multiple isolated instances, each behaving as an independent GPU with dedicated compute resources. This partitioning allows for efficient allocation of GPU resources, enhancing your computing experience.

KEY BENEFITS AND LIMITATIONS:

Efficient Resource Allocation: MIG’s partitioning ensures that your tasks receive dedicated GPU resources, avoiding resource contention and enhancing efficiency.
Enhanced Scalability: Run multiple GPU workloads concurrently without conflicts, simplifying project scaling.
Flexibility: Customize GPU instances to match your application requirements, optimizing performance and cost-effectiveness.
MIG is only intended for single-slice jobs, and a single process cannot run across multiple devices. Slurm will only allow jobs requesting a single GPU.

GETTING STARTED:

To access the nodes equipped with MIG technology, use the Slurm partition called “gpu_mig40” when submitting your job requests. The partition is named with the amount of memory each GPU has available.

Important Notice: Please be aware that each job can only utilize a single GPU. If your jobs needs more that 1 GPU, use the “gpu” or “spgpu” partition, depending on your GPU needs.

EXAMPLE SLURM JOB SUBMISSION:

sbatch –partition=gpu_mig40 –gres=gpu:1 your_job_script.sh

If your job could run on either the “gpu_mig40” or “gpu” partition, you can specify both, and the scheduler will schedule your job on either partition.

sbatch –partition=gpu_mig40,gpu –gres=gpu:1 your_job_script.sh

For questions or support requests, please contact our team at [email protected]