Completed: Quarterly Cluster Maintenance Wednesday January 3
Submitted by nlc60 on Wed, 01/03/2024 - 21:05
Dear CRC User Community,
We have completed our quarterly maintenance and have returned the clusters to production.
Some notable changes to the CRC clusters are the following:
- Enabled cgroup-based resource management for SLURM jobs on all clusters
- SLURM updates to 22.05.11
- Upgrades on ix and ix1
- GPU cluster a100_multi partition usage policy implementation:
- Jobs must request at least 2 nodes and no more than 8 nodes
- Jobs submitted to this partition can no longer undersubscribe the nodes they request. Attempting this will yield the following message on submission:
ERROR: Your job is not requesting the full number of GPUs on the a100_multi partition node
Thank you for your patience during this downtime and as always, please log in and submit a help ticket if you encounter any post-maintenance problems.
Sincerely,
The CRC Team