Announcement: Wynton Reboots on 2024-09-12

announcement
maintenance
Author
Affiliation

Erik Ellestad

UCSF Wynton HPC Team

Published

September 11, 2024

TLDR

On Thursday, September 12 at 12:30 PT we will be rebooting all Wynton interactive nodes to upgrade the kernel and implement RAM and CPU limits on individual user sessions. To avoid data loss, please logout and save your work before the scheduled reboot.

The Rest of the Story™

To increase the stability of the Wynton dev, login, and data transfer nodes we are implementing CPU and memory limitations for user sessions on those nodes.

On dev and data transfer nodes, users will be limited to the equivalent of 2 CPUs and user sessions or processes which take more than 96 GiB of the system’s available memory will be killed.

On login nodes, users will be limited to the equivalent of 1 CPU and user sessions or processes which take more than 32 GiB of the system’s available memory will be killed.

By introducing these CPU and memory limits, we hope to avoid most of the past traffic jams that have occurred on the interactive nodes. In the past, there was always the risk that a single user process could bring these machines to a standstill and they could become non-responsive, or even cause them to crash. These new limits lower the risk for individual processes overconsuming the CPU and memory, resulting in, what we hope, a smoother ride for everyone using these shared machines.

To implement these changes and push an update to the Linux kernel, on Thursday, September 12 we will be rebooting all interactive nodes: log1, log2, plog1, dt1, dt2, pdt1, pdt2, dev1, dev2, dev3, pdev1, gpudev1, and pgpudev1.

Before the scheduled reboot, to prevent data loss, please save your work and logout of any interactive sessions.

From the Wynton website:

Although you should always run analyses via the job scheduler, there are times when you may need to develop parts of it interactively at the command-line prompt. For instance, you may need to install some software, a few R packages, or run some quick tests on your new pipeline. Wynton HPC provides development nodes dedicated for such short-term usages and that are configured similarly to the compute nodes. The dev nodes are meant for light interactive work and quick test jobs. Please submit all heavy processing to the cluster via qsub.

Addendum

  • 2024-09-12: Jobs on scheduler, queued or running, are not affected by this reboot of interactive nodes.