Skip to content

Cluster Maintenance Updates and News

Office Hours

We currently offer drop-in office hours every Wednesday and Friday from 2:00–4:00 p.m. Stop by to meet with our student consultants and ask any questions you have about using HPC resources. Whether you’re just getting started or need help with a specific issue, feel free to bring your laptop to walk us through any problems you're facing. There's no need to create a ticket in advance; if follow-up is needed, the student consultants will open a ticket on your behalf, and you'll receive further instructions.

Consulting Hours

  • Date: Every Wednesday and Friday
  • Location: GITC 2404
  • Time: 2:00 PM - 4:00 PM

HPC Fall Events

ARCS HPC invites you to our upcoming events. Please register for the events you plan to attend.

SLURM Batch System Basics

Save the Date

  • Date: Sep 18th, 2024
  • Location: Virtual
  • Time: 2:30 PM - 3:30 PM

Join us for an informative webinar designed to introduce researchers, scientists, and HPC users to the fundamentals of the SLURM (Simple Linux Utility for Resource Management) workload manager. This virtual session will equip you with essential skills to effectively utilize HPC resources through SLURM.

Registration

Introduction to Containers on Wulver

Save the Date

  • Date: Oct 16th, 2024
  • Location: Virtual
  • Time: 2:30 PM - 3:30 PM

The HPC training event on using Singularity containers provides participants with a comprehensive introduction to container technology and its advantages in high-performance computing environments. Attendees will learn the fundamentals of Singularity, including installation, basic commands, and workflow, as well as how to create and build containers using definition files and existing Docker images.

Registration

Job Arrays and Advanced Submission Techniques for HPC

Save the Date

  • Date: Nov 20th, 2024
  • Location: Virtual
  • Time: 2:30 PM - 3:30 PM

Elevate your High-Performance Computing skills with our advanced SLURM webinar! This session is designed for HPC users who are familiar with basic SLURM commands and are ready to dive into more sophisticated job management techniques.

Registration

HPC Summer Events

ARCS HPC invites you to our upcoming events. Please register for the events you plan to attend.

NVIDIA Workshop — Fundamentals of Accelerated Data Science

Save the Date

  • Date: July 15, 2024
  • Location: GITC 3700
  • Time: 9 a.m. - 5 p.m.

Learn to use GPU-accelerated resources to analyze data. This is an intermediate level workshop that is intended for those who have some familiarity with Python, especially NumPy and SciPy libraries. See more detail about the workshop here.

Registration

HPC Research Symposium

Save the Date

  • Date: July 16, 2024
  • Location: Student Center Atrium
  • Time: 9 a.m. - 5 p.m.

This past year has been transformative for HPC Research at NJIT. The introduction of our new shared HPC cluster, Wulver, has expanded our computational capacity and made the research into vital areas more accessible to our faculty. Please join us to highlight the work of researchers using the HPC resources and connect with the NJIT HPC community.

Please register for the symposium here, you can also sign up to present your HPC research as a lightning talk or poster presentation:

SLURM Workload Manager Workshop

Save the Date

  • Date: August 13 & 14, 2024
  • Location: GITC 3700
  • Time: 9 a.m. - 5 p.m.

This immersive 2-day experience will take you through comprehensive technical scenarios with lectures, demos, and workshop lab environments. The Slurm trainer will assist in identifying commonalities between previously used resources and schedulers, offering increased understanding and adoption of SLURM job scheduling, resource management, and troubleshooting techniques.

Registration is now closed.

Wulver Maintenance

Wulver Monthly Maintenance

Beginning Feb 1, 2024, ARCS HPC will be instituting a monthly maintenance downtime on all HPC systems on the second Tuesday from 9AM - 9PM. Wulver and the associated GPFS storage will be taken out of service for maintenance, repairs, patches and upgrades. During the maintenance downtime, logins will be disabled, users will not have access to their stored data in /project, /home and /scratch. All jobs that do not end before 9AM will be held by the scheduler until the downtime is complete and the systems are returned to service.

We anticipate maintenance to be completed by the scheduled time. However, occasionally the maintenance may be completed earlier than scheduled or could be extended to the following days. A notification will be sent to the user mailing list when the systems are returned to service or the maintenance window is extended. Additionally, users will encounter the cluster service information upon logging in to Wulver during maintenance. Please pay attention to the Message of the Day when logging in, as it will serve as a reminder for upcoming downtimes or other crucial cluster-related information. Users should take into account the maintenance window when scheduling jobs and developing plans to meet various deadlines. Please do not contact the help desk, HPC staff or open SNOW tickets for access to the cluster or data during the maintenance downtime.

Lochness Maintenance Updates

Lochness is Back Online!

Lochness is mostly back up after being moved to a new facility. The move required complete disassembly and reassembly of the entire cluster. There are 8 nodes down as they were damaged in the move, repairs forthcoming. Infiniband network issues affect 50 nodes, these are in "drain" state. Currently 120 nodes are fully functional. You can use sinfo to see the exact states of nodes accessible to you. Please email hpc@njit.edu for assistance.

Wulver Maintenance

GPFS Fileset Changes

Wulver will be out of service Wed Oct 18th between 9:00 am-11:00 am for updates and configuration changes. The maintence will be conducted to fix the stale file handle error on /scratch while accessing files from login node.

Maintencance Plans

Recommendation:

  • Each fileset gets it’s own inode namespace
  • Fileset names to automatically inherit pool policies
  • Additional fileset settings for chmod to not conflict with ACLs

Migration Plan:

  • Create new filesets and link under /mmfs1/Scratch and /mmfs1/Project
    • New fileset names with sata1-project_xx and nvme1-scratch_xx (no bearing on FS path)
    • New fileset have own inode spaces
  • Rsync data from old to new location
  • Job outage for final copy and change
  • Final rsyncs
    • Remove symlink for /mmfs1/scratch and create /mmfs1/scratch
    • Unlink/relink filesets in new location
    • Resolve any links remaining on nodes/images

Relocation of Lochness to Databank Datacenter

Dear Lochness Users,

We hope this message finds you well. We want to inform you of an upcoming significant event regarding our HPC (High-Performance Computing) cluster, lochness.njit.edu. The GITC datacenter is scheduled for demolition on November 1, 2023. In order to maintain the operation of our computing infrastructure, we will be relocating the cluster to the Databank colocation facility in Piscataway.

Key Details:

  • Cluster Shutdown Date: October 6th, 2023 at Noon
  • Anticipated Duration: Up to Seven Days
  • Operational Continuity: After the move the cluster will remain operational until the end of the semester.
  • User Migration: We are actively working on migrating all users to the new cluster, wulver.njit.edu, before the end of the semester.

Cluster Relocation Details:

The scheduled shutdown of the lochness.njit.edu cluster will take place on October 6th, 2023. We have estimated that the relocation process will require no longer than five days to complete. During this time, the cluster will not be accessible. We understand the importance of uninterrupted access to computational resources, and we will make every effort to minimize downtime.

Operational Continuity:

Rest assured that we are committed to maintaining cluster availability for your research and academic needs. The lochness.njit.edu cluster will remain operational until the end of the current semester. This means that you will have access to its computing power throughout your ongoing projects and coursework.

User Migration:

Our team is actively working on the migration process to ensure a smooth transition for all users. We plan to migrate all users to the new cluster, wulver.njit.edu, well before the end of the semester. Detailed instructions and support will be provided to facilitate this transition, and we will keep you updated on the migration progress.

We understand that this relocation may raise questions or concerns, and we are here to address them. Please feel free to reach out to hpc@njit.edu you have any specific inquiries or require further information.

We appreciate your understanding and cooperation during this transitional period. The relocation of our HPC cluster is aimed at providing you with an improved and more reliable computing environment.

Thank you for your ongoing support and contributions to our research community.