HPC System Administrator

Logo of Roush

Roush

πŸ“United States

Job highlights

Summary

Join Roush's team of innovators and work alongside the best and brightest to provide product development solutions in a diverse range of industries. As an HPC System Administrator, you will be responsible for day-to-day operational support of the Roush CAE HPC and VDI hardware and software infrastructure.

Requirements

  • Bachelor's degree in engineering, computer science or related fields
  • Experience with Red Hat Enterprise Linux or similar Linux distributions (Fedora, CentOS Stream, Alma Linux and/or Rocky Linux)
  • Experience in bash, python, and/or similar scripting languages
  • Experience in Microsoft Office products (Excel, PowerPoint, SharePoint, Teams etc.)
  • U.S. Citizen allowing for International Traffic in Arms Regulations (ITAR) compliance
  • Self-starter, able to identify requirements independently, then make proposals for solutions as well as the flexibility in dealing with change in priorities and working on several projects simultaneously
  • Excellent documentation skills and the ability to communicate well with people of diverse backgrounds and computer knowledge
  • High level of personal commitment, occasional availability on weekends and out of hours will be required to ensure the system up time and support system maintenance schedules
  • Aptitude to learn from others, share knowledge with others, and promote continuous improvement of our processes
  • Ability to work with the engineering staff and users to aid and instruct how to use the HPC resources optimally

Responsibilities

  • Responsible for the day-to-day operational support of the Roush CAE HPC Clusters, VDI and backup servers: manage and solve any hardware and software issues that may arise
  • Assist in hardware and software upgrade programs to implement new technologies
  • Write Help documents for users, develop functional and technical designs for automated tools that can assist users with HPC job optimization following the Roush CAE HPC change management guidelines
  • Identify bottlenecks and assist in maximizing performance of our HPC applications
  • Provide advice and support to Roush HPC users
  • Interact confidently and professionally with various audiences and stakeholders at all levels
  • Keep abreast of latest HPC and industry developments and investigate the suitability of newly available technologies, including but not limited to: new CPU/GPU technologies, HMB, memory and high-speed interconnects, web-based software technologies and parallel high performance computing application tuning & optimization

Preferred Qualifications

  • Minimum of 2 years' experience of HPC system administration and supporting CAE users
  • Experience in installation, configuration and administration and use of CAE software (LS-DYNA, Nastran, StarCCM+, Abaqus, Fluent, etc.)
  • Experience in installation, configuration and administration of queue systems such as SLURM/LSF/PBS
  • Experience in installation, configuration and administration of Virtual Desktop Infrastructure (VDI) applications
  • Willingness to try new tools / technologies and improve process and cost effectiveness
  • Knowledge of HPC interconnect technologies (InfiniBand, Omni-Path, MPI etc.)
  • Knowledge and understanding of network technologies such as TCP/IP and networked file systems such NFS, GFS, Lustre, GPFDS

Benefits

  • Medical insurance
  • Dental insurance
  • Vision insurance
  • Life insurance
  • Earned sick time
  • STD (Short-Term Disability) insurance
  • LTD (Long-Term Disability) insurance
  • 401K retirement plan
  • Tuition reimbursement
  • Paid vacation
  • Paid holidays

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Roush know you found this job on JobsCollider. Thanks! πŸ™