Job Listings

Senior GPU Supercomputer Scheduler Engineer

Nvidia

### Summary Description
NVIDIA has consistently transformed itself over the last two decades, starting with the creation of the GPU in 1999 that ignited the growth of the PC gaming industry while redefining modern computer graphics and altering the landscape of parallel computing. More recently, the advent of GPU deep learning has ushered in a new era of modern AI, showcasing NVIDIA's commitment to innovation. As a "learning machine," NVIDIA continually evolves by tackling complex, impactful challenges that enhance human creativity and intelligence. Join our GPU/HPC Infrastructure team and play a pivotal role in leading the design and development of advanced GPU compute clusters that run intensive deep learning and high-performance computing (HPC) workloads. This position presents an exciting opportunity to contribute to production-grade solutions, engage with cutting-edge technology, and collaborate with industry leaders to solve pressing challenges in machine learning, cloud computing, and system co-design.

### Compensation and Benefits
- Base Salary Range: $148,000 - $419,750
- Compensation is determined by location, experience, and similar position pay within the company.
- Opportunity for stock equity participation.
- Comprehensive benefits package which includes health, wellness, and financial benefits (detailed at [NVIDIA Benefits](https://www.nvidia.com/en-us/benefits/)).
- Support for continuous learning and personal development opportunities.

### Why You Should Apply for This Position Today
- **Innovation Hub**: Be part of a leading technology company known for pioneering advancements in GPU technology and AI.
- **Career Growth**: Work alongside top professionals in the field, gaining insights and experience in high-demand technologies.
- **Diverse Work Environment**: Join a team that emphasizes diversity and inclusion, cultivating a supportive and collaborative workplace.
- **Impactful Work**: Engage in projects that are not just technically challenging, but also contribute to meaningful advancements in technology and society.

### Skills
- Strong understanding of HPC batch schedulers (e.g., Slurm, RTDA, LSF).
- Proficient in programming languages such as C/C++ and advanced scripting with Python, Go, and bash.
- Established experience with Linux operating systems and environments.
- In-depth knowledge of computer architecture and operating systems.
- Strong comprehension of Networking Protocols (InfiniBand, Ethernet).
- Experience with performance analysis and tuning for HPC workloads.
- Familiarity with container technologies like Docker, Singularity, and Podman.
- Excellent communication and interpersonal skills for effective collaboration with teams and customers.

### Responsibilities
- Design and implement enhancements for HPC batch scheduler systems.
- Collaborate extensively with HPC scheduler vendors for bug fixes and feature developments.
- Provide hands-on support for staff and end users to troubleshoot batch scheduler issues.
- Enhance the ecosystem surrounding GPU-accelerated computing.
- Conduct comprehensive performance analysis and optimization of deep learning workflows.
- Develop large-scale automation solutions to streamline operations.
- Perform root cause analysis and suggest actionable solutions for various issues.
- Anticipate and proactively address potential technological problems.

### Qualifications
- Bachelor’s degree in Computer Science, Electrical Engineering, or related discipline; equivalent hands-on experience considered.
- Minimum of 5 years of relevant work experience in high-performance computing and related fields.

### Similar Occupations / Job Titles that Would Be a Great Fit for This Role
- HPC Engineer
- Systems Architect
- Machine Learning Engineer
- Cloud Computing Specialist
- Computational Scientist
- Systems Software Engineer

### Education Requirements
- Bachelor’s Degree in Computer Science, Electrical Engineering, or a related field.

### Education Requirements Credential Category
- Bachelor’s Degree or relevant equivalent experience.

### Experience Requirements
- At least 5 years of relevant experience, particularly in high-performance computing environments, GPU technology, and related programming and scripting languages.

### Why Work in Santa Clara, CA
- **Technology Hub**: As part of Silicon Valley, Santa Clara is home to a thriving technology ecosystem, providing ample networking and career opportunities within various tech industries.
- **Innovation and Culture**: The city is known for its vibrant culture, innovative spirit, and access to world-class amenities, enriching both personal and professional lives.
- **Climate**: Enjoy a Mediterranean climate with warm summers and mild winters, conducive to outdoor activities and a high-quality lifestyle.
- **Diversity**: Experience a diverse community that fosters inclusivity, creative thinking, and collaboration across various disciplines and cultures.

Location: Santa Clara, CA

Posted: Aug. 26, 2024, 2:04 a.m.

Apply Now Company Website