Description
NVIDIA is leading the way in the AI revolution, transforming industries with our cutting-edge GPU technology. Our GPUs fuel groundbreaking innovations across various domains, such as self-driving cars, computer vision, and speech recognition. As the premier AI computing company, we relentlessly push the boundaries of AI, big data, and deep learning. We are searching for bold and visionary leaders to join us as Senior SRE Engineering Leader. In this role, you will manage globally distributed clusters, ensuring seamless operations and delivering AI services that drive advancements in life sciences and natural language processing. Your responsibilities will include building and operating large-scale GPU clusters across various cloud providers and designing processes that enhance our operational ecosystem.
Company Culture and Environment
At NVIDIA, we foster a culture of creativity and autonomy, encouraging our engineers to innovate and drive technology forward. Our teams are made up of some of the most experienced and versatile professionals in the industry, contributing to a collaborative and supportive work environment that values diverse perspectives.
Career Growth and Development Opportunities
NVIDIA offers numerous opportunities for professional growth and career advancement. As part of our extraordinary engineering teams, you will have the chance to develop your skills, mentor others, and lead projects that influence the future of AI.
Detailed Benefits and Perks
• Comprehensive benefits package including equity options
• Opportunities for ongoing training and development
• Flexible working conditions and a supportive work environment
• Access to cutting-edge technologies and tools
Compensation and Benefits
The base salary range for this position is 272,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/).
Why you should apply for this position today
By joining NVIDIA, you will be part of a transformative journey in AI, working on groundbreaking projects and collaborating with talented teams. Your contributions will directly impact the future of technology and innovation in diverse industries.
Skills
• Strong Unix/Linux knowledge and proficiency in at least two programming languages (Perl, Python, Go)
• Expertise in managing large-scale distributed systems and AI/HPC environments
• Experience supporting AI/ML workloads with operational best practices
• Leadership experience with mentoring and coaching skills
• Ability to quickly learn and integrate new technologies
• Strong collaboration skills across engineering, server, storage, and security teams
Responsibilities
• Manage distributed, multi-location GPU clusters for AI research
• Lead a team of SREs, driving cluster operational excellence and efficiency
• Deliver scalable distributed systems and AI services in fast-paced environments
• Build strong, globally distributed teams and drive technical strategy
• Collaborate across the company to enhance the GPU ecosystem for AI use cases
• Address reliability, efficiency, and productivity challenges for GPU infrastructure
• Define strategy, manage projects, and provide technical leadership across multiple areas
• Ensure transparency on budget and operational efficiency with internal collaborators
Qualifications
• 10+ years of experience in engineering management; 3+ years in leadership roles
• Bachelor’s or Master’s in Computer Science or a related field, or equivalent experience
• Proven experience managing large-scale distributed systems and AI/HPC environments
• Familiarity with deep learning frameworks like PyTorch and TensorFlow
Education Requirements
• Bachelor’s degree in Computer Science, Engineering, or a related field
• Master’s degree preferred but not required
Education Requirements Credential Category
• Computer Science
• Engineering
Experience Requirements
• 10+ years of overall engineering management experience
• 3+ years in leadership roles
• Background in supporting AI/ML workloads and driving operational standard methodologies
Why work in Santa Clara, CA
Santa Clara is a vibrant hub for technology and innovation, home to many leading tech companies. The city offers a diverse cultural scene, excellent dining options, and beautiful parks for recreation. Living in Santa Clara provides a unique opportunity to be at the forefront of technological advancements while enjoying a high quality of life.
Location: Santa Clara, CA
Posted: Oct. 11, 2024, 4:36 p.m.
Apply Now Company Website