Department Summary
UCLA's Office of Advanced Research Computing (OARC) melds expert staff and technical infrastructure to amplify and accelerate the impact of UCLA research in the age of networked data and computation.
OARC expertise and resources are available to all UCLA researchers, who are engaged in digital research and scholarship. We work with faculty, student, and postdoctoral researchers; instructors; and staff and administrators.
OARC is a relationship-building organization. We enable digital scholarship through collaborations, partnerships, and networked communities to advance cutting-edge research capabilities at UCLA and beyond.
OARC supports and enhances the university mission of education, research, and service through the development and execution of innovative and sustainable technology practices, programs, services, infrastructure, policies, and partnerships.
Position Summary
HPC System Administrator
UCLA's Office of Advanced Research Computing (OARC) supports and enhances the university mission of education, research, and service through the development and execution of innovative and sustainable technology practices, programs, services, infrastructure, policies, and partnerships.
The OARC High Performance Computing (HPC) Systems Research Technology Group (RTG) supports thousands of UCLA researchers and over 300 research groups through consultation and the operation of the Hoffman2 High Performance Research Cluster. More information on the Hoffman2 cluster may be found at Hoffman2 Cluster Documentation
The Hoffman2 cluster environment consists of approximately 1000 compute nodes, GPU nodes, high speed networking, high-performance storage, backup equipment, and extensive hardware and software support infrastructure, spread across multiple data centers.
The HPC System Administrator, as part of the HPC team, will serve as a technical expert supporting OARC's HPC environment in the areas of systems and application software development, HPC cluster system administration and management of the backup system environment.
Requires the ability to work from UCLA's Westwood campus as operational demands dictate. FlexWork / hybrid schedules will be considered based on work demands and operational needs.
Salary & Compensation
• UCLA provides a full pay range. Actual salary offers consider factors, including budget, prior experience, skills, knowledge, abilities, education, licensure and certifications, and other business considerations. Salary offers at the top of the range are not common. Visit UC Benefit package to discover benefits that start on day one, and UC Total Compensation Estimator to calculate the total compensation value with benefits.
Qualifications
• 3 years Experience with software and applications development, Linux system administration, and two or more modern programming languages (e.g. Python, C++, Java). (Required)
• Expert knowledge of Python, SQL, bash, git, and associated build systems, libraries, and development tools. Demonstrated knowledge of common programming paradigms (e.g., asynchronous, concurrent, and object-oriented). Demonstrated ability to create high-quality system tools and software. (Required)
• Ability to work independently or in a development team, and effectively estimate time and effort required to complete tasks. Ability to analyze, benchmark, debug, and test software in a technically sound manner and to generate clear, readable reports and summaries. (Required)
• Demonstrated working knowledge of HPC cluster architectures and concepts (e.g., provisioning, benchmarking, scalability, and parallelizing code) and ability to stay current with industry best practices. (Required)
• Detailed knowledge of Red Hat Enterprise Linux and related distributions. Solid system administration skills including scripting, pipelines, and UNIX operating system fundamentals. (Required)
• Working knowledge of protocols, applications, and formats including, but not limited to, TCP/IP, HTTP, DHCP, SSH, NFS, JSON, XML, and HTML. (Required)
• Demonstrated ability to troubleshoot and debug computing problems including, corrupted data, file management, application software, and operating system problems. Accurately, and independently respond to production problems in multiple complex operating systems and software components. (Required)
• Knowledge of validation, verification, and disaster recovery capabilities for both hardware and software. (Required)
• Demonstrated skill in writing well-organized, complete, and technically and grammatically correct documents and procedures to be used by technical and non-technical personnel of diverse backgrounds at various levels in the organization, including researchers, peers, and management. (Required)
• Demonstrated oral communication and presentation skills sufficient to effectively obtain and impart technical information and explain concepts on a one-to-one basis as well as in meetings with or presentations to multiple clients. (Req
Location: Los Angeles, CA
Posted: Aug. 28, 2024, 12:03 a.m.
Apply Now Company Website