Overview
The Physical and Computational Sciences Directorate (PCSD) researchers lead major R&D efforts in experimental and theoretical interfacial chemistry, chemical analysis, high energy physics, interfacial catalysis, multifunctional materials, and integrated high-performance and data-intensive computing.
PCSD is PNNL’s primary steward for research supported by the Department of Energy’s Offices of Basic Energy Sciences, Advanced Scientific Computing Research, and Nuclear Physics, all within the Department of Energy's Office of Science.
Additionally, Directorate staff perform research and development for private industry and other government agencies, such as the Department of Defense and NASA. The Directorate's researchers are members of interdisciplinary teams tackling challenges of national importance that cut across all missions of the Department of Energy.
Responsibilities
The Data Sciences & Machine Intelligence group in the Advanced Computing, Mathematics, and Data Division at PNNL seeks a multifaceted Data Scientist to join the group to lead and support scientific research in the broad areas of data science, artificial intelligence (AI), and machine learning (ML) with a focus on advancing the methods in natural language processing (NLP) and their applications. This is an excellent opportunity to develop your scientific career in an outstanding research institution by joining an interdisciplinary research team that focuses on accelerating scientific discovery. The primary emphasis of this position will be growing existing and crafting new capabilities in the areas of AI/ML, NLP and scientific machine learning to strengthen the group’s leadership position in data science and machine intelligence. A successful candidate will have shown expertise in the broad areas of data science and NLP such as: (i) training and evaluation of language models; (ii) application of NLP methods to domain problems/tasks; (ii) high performance computing; (iii) high level languages such as Python, and AI/ML libraries such as PyTorch/HuggingFace and (iv) contribution to proposals/white paper. We are looking for a proactive, highly motivated individual with an aptitude for contributing on multi-disciplinary teams.
• Applies knowledge of statistics, machine learning, advanced mathematics, simulation, software development, and data modeling to integrate and clean data, recognize patterns, address uncertainty, pose questions, and make discoveries from structured and/or unstructured data.
• Produces solutions driven by exploratory data analysis from complex and high-dimensional datasets.
• Designs, develops, and evaluates predictive models and advanced algorithms that lead to optimal value extraction from the data. Demonstrates ability to transfer skills across application domains.
• Design, develop, and implement methods, processes, and systems to analyze data from several application domains of national interest, including basic science and policy.
• Apply your knowledge of NLP, multimodal representation learning, and data analytics to integrate and clean data, recognize patterns, pose questions, and/or make discoveries from structured and/or unstructured data.
• Lead the development and evaluation of predictive and generative models (language models) for the extraction of maximum value from data and exercise your ability to transfer skills across application domains.
• Develop and maintain high quality software for machine learning/NLP projects.
• Train and deploy models on HPC/cloud infrastructure to support production grade applications.
• Produce solutions driven by exploratory data analysis from complex and high-dimensional datasets.
• Lead the publication and presentation of results in high impact scientific computing journals and conferences, and to sponsoring agencies.
• Mentor and train graduate and undergraduate interns.
Qualifications
Minimum Qualifications:
• BS/BA and 2 years of relevant experience -OR-
• MS/MA -OR-
• PhD
Preferred Qualifications:
• PhD in Data Science, Computer Science, Electrical and Computer Engineering, and closely related fields.
• Experience working in Python, including PyTorch, and libraries commonly used in machine learning for NLP (e.g., HuggingFace, deepspeed, dockers).
• Solid knowledge of core skills in data science and machine learning, including data/information curation from large unstructured data in the form of PDFs (text, image).
• Hands-on experience in training, studying, or analyzing large language models for a specific tasks/domain.
• Proficiency in computer science engineering skills. Experience in a production grade deployment team is a big plus, especially with applications in a cloud environment (AWS, Azure, VertexAI).
• Publications in top tier venues (ACL, EMNLP, NAACL, AAAI, NeurIPS, EMNLP, ICLR, etc.) and/or contributions to high-quality software tools.
Hazardous Working Conditions/Environment
Not applicable
Additional Information
Not applicable
Testing
Location: Richland, WA
Posted: Oct. 7, 2024, 9:46 a.m.
Apply Now Company Website