Machine Learning Engineer, ML Runtime & Optimization
Pony.ai
Founded in 2016 in Silicon Valley, Pony.ai has quickly become a global leader in autonomous mobility and is a pioneer in extending autonomous mobility technologies and services at a rapidly expanding footprint of sites around the world. Operating Robotaxi, Robotruck and Personally Owned Vehicles (POV) business units, Pony.ai is an industry leader in the commercialization of autonomous driving and is committed to developing the safest autonomous driving capabilities on a global scale. Pony.ai’s leading position has been recognized, with CNBC ranking Pony.ai #10 on its CNBC Disruptor list of the 50 most innovative and disruptive tech companies of 2022. In June 2023, Pony.ai was recognized on the XPRIZE and Bessemer Venture Partners inaugural “XB100” 2023 list of the world’s top 100 private deep tech companies, ranking #12 globally. As of August 2023, Pony.ai has accumulated nearly 21 million miles of autonomous driving globally. Pony.ai went public at NASDAQ in Nov. 2024.
Responsibility
The ML Infrastructure team at Pony.ai provides a set of tools to support and automate the lifecycle of the AI workflow, including model development, evaluation, optimization, deployment, and monitoring.
As a Machine Learning Engineer in ML Runtime & Optimization, you will be developing technologies to accelerate the training and inferences of the AI models in autonomous driving systems.
This includes:
- Identifying key applications for current and future autonomous driving problems and performing in-depth analysis and optimization to ensure the best possible performance on current and next-generation compute architectures.
- Collaborating closely with diverse groups in Pony.ai including both hardware and software to optimize and craft core parallel algorithms as well as to influence the next-generation compute platform architecture design and software infrastructure.
- Apply model optimization and efficient deep learning techniques to models and optimized ML operator libraries.
- Work across the entire ML framework/compiler stack (e.g.Torch, CUDA and TensorRT), and system-efficient deep learning models.