Job Title: Software Engineer IV
Work Arrangement: Remote (US)
Type: W2, Contract, 6 months w/ possible extension
Location: Menlo Park, CA
Pay Range: $105 – $115 per hour w/ optional benefits
Summary
The AI and Systems Co-Design team has a mission to explore, develop, and help productize high-performance software and hardware technologies for AI. Our team defines and drives the AI software and hardware roadmap at the company. We are seeking a candidate who will work on a foundational tool of our internal workloads on current and next-generation AI platforms. Specifically, this position focuses on collecting, processing, storing, and analyzing various operators and workloads.
Responsibilities
- Extract operators (e.g. aten, triton) from AI/ML models.
- Run operators on multiple devices and collect performance data.
- Process collected data and store it to a database while maintaining data integrity.
- Implement, improve, and maintain programmatic and web interfaces to query and analyze performance data stored in the database.
- Collaborate as part of a project team to coordinate development and determine project scope and limitations.
- Review project requests to estimate time and cost required to complete the project.
- Maintain the database by ensuring data is properly saved and can be retrieved efficiently.
- Have a solid understanding of how to write data to the database and read data from it.
- After completing these tasks, integrate the database into the automated testing workflow (Continuous Integration – CI).
Education/Experience
- Bachelor’s Degree
- Minimum 8+ years of experience
Must-Have Skills
- Hands-on experience with product-level Python programming
- Essential for implementing, improving, and maintaining programmatic and web interfaces, as well as processing and analyzing data.
- Proficiency in PyTorch, Kineto trace, dispatcher, and CUDA/Triton kernels
- Critical for extracting operators from AI/ML models, running them on devices, and collecting performance data.
- Hands-on experience in database management and SQL
- Necessary for processing collected data, storing it in databases, and maintaining data integrity.
- Machine learning experience
Nice-to-Have Skills
- Experience in Large Language Models (LLM), especially Llama
- Valuable for working with advanced AI models and potentially improving performance analysis.
- Knowledge of CI-based testing and automation
- Helpful for ensuring code quality and automating testing processes.
- Proficiency in Linux and Bash
- Important for working in the development environment and managing scripts and tools efficiently.