Job Title: Software Engineer IV
Contract Duration: 6 months, possible extension
Location: Menlo Park, CA
Work Arrangement: Remote
Summary
The AI and Systems Co-Design team has a mission to explore, develop, and
help productize high-performance software and hardware technologies for AI.
Our team defines and drives the AI software and hardware roadmap at Meta.
We are seeking a candidate who will work on a foundational tool of our
internal workloads on current and next-generation AI platforms. Specifically,
this position focuses on collecting, processing, storing, and analyzing various
operators and workloads.
Responsibilities
- Extract operators (e.g. aten, triton) from AI/ML models.
- Run operators on multiple devices and collect performance data.
- Process collected data and store it to a database while maintaining data
- integrity.
- Implement, improve, and maintain programmatic and web interfaces to query
- and analyze performance data stored in the database.
- Collaborate as part of a project team to coordinate development and
- determine project scope and limitations.
- Review project requests to estimate time and cost required to complete the
- project.
Skills
- Hands-on experience with product-level Python programming
- Proficiency in PyTorch, Kineto trace, dispatcher
- Hands-on Experience with CUDA, Triton kernels
- Hands-on experience in database management and SQL
- Proficiency in Linux and Bash
- Ability to work independently
- Good-to-have skills
- Experience in LLM especially Llama.
- Knowledge of CI-based testing and automation
- Education/Experience
- At least three years of experience with above-mentioned skills is required for
- this role.
Must Have Skills
- Hands-on experience with product-level Python programming
- Essential for implementing, improving, and maintaining programmatic and
- web interfaces, as well as processing and analyzing data.
- Proficiency in PyTorch, Kineto trace, dispatcher, and CUDA/Triton kernels
- Critical for extracting operators from AI/ML models, running them on devices,
- and collecting performance data.
- Hands-on experience in database management and SQL
- Necessary for processing collected data, storing it in databases, and
- maintaining data integrity.
- Machine learning experience
Nice to Have Skills
- Experience in Large Language Models (LLM), especially Llama
- Valuable for working with advanced AI models and potentially improving
- performance analysis.
- Knowledge of CI-based testing and automation
- Helpful for ensuring code quality and automating testing processes.
- Proficiency in Linux and Bash
- Important for working in the development environment and managing scripts
- and tools efficiently.