Meta→
AI Research Scientist, Text Data Research - MSL FAIR
Entry LevelOn-site
Location
Bellevue, WA
Salary
$154k–$217k/yr
Experience
Not specified
Posted
1 week ago
Skills
llm expertisedata curationsynthetic data generationpytorchsqllarge-scale data handlingresearch publicationcollaborationproject management
Job Description
Summary: Meta is seeking AI research scientists to help us build the data foundation for Meta's most advanced Large Language Models. The role involves collaborating with cross-functional teams to develop foundational models and tackling data challenges at scale.
Responsibilities:
- Collaborate with cross-functional teams to develop Meta’s next foundational models
- Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data
- Architect efficient and scalable data curation systems and pipelines
- Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling
- Execute on high priority projects in pre-training, mid-training, or post-training data curation
- Apply specialized expertise in agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization
- Lead complex technical projects end-to-end
Required Qualifications:
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- PhD in Computer Science or a related technical field
- 1+ year of industry research experience in LLM/NLP or related AI/ML models
- Experience owning and/or driving complex technical projects from end-to-end
- Practical experience with pre-training or mid-training data curation for large foundational models and experience working with organic, synthetic, agentic, or reasoning data for LLMs
- Published research in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP) and/or demonstrated significant industry influence in the field of AI
Preferred Qualifications:
- Experience working on frontier-quality/state-of-the-art Large Language Models
- Multiple first-author publications in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP)
- Hands-on experience with modeling frameworks like PyTorch
- Hands-on experience on SQL and large-scale data handling, with familiarity of frameworks like Spark and Hive
Required Skills: LLM expertise, data curation, synthetic data generation
Important Skills: PyTorch, SQL, large-scale data handling
Nice-to-Have Skills: research publication, collaboration, project management
Benefits: Bonus, Equity, Benefits
Benefits
Bonus
Equity
Benefits