Staff Research Engineer, Pre-training Data
Reddit(4 days ago)
About this role
Staff Research Engineer for Pre-training Data at Reddit will define the technical strategy and architecture for data curriculum pipelines that power Reddit-native foundational LLMs. The role focuses on transforming Reddit’s large multimodal conversational corpus into high-quality training signals and building scalable infrastructure to feed distributed training clusters. This position supports Reddit’s AI products across Safety, Moderation, Search, Ads, and next-generation user experiences.
Required Skills
- Python
- Distributed Processing
- Ray Data
- Spark
- Data Sampling
- Curriculum Learning
- PII Redaction
- Graph Data
- Multimodal Data
- Rust
+3 more
About Reddit
redditinc.comReddit is a social news and discussion platform where users submit links, posts, and media into topic-based communities called subreddits. Content is surfaced and ranked by user voting and threaded discussions, and the site hosts popular formats like AMAs (Ask Me Anything) and community-driven events. Reddit provides moderation tools, mobile apps, and advertising products for brands and agencies. It’s known for its passionate niche communities and outsized influence on internet culture and trends.
Apply instantly with AI
Let ApplyBlast auto-apply to jobs like this for you. Save hours on applications and land your dream job faster.
More jobs at Reddit
Similar Jobs
Member of Technical Staff, Image Generation - Agent, RL
xAI(1 month ago)
Staff Software Engineer, GeminiApp, Automation
DeepMind(1 month ago)
Senior Machine Learning Engineer, Data for Embodied AI
Wayve(3 months ago)
Engineering Manager – Indexing and Search Systems
Wayve(6 months ago)
Research Engineer, Multimodal Reinforcement Learning
DeepMind(13 days ago)
Senior Staff Software Engineer ML Platform
Stack AV(11 days ago)