Infrastructure Engineer, Data Acquisition
Company: OpenAI
Location: San Francisco
Posted on: February 2, 2025
Job Description:
The Data Acquisition (DAQ) organization is responsible for
building and operating the data pipelines, crawling, storage
systems, and post-processing platforms that fuel OpenAI's research
and product development. Our Infrastructure team's mission is to
enable DAQ developers to move fast with minimal friction, providing
"just enough" reliability, observability, and tooling in a highly
dynamic environment. We value broad impact and rapid iteration over
polished perfection. As our data needs grow by orders of magnitude,
this small but scrappy team ensures that our foundational services
can keep pace.We operate across a wide range of infrastructure
concerns, including:
- Scaling web crawler (from modest usage to many multiples of
that in the coming year)
- Managing storage and compute for large-scale indexing,
embedding, and search workloads
- Evolving observability (metrics, logs, traces) and automation
in a flexible, 80/20 manner
- Adopting best-of-breed tooling and security from other parts of
the organization (e.g., Terraform stacks, cloud platform
practices)Rather than enforcing rigid SLAs or designing monolithic
infrastructure, we focus on empowering DAQ teams to build, deploy,
and run their own services. Our environment can be fluid and ad
hoc- if something solves the problem quickly and reliably enough,
that is usually the correct approach.About the RoleWe're looking
for a hands-on Infrastructure Engineer with a strong bias toward
action. You'll be part of a small group of generalists responsible
for everything from ad-hoc shell scripts to cluster provisioning
automation. You will both design and implement systems: we do
create architecture, but it's rarely in the form of a lengthy
design doc that goes stale-rather, we value prototyping, iterating,
and shipping quickly.In this role, you will:
- Scale and maintain our data pipelines and compute clusters as
DAQ grows by large multiples in the next year
- Build out "just enough" observability (metrics, logs, tracing)
to support developer troubleshooting and performance insights
- Help design on-call processes for the infra we own, balancing
developer velocity with service reliability
- Collaborate directly with DAQ teams on deployment approaches,
ephemeral or ad hoc workloads, network/security integrations, and
more
- Prototype and implement solutions for caching, load balancing,
job scheduling, and cluster scaling, with an emphasis on iteration
speed
- Improve developer productivity by reducing friction-through
better tooling, automated provisioning, and simplified environment
setups
- Adopt and integrate Infrastructure-as-Code (IaC), CI, and
security best practices from other teams, tailoring them to DAQ's
dynamic needsYou might thrive in this role if you are a broad
generalist who enjoys a scrappy, results-oriented culture, can dive
into anything from container orchestration to networking, and loves
to unblock fellow engineers by building reliable infrastructure
that supports massive growth. You're comfortable with cloud
infrastructure, automation, observability, and are eager to work in
a dynamic and fast-moving team environment.Qualifications
- Proven experience in an infrastructure backend role, ideally in
a fast-paced environment
- Comfort with at least one major cloud platform and its
associated tooling
- Familiarity with containerization and orchestration
technologies (Kubernetes or similar)
- Some background with IaC (e.g., Terraform, CloudFormation)
- Hands-on experience with monitoring/observability stacks
- Strong scripting/coding ability to build and automate solutions
(your language of choice)
- Ability to balance scrappiness and speed with robust design
when needed
- Effective communication and collaboration skills; you enjoy
enabling othersWe are a small team tackling big scaling challenges
with lean resources-this role is pivotal to enabling the next wave
of DAQ's growth. If you're excited by a high-impact, hands-on
position where you'll have broad autonomy and creative freedom to
shape infrastructure, we'd love to hear from you!About OpenAIOpenAI
is an AI research and deployment company dedicated to ensuring that
general-purpose artificial intelligence benefits all of humanity.
We push the boundaries of the capabilities of AI systems and seek
to safely deploy them to the world through our products. AI is an
extremely powerful tool that must be created with safety and human
needs at its core, and to achieve our mission, we must encompass
and value the many different perspectives, voices, and experiences
that form the full spectrum of humanity.We are an equal opportunity
employer and do not discriminate on the basis of race, religion,
national origin, gender, sexual orientation, age, veteran status,
disability or any other legally protected status.For US Based
Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we
will consider qualified applicants with arrest and conviction
records.We are committed to providing reasonable accommodations to
applicants with disabilities, and requests can be made via this
link.At OpenAI, we believe artificial intelligence has the
potential to help people solve immense global challenges, and we
want the upside of AI to be widely shared. Join us in shaping the
future of technology.
#J-18808-Ljbffr
Keywords: OpenAI, Rancho Cordova , Infrastructure Engineer, Data Acquisition, Engineering , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...