Principal Engineer, AIOps
Company: NVIDIA Corporation
Location: Santa Clara
Posted on: February 1, 2025
Job Description:
Principal Engineer, AIOpsWe are looking for an AIOps Principal
Engineer who can design, develop, and deploy AI-powered solutions
for IT operations. You will work with a team of engineers, data
scientists, and domain experts to create and implement innovative
applications that leverage NVIDIA's Observability, Infrastructure
and Gen AI platforms. You will also collaborate with internal and
external customers to understand their needs, define requirements,
and deliver high-quality products.What you'll be doing:
- Lead the design, development, testing, and deployment of AIOps
platform.
- Apply machine learning, deep learning, natural language
processing, and other AI techniques to solve IT operations
challenges such as anomaly detection, root cause analysis, incident
management, and automation.
- Improve IT Infrastructure and Operations Management by defining
and measuring AIOps metrics such as accuracy, reliability,
scalability, performance, and efficiency.
- Experience in implementing observability principles and
practices such as monitoring, logging, tracing, and alerting.
- Deep Knowledge in data science engineering such as data
collection, data cleaning, data analysis, data modeling, and data
visualization.
- Expertise in integrating AIOps tools with IT operations
management (ITOM) and IT service management (ITSM) systems, service
desk, change management, configuration management, etc.
- Demonstrate solid leadership skills and ability to lead and
empower engineers and data scientists.
- Design and communicate the AIOps roadmap, vision, and strategy
to the team and the partners.
- Collaborate effectively with customers, such as IT managers,
business users, vendors, and partners, to ensure alignment and
satisfaction.
- Play a pivotal role in harnessing AI, generative AI, and
machine learning for Nvidia IT teams.What we need to see:
- Bachelor's degree or higher in computer science, engineering,
or related field (or equivalent experience).
- 15+ years of industry experience in extensive engineering
projects, with a particular emphasis on infrastructure automation,
distributed systems, and tool development for managing large-scale
private or public cloud systems.
- 5+ years of experience and understanding working with AIOps
technologies and platforms.
- Proficient in Python, TensorFlow, PyTorch, or other AI
frameworks and libraries.
- Proficiency in Python and Go programming; your coding and
debugging expertise are pivotal to your success in this role.
- Demonstrated commitment to sound software engineering
principles and a strong willingness to acquire new skills.
- Experience in working with IT systems, tools, and processes
such as ITSM, ITOM, monitoring, logging, and alerting.
- Ability to work independently and collaboratively in a
fast-paced and dynamic environment.
- Hands-On experience in designing and implementing end-to-end
architecture and large-scale rollout of AIOps product.
- Developed Gen AI applications using LLMs, RAG for incident
diagnosis, identifying root causes and incident resolution.Ways to
stand out from the crowd:
- Proficiency in developing and deploying generative AI solutions
such as language model, chatbot, and conversational assistant.
- Hands-On experience in Integrating workflow automation tools
with AIOps for incident resolution and self-healing.
- Deep background and understanding of Machine Learning:
developing, training, and applying machine learning models across
large operational datasets.
- Experience with pre-training & fine-tuning LLM models and
working on ML frameworks such as SKLearn, XGBoost, PyTorch,
Tensorflow.
- Have hands-on experience with various AIOps platforms such as
BigPanda, DataDog, Moogsoft, ITOM Health, Splunk, Elastic Stack,
Dynatrace, New Relic, etc.NVIDIA is widely considered to be one of
the technology world's most desirable employers. We have some of
the most forward-thinking and hardworking people in the world
working for us. If you're a creative individual who thrives on
achieving goals and enjoys a dynamic learning environment, then why
not seize this opportunity? Apply today!The base salary range is
248,000 USD - 385,250 USD. Your base salary will be determined
based on your location, experience, and the pay of employees in
similar positions.You will also be eligible for equity and
benefits.NVIDIA accepts applications on an ongoing basis.NVIDIA is
committed to fostering a diverse work environment and proud to be
an equal opportunity employer. As we highly value diversity in our
current and future employees, we do not discriminate (including in
our hiring and promotion practices) on the basis of race, religion,
color, national origin, gender, gender expression, sexual
orientation, age, marital status, veteran status, disability status
or any other characteristic protected by law.
#J-18808-Ljbffr
Keywords: NVIDIA Corporation, Rancho Cordova , Principal Engineer, AIOps, Engineering , Santa Clara, California
Didn't find what you're looking for? Search again!
Loading more jobs...