Nvidia
Senior Manager, Cloud Operations Engineering
Summary
Job Description
For over 25 years, NVIDIA has been revolutionizing computer graphics, PC gaming, and accelerated computing. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people.
At NVIDIA, we are seeking a highly skilled Senior Engineer Operations Manager to join our world-class NGC Cloud team. In this role, you will help drive the efficiency, reliability, and scalability of the systems that power our global business operations. This is an exceptional opportunity to shape how we automate, streamline, and support critical operational workflows across the organization. You will define how we implement innovative automation and support solutions, enabling teams to operate seamlessly and deliver impact at global scale—all within an encouraging and inclusive environment.
What you'll be doing:
Lead, mentor, and develop a team of 4-8 engineers, providing technical guidance, performance feedback, and career development opportunities
Build and implement comprehensive monitoring, alerting, and reporting solutions using industry-standard tools
Develop and maintain automation pipelines to streamline operational workflows and reduce manual overhead
Coordinate incident, problem, and process adjustment procedures in alignment with ITSM guidelines
Collaborate with multi-functional teams to identify operational difficulties and implement solutions
Build and maintain internal operational tools and frameworks that enhance team productivity
Ensure alignment with security and compliance standards across all operational systems and processes
Define key performance indicators and metrics to measure operational health and team performance
What we need to see:
BS/MS in Computer Science or a related technical field, or equivalent experience, combined with 8+ overall years of hands-on experience building, supporting, and managing complex services and infrastructure.
Proven track record of 4+ years of leadership/management experience in a technical environment.
Strong proficiency in Python for automation, data handling, and tool development
Hands-on experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, CloudWatch, or Splunk
Demonstrated expertise in ITSM practices, including incident, problem, and process improvement
Ability to implement secure and compliant offboarding procedures and manage access-related tasks
Strong understanding of IT operations, system workflows, and operational standards
Core knowledge of Java, including Collections API, Streams API, Concurrency, and I/O
Solid understanding of RDBMS and NoSQL databases, with hands-on experience in Cassandra, DynamoDB, or Redis
Ways to stand out from the crowd:
Experience designing or implementing end-to-end automation pipelines and internal operational tools
Prior experience in security-conscious or compliance-heavy environments (financial services, healthcare, SaaS, etc.)
Expertise in creating comprehensive monitoring solutions, custom dashboards, and automated reporting mechanisms
Track record of success in fast-paced, high-growth environments with constantly evolving operational needs
Strong documentation habits and demonstrated commitment to continuous improvement and knowledge management
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 425,500 USD.You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until December 6, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.About Nvidia
Nvidia
NVIDIA provides the computational backbone for numerous mission-critical defense and intelligence operations.
In federal defense, Nvidia’s work is characterized by its dual identity as a critical provider of high-performance AI infrastructure and a central figure in national security policy.
Similar Jobs
Nvidia
Distinguished Engineer, Utility Computing
US, CA, Remote • Dec 22
NVIDIA is leading the industry in delivering accelerated computing in cloud and enterprise environments. We’re a team of innovative engineers...
Nvidia
Senior Data Platform Admin – Finance
US, CA, Santa Clara • Dec 22
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy...
Nvidia
Principal Network Architect
US, CA, Santa Clara • Dec 22
NVIDIA Enterprise Network Architecture team is seeking experienced candidates in the extensive domain of network architecture & engineering. This is...