Skip to content

Infrastructure Systems Engineer

US, CA, Santa Clara
Full Time On-site

Summary

Job Description

NVIDIA’s Kernel Infrastructure team is looking for a Hands-On Systems Engineer to manage environment readiness, configuration, and long-term health of our next-generation GPU platforms. You will own the key lifecycle phase where early production hardware meets software. Your role ensures our innovative systems are stable, optimized, and continuously maintained for engineering teams.

If you love being hands-on with early-stage computing platforms, debugging complex hardware-to-software environments, and owning the operational stability of fast-evolving infrastructure, join us in Santa Clara, CA.

What you'll be doing:

  • Early Production Bringup & Tuning: Drive early-stage engineering systems to a performance-ready state. Handle firmware/VBIOS flashing, core clock configurations, power-state enablement, and system tuning.

  • Triage & Cross-Functional Collaboration: Act as the first line of defense for complex system and environment-level issues, coordinating directly with firmware, hardware design, and platform teams to unblock engineering.

  • Fleet Health & Maintenance: Monitor and optimize the ongoing health of the hardware fleet. Implement proactive health checks, diagnose degrading systems, and provide manual recovery when automated workflows fall short.

  • Standardization & Allocation: Establish and detail the "golden" system baselines (drivers, firmware, configurations) required for stable engineering execution as the product evolves. Track hardware inventory and manage demands from engineering teams to improve hardware utilization.

What we need to see:

  • Degree in Computer Engineering, Electrical Engineering, Computer Science, or equivalent experience.

  • 3+ years in systems engineering, infrastructure operations, or hardware validation environments handling early-stage platforms.

  • Deep Linux and Windows system administration with strong debugging capabilities across the hardware-to-software stack.

  • Proficiency in scripting and automation (Shell scripting, Python, Ansible etc.).

  • Hands-on experience with Slurm, Kubernetes, or other cluster management platforms.

  • Strong, clear written and verbal communication skills, including the ability to explain complex technical concepts to non-technical audiences.

  • Strong problem-solving skills and a collaborative approach.

  • Self-motivated individual and a great teammate.

Ways to stand out from the crowd:

  • Experience managing HPC clusters at scale.

  • A proven track record of configuring and maintaining bring-up systems and early hardware prototypes.

  • Demonstrated technical curiosity and a drive to innovate.

  • Mechanically inclined and comfortable with tools and hands-on physical work.

  • Positive and cooperative, with the determination to help us reach the finish line.

#LI-Hybrid

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 124,000 USD - 195,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 28, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

About Nvidia

Nvidia

NVIDIA is one of the most influential technology companies in the world, powering the modern era of artificial intelligence, high-performance computing, graphics, and autonomous systems. Originally known for its leadership in gaming GPUs, NVIDIA has evolved into the backbone of AI infrastructure, designing the chips, software, and systems that train and deploy large-scale AI models used across industries from healthcare and robotics to autonomous vehicles and scientific computing.

For job seekers, NVIDIA offers opportunities at the forefront of deep tech, spanning software engineering, AI research, systems engineering, hardware design, networking, robotics, and developer tools. A major focus of its work is the CUDA software platform and AI ecosystem, which enables developers to program GPUs at massive scale and has become foundational to modern machine learning and data center computing. This makes NVIDIA especially attractive to engineers, researchers, and technologists who want to work directly on the infrastructure powering today’s AI revolution.

Unlike traditional hardware companies, NVIDIA operates as a full-stack computing platform company, integrating silicon, systems, and software into a unified ecosystem. Employees may work on everything from GPU architecture and data center systems to AI frameworks, simulation platforms like Omniverse, and autonomous vehicle technology through the DRIVE platform. This breadth allows teams to operate at the intersection of research and production-scale deployment, with direct impact on global computing infrastructure.

As demand for AI, accelerated computing, and autonomous systems continues to grow rapidly, NVIDIA remains one of the most important employers in technology and advanced engineering. For professionals seeking a high-impact career at the center of AI development—where breakthroughs quickly translate into real-world systems at global scale—NVIDIA stands out as one of the most dynamic and sought-after destinations in the industry.

Go to company profile