Copyright SiliconANGLE News

In enterprise R&D environments, teams need self-service infrastructure that scales without friction, enabling thousands of developers to spin up resources, test at velocity and meet deadlines without waiting on tickets or risking downtime. Nvidia Corp.’s platform engineering teams support chip design, firmware development and AI training workloads across on-premises and cloud Kubernetes clusters, all operating under a container-first philosophy. The company turned to Portworx by Pure Storage Inc. to deliver self-service data management at scale, enabling multi-tenant environments where development teams never collide and platform engineers can maintain availability without sacrificing agility, according to Brian Monroe (pictured, left), senior software engineer at Nvidia. “We need to be able to … take down a cluster, do maintenance,” Monroe told theCUBE. “We need to be able to shift our workloads. We try to generally do a zero downtime maintenance, so we basically would take down one node in a cluster, do the upgrades, things like that. We shift the workloads around … the PortWorx storage infrastructure with replication spread across multiple nodes [that] allows us to move those workloads around in various locations without having to worry about taking down a specific business process or function.” Monroe and Venkat Ramakrishnan (right), vice president and general manager of Portworx by Pure Storage Inc., spoke with theCUBE’s Rob Strechay at the KubeCon + CloudNativeCon NA event, during an exclusive on-the-ground broadcast from theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed how Nvidia and Portworx enable self-service data management for Kubernetes environments at scale, supporting multi-tenant workloads without compromising availability or agility. (* Disclosure below.) Building self-service infrastructure for speed and scale Portworx provides storage provisioning scoped directly to Kubernetes namespaces, allowing teams working on chip design, firmware development or AI training to operate independently without waiting for infrastructure tickets. That self-service infrastructure model means developers can spin up persistent volumes within their own environments while platform engineers maintain control over the underlying systems, according to Ramakrishnan. “You don’t need to ever file a ticket to get storage service or like a file block or an object because you’re running Portworx, and it’s available scoped to your namespace with multi-tenancy,” he said. “That means you could have a team that’s developing a whole bunch of ASIC firmware, but another team that’s developing AI training can share the same Kubernetes infrastructure without ever stepping on each other’s data. These are teams that support a large number of developers with a few platform engineers. We enable them to operate at scale.” Nvidia’s infrastructure teams plan for elastic growth, adding nodes and storage capacity as research and development demands shift. The company operates in an environment where resource requests can spike during critical phases, such as chip tape-outs or AI model-training cycles, requiring self-service infrastructure that stretches without redesigns, according to Ramakrishnan. “Different industries have different requirements, but I think the fundamental problems are kind of similar when you go from one industry to another,” he said. “You’re talking about scale and resiliency right now; in media, especially, there’s a developer platform, a developer experience platform, but let’s look at the fundamental [key performance indicators] and [service-level agreements] we need to drive. They have thousands of developers, they’re building code constantly, [and] at the time of a release, like a chip tape out or a new software release, likely you’re going to need more resources. The underlying problems in scale is the same, same thing with resiliency.” Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the KubeCon + CloudNativeCon NA event: (* Disclosure: Pure Storage sponsored this segment of theCUBE. Neither Pure Storage nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.) Photo: SiliconANGLE