Copyright International Business Times

It's back! This week is KubeCon, North America—one of the most beloved technology conferences in the world, centered around one of the most beloved enterprise technologies: Kubernetes. Born out of Google, Kubernetes has changed the very infrastructure on top of which modern applications are built. More than half of all enterprises have adopted Kubernetes, with millions of developers deploying their applications on top of Kubernetes every day. But like any other technology, it comes with its trade-offs...mainly with the complexity of managing performance and debugging issues, which is why hundreds, if not thousands, of software vendors are touting their Kubernetes solutions this week. To help you separate the noise from the value, we want to highlight [] companies that truly help deliver value to enterprises using Kubernetes. Gremlin We're very excited to see this company ramp back up their public persona. Gremlin was founded by ex-Netflix and Amazon engineers in 2017 and made a lot of noise around the cutting-edge discipline of Chaos Engineering. For those of you who are unfamiliar, Gremlin has built a platform that empowers companies to run experiments on their systems in order to proactively identify where the weaknesses are. Like getting people to the gym, it isn't always easy to convince people to change their habits—but the enterprises that build Chaos Engineering into their routines benefit from much healthier, reliable systems. At this KubeCon, Gremlin is announcing a strategic partnership with Dynatrace—a leader in observability and application performance monitoring—to help Kubernetes users keep their applications in a desired state. Kubernetes services are automatically discovered within Gremlin, powered by Dynatrace's AI-driven observability and topology mapping. Health checks are then applied to Kubernetes' objects, allowing organizations to efficiently implement standardized reliability testing and gain deeper insights into their environments. Mezmo Mezmo is pioneering the concept of active telemetry in order to provide AI agents with better data and context. The company recently launched an RCA agent (root cause analysis) aimed at Kubernetes users that automatically identifies and fixes common issues such as deployment failures, resource issues, configuration errors, application-level failures, and more. The proof is in the pudding: their AI agent consistently resolves issues in complex cloud environments faster and more accurately than other AI agents and models. According to the company's blog, "we're entering an era where incidents resolve themselves before engineers even know they exist." By leveraging agentic AI workflows, Mezmo rapidly analyzes telemetry data to pinpoint root causes, eliminate noise, and recommend actionable remediation steps. Causely Causely is pioneering the category of AI SRE—a category that will undoubtedly explode in popularity in 2026. Now more than ever, engineering teams are drowning in too much data and too many alerts. Having a system like Causely on the market—founded by a veteran of the industry with two prior startups in IT Operations—is a natural and critical response to the rise of AI code-generation tools that are shipping code faster than humans can reasonably understand it or manage it. At this year's KubeCon, Causely is announcing an MCP Server that seamlessly integrates into any MCP-compatible IDE and enables developers to automatically diagnose, understand, and remediate complex issues within Kubernetes and application code using natural language prompts. It works by analyzing the real-time state of the system, identifying whether the cause of an issue is in the infrastructure or application layer, recommending the precise code changes, configuration changes, or helm chart updates, and presenting these suggestions inline within the developer's IDE for review, refinement, or approval. Komodor It's hard to talk about Kubernetes troubleshooting and not mention Komodor. Founded by an ex-Google engineer, Komodor has been dedicated to the Kubernetes ecosystem since it launched out of stealth in 2021. Their management platform simplifies operations, provides automated troubleshooting, and helps teams manage complex environments. It tracks changes, analyzes their impact, and provides actionable context for issues, which reduces troubleshooting time and improves delivery velocity. Key features include automated drift detection, root cause analysis, and monitors for cluster health and resource optimization. If you'll be in Atlanta, Georgia, this week for KubeCon 2025—stop by the booth of each of these companies. They are adding tremendous value to enterprises looking to maximize the performance and benefits of using Kubernetes.