Copyright SiliconANGLE News

Google LLC has debuted two new open-source tools designed to ease the task of operating artificial intelligence environments. The tools made their debut today at the KubeCon + CloudNativeCon North America conference in Atlanta. They are rolling out alongside GKE Pod Snapshots, a new feature in Google Cloud’s GKE managed Kubernetes service. AI agents use external applications such as browsers and databases to carry out tasks. Such integrations can potentially pose cybersecurity risks: an AI agent might use a code editor to write malware. Developers mitigate cybersecurity risks by deploying agents and the applications they use in an isolated container, or sandbox, that is cordoned off from sensitive systems. The first open-source tool that Google debuted today, Agent Sandbox, is designed to ease the creation of AI agent sandboxes. It’s implemented as an extension to Kubernetes’ core feature set. According to the search giant, AI applications can use Agent Sandbox to launch upwards of thousands of isolated AI agent environments and delete them when the agents complete their work. Agent Sandbox is based on an open-source tool called gVisor. The latter technology, which Google released in 2018, isolates a container from sensitive components of the operating system on which it runs. It thereby prevents any AI-written malware that might be running inside the container from making malicious changes. Google Cloud will provide support for Agent Sandbox in its GKE service, which enables developers to create cloud-based Kubernetes clusters. The service also automates many of the associated infrastructure maintenance tasks. According to Google, GKE now enables developers to create “pre-warmed” Agent Sandbox environments. Those are containers that contain the tools an AI agent needs to perform a task and come online well before it begins the work. Launching sandboxes ahead of time removes the need for agents to pause their work while those sandboxes initialize, which speeds up processing. Google Cloud promises to further improve AI workloads’ performance with a GKE feature called Pod Snapshots that also debuted today. Some large language models can take upwards of 10 minutes to launch. According to Google, Pod Snapshots shortens startup times by 80% in some cases. One of the main reasons LLMs take a long time to launch is that the containers in which they run have to be initialized from scratch. The task involves deploying the various software components a model requires to operate and then configuring those components. The process is usually carried out automatically by a script. Pod Snapshots speeds up the workflow by removing the need for script-driven environment configuration. The feature creates a snapshot, or copy, of a container that includes not only all the software components it contains, but also their configuration. Applications can simply load the ready-to-use snapshot from memory instead of using a script to individually set up software components. “GKE Pod Snapshots supports snapshot and restore of both CPU- and GPU-based workloads, bringing pod start times from minutes down to seconds,” Google senior product manager Brandon Royal wrote in a blog post today. “With Pod Snapshots, any idle sandbox can be snapshotted and suspended, saving significant compute cycles with little to no disruption for end-users.” Google is rolling out Agent Sandbox and Pod Snapshots alongside a new open-source tool called Multi-Tier Checkpointing, or MTC. It’s designed to streamline large-scale AI training projects. AI models sometimes encounter errors during training that have to be remediated before the project can proceed. The simplest way to remove errors is to restart the training session from scratch. However, that takes a significant amount of time. Developers shorten the process by occasionally saving an LLM during training and restoring the most recent version when an error emerges. Rolling back an LLM to an earlier version is faster than restarting the training process, but can still lead to significant training delays. Google says that its new MTC tool speeds up the workflow. It thereby enables companies to more quickly train new AI models and update existing ones with fresh datasets.