Learning The Most Rigorous Approaches To Validating Algorithms And Greatly Boosting AI Safety

Learning about validating of systems, especially AI, provides good career promise and aids society toward achieving AI safety. In today’s column, I examine the importance of performing vital and comprehensive validations of modern-day systems, including the validation of systems based on AI. Though you might naturally assume that AI developers and AI makers are mindfully validating their cutting-edge systems before public placement into the marketplace, the sad and altogether disconcerting news is that validation efforts often get short shrift. The mindset of fast deployment tends to downplay the crucial need to undertake thoughtful validation. Readers know that I have been repeatedly calling for more attention to AI safety (see my coverage at the link here and the link here, for example). As AI systems, such as multi-agent AI conglomerations, are devised and brought to fruition, we are getting closer and closer to costly and potentially harmful in-production failures and breakdowns. Why so? Because AI safety doesn’t get the investment it requires, plus many in the AI field are simply unaware of what techniques and technologies can be used to appropriately validate their budding systems. Let’s talk about it. This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here). AI Safety Gets Deserved Attention In a posting entitled “Advancing Responsible AI In High-Stakes Environments” (May 9, 2025, Stanford Report), Professor Mykel Kochenderfer at Stanford University, an acclaimed expert on system safety and Associate Professor of Aeronautics and Astronautics, made these insightful remarks about the rising awareness that AI safety is a vital consideration impacting us all (excerpts): “The AI community is primarily focused on building AI systems, but there’s been relatively little focus on how to go about rigorously evaluating those systems before deployment.” “Assessing these systems is becoming very important because industry is making enormous investments, and we want to deploy these systems to reap the economic and other societal benefits.” “What we’re interested in doing in our lab – and as part of the Stanford Center for AI Safety, in general – is to develop quantitative tools to assist in system validation.” “Last year, we offered a course on the validation of safety-critical systems. So, if you have an AI system and have to guarantee a certain level of performance and reliability, how do you go about doing that?” “It’s the first course of its kind at Stanford, and we just released a preprint of the new textbook called Algorithms for Validation.” The textbook Algorithms for Validation is available for free currently as a preprint at the link here, and will be published next year by MIT Press (co-authors of the book are Mykel Kochenderfer, Sydney Katz, Anthony Corso, and Robert Moss). MORE FOR YOU Links on the same website of the preprint provide access to video recordings of the initial course offering, which took place from February 2025 to April 2025, and each video is approximately an hour in length. The course known as AA228V “Validation of Safety Critical Systems” was ably and proficiently taught by Dr. Sydney Katz, a postdoctoral scholar at Stanford, and accompanied by periodic guest speakers. For more about the research of Dr. Mykel Kochenderfer, Associate Professor of Aeronautics and Astronautics and, by courtesy, of Computer Science, Stanford University, see the link here. I also previously highlighted some of his leading-edge, innovative work in my discussion at the link here. AI Safety And Algorithm Validation I will tap into the Algorithms for Validation textbook and dovetail my lived experiences about the need for suitable systems safety throughout the life cycle of building AI systems. Having taught courses on systems safety while a professor at USC and executive director of an AI lab there, I also commercially applied the theories and concepts into real-world practice as a CTO at large and small companies. The gist is that for those who hand-wave away systems validation as somehow arcane or unneeded, they are in for a rude awakening when their AI ends up making severe mistakes that could have been detected and prevented at the get-go. I’ve been an expert witness in court cases involving legal liability associated with the lack of validation that AI builders could and should have performed -- but didn’t do so. Validating AI is readily feasible if AI makers and AI developers put their minds and their pocketbooks toward giving the matter a priority. A validation effort must occur on an A-to-Z basis when devising AI. This means that you don’t wait until after the AI has been constructed to then start thinking about its validation. During the time that an AI system is even being conceived as a potential AI to be built, validation already needs to be a top-of-mind topic in the room. Waiting to deal with validation once bugs are found is usually too little, too late. Nonetheless, if you are stuck in that all too common circumstance, validation is still worth pursuing. You do what you can to deal with the problems at hand. The Base Model Here’s how you can macroscopically think about the validation of algorithms. Suppose that we have an AI agent that is embedded in a humanoid robot, entailing the robot moving throughout a house that has adults and kids running around. This combination of AI and a robot is known these days as embodied AI or physical AI; see my explanation at the link here. Such robots are going to be used in all walks of life, including in the home, at schools, in the workplace, etc. We can readily say that the AI agent and robot in this instance are operating in a particular environment, namely, maneuvering within the confines of the house to perform daily chores. Perhaps the AI agent is devised to pick up items scattered throughout the house and put any soiled clothing into the laundry washer. The robot has various sensors that aim to detect aspects of the environment. A video camera embedded in the robot is capturing real-time video of the environment that can be scanned visually. Data from the sensor is fed to the AI agent. The AI agent has been crafted to evaluate the sensory data. Based on the assessment, the AI agent gives commands to the robot to make its way in the house, i.e., move within the environment of interest. Our system at hand consists of an agent, an environment, and a sensor. At any point in time, the agent is at a particular configuration within the environment, and we’ll refer to this as state “s”. The robot and AI agent might be on one side of the room (our state “s” for the moment), and after the AI agent commands the robot to walk over to the door on the other side of the room, we have moved from state “s” to some later state we’ll refer to as “s’ “. Validating The System We would want to have already done testing to determine that the algorithms of the AI agent and robot will safely enable a crossing of the room. The AI agent, via the sensory data, ought to avoid a couch and a table that are in the middle of the room. In addition to static objects, there are potential moving objects, such as a child darting in the derived pathway of the robot. We certainly don’t want the robot to bump into the exuberant youngster. We would have mindfully put together a specification that details the operating requirements associated with the in-home robot. Our validation effort then should have entailed validating the algorithms of the AI agent that have to do with the planning and execution of movement and activities within the home environment, doing so in comparison to the specification (there could be other validation considerations, but I’m simplifying things to focus on the movement of the robot). Think about the robot as taking a series of steps when it seeks to cross the room. Each step will be considered a state of where the robot is. Before the robot starts to move, the AI agent could explore whether to walk straight across or take a more circuitous route to try and avoid colliding with objects and people in the room. As noted in the cited textbook on page 8, here’s what we are considering: “The state space S represents the set of all possible states. An environment consists of an initial state distribution and a transition model. When the agent takes an action, the state evolves probabilistically according to the transition model. The transition model T (s’ | s, a) denotes the probability of transitioning to state s’ from state s when the agent takes action a.” In brief, the effort of crossing the room will involve making transitions from one state to another state. The goal that is presumably indicated in the specification will be to successfully cross the room and not bump into anyone or anything. A suitable validation effort would be to perform tests to showcase that the AI agent and robot will appropriately achieve that specification. If possible, we would like to have a formal guarantee or proof that the system at hand will never enter into a dangerous state. A dangerous state in this context would be the robot knocking over the table or ramming into a person in the room. Formal Methods Aplenty The textbook provides a wide variety of mathematical and computational formulations to bring someone up to speed on the validation of systems. It is an impressive and comprehensive approach that includes modeling techniques, temporal logic specifications, Markov chains, reachability analysis, explainability, etc. Examples and scenarios are vividly depicted. My favorites are those involving the complexities associated with airplanes and the avoidance of airborne collisions, along with the use of validation associated with self-driving cars and autonomous vehicles. I admit that I additionally have a preference for books that provide programming code. It is one thing to inspect a mathematical formula, which is absolutely needed, but there is a soft place in my heart for looking at code that reflects the sometimes-obtuse mathematical renderings. The programming language Julia was selected for showcasing code regarding validating systems. For those of you who are keen software engineers, I urge you to give Julia a look-see since it has a lot going for it. This science-oriented programming language first got underway nearly fifteen years ago. You will instantly recognize the language since it has a semblance or appearance of facets you would see in Python, MATLAB, R, C, and C++. The upsides include parametric polymorphism, just-in-time compilation, LISP-like features, its open source, and it runs rather speedily. That being said, the language tends to be used somewhat narrowly by computer scientists and traditionally found in academic, commercial, and governmental research settings, and is not generally widely known by software builders across the board. Snippets Of Julia Code Whenever I mention a somewhat less common programming language, this usually piques the interest of readers. What does it look like? Is it hard to grasp? No worries, I’ll make use of two snippets of code from the book and briefly explain what the code does. We want to define a function that can repeatedly be invoked and step through the three elements of my system, namely the sensor that captures data, the AI agent processing the data, and the change in state of the environment once the AI agent directs the robot to move ahead. Here’s a snippet of Julia code: function step (sys::System, s) o = sys.sensor (s) a = sys.agent (o) s′ = sys.env (s, a) return (; o, a, s′) The function is defined with a name of “step”. This function gets passed to it the “sys” or system of interest, along with the present state “s”. The state “s” feeds into the sensor, and the sensor then returns an observation that is labeled as “o”. The observation next gets fed into the agent, which provides a stipulated action labeled as “a”. The state “s” and action “a” are fed into the environment, such as having the robot take one step forward, and the resulting new state is s ‘. The function returns to its caller the observation, action, and updated state of s’. Easy-peasy. Another function will be named “rollout”. It will be used to repeatedly invoke the above “step” function. Each of the series of states will be collected into an array. The number of steps or iterations to undertake will be determined by passing along the desired count labeled as “d”. This is what the Julia code looks like: function rollout (sys::System; d) s = rand (Ps(sys.env)) for t in 1:d o, a, s′ = step (sys, s) push! (τ, (; s, o, a)) That snippet is a bit more complicated than the first snippet. You’ll undoubtedly recognize the looping with the use of the “for” statement, and the push statement entails adding items to the array. I’d bet that any versatile programmer recognizes the look of the code, and it rings a bell of overall familiarity. A GitHub site has been set up with all the code that accompanies the explanations and examples given in the book (see the link here). Validation Needs More Proponents Throughout most of the history of software development, the efforts to perform validation have often been hidden in backrooms and not received the glamour and adulation it richly deserves. Managers who oversee software developers and AI builders are at times pressured to keep their budgets lean, which makes cuts in validation a ready target. Just get the software out the door, and we’ll worry about whether it works properly once people start reporting bugs and errors. Exasperatingly, this mindset often prevails, and firms get away with undercutting their validation endeavors. AI systems are increasingly placed into high-stakes situations. We have AI that drives cars, AI that guides missiles and weapons systems, AI that controls power plants and factories, and so on. The low-stakes AI that got away with minimal, if any, validation has created a false sense of security that AI “just works” and there’s not much need to set aside devoted time, money, and priority to AI safety. Benjamin Franklin famously remarked that an ounce of prevention is worth a pound of cure. AI makers that do not wake up and realize that validating AI and embarking rigorously on AI safety are a requisite form of prevention are going to ultimately find themselves in deep trouble. Society is going to hammer down on those who gave AI safety short shrift. Do the right thing beforehand and get validation and AI safety at the forefront of your thinking and actions, and serve as a proponent that inspires similar thinking and actions of those around you. Join in and become a strident proponent of AI safety. Editorial StandardsReprints & Permissions

Learning The Most Rigorous Approaches To Validating Algorithms And Greatly Boosting AI Safety

Guess You Like