Copyright SiliconANGLE News

As 2025 draws to a close, the production-grade enterprise AI landscape looks very different from where it began. What started as a rush to train foundation models has evolved into an industry-wide focus on inference at scale — turning data into outcomes with speed, precision and sustainability. WekaIO Inc. is among the companies leading that shift, building data infrastructure designed for AI’s “second wave.” In Weka’s 2025 outlook, Chief Technology Officer Shimon Ben-David described this pivot as the logical next phase for enterprise adoption. “We’re entering the second wave of AI adoption, where inferencing and fine-tuning pre-trained models will take center stage,” he wrote. “Organizations will increasingly leverage existing models as customizable tools, rather than investing time and resources into building new ones from scratch.” This shift reflects a pragmatic desire to accelerate return on investment and simplify deployment, according to Ben-David. The emphasis is on “turning vast amounts of raw data into actionable insights quickly and efficiently” while fine-tuning models for domain-specific applications that drive real business value, he added. That focus is reshaping infrastructure priorities. Instead of designing environments solely for training, enterprises are now optimizing for inference pipelines. “High-performance, scalable systems capable of handling these AI pipelines with low latency will be critical to success,” Ben-David said. This feature is part of SiliconANGLE Media’s ongoing exploration into enterprise AI infrastructure and sustainable compute design. (* Disclosure below.) The WAARP blueprint: Making production-grade enterprise AI real Weka introduced its AI RAG Reference Platform, or WAARP, as a modular blueprint to help customers operationalize inference at scale. Ben-David described it as a way to “make the impossible possible” for production-grade AI environments. “We saw customers had mostly built proof-of-concept-grade RAG platforms that could show their potential, but they lacked the muscle and results of true production RAG environments,” Ben-David said during an interview with SiliconANGLE Media. “Fast-forward to 2025 and we are taking WAARP to a new level, adding more layers to improve visibility and security … including support spanning multiple clouds.” That modularity has proved essential as enterprises scale up. WAARP combines the Weka Data Platform with ecosystem partners, including Nvidia Corp. and Run:ai Inc., delivering validated, interoperable layers for software, hardware and orchestration. “The gen AI inference ecosystem is made up of many different components and layers,” Ben-David said. “To make these AI environments successful, organizations need validated solutions where each of these AI components is first optimized for best-of-breed performance in its own individual layer within the WAARP blueprint.” Power, performance and sustainability Efficiency has become the new competitive currency in AI. Enterprises are no longer measuring success only by model performance, but also by how effectively they can balance compute power with energy use. “Power has become the currency of this new era,” Ben-David wrote in Weka’s 2025 predictions, noting that data-center energy constraints will define the next phase of the AI economy. He believes that power efficiency “isn’t only a matter of cost control; it’s a competitive differentiator.” Ben-David tied that principle directly to Weka’s design philosophy. He emphasized that efficiency is built into the company’s DNA, shaping how its software handles data and compute from the ground up. “Weka’s data platform software is designed to be both highly performant and efficient, including built-in energy efficiency,” he said during an interview with SiliconANGLE. “On a per-petabyte basis, in terms of throughput and IOPs, we deliver ultra-fast performance, and for gen AI that translates to optimal efficiency per token.” Weka’s approach dramatically reduces wasted GPU time, Ben-David explained. Idle compute resources can become one of the biggest drains on both cost and sustainability when not properly optimized. “GPUs in a data center are often idling up to 70% of the time, just waiting for data,” Ben-David said. “The WekaData Platform accelerates GPUs and ensures GPUs are working on the right valuable information rather than spending cycles recalculating older information.” Scaling toward exascale and beyond As Weka expands its footprint, Ben-David has underscored that exascale performance is no longer theoretical — it’s the new threshold for AI and high-performance computing. “We built Weka to create an environment, a file system at the core that has no compromises,” he told theCUBE. “We created Weka to service the new AI workloads in a very efficient manner to ensure you can get the most utilization out of your high-performance compute GPU environments.” That architecture functions as “an extension of local GPU memory,” letting AI and HPC environments “maximize their computing power without bottlenecks,” according to Ben-David. It’s a practical reflection of Weka’s long-term vision: a unified data platform that scales seamlessly with the accelerating demands of AI, HPC and analytics workloads. With SC25 fast approaching, Ben-David’s forecast is unfolding. Inference workloads are dominating enterprise AI investment, sustainable design is emerging as a core requirement and the conversation has moved from proofs-of-concept to production reality. “Those who can scale their AI workloads efficiently while minimizing energy use will lead the way in an increasingly power-limited world,” Ben-David predicted. As enterprises prepare for 2026, Weka’s blueprint — fast, efficient and exascale-ready — illustrates where the next phase of AI infrastructure is headed: toward an era where intelligence, efficiency and sustainability converge. (* Disclosure: TheCUBE was a paid media partner for SC25. Neither Weka, a sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.) Photo: SiliconANGLE