By Jelani Harper,Vicki Walker
Copyright thenewstack
When real-world systems depend on split-second decisions, the database must deliver intelligence, not just storage. Traditional databases are designed to capture and retrieve information quickly, but today’s industrial systems, digital twins and predictive applications need more than speed. They need intelligence built in, so insights emerge as fast as the data itself.
“Most databases ingest data, store it and hope to read it back to you fast,” explained Peter Barnett, lead product manager at InfluxData. “Oftentimes, they’re not doing any of the actual analysis or processing; they’re just storing the data. We wanted to build beyond storage to turn the database into an active intelligence engine where data isn’t just managed: it’s actively processed and analyzed.”
InfluxDB 3 introduces this capability by building intelligence directly into the database. Its Python-powered Processing Engine, flexible deployment design and diskless architecture combine to eliminate the overhead of traditional data pipelines and deliver real-time analytics where action takes place.
Python Processing Engine: AI Where the Data Lives
At the heart of this shift is a Python virtual machine (VM) built directly into the database. “If you want to take advantage of the new AI tools that can write scripts and accelerate development, Python is one of the most well-known and widely adopted languages for development,” Barnett said.
The popular data science framework also offers a host of analytics and data processing libraries (including NumPy and Polars) that integrate with contemporary AI libraries. The Processing Engine consists of plugins — bespoke Python scripts that can access any Python library — devised to cover the range of use cases for the engine, including anomaly detection, forecasting and alerting.
“We also built our own line builder protocol on top of it, and that makes interactions with the Processing Engine much simpler,” Barnett added. “These plugins can execute on writes, schedules and on demand, so you can do data transformations, aggregations and ad hoc actions in real-time.”
The plugins help users monitor complex systems, such as digital twins or network activity in real time, while detecting subtle changes to anticipate problems before they occur.
Edge Deployments: Intelligence Without the Overhead
The overall gains of implementing intelligence directly within the database become magnified when InfluxDB is implemented at the edge. The traditional overhead associated with sending data across and between networks for time series applications evaporates when users install the engine on an edge device. In addition to the inherent costs of setting up and maintaining data pipelines, they simply broaden the surface area of “where things can go wrong,” Barnett said.
Conversely, the efficiency of InfluxDB 3 Core, an open source, recent-data collector, and of the Enterprise version, which can perform edge processing, is difficult to deny. The former can “live on these smaller edge devices for real-time querying, with a significantly lower storage overhead,” Barnett said. It’s optimized for use cases spanning real-time system monitoring, edge data collection and transformation, streaming analytics, sensor alerting, and anything else where data needs to be collected and processed at fast rates.
Diskless Architecture: Scale, Resilience and Cost Savings
Part of what makes these edge deployments possible is InfluxDB 3’s diskless architecture, which enables high availability (HA), instant failovers and seamless scalability. Since the engine stores data externally via Parquet in object storage (including AWS S3 and Azure Blob Storage), there’s an extremely low storage overhead.
The cost advantages of this approach are apparent. Instead of storing high-volume time series data on a single machine and replicating it to others after setting up a cluster environment, the data can land in cheap object storage. Other benefits of this architecture pertain to performance, a lack of complexity and minimization of points of failure.
“We can store all that data in an object store and you can point to it, and within seconds you can start reading that data from different nodes,” Barnett explained. “It’s just a simpler way for you to create a commercial environment that otherwise would take much longer, with greater overhead and probably a Kubernetes implementation.”
These performance gains are critical for time-sensitive use cases, from equipment failures like lightning strikes on an aircraft to natural disasters, where fast analysis and remediation are crucial.
The FDAP Foundation
Under the hood, the Apache-backed FDAP stack — FlightSQL, DataFusion, Arrow and Parquet — powers this architecture. Apache Arrow Flight delivers high-speed query performance. DataFusion provides a Rust-based SQL optimizer and execution engine using Apache Arrow as its memory model. Parquet adds the compression and efficiency needed for massive time series workloads.
With that format, the database has “a highly optimized storage solution, and the compression ratio on Parquet is phenomenally better than many other storage solutions we’ve seen,” Barnett said. “We’re able to get much higher compression and a much lower storage footprint that leads to much better cost savings and increased efficiency in these high cardinality use cases.”
Continuing the Momentum
InfluxDB 3 is turning the database from a passive store into an active intelligence engine, and the momentum isn’t slowing. InfluxData ships monthly updates across InfluxDB 3 Core and Enterprise; versions 3.2 and 3.3 have managed Processing Engine plugins for common time series tasks. The latest release, InfluxDB 3.4, added automated setup and workflow features for both Core and Enterprise. Version 3.5, scheduled for release at the end of September, is set to introduce even more ways to leverage the Processing Engine in daily workflows.
By embedding a Python-powered Processing Engine directly into the database and pairing it with a diskless architecture built on the FDAP stack, InfluxDB 3 collapses the traditional gap between data collection and analysis. For developers, that means less infrastructure to manage, faster paths from raw signals to actionable insight, and a platform that continues to evolve on a predictable cadence.