Technology

I replaced my smart speakers with cheaper open-source alternatives, and I’ll never go back

I replaced my smart speakers with cheaper open-source alternatives, and I'll never go back

Smart speakers like the Google Nest Hub or Amazon Echo are convenient, but they come with trade-offs: you sacrifice control, flexibility, and often privacy as well. As I’ve gone down the Home Assistant rabbit hole, I’ve begun to play with more open-source alternatives, starting with the Home Assistant Voice Preview Edition. Once I’d used that particular device, I realized that I didn’t need to rely on commercial hardware anymore. Through the power of ESPHome and amazing hardware like those in the ReSpeaker line and a separate foray with an ESP32-powered display, I discovered how easy it was to build my own open source replacements without the compromises.
If you’re looking to unshackle yourself from the chains of Google, Amazon, and other smart home providers, I can’t recommend this enough. Not only do you have the power to build whatever you want, but that includes your own voice commands, software integrations, and more. I can’t see myself ever wanting to go back, because why would I? Putting those benefits aside, these devices tend to be cheaper, too. The ReSpeaker Lite comes in at $30, the XVF3800 at $55, and the Elecrow 7-inch display at $40, all at the time of writing.
Building my own voice assistant with ESPHome and a local LLM
Nothing has to leave my network
The heart of my setup is ESPHome, which allows me to deploy YAML-based configurations to devices powered by an ESP32. In terms of voice assistants, it started with the ReSpeaker Lite and the ReSpeaker XVF3800, though even the Home Assistant Voice Preview Edition is developed commercially with ESPHome.
All of these devices work like a Google Nest Mini or Amazon Echo, complete with audio output, integrated speakers of your choosing (if you prefer that), and everything is customizable. They run entirely on my network, and thanks to Whisper which transcribes speech to text and the Ollama instance on my Proxmox server, even the responses are generated locally. My primary voice assistant even sounds like GLaDOS from Portal, and responds like her, too.
The other side of all of this is that I love learning, and that’s been a major driver in my obsession with these devices. They’re a great way to get started with all kinds of technology, from networking, to signaling, to wiring, and more.
As an example, back when I tested the ReSpeaker Lite, I wanted to see what I could do with regards to audio. My dad gave me the TDK OutLoud CD Wallet, a speaker that was released in the latter half of 2003 and required batteries to work. It wouldn’t power on, and audio wouldn’t play, but he was happy for me to take it apart and see what I could do with it. A few hours later, I had a portable CD wallet voice assistant. This is still one of my favorite projects I’ve built, and it was my first experience with soldering, too.
ESPHome also makes the configuration fairly simplistic once you get past the initial hurdles of figuring out how to interface with the hardware. As someone with programming experience, I’ve felt comfortable enough to read the official datasheets and deploy my own software interfacing with the hardware directly. In most cases, though, people have already done the work for you, and you can find code snippets for most devices on GitHub.
Finally, unlike proprietary assistants, I can control everything. I can tweak the wake-word sensitivity or even train my own one, change how responses are played back and redirected to different devices, and run the entire pipeline locally if I wanted to. It’s not just a privacy thing, it’s an actual ownership thing as well.
Building a Nest Hub alternative with ESPHome and an ESP32 display
A rewarding challenge
This has been one of the most complex projects I’ve ever worked on, and to be honest, I’ve procrastinated on expanding it because of the sheer amount of work it was to get it working in the first place. Given that I have three devices that serve as Google Nest Mini replacements, the next step was to replace the Google Nest Hub that I’ve used in my kitchen. Thanks to the Elecrow CrowPanel Advance 7-inch HMI ESP32 AI Display, it was quite easily possible.
This particular screen has a microphone built in alongside a JST PH 2.0mm speaker output. Starting with the ESP32-S3-Box-3 ready-made project, I ported its primary functionality to the Elecrow display, providing visual feedback for voice commands. It originally used ESPHome’s standard display drawing libraries, which was slow but could show the text of what I had asked and the response. However, I switched over to LVGL, which is a lot faster, though I never got around to porting the text, too.
With this as a base, it would be trivial to build additional features, too. A standard always-on display with the time and date for one, alongside a Google Calendar integration and other services. It’s not fully on par with everything the Google Nest Hub can do (and some things, like video streaming, are likely out of the question), but it’s possible to integrate most features alongside others that could never work on the Nest Hub.
ESPHome’s tight-knit integration with Home Assistant makes it possible to do so much with it, and you’re limited by your programming ability and the time you’re willing to dedicate to it. Remember how I mentioned that you can find GitHub repositories out there to help you get started? Well, I published that entire project on GitHub, too.
ESPHome, and the freedom of local control
Deceptively simple to start
ESPHome sits at a sweet spot between raw coding and pure plug-and-play. You don’t need to write thousands of lines of C++ or mess with low-level drivers… unless you want to, that is. Typically, you simply describe how your hardware should behave in YAML, like what pins are connected, what sensors exist, how they should display or report data, and ESPHome takes care of the rest.
As a result, getting started in ESPHome is deceptively simple. As things advance, it can get a lot harder, especially when writing C++ lambdas inside of your YAML, but the results make it incredibly worth it. Maybe I’m just a nerd, but reading datasheets and tinkering with I2S microphones or custom display drivers has been surprisingly fun, and I’ve loved learning, adapting, and porting software to run on different devices in order to make them truly my own. That Elecrow display is a prime example of that.
Commercial voice assistants always come with caveats: your data lives in the cloud, your devices get software updates on someone else’s schedule, and if a company decides to shut down a feature, you’re out of luck. Here, I’m still beholden to ESPHome’s updates in a sense, but I don’t need to worry as much as the entire system runs locally, anyway. Everything is mine to tweak, and even when my internet goes down, my assistant works.
These devices don’t track me, there’s no advertising, and they’re truly mine. I control the behavior, and when I’m done with using it as a voice assistant, it doesn’t just get chucked into a drawer and never get used again. Instead, I can turn it into something else that’s useful to me.