Liftbridge: Lightweight, fault-tolerant message streams

Tyler Treat
Real Kinetic Blog
Published in
4 min readMay 4, 2020

--

We’re big fans of open source. We’re also big fans of architectural patterns that allow you to decouple systems. Pub/sub messaging and other asynchronous processing patterns are things that we frequently help our clients implement in order to build more resilient, scalable, and decoupled services. That’s why we’re excited to announce the 1.0 release of a new open source software project called Liftbridge aimed to provide a lightweight and fault-tolerant solution to message streaming.

Liftbridge is similar in nature to Apache Kafka, Apache Pulsar, and related systems in that it provides a durable, high-performance, and replicated commit log with a pub/sub API. However, unlike these systems, which tend to be highly complex, nuanced, and have a significant amount of operational overhead, Liftbridge attempts to offer a focus on simplicity and usability. This is demonstrated through many of the design and implementation decisions. A few examples include the use of NATS as the messaging backbone, avoiding heavy dependencies on runtimes like the JVM and external coordination systems like ZooKeeper, compiling down to a small, single static binary, opting for a gRPC-based API, and relying on plain YAML configuration. Liftbridge is written in Go, and the code is structured with the hopes that it’s relatively easy for someone to hop in and contribute to the project.

I started Liftbridge back in October 2017 with the goal of bridging the gap between sophisticated but complex log-based messaging systems like Kafka and Pulsar and simpler, cloud-native solutions. This was also largely inspired by my work as a core committer on both NATS and NATS Streaming and drawing on my experience and lessons learned while working on those projects. It’s been nearly two years since I originally open-sourced Liftbridge, so I’m pleased to announce the project has now finally reached a 1.0 release. In practical terms, what this means is that the API has reached a point of stability suitable for production use and will provide a backward-compatibility commitment going forward. Liftbridge will continue to follow a semantic versioning scheme.

There are a number of unique features that set Liftbridge apart from other, similar systems. Unsurprisingly, one that has resonated the most with people I’ve talked to is the fact that it’s written in Go rather than Java or other JVM variants. There is something to be said about running a small static binary rather than a JVM. The second is that it doesn’t rely on ZooKeeper, which is the source of a lot of heartburn for operations folks. Liftbridge supports “wildcard topics”, which is a neat way to allow joining streams of information together. For example, streams can match wildcard topics like stock.nyse.* or stock.nasdaq.* in addition to topic literals like stock.nasdaq.msft. This enables some powerful use cases. Streams can be paused on demand and subsequently resumed when published to, which allows for conserving resources when dealing with large amounts of streams. One of my personal favorite features is the activity stream, which allows you to respond to events such as streams being created, deleted, paused, or resumed. This, for instance, allows you to orchestrate consumers as new streams spin up or spin down. What’s also cool is that several of these features were contributed by the community. A more complete feature comparison is available to see how Liftbridge compares to similar systems.

Additionally, there are the usual “table stakes” features like data replication for high availability and durability of messages, partitioning for horizontal scalability of streams, log compaction and retention rules, and metadata support for messages. The roadmap ahead includes some exciting stuff as well like auto-pausing of sparsely used partitions, durable and fault-tolerant consumer groups, a better stream re-partitioning story, and broader client support.

Liftbridge is fully open source. You can use it, fork it, and modify it however you want. Real Kinetic is also providing commercial support and consulting services around Liftbridge. As I mentioned earlier, a big part of our normal consulting is working with our clients to understand and implement asynchronous patterns for improving fault-tolerance and scalability as they transition from monolithic architectures to microservices. We often see clients struggling with systems like Kafka, Pulsar, RabbitMQ, and Amazon Kinesis. Liftbridge can provide a compelling alternative.

If you’re already using Liftbridge today or are thinking about using it, I’d love to hear from you. Be sure to follow Liftbridge on Twitter and join the community Slack channel to stay up-to-date on the latest developments.

--

--

Managing Partner at Real Kinetic. Interested in distributed systems, messaging infrastructure, and resilience engineering.