Monity wiki5/29/2023 ![]() ![]() Sometimes you’re not going to have all the data you need in an event stream, and joining real-time data represents a consistency trade off. Xiang gave me a new way to think about this. ![]() This problem is famously known as an impedance mismatch, which simply means that the best model for querying data isn’t always the best model for storing data, which causes us humans to translate between models while sacrificing things like performance, availability, or consistency. While it’s probably not the best way to query raw event streams, it turns out to be the best option for business analysts or developers that need to quickly build reports on top of data sources that were originally shaped to fit in tables. Why we still have tables today is because SQL tends to be the most widely used language for querying data. Tables have always been a pain to deal with in relational databases, and that’s nothing new. Xiang made mention that the best way to query immutable events, which may number into the hundreds of millions, and potentially billions, is to not join tables. Xiang Fu, one of the co-authors of Apache Pinot, provided me with an insight that helped wrap my head around using Kafka for event-driven data analysis. The next thing I need to do is create a decoration job that will replicate the server-sent events into a Kafka topic that I control.Īs a part of my research for putting together the example application discussed in this blog post, I relied on the help of friends. All changes across Wikipedia go through this pipe, which is at a rate of about 50 changes per second. Now I have a way to subscribe to the recent changes as they are happening. Here, I create a stream client that will process each server-sent event that is emitted by the recent change API. Returns a reactive streams subscriber that processes server-sent events (SSE) from the Wikimedia recent change stream API. The first thing we’ll do is create a reactive stream that processes recent changes being reported by Wikimedia’s event platform. There you will find more specific instructions for setting up the end-to-end example as well as usage information. The example application’s source code that I discuss in this blog post can be found on GitHub. Having used Spring Boot for nearly a decade, I decided to put the new reactive goodies to work for analyzing real-time events published by the Wikimedia platform. To keep pace with the recent demands of modern event-driven applications built on Apache Kafka, the Spring team led the charge back in 2017, having now introduced a fully end-to-end reactive application framework that is integrated across the Spring ecosystem of libraries.Īs a result of yet another Spring transformation - this time focused on high performance event-driven applications - emerging patterns for building reactive applications are continually surfacing. Today, Spring continues to evolve, as the oldest possible production deployment of a Spring Boot application would be almost a decade old. The application framework I chose was Spring Boot, which provides a robust solution for reactive streams. Building the application using Spring Boot Not only does it scale to the demands of high volume, it was built to scale to the organizational demands of needing to support fast real-time analytics on things happening right now. Pinot is completely self-service for developers and operators, and provides a storage model that makes sense for modern event-driven platforms. There’s no need to mess with custom serializers or to do heavy lifting to support long running applications that perform stream processing. Pinot scales based on the same principles as Kafka when it comes to performance, which makes it a go-to solution for running SQL queries on events that are stored in Kafka topics. What makes Pinot so powerful is that it plugs right into the kind of system that Wikimedia has built on top of Kafka. Since then, it has evolved into the most performant and scalable analytics platform for high-throughput event-driven data. Querying in Real-time with Apache PinotĪpache Pinot was created, as was Kafka, at LinkedIn to power analytics for business metrics and user facing dashboards. Kafka acts as the stateful backbone of their system, allowing the platform to ingest extremely large volumes of events that are capturing the real-time behavior of users on one of the most trafficked websites on the planet. Kafka is at the center of the architecture, and keeps data flowing as events are created by the various Wikimedia properties. In the diagram above ( courtesy of Wikimedia) they describe the various components of their event platform. Wikimedia’s Modern Event Platform architecture ![]()
0 Comments
Leave a Reply. |