We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!
The appeal of processing data in real time is increasing. Historically, organizations that adopted the streaming data paradigm have been driven by use cases such as application monitoring, log aggregation, and data transformation (ETL).
Organizations like Netflix are early adopters of the streaming data paradigm. Today, there are more drivers for growing adoption. In Lightbend’s 2019 survey, Streaming Data and the Future Tech Stack, new capabilities in artificial intelligence (AI) and machine learning (ML), integration of multiple data streams, and analytics are starting to rival these historic use cases.
The streaming analytics market (which may be just one segment of the streaming data market depending on definitions) is expected to grow from $15.4 billion in 2021 to $50.1 billion in 2026, at a compound annual growth rate (CAGR) of 26.5% during the forecast period according to Markets and Markets.
Again, historically there was some sort of de-facto standard for streaming data: Apache Kafka. Kafka and Confluent, the company that commercializes it, are an ongoing success story, with Confluent confidentially applying for an IPO in 2021.
In 2018, more than 90% of respondents to a Confluent survey viewed Kafka as essential to their data infrastructure, and Stack Overflow searches grew by more than 50% over the course of the year. However successful Confluent may be and however widespread Kafka may be, the fact remains: the foundation of Kafka was laid in 2008.
In recent years, a multitude of streaming data alternatives have emerged, each with a specific focus and approach. One such alternative is Apache Pulsar. In 2021, Pulsar was in the top 5 of the Apache Software Foundation project, surpassing Apache Kafka in monthly active contributors.
StreamNative, a company founded by the original developers of Apache Pulsar and Apache BookKeeper, just released a report comparing Apache Pulsar to Apache Kafka in performance benchmarks. StreamNative provides a fully managed Pulsar-as-a-service cloud and enables enterprises to “access data as real-time event streams”.
Pulsar vs. kafka
StreamNative is not the first company founded around Pulsar. Streamlio, another company founded by Pulsar core committers, was acquired by Splunk in 2019. Two of Streamlio’s founders, Sijie Guo and Matteo Merli, are the CEO and CTO of StreamNative, respectively.
As Addison Higham, StreamNative’s lead architect and head of cloud engineering shared, the company is focused on a bottom-up, community-driven approach and aspects such as technical development, documentation and training. Pulsar is used by Tencent, Verizon, Intuit, and Flipkart, among others, with the latter two also being StreamNative clients.
StreamNative has grown significantly in 2021. It raised $23.7 million in Series A financing, grew its team from 30 to more than 60 in North America, EMEA and Asia, and saw six times revenue growth and 3x growth in adoption, accelerated by AWS Marketplace integration, SQL support, and other updates. The community also grew twice and Pulsar surpassed 10,000 stars on GitHub.
Higham said the question of how Pulsar compares to Kafka is one they get a lot. The last widely publicized Pulsar vs. Kafka benchmark was run in 2020 and a lot has changed since then. Therefore, StreamNative’s engineering team conducted a benchmark study using the Linux Foundation Open Messaging benchmark.
According to StreamNative benchmarks, Pulsar can achieve 2.5 times the maximum throughput compared to Kafka. Pulsar offers consistent single-digit publishing latency that is 100 times lower than Kafka at P99.99 (ms). Low latency in publishing is important because it allows systems to quickly pass messages to a message bus.
With a historical read speed 1.5 times faster than Kafka’s, applications using Pulsar as their messaging system can catch up in half the time after an unexpected interruption. That said, we should keep in mind that the benchmark, like all benchmarks and especially those of suppliers, should be considered indicative.
In addition, as StreamNative also points out, the report focuses purely on comparing technical performance. While clearly important, that’s not all that matters when evaluating alternatives, as Higham also acknowledged. Many third parties have started comparing Pulsar and Kafka.
Higham said Pulsar and Kafka can behave similarly in many situations. Where StreamNative tries to distinguish itself with Pulsar is in the field of management and developer experience.
Architecture and Positioning of Pulsar
Higham referred to Pulsar’s legacy as a messaging-oriented platform, which later evolved to address streaming and events as well. This is reflected in Pulsar’s API and Higham believes it will make it easier for developers to adopt. While Pulsar does not have direct compatibility with Kafka, a feature called Protocol Handler allows it to interoperate with other system APIs, with a Kafka implementation prominent.
Higham said StreamNative is in regular contact with companies that use Kafka and found that they only have a large sprawl of hundreds or even thousands of Kafka clusters, almost one per application, which is ultimately not very cost-effective. Pulsar’s built-in multi-tenancy is designed to share workloads securely, which is extremely valuable at scale, Higham added, while also emphasizing features such as Geo-replication.
Pulsar also provides SQL access to streaming data via Trino, as well as data transformation Pulsar functions in languages such as Go, Java and Python. The latest version of Pulsar is 2.9.1, but when version 2.8 was released, the Pulsar team published a technical blog describing the architecture of Pulsar and we refer interested readers to it.
StreamNative claims that its Protocol Handler framework provides not only a clear migration path from Kafka, but also integration with other systems and protocols such as RocketMQ, AMQP, and MQTT. Higham noted that this is coming to StreamNative Cloud soon, with an emphasis on Kafka API support.
StreamNative Cloud is StreamNative’s primary source of revenue. In addition to supporting both a managed cloud offering, StreamNative provides value additions to Apache Pulsar for security and integration capabilities, including with platforms such as Flink, Spark, and Delta Lake.
CAs comparing Pulsar to other offerings in that space, such as Apache Flink or Spark Streaming, Higham said Pulsar isn’t really focused on trying to build something akin to one of those streaming compute engines.
What they are aiming for is “building a great integration story” [the] best of breed connector that is very flexible, ease of use and the simple 80% use cases of single message transformation,” said Higham. Pulsar has more in common with Redpanda in that they focus on solving some of those core pain points, but some of those pain points are not just in the implementation, but in the underlying protocol, Higham claims.
VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more about membership.
This post StreamNative publishes report with insights into the data streaming ecosystem
was original published at “https://venturebeat.com/2022/04/07/streamnative-releases-report-with-insights-into-data-streaming-ecosystem/”