How Metronome delivers advanced usage-based billing with Responsive.
Metronome’s platform makes billing fast, flexible, and frictionless, enabling customers like OpenAI, NVIDIA and Databricks to launch products rapidly, adopt any pricing model with ease, and streamline their revenue operations.
To ingest, process and serve over tens of thousands of billing events per second with strict transactional guarantees, Metronome designed their event-driven architecture using Confluent Cloud—a cloud-native data streaming platform built by the original creators of Apache Kafka®—and Kafka Streams.
Industry
Fintech
Company Size
51-200
Highlights
Suyog Rao
VP Engineering
Challenge
Metronome’s product necessitates realtime, transactional processing to display up-to-date billing dashboards and enable users to manage their spending with spend limits. They chose Kafka Streams to build the ingestion pipeline which powers these features for its simplicity, flexibility, and capabilities like exactly-once semantics, along with Confluent Cloud.
As their business grew, the massively increased throughput resulted in large amounts of Kafka Streams state, which caused reliability and operational issues with Kafka Streams. For Metronome customers, outages could result in their end customers exceeding spend thresholds, missing billing alerts, and seeing out-of-date information on their dashboards. To prevent this, the Metronome team spent significant engineering effort each quarter planning/executing operational toil such as increasing the number of Kafka partitions for their application. Each new customer and expansion compounded the operational risk to their core pipeline and Metronome knew they needed to be proactive in investing in foundational improvements to their Kafka Streams Pipeline.
Solution
Metronome turned to Responsive’s Kafka Streams’ platform to solve their stability challenge and ensure they are setup to scale with their aggressive business goals.
Responsive separates Kafka Streams’ compute from storage, eliminating Metronome’s main stability concerns stemming from rebalancing and state restoration in Kafka Streams. This also allowed them to increase their Kafka partitions from 32 to 96 in under ten minutes, an operational task that previously took multiple weeks to meticulously plan and execute.
With the subsequent release of asynchronous processing, Metronome is now able to scale their throughput 10x their current amount without facing the typical bottlenecks associated with increased per-partition load.
Additionally, thanks to Responsive’s autoscaling capabilities, Metronome’s entire pipeline now automatically scales with their load, enabling them to quickly burn through backlogs caused by load spikes without wasting computing resources the rest of the time.
Casey Crites
Founding Engineer
Results
- 99.99% availability on 300% growth: The separation of storage and compute has eliminated Metronome’s biggest source of Kafka Streams outages related to task rebalancing with large amounts of state.
- Setup for near-infinite scale: Metronome has been able to keep up with the immense demand for their popular product, as well as the growth of some of their biggest customers like OpenAI and NVIDIA. With features like autoscaling and asynchronous processing, they’re able to add additional resources and scale sufficiently for the foreseeable future.
- Engineers are freed to work on revenue generating features: The improved foundation for their event-driven applications paired with curated observability and expert support means Metronome’s engineering team is freed up to work on features that contribute to their business’ top line instead of fighting fires.