Summary
Data Streaming is used for real-time data processing, allowing continuous flow and processing of data as it arrives. This is different from batch processing, which handles data in chunks.
-
The key to data streaming is the ==Publish-Subscribe== (Pub/Sub) model:
- Producers publish messages to specific channels called Topics.
- Consumers subscribe to these Topics to receive the data in real-time.
-
Apache Kafka is a widely-used framework for data streaming. It has several key features:
- High throughput: Kafka can process millions of messages per second.
- Scalability: It can scale to thousands of brokers (servers) to handle large data streams.
- Immutable commit log: Kafka maintains an append-only log of messages, which ensures data integrity and replayability.
-
Real-world example:
- Companies like Netflix use Kafka to handle billions of messages daily, powering real-time recommendations, analytics, and user activity tracking.