- Stream processing for unbounded datasets.
- Similar to PubSub
- Kafka Connect
- A tool for scalably and reliably streaming data between Apache Kafka and other systems.
- There are connectors to PubSub/Dataflow/BigQuery
- Can have precisely once delivery with Spark direct Connector in addition to at least once.
- Only at least once with PubSub
- Guaranteed ordering within a partition.
- No ordering guaranteed with PubSub
- No max persistence period
- 7 days or until acknowledged by all subscribers for PubSub
- Partitioning under user control
- Partitioning control abstracted away with PubSub
- Cluster Mirroring for disaster recovery
- Automated disaster recover for PubSub
- 1MB max size for data blobs
- 10 MB max size for PubSub
- Can change partitioning after setup (does not repartition existing data)
- Not under user control with PubSub
- Pseudo push model supported using Spark.