02 Stream Raw Data with ETL
Here's a professional and concise summary of the changes introduced in commit 42dd40d
to add Kafka export functionality to bitcoin-etl
:
Add Kafka Export to bitcoin-etl
Commit: 42dd40d
PR: #72
Author: @amitrahav
Purpose: Enable streaming of parsed Bitcoin data (blocks and transactions) directly to Apache Kafka topics.
Key Additions
-
New CLI Option:
--output kafka
- Extends the existing
stream
command to support Kafka as a streaming sink.
- Extends the existing
-
Kafka Exporter Integration
- Introduces a new
KafkaItemExporter
class to send items to Kafka. - Uses the
confluent_kafka.Producer
for high-performance message delivery.
- Introduces a new
-
New Arguments
--kafka-bootstrap-servers
: Comma-separated list of Kafka brokers (e.g.,localhost:9092
).--kafka-topic
: The Kafka topic to publish messages to.
-
Data Format
- Each message is serialized as a JSON string.
- Messages include block and transaction data parsed from Bitcoin Core via ZMQ.
-
Graceful Shutdown
- Ensures the Kafka producer flushes all buffered messages before exiting.
Files Modified
bitcoinetl/streaming/streamer_adapter.py
bitcoinetl/streaming/streamer.py
bitcoinetl/jobs/exporters/kafka_item_exporter.py
(new)cli/stream.py
(adds new CLI options)
Example Usage
git clone https://github.com/CoinsGPT/bitcoin-etl.git
cd bitcoin-etl
python3 bitcoinetl.py stream \
-p http://username:password@localhost:8332 \
--output kafka/localhost:9092 \
--period-seconds 0 \
-b 10 \
-B 50 \
--enrich true\
-l last_synced_block.txt