06 Bitcoin Data Verification
To verify that Bitcoin block and transaction data is complete in ClickHouse, you need to ensure:
- No missing blocks
- No missing transactions
- No duplicate block heights
- Transaction counts match block metadata
1. block height continuityβ
Run this to detect missing block numbers:
WITH seq AS (
SELECT number
FROM numbers(
toUInt64(
ifNull((SELECT max(number) FROM blocks_fat), 0) + 1
)
)
)
SELECT seq.number AS missing_block_number
FROM seq
LEFT ANTI JOIN blocks_fat AS b ON seq.number = b.number
If this query returns rows, those block numbers are missing.
2. Check for duplicate blocksβ
Check for duplicate block numbers (heights):
SELECT number, count() AS cnt
FROM blocks_fat
GROUP BY number
HAVING cnt > 1
You should see zero rows returned.
if the count of one block is bigger than 1, please use final command to deduplicate the same rows
OPTIMIZE TABLE blocks_fat FINAL;
3.Transactions Missingβ
To detect missing transactions from the transactions_fat table that are listed in the blocks_fat.transactions array, and process it partition by partition
- To list all the partition in the blocks_fat table
SELECT DISTINCT toYYYYMM(timestamp_month) AS part
FROM blocks_fat
ORDER BY part ASC
- To find the transactions which are missing from transactions_fat table partition by partition
WITH flattened AS (
SELECT
hash AS block_hash,
arrayJoin(transactions) AS tx_hash
FROM blocks_fat
WHERE toYYYYMM(timestamp_month) = 201304
)
SELECT
flattened.block_hash,
flattened.tx_hash,
t.hash
FROM flattened
LEFT JOIN (
SELECT hash
FROM transactions_fat
WHERE toYYYYMM(block_timestamp_month) = 201304
) AS t
ON flattened.tx_hash = t.hash
WHERE t.hash != flattened.tx_hash;
The missing transaction examples
Block: 364773 Missing TX: 9fdbcf0ef9d8d00f66e47917f67cc5d78aec1ac786e2abb8d2facb4e4790aad6
Block: 364784 Missing TX: 9c667c64fcbb484b44dcce638f69130bbf1a4dd0fbb4423f58ceff92af4219ec
Block: 367885 Missing TX: 30b3b19b4d14fae79b5d55516e93f7399e7eccd87403b8dc048ea4f49130595a
Block: 367906 Missing TX: dd6067e71c04cb62f8e5aa52ecc99b01ffcd551a52727d046a2fabb14eb39b4d
Block: 367904 Missing TX: 740ac533882221099e7202bbdafbb99ec589c6e74fd2fe7ca1274b46ea4f0a96
Block: 367899 Missing TX: f2e197a6d8d088b13afd0f99d4027da36a9413b9f3d7730ba5278132ebc950a7
Block: 367886 Missing TX: cf1032c2213e6faea04f1813aa6890e7f588bb378cb98e7425aec83c11d4457c
Block: 367877 Missing TX: 52539a56b1eb890504b775171923430f0355eb836a57134ba598170a2f8980c1
Block: 367911 Missing TX: 5f4d2593c859833db2e2d25c672a46e98f7f8564b991af9642a8b37e88af62bc
Block: 367897 Missing TX: 8dabbf51f78c1e7286866af1de403118c5ddbe57ca93b54859245916d2bf1063
Block: 367891 Missing TX: c9fe64681c9a12795586a3ae7c5e94b585032f67847c7f9c42e1b979a1e2959b
Block: 367906 Missing TX: dd6067e71c04cb62f8e5aa52ecc99b01ffcd551a52727d046a2fabb14eb39b4d
Block: 367904 Missing TX: 740ac533882221099e7202bbdafbb99ec589c6e74fd2fe7ca1274b46ea4f0a96
Block: 367899 Missing TX: f2e197a6d8d088b13afd0f99d4027da36a9413b9f3d7730ba5278132ebc950a7
Block: 367885 Missing TX: 30b3b19b4d14fae79b5d55516e93f7399e7eccd87403b8dc048ea4f49130595a
Block: 367906 Missing TX: dd6067e71c04cb62f8e5aa52ecc99b01ffcd551a52727d046a2fabb14eb39b4d
Block: 367904 Missing TX: 740ac533882221099e7202bbdafbb99ec589c6e74fd2fe7ca1274b46ea4f0a96
Block: 367899 Missing TX: f2e197a6d8d088b13afd0f99d4027da36a9413b9f3d7730ba5278132ebc950a7
Block: 367886 Missing TX: cf1032c2213e6faea04f1813aa6890e7f588bb378cb98e7425aec83c11d4457c
Block: 367877 Missing TX: 52539a56b1eb890504b775171923430f0355eb836a57134ba598170a2f8980c1
Block: 367911 Missing TX: 5f4d2593c859833db2e2d25c672a46e98f7f8564b991af9642a8b37e88af62bc
Block: 367897 Missing TX: 8dabbf51f78c1e7286866af1de403118c5ddbe57ca93b54859245916d2bf1063
Block: 367891 Missing TX: c9fe64681c9a12795586a3ae7c5e94b585032f67847c7f9c42e1b979a1e2959b
Block: 367885 Missing TX: 30b3b19b4d14fae79b5d55516e93f7399e7eccd87403b8dc048ea4f49130595a
Block: 367911 Missing TX: 5f4d2593c859833db2e2d25c672a46e98f7f8564b991af9642a8b37e88af62bc
Block: 367877 Missing TX: 52539a56b1eb890504b775171923430f0355eb836a57134ba598170a2f8980c1
Block: 367897 Missing TX: 8dabbf51f78c1e7286866af1de403118c5ddbe57ca93b54859245916d2bf1063
Block: 367891 Missing TX: c9fe64681c9a12795586a3ae7c5e94b585032f67847c7f9c42e1b979a1e2959b
Block: 367886 Missing TX: cf1032c2213e6faea04f1813aa6890e7f588bb378cb98e7425aec83c11d4457c
- If you find there are transactions missed, please double check whether it contained in the bitcoind
bitcoin-cli getrawtransaction <txid> true
- Apache Kafka has several key limitations and configurable constraints that impact how much data can be produced, buffered, and consumed. Here's a clear breakdown of the main types of limitations:
- Message Size Limitations
- Buffering & Memory Limits
- Log & Retention Limits
- Consumer Fetch & Processing Limits
- Connection & Network Limits
To enlarge your Kafka topic transactions
to support messages up to 90MB, you must:
- Update the topic config (
max.message.bytes
) - Update the broker config (
message.max.bytes
andreplica.fetch.max.bytes
) - Update your producer config (
max.request.size
,buffer.memory
)
It works after finish the first step to update the transactions topic config.
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name transactions \
--alter \
--add-config max.message.bytes=94371840
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name transactions --describe
Dynamic configs for topic transactions are:
cleanup.policy=delete sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:cleanup.policy=delete, DEFAULT_CONFIG:log.cleanup.policy=delete}
max.message.bytes=94371840 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:max.message.bytes=94371840, DEFAULT_CONFIG:message.max.bytes=1048588}
retention.bytes=1073741824 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:retention.bytes=1073741824, DEFAULT_CONFIG:log.retention.bytes=-1}
retention.ms=86400000 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:retention.ms=86400000}
4.Transactions Missing Reverseβ
To find transactions listed in transactions_fat that are not present in blocks_fat.transactions, partition-by-partition (month-by-month)
WITH
transactions_fat_partitioned AS
(
SELECT
hash,
block_hash
FROM transactions_fat
WHERE toYYYYMM(block_timestamp_month) = 201304
),
blocks_fat_partitioned AS
(
SELECT
hash AS block_hash,
arrayJoin(transactions) AS tx_hash
FROM blocks_fat
WHERE toYYYYMM(timestamp_month) = 201304
)
SELECT
t.hash AS missing_transaction_hash,
t.block_hash AS expected_block_hash,
b.tx_hash
FROM transactions_fat_partitioned AS t
LEFT JOIN blocks_fat_partitioned AS b ON (t.hash = b.tx_hash) AND (t.block_hash = b.block_hash)
WHERE b.tx_hash != t.hash
4. duplicate transactionsβ
SELECT hash, count() AS cnt
FROM transactions_fat
GROUP BY hash
HAVING cnt > 1
Again, this should return no rows.
5. Python to Loop Partition by Partitionβ
Here is a Python script that:
- Connects to ClickHouse via
clickhouse-connect
. - Iterates over partition months (you define the range).
- Runs a query per partition to find transactions listed in
blocks_fat
but missing fromtransactions_fat
.
from clickhouse_connect import get_client
from datetime import datetime
from dateutil.relativedelta import relativedelta
# === CONFIGURATION ===
CLICKHOUSE_HOST = 'localhost'
CLICKHOUSE_PORT = 8123
CLICKHOUSE_USER = 'default'
CLICKHOUSE_PASSWORD = 'password'
DATABASE = 'bitcoin'
START_MONTH = '2010-01' # yyyy-mm
END_MONTH = '2013-11'
# === INIT CLIENT ===
client = get_client(
host=CLICKHOUSE_HOST,
port=CLICKHOUSE_PORT,
username=CLICKHOUSE_USER,
password=CLICKHOUSE_PASSWORD,
database=DATABASE,
)
# === GENERATE PARTITIONS ===
def generate_partitions(start, end):
partitions = []
start_date = datetime.strptime(start, '%Y-%m')
end_date = datetime.strptime(end, '%Y-%m')
current = start_date
while current <= end_date:
partitions.append(current.strftime('%Y%m'))
current += relativedelta(months=1)
return partitions
# === MAIN LOOP ===
partitions = generate_partitions(START_MONTH, END_MONTH)
for partition in partitions:
print(f'\nπ§ͺ Querying Partition: {partition}')
query = f"""
WITH flattened AS (
SELECT
hash AS block_hash,
tx_hash
FROM bitcoin.blocks_fat
ARRAY JOIN transactions AS tx_hash
WHERE toYYYYMM(timestamp_month) = {partition}
)
SELECT
flattened.block_hash,
flattened.tx_hash
FROM flattened
LEFT JOIN (
SELECT hash
FROM bitcoin.transactions_fat
WHERE toYYYYMM(block_timestamp_month) = {partition}
) AS txs
ON flattened.tx_hash = txs.hash
WHERE txs.hash != flattened.tx_hash
"""
result = client.query(query)
rows = result.result_rows
if rows:
print(f"β οΈ Missing transactions in partition {partition}: {len(rows)}")
for row in rows:
print(f" Block: {row[0]}, Missing TX: {row[1]}")
else:
print(f"β
All transactions found in partition {partition}")
pip install clickhouse-connect python-dateutil