from_jsonl_to_parquet.py¶
Purpose: Convert JSONL event files to Parquet format
Use case: Preparing data for analysis with pandas, polars, or other data science tools
Features:
- Merges multiple JSONL files into a single Parquet output
- Adds UNIX timestamp columns in seconds (
*_s) alongside microseconds (*_us) - Adds
bme280_validcolumn flagging rows within BME280 hardware spec - Configurable compression (snappy, gzip, zstd, none)
- Preserves all dynamic fields from event data
- Output filename is fixed to
run.parquetin the input directory
Required input: JSONL event files produced by get_events.py or get_runs.py
Usage:
# Default pattern (events*.jsonl)
uv run examples/from_jsonl_to_parquet.py 20251221_run126/
# Custom pattern
uv run examples/from_jsonl_to_parquet.py 20251221_run126/ --pattern "*.jsonl"
# Higher compression
uv run examples/from_jsonl_to_parquet.py 20251221_run126/ --compression gzip
# With verbose output
uv run examples/from_jsonl_to_parquet.py 20251221_run126/ --verbose
# Overwrite existing output
uv run examples/from_jsonl_to_parquet.py 20251221_run126/ --overwrite
CLI Options:
| Option | Default | Description |
|---|---|---|
READ_FROM |
(required) | Directory containing JSONL files |
--pattern |
events*.jsonl |
Glob pattern to filter input files |
--compression |
snappy |
Compression codec (snappy, gzip, zstd, none) |
--add-timestamps / --no-add-timestamps |
on | Add *_s columns from *_us fields |
--overwrite |
off | Overwrite existing output file |
--verbose / --quiet |
--quiet |
Show or suppress status messages |
--log-level |
error |
Log level (debug/info/error) |
Output file: READ_FROM/run.parquet
- All original fields preserved (31+ columns)
- Additional
*_stimestamp columns (received_s,sent_s,detected_s,gnss_time_s) - Additional
bme280_validcolumn (bool)