v0.11.0 - JSONL/Parquet/ROOT Data Format Conversion Scripts (2025-12-28)¶
What Changed?¶
This release adds three new example scripts for converting OSECHI event data between formats: JSONL → Parquet, Parquet → ROOT, and direct JSONL → ROOT conversion.
The scripts provide flexible data pipeline options for both ROOT physics analysis users and Python data science users (Pandas/Polars).
All scripts follow PEP 723 specification for standalone execution with uv run.
What's New¶
Feature 1: JSONL → Parquet Conversion¶
What it does: Converts multiple JSONL event files from the OSECHI detector into a single Parquet file for efficient analysis with Pandas, Polars, or other data science tools.
How to use it:
uv run from_jsonl_to_parquet.py 20251221_run126/
Key features:
- Glob pattern support for flexible file selection (default:
events*.jsonl) - Automatic timestamp conversion (adds
*_sfields from*_usfields) - Configurable compression: snappy (default), gzip, zstd, or none
- Progress tracking with
--verbosemode - Schema validation for required fields
Feature 2: Parquet → ROOT Conversion¶
What it does: Converts a Parquet file into ROOT TTree format for analysis with ROOT framework or uproot.
How to use it:
uv run from_parquet_to_root.py 20251221_run126/
Key features:
- Flat TTree structure (one branch per column)
- Optional alphabetical branch sorting
- Compatible with ROOT analysis workflows
- Automatic type mapping (int64 → Long64_t, float64 → Double_t)
Feature 3: Direct JSONL → ROOT Conversion¶
What it does: Single-step conversion from JSONL event files directly to ROOT TTree format, combining the benefits of both previous scripts.
How to use it:
uv run from_jsonl_to_root.py 20251221_run126/
Key features:
- One-command workflow for quick ROOT analysis
- All timestamp and sorting options available
- Performance: ~1-2 seconds for typical detector runs
Installation¶
Quick Start¶
# Get the release
git checkout v0.11.0
# Setup
task env:setup
# Try one of the conversion scripts
uv run examples/from_jsonl_to_parquet.py path/to/events/
# Or run the CLI as usual
uv run kazunoko --help
Script Requirements¶
All three scripts require Python 3.10+ and use PEP 723 for dependency management. Run with uv run:
# JSONL → Parquet (requires: pandas, pyarrow, typer, rich)
uv run examples/from_jsonl_to_parquet.py [OPTIONS]
# Parquet → ROOT (requires: pandas, pyarrow, uproot, typer, rich)
uv run examples/from_parquet_to_root.py [OPTIONS]
# JSONL → ROOT (requires: pandas, uproot, typer, rich)
uv run examples/from_jsonl_to_root.py [OPTIONS]
What's Different from the Last Version?¶
✅ Added¶
- Three new data format conversion scripts for flexible event analysis workflows:
from_jsonl_to_parquet.py: Convert JSONL events to Parquet formatfrom_parquet_to_root.py: Convert Parquet files to ROOT TTree formatfrom_jsonl_to_root.py: Direct JSONL to ROOT conversion in one step- Comprehensive documentation in examples/README.md with usage patterns and performance metrics
- PEP 723 compliance for all example scripts - run standalone with
uv runwithout external dependency files - Timestamp conversion features: Automatic addition of UNIX second timestamps (
*_s) alongside microsecond originals (*_us) - Schema validation: Required field checking (
type,received_us,hit_type) across all scripts - Progress tracking: Rich console output with progress bars in verbose mode
🔧 Changed¶
- Updated examples/README.md with comprehensive "Data Format Conversion Scripts" section
- Enhanced documentation with unified CLI interface across all three scripts
🐛 Fixed¶
- No bug fixes in this release
Is It Safe to Upgrade?¶
Backward Compatible: Yes
- This release only adds new example scripts; no changes to core library functionality
- All existing CLI commands work exactly as before
- New scripts are purely additive - existing workflows are unaffected
- No dependencies added to the main kazunoko library
Tests Passed¶
- ✅ Builds without errors
- ✅ All three conversion scripts tested with 1,806 JSONL files (52,432 events, ~31MB)
- ✅ JSONL → Parquet conversion: successful with configurable compression
- ✅ Parquet → ROOT conversion: TTree creation with proper branch mapping
- ✅ Direct JSONL → ROOT conversion: complete pipeline verified
- ✅ Timestamp conversion: both
*_usand*_sfields present and correct - ✅ Schema validation: required field detection working across all scripts
- ✅ Error handling: proper messages for missing files and invalid schemas
Release Details¶
- Date: 2025-12-28
- Version: v0.11.0
- Files Changed: 5 (3 scripts + examples/README.md + docs/releases/v0.11.0.md)
- Commits: 952c736, d7c1631, 0dd5c6f, 98c6728, 726a11f, 6ffcc3a
Commit Summary:
952c736- Bump version 0.10.3 → 0.11.0d7c1631- Document conversion scripts in examples/README.md0dd5c6f- Refactor scripts with unified interface98c6728- Add direct JSONL to ROOT conversion script726a11f- Add Parquet to ROOT conversion script6ffcc3a- Add JSONL to Parquet conversion script
Next Steps¶
Future enhancements planned:
- HDF5 export format support for advanced analysis
- Batch processing mode for multi-run conversions
- Progress resumption for interrupted large conversions
- Integration with ROOT's PyROOT when python environment constraints relax