Skip to content

v0.11.0 - JSONL/Parquet/ROOT Data Format Conversion Scripts (2025-12-28)

What Changed?

This release adds three new example scripts for converting OSECHI event data between formats: JSONL → Parquet, Parquet → ROOT, and direct JSONL → ROOT conversion. The scripts provide flexible data pipeline options for both ROOT physics analysis users and Python data science users (Pandas/Polars). All scripts follow PEP 723 specification for standalone execution with uv run.


What's New

Feature 1: JSONL → Parquet Conversion

What it does: Converts multiple JSONL event files from the OSECHI detector into a single Parquet file for efficient analysis with Pandas, Polars, or other data science tools.

How to use it:

uv run from_jsonl_to_parquet.py 20251221_run126/

Key features:

  • Glob pattern support for flexible file selection (default: events*.jsonl)
  • Automatic timestamp conversion (adds *_s fields from *_us fields)
  • Configurable compression: snappy (default), gzip, zstd, or none
  • Progress tracking with --verbose mode
  • Schema validation for required fields

Feature 2: Parquet → ROOT Conversion

What it does: Converts a Parquet file into ROOT TTree format for analysis with ROOT framework or uproot.

How to use it:

uv run from_parquet_to_root.py 20251221_run126/

Key features:

  • Flat TTree structure (one branch per column)
  • Optional alphabetical branch sorting
  • Compatible with ROOT analysis workflows
  • Automatic type mapping (int64 → Long64_t, float64 → Double_t)

Feature 3: Direct JSONL → ROOT Conversion

What it does: Single-step conversion from JSONL event files directly to ROOT TTree format, combining the benefits of both previous scripts.

How to use it:

uv run from_jsonl_to_root.py 20251221_run126/

Key features:

  • One-command workflow for quick ROOT analysis
  • All timestamp and sorting options available
  • Performance: ~1-2 seconds for typical detector runs

Installation

Quick Start

# Get the release
git checkout v0.11.0

# Setup
task env:setup

# Try one of the conversion scripts
uv run examples/from_jsonl_to_parquet.py path/to/events/

# Or run the CLI as usual
uv run kazunoko --help

Script Requirements

All three scripts require Python 3.10+ and use PEP 723 for dependency management. Run with uv run:

# JSONL → Parquet (requires: pandas, pyarrow, typer, rich)
uv run examples/from_jsonl_to_parquet.py [OPTIONS]

# Parquet → ROOT (requires: pandas, pyarrow, uproot, typer, rich)
uv run examples/from_parquet_to_root.py [OPTIONS]

# JSONL → ROOT (requires: pandas, uproot, typer, rich)
uv run examples/from_jsonl_to_root.py [OPTIONS]

What's Different from the Last Version?

✅ Added

  • Three new data format conversion scripts for flexible event analysis workflows:
  • from_jsonl_to_parquet.py: Convert JSONL events to Parquet format
  • from_parquet_to_root.py: Convert Parquet files to ROOT TTree format
  • from_jsonl_to_root.py: Direct JSONL to ROOT conversion in one step
  • Comprehensive documentation in examples/README.md with usage patterns and performance metrics
  • PEP 723 compliance for all example scripts - run standalone with uv run without external dependency files
  • Timestamp conversion features: Automatic addition of UNIX second timestamps (*_s) alongside microsecond originals (*_us)
  • Schema validation: Required field checking (type, received_us, hit_type) across all scripts
  • Progress tracking: Rich console output with progress bars in verbose mode

🔧 Changed

  • Updated examples/README.md with comprehensive "Data Format Conversion Scripts" section
  • Enhanced documentation with unified CLI interface across all three scripts

🐛 Fixed

  • No bug fixes in this release

Is It Safe to Upgrade?

Backward Compatible: Yes

  • This release only adds new example scripts; no changes to core library functionality
  • All existing CLI commands work exactly as before
  • New scripts are purely additive - existing workflows are unaffected
  • No dependencies added to the main kazunoko library

Tests Passed

  • ✅ Builds without errors
  • ✅ All three conversion scripts tested with 1,806 JSONL files (52,432 events, ~31MB)
  • ✅ JSONL → Parquet conversion: successful with configurable compression
  • ✅ Parquet → ROOT conversion: TTree creation with proper branch mapping
  • ✅ Direct JSONL → ROOT conversion: complete pipeline verified
  • ✅ Timestamp conversion: both *_us and *_s fields present and correct
  • ✅ Schema validation: required field detection working across all scripts
  • ✅ Error handling: proper messages for missing files and invalid schemas

Release Details

  • Date: 2025-12-28
  • Version: v0.11.0
  • Files Changed: 5 (3 scripts + examples/README.md + docs/releases/v0.11.0.md)
  • Commits: 952c736, d7c1631, 0dd5c6f, 98c6728, 726a11f, 6ffcc3a

Commit Summary:

  • 952c736 - Bump version 0.10.3 → 0.11.0
  • d7c1631 - Document conversion scripts in examples/README.md
  • 0dd5c6f - Refactor scripts with unified interface
  • 98c6728 - Add direct JSONL to ROOT conversion script
  • 726a11f - Add Parquet to ROOT conversion script
  • 6ffcc3a - Add JSONL to Parquet conversion script

Next Steps

Future enhancements planned:

  • HDF5 export format support for advanced analysis
  • Batch processing mode for multi-run conversions
  • Progress resumption for interrupted large conversions
  • Integration with ROOT's PyROOT when python environment constraints relax