Skip to content

v0.11.1 - Example Scripts Enhancement & Documentation (2025-12-28)

What Changed?

This patch release enhances the example scripts with improved flexibility and usability. The create_run_summary.py script now supports flexible file pattern matching and overwrite protection. Documentation has been expanded with comprehensive compression format guidance to help users choose the right options for their workflows. All improvements are backward compatible.


What's New

Enhanced create_run_summary.py Script

What it does: The create_run_summary.py script now supports flexible glob pattern matching for event files and overwrite protection for output files, making it more practical for batch processing and preventing accidental overwrites.

How to use it:

# Create summary from directory (default pattern: events*.jsonl)
uv run create_run_summary.py ./20251221_run126

# Use custom glob pattern
uv run create_run_summary.py ./20251221_run126 --pattern "*.jsonl"

# Overwrite existing summary
uv run create_run_summary.py ./20251221_run126 --overwrite

# Show detailed progress
uv run create_run_summary.py ./20251221_run126 --verbose

Key features:

  • --pattern TEXT: Flexible glob pattern matching (default: events*.jsonl)
  • --overwrite: Safely replace existing summary files without accidental data loss
  • --verbose: Detailed progress and statistics
  • Backward compatible with existing workflows

Compression Format Documentation

What it does: Comprehensive guidance on selecting compression formats for the Parquet conversion script, helping users understand trade-offs between speed, compression ratio, and compatibility.

Comparison Table:

Format Speed Compression Ratio Compatibility Use Case
snappy Fastest (250 MB/s) Medium (3-6x) Universal Recommended (default)
gzip Medium High (5-10x) Universal Maximum compatibility needed
zstd Fast Very High (7-15x) Modern tools Highest compression priority
none Instant 1x N/A Testing/debugging only

Decision Guide:

  • Use snappy for OSECHI data (recommended): Fast processing, good compression, universally supported
  • Use gzip if you need maximum compatibility with older systems
  • Use zstd if you need the highest compression ratio
  • Use none only for testing or when raw performance is critical

Installation

Quick Start

# Get the release
git checkout v0.11.1

# Setup
task env:setup

# Try the enhanced script
uv run examples/create_run_summary.py ./path/to/events/

# Or run the CLI as usual
uv run kazunoko --help

What's Different from the Last Version?

✅ Added

  • Flexible glob pattern matching in create_run_summary.py via --pattern option (default: events*.jsonl)
  • Overwrite protection in create_run_summary.py via --overwrite flag to prevent accidental file replacement
  • Compression format documentation in examples/README.md with detailed comparison table
  • Decision guide for selecting compression formats based on performance and compatibility requirements

🔧 Changed

  • Enhanced create_run_summary.py to match unified interface pattern of other example scripts
  • Improved examples/README.md with comprehensive compression format guidance

🐛 Fixed

  • No bug fixes in this release (patch release focused on documentation and usability)

Is It Safe to Upgrade?

Backward Compatible: Yes

  • All changes are purely additive: existing create_run_summary.py invocations work unchanged
  • New --pattern option defaults to events*.jsonl (original behavior)
  • New --overwrite option defaults to false (safer than original behavior)
  • No changes to core library functionality
  • Documentation improvements have no impact on code behavior
  • Existing example scripts continue to work exactly as before

Tests Passed

  • ✅ Builds without errors
  • ✅ Enhanced create_run_summary.py tested with 1,806 JSONL files (52,432 events)
  • ✅ Pattern matching verified with various glob patterns (events.jsonl, .jsonl, etc.)
  • ✅ Overwrite protection works correctly (fails without --overwrite flag)
  • ✅ Verbose mode displays detailed progress and statistics
  • ✅ Compression format documentation reviewed and verified for accuracy

Release Details

  • Date: 2025-12-28
  • Version: v0.11.1
  • Files Changed: 3 (create_run_summary.py, examples/README.md, docs/releases/v0.11.1.md)
  • Commits: e0fa988, eba1d42, c3816c4, f7afa38

Commit Summary:

  • e0fa988 - Bump version 0.11.0 → 0.11.1
  • eba1d42 - Add compression formats comparison table to examples/README.md
  • c3816c4 - Update create_run_summary.py documentation with new options
  • f7afa38 - Enhance create_run_summary.py with flexible pattern and overwrite options

Next Steps

Future enhancements planned:

  • Data Format Export: Support for additional export formats (HDF5, NetCDF) for advanced data analysis
  • Batch Processing: Multi-run conversion capabilities for processing multiple measurement directories in one command
  • Progress Resumption: Ability to resume interrupted conversions for very large datasets
  • Interactive Configuration: GUI or terminal-based configuration wizard for compression and output format selection