v0.11.1 - Example Scripts Enhancement & Documentation (2025-12-28)¶
What Changed?¶
This patch release enhances the example scripts with improved flexibility and usability.
The create_run_summary.py script now supports flexible file pattern matching and overwrite protection.
Documentation has been expanded with comprehensive compression format guidance to help users choose the right options for their workflows.
All improvements are backward compatible.
What's New¶
Enhanced create_run_summary.py Script¶
What it does:
The create_run_summary.py script now supports flexible glob pattern matching for event files and overwrite protection for output files, making it more practical for batch processing and preventing accidental overwrites.
How to use it:
# Create summary from directory (default pattern: events*.jsonl)
uv run create_run_summary.py ./20251221_run126
# Use custom glob pattern
uv run create_run_summary.py ./20251221_run126 --pattern "*.jsonl"
# Overwrite existing summary
uv run create_run_summary.py ./20251221_run126 --overwrite
# Show detailed progress
uv run create_run_summary.py ./20251221_run126 --verbose
Key features:
--pattern TEXT: Flexible glob pattern matching (default:events*.jsonl)--overwrite: Safely replace existing summary files without accidental data loss--verbose: Detailed progress and statistics- Backward compatible with existing workflows
Compression Format Documentation¶
What it does: Comprehensive guidance on selecting compression formats for the Parquet conversion script, helping users understand trade-offs between speed, compression ratio, and compatibility.
Comparison Table:
| Format | Speed | Compression Ratio | Compatibility | Use Case |
|---|---|---|---|---|
| snappy ⭐ | Fastest (250 MB/s) | Medium (3-6x) | Universal | Recommended (default) |
| gzip | Medium | High (5-10x) | Universal | Maximum compatibility needed |
| zstd | Fast | Very High (7-15x) | Modern tools | Highest compression priority |
| none | Instant | 1x | N/A | Testing/debugging only |
Decision Guide:
- Use snappy for OSECHI data (recommended): Fast processing, good compression, universally supported
- Use gzip if you need maximum compatibility with older systems
- Use zstd if you need the highest compression ratio
- Use none only for testing or when raw performance is critical
Installation¶
Quick Start¶
# Get the release
git checkout v0.11.1
# Setup
task env:setup
# Try the enhanced script
uv run examples/create_run_summary.py ./path/to/events/
# Or run the CLI as usual
uv run kazunoko --help
What's Different from the Last Version?¶
✅ Added¶
- Flexible glob pattern matching in
create_run_summary.pyvia--patternoption (default:events*.jsonl) - Overwrite protection in
create_run_summary.pyvia--overwriteflag to prevent accidental file replacement - Compression format documentation in examples/README.md with detailed comparison table
- Decision guide for selecting compression formats based on performance and compatibility requirements
🔧 Changed¶
- Enhanced
create_run_summary.pyto match unified interface pattern of other example scripts - Improved examples/README.md with comprehensive compression format guidance
🐛 Fixed¶
- No bug fixes in this release (patch release focused on documentation and usability)
Is It Safe to Upgrade?¶
Backward Compatible: Yes
- All changes are purely additive: existing
create_run_summary.pyinvocations work unchanged - New
--patternoption defaults toevents*.jsonl(original behavior) - New
--overwriteoption defaults to false (safer than original behavior) - No changes to core library functionality
- Documentation improvements have no impact on code behavior
- Existing example scripts continue to work exactly as before
Tests Passed¶
- ✅ Builds without errors
- ✅ Enhanced
create_run_summary.pytested with 1,806 JSONL files (52,432 events) - ✅ Pattern matching verified with various glob patterns (events.jsonl, .jsonl, etc.)
- ✅ Overwrite protection works correctly (fails without --overwrite flag)
- ✅ Verbose mode displays detailed progress and statistics
- ✅ Compression format documentation reviewed and verified for accuracy
Release Details¶
- Date: 2025-12-28
- Version: v0.11.1
- Files Changed: 3 (create_run_summary.py, examples/README.md, docs/releases/v0.11.1.md)
- Commits: e0fa988, eba1d42, c3816c4, f7afa38
Commit Summary:
e0fa988- Bump version 0.11.0 → 0.11.1eba1d42- Add compression formats comparison table to examples/README.mdc3816c4- Update create_run_summary.py documentation with new optionsf7afa38- Enhance create_run_summary.py with flexible pattern and overwrite options
Next Steps¶
Future enhancements planned:
- Data Format Export: Support for additional export formats (HDF5, NetCDF) for advanced data analysis
- Batch Processing: Multi-run conversion capabilities for processing multiple measurement directories in one command
- Progress Resumption: Ability to resume interrupted conversions for very large datasets
- Interactive Configuration: GUI or terminal-based configuration wizard for compression and output format selection