Generated by
scripts/profile_performance.sh— run the script to populate with actual measurements
scripts/profile_performance.sh to populate)| Operation | Time (s) |
|---|---|
| Project 2 columns | — |
| Filter rows | — |
| Sum column | — |
| Group-by | — |
| Group aggregate (mean) | — |
| Operation | Time (s) |
|---|---|
| Project 2 columns | — |
| Sum column | — |
| Group-by | — |
| Group aggregate (sum) | — |
| sqrt (vectorized) | — |
| abs (vectorized) | — |
| compare scalar | — |
| Operation | Time (s) |
|---|---|
| Project 2 columns | — |
| Sum column | — |
| Mean column | — |
| Group-by | — |
| Group aggregate (mean) | — |
Operations should scale approximately linearly with row count: - 10x rows → ~10x time for columnar operations - Group-by scaling depends on group cardinality
The most time-critical operations for large datasets are: 1. Group-by + aggregation: Dominates pipeline execution time for grouped summarizations 2. Filter: Boolean mask construction + row extraction 3. CSV reading: I/O bound for large files; Arrow native reader provides significant speedup
mutate(), the native Arrow handle is dropped. Lazy
evaluation could defer materialization.See docs/performance.md for detailed performance expectations and the Arrow backend architecture overview.