T Programming Language - performance_analysis

Performance Analysis

Generated by scripts/profile_performance.sh — run the script to populate with actual measurements


Test Environment


Timing Results

10k Rows (100 groups)

Operation Time (s)
Project 2 columns
Filter rows
Sum column
Group-by
Group aggregate (mean)

100k Rows (1000 groups)

Operation Time (s)
Project 2 columns
Sum column
Group-by
Group aggregate (sum)
sqrt (vectorized)
abs (vectorized)
compare scalar

1M Rows (10000 groups)

Operation Time (s)
Project 2 columns
Sum column
Mean column
Group-by
Group aggregate (mean)

Analysis

Scaling Behavior

Operations should scale approximately linearly with row count: - 10x rows → ~10x time for columnar operations - Group-by scaling depends on group cardinality

Hot Paths

The most time-critical operations for large datasets are: 1. Group-by + aggregation: Dominates pipeline execution time for grouped summarizations 2. Filter: Boolean mask construction + row extraction 3. CSV reading: I/O bound for large files; Arrow native reader provides significant speedup

Optimization Opportunities


Targets

See docs/performance.md for detailed performance expectations and the Arrow backend architecture overview.