Welcome to the T Orchestration Engine FAQ. This guide covers the philosophy, technical architecture, and practical usage of T.
T isn’t just another data analysis language; it’s a reproducibility-first engine. While R and Python rely on external tools for environment management, T integrates Nix at its core. Every workflow is a statically defined directed acyclic graph (DAG) called a Pipeline, ensuring that your analysis is as stable as the hardware it runs on.
T is currently in Beta (v0.51.0). While it is an experimental project, it is already fully capable of performing end-to-end data processing. You can use T’s native data manipulation verbs and Quarto integration to build reports without ever leaving the language. For more complex statistical modeling or advanced visualization, you can easily pull in R or Python nodes.
T uses Apache Arrow as its core data exchange
format. When you pass a DataFrame between a T node and an R
(rn()), Python
(pyn()), or Shell
(shn()) node, T handles the interchange using
highly efficient Arrow files. - Hermeticity: Because T
runs every node in a hermetic Nix sandbox, data cannot be shared
directly in memory. - Serialization: Dataframes are
serialized to Arrow IPC files on disk. This is still significantly
faster and more robust than traditional CSV/JSON interchange. -
Fidelity: All level metadata for factors and nested
list-columns is preserved through the serialization process. -
Model Interchange: Machine learning models are passed
between languages using PMML.
$ prefix?T uses Non-Standard Evaluation (NSE) to make data
manipulation concise. The $ prefix (e.g.,
filter($age > 30)) identifies column names or variables
in the data context, similar to rlang in R but built
directly into the language syntax for clarity.
T takes a strict approach to safety. Unlike other languages where
NA might propagate silently, T requires explicit handling.
- Aggregation functions will throw an error if they
encounter an NA unless you pass na_rm = true.
- Native types like na_int(), na_float(), and
na_string() ensure type-safe missingness.
No. T is a pure functional language. - Instead of
for or while loops, use map(),
filter(), or recursion. - Variables are
immutable. This prevents the “spaghetti state” common in long data
scripts.
For non-interactive work, T enforces a pipeline block.
This ensures that every step of your analysis is declared as a node in a
graph. This architecture: 1. Prevents order-of-execution bugs (scripts
that only work if run in a specific sequence). 2. Enables
automatic parallelization of independent nodes. 3.
Allows for advanced graph operations like swap(),
rewire(), and upstream_of().
Not for basic work. Running nix develop sets up your
entire environment. However, T’s power comes from Nix—it handles your
OCaml, R, Python, and system dependencies in a single, pinned
flake.lock.
The T standard library includes: -
colcraft: A powerful suite of verbs
(mutate, summarize, pivot_longer)
following tidyverse semantics. -
chrono: Precise date and time manipulation
with calendar-aware rounding. - factors:
Native Arrow-backed categorical data handling.
If you’re building a reusable function that takes a column name as an
argument, T provides first-class support for
Metaprogramming: - Use enquo(col) to
capture the argument. - Use !! (unquote) to inject it into
a verb. - Use !!name := value for dynamic column
naming.
Example:
my_avg = \(df, col, name)
df |> summarize(!!name := mean(!!col))
For simple reports, you can use T’s built-in
colcraft verbs to summarize data and
output it via Quarto. While T does not currently have
its own native plotting library, its high-fidelity interop with R and
Python makes it trivial to define a specialized node for more complex
charts using ggplot2, matplotlib, or
seaborn.
Yes. T’s native Arrow backend allows it to perform
select, filter, and sort
operations directly on Arrow tables in memory. - Optimized
Compute: T’s built-in compute engine handles millions of rows
by interacting directly with Arrow memory buffers. -
Orchestration: For massive datasets, you can leverage T
to orchestrate R’s dtplyr or Python’s polars
nodes. The results are passed back via Arrow serialization, maintaining
high fidelity.
Yes! The T Language Server (t-lsp) provides: -
Autocompletion: For functions, variables, and even
DataFrame column names. - Hover Docs:
View docstrings directly in your editor. - Diagnostics:
Real-time syntax and type error reporting.
The T REPL is designed for productivity: - Ghost
Hints: Inline suggestions based on your command history. -
Signal Safety: Hit Ctrl+C to cancel a
long-running calculation without crashing the session. -
Multi-line Detection: Automatic detection of nested
blocks for easy copy-pasting.
T is built with robustness in mind. If a node fails
(e.g., an R script crashes), T captures the error and presents it as a
first-class VError value, preventing the whole pipeline
engine from crashing. - Fail-Safe Loading: The
read_node() function will never crash your session. Even if
an artifact is missing or corrupted, you get a clean error value you can
handle with ?|>. - High-Fidelity
Representation: T works hard to provide native representations
of list, dict, and DataFrame objects from other languages. -
Generic Fallbacks: If a node produces a complex object
that T doesn’t natively understand (like a custom private class in
Python), it is safely wrapped as a HostObject. You can
still pass this object as a reference to other nodes of the same
language, keeping your polyglot workflow intact. - Native Escape
Hatch: If you need to manipulate a complex object that cannot
be serialized, you can always read the node’s artifact directly using
the native interpreter’s own libraries (e.g., by calling
read_node() inside a Python or R script).
Absolutely. T integrates with Quarto through a
native extension. You can write .qmd files where code
chunks are written in pure T, allowing you to summarize
data and generate professional reports without ever needing R or Python.
For advanced charting or low-level file processing, you can mix T chunks
with R, Python, or Shell (using the shn()
function) in the same document.
T is an open-source project. You can contribute by: - Porting
R/Python utility functions to native T. - Improving the
t-lsp implementation. - Reporting bugs or suggesting
features on GitHub.
The developer (Bruno Rodrigues) works on T based on community interest and experimental whims. High-priority items include Julia integration and expanding the native Arrow compute engine.
[!TIP] Need help? Check out the Getting Started guide or join the GitHub Discussions.