Frequently Asked Questions (FAQ)

Welcome to the T Orchestration Engine FAQ. This guide covers the philosophy, technical architecture, and practical usage of T.


General Questions

What makes T different?

T isn’t just another data analysis language; it’s a reproducibility-first engine. While R and Python rely on external tools for environment management, T integrates Nix at its core. Every workflow is a statically defined directed acyclic graph (DAG) called a Pipeline, ensuring that your analysis is as stable as the hardware it runs on.

Who should use T?

Is T production-ready?

T is currently in Beta (v0.51.0). While it is an experimental project, it is already fully capable of performing end-to-end data processing. You can use T’s native data manipulation verbs and Quarto integration to build reports without ever leaving the language. For more complex statistical modeling or advanced visualization, you can easily pull in R or Python nodes.


The Technical Core

How does the Polyglot Architecture work?

T uses Apache Arrow as its core data exchange format. When you pass a DataFrame between a T node and an R (rn()), Python (pyn()), or Shell (shn()) node, T handles the interchange using highly efficient Arrow files. - Hermeticity: Because T runs every node in a hermetic Nix sandbox, data cannot be shared directly in memory. - Serialization: Dataframes are serialized to Arrow IPC files on disk. This is still significantly faster and more robust than traditional CSV/JSON interchange. - Fidelity: All level metadata for factors and nested list-columns is preserved through the serialization process. - Model Interchange: Machine learning models are passed between languages using PMML.

What is NSE and why the $ prefix?

T uses Non-Standard Evaluation (NSE) to make data manipulation concise. The $ prefix (e.g., filter($age > 30)) identifies column names or variables in the data context, similar to rlang in R but built directly into the language syntax for clarity.

How are missing values (NA) handled?

T takes a strict approach to safety. Unlike other languages where NA might propagate silently, T requires explicit handling. - Aggregation functions will throw an error if they encounter an NA unless you pass na_rm = true. - Native types like na_int(), na_float(), and na_string() ensure type-safe missingness.

Does T have loops or mutable state?

No. T is a pure functional language. - Instead of for or while loops, use map(), filter(), or recursion. - Variables are immutable. This prevents the “spaghetti state” common in long data scripts.


Pipelines & Reproducibility

Why are Pipelines mandatory?

For non-interactive work, T enforces a pipeline block. This ensures that every step of your analysis is declared as a node in a graph. This architecture: 1. Prevents order-of-execution bugs (scripts that only work if run in a specific sequence). 2. Enables automatic parallelization of independent nodes. 3. Allows for advanced graph operations like swap(), rewire(), and upstream_of().

Do I need to know Nix?

Not for basic work. Running nix develop sets up your entire environment. However, T’s power comes from Nix—it handles your OCaml, R, Python, and system dependencies in a single, pinned flake.lock.

What operating systems are supported?


Data Manipulation & Features

What libraries are included?

The T standard library includes: - colcraft: A powerful suite of verbs (mutate, summarize, pivot_longer) following tidyverse semantics. - chrono: Precise date and time manipulation with calendar-aware rounding. - factors: Native Arrow-backed categorical data handling.

How do I program with column names?

If you’re building a reusable function that takes a column name as an argument, T provides first-class support for Metaprogramming: - Use enquo(col) to capture the argument. - Use !! (unquote) to inject it into a verb. - Use !!name := value for dynamic column naming.

Example:

my_avg = \(df, col, name)
  df |> summarize(!!name := mean(!!col))

How do I visualize data?

For simple reports, you can use T’s built-in colcraft verbs to summarize data and output it via Quarto. While T does not currently have its own native plotting library, its high-fidelity interop with R and Python makes it trivial to define a specialized node for more complex charts using ggplot2, matplotlib, or seaborn.

Can T handle large datasets?

Yes. T’s native Arrow backend allows it to perform select, filter, and sort operations directly on Arrow tables in memory. - Optimized Compute: T’s built-in compute engine handles millions of rows by interacting directly with Arrow memory buffers. - Orchestration: For massive datasets, you can leverage T to orchestrate R’s dtplyr or Python’s polars nodes. The results are passed back via Arrow serialization, maintaining high fidelity.


Developer Experience

Is there an LSP or VS Code support?

Yes! The T Language Server (t-lsp) provides: - Autocompletion: For functions, variables, and even DataFrame column names. - Hover Docs: View docstrings directly in your editor. - Diagnostics: Real-time syntax and type error reporting.

What about the REPL?

The T REPL is designed for productivity: - Ghost Hints: Inline suggestions based on your command history. - Signal Safety: Hit Ctrl+C to cancel a long-running calculation without crashing the session. - Multi-line Detection: Automatic detection of nested blocks for easy copy-pasting.

What happens if a node fails or produces a complex object?

T is built with robustness in mind. If a node fails (e.g., an R script crashes), T captures the error and presents it as a first-class VError value, preventing the whole pipeline engine from crashing. - Fail-Safe Loading: The read_node() function will never crash your session. Even if an artifact is missing or corrupted, you get a clean error value you can handle with ?|>. - High-Fidelity Representation: T works hard to provide native representations of list, dict, and DataFrame objects from other languages. - Generic Fallbacks: If a node produces a complex object that T doesn’t natively understand (like a custom private class in Python), it is safely wrapped as a HostObject. You can still pass this object as a reference to other nodes of the same language, keeping your polyglot workflow intact. - Native Escape Hatch: If you need to manipulate a complex object that cannot be serialized, you can always read the node’s artifact directly using the native interpreter’s own libraries (e.g., by calling read_node() inside a Python or R script).

Can I write Literate Programming reports?

Absolutely. T integrates with Quarto through a native extension. You can write .qmd files where code chunks are written in pure T, allowing you to summarize data and generate professional reports without ever needing R or Python. For advanced charting or low-level file processing, you can mix T chunks with R, Python, or Shell (using the shn() function) in the same document.


Community & Contributing

How can I help?

T is an open-source project. You can contribute by: - Porting R/Python utility functions to native T. - Improving the t-lsp implementation. - Reporting bugs or suggesting features on GitHub.

What’s next on the roadmap?

The developer (Bruno Rodrigues) works on T based on community interest and experimental whims. High-priority items include Julia integration and expanding the native Arrow compute engine.


[!TIP] Need help? Check out the Getting Started guide or join the GitHub Discussions.