Comprehensive guide to error handling, recovery patterns, and debugging in T.
T treats errors as first-class values, not exceptions. This design enables:
Key Principle: Errors are data, and data can be transformed, inspected, and recovered from.
T-Lang provides two modes of evaluation, controlled by the “resilience” setting:
Evaluation continues even when statements or pipeline nodes result in
VError values. This is aligned with the “Errors are Values”
philosophy, allowing you to collect as much information as possible from
a single run. Residual errors are simply passed to downstream functions
(which may short-circuit via |> or recover via
?|>). This is the recommended mode for complex data
pipelines where you want to observe as many diagnostic outcomes as
possible in a single pass.
Evaluation stops immediately upon encountering the first
VError. This is the usual, common behaviour for critical
scripts where subsequent steps should only run if previous ones were
flawlessly successful.
How to toggle: - CLI: Use
t run --failfast script.t - REPL: Set
t_run(failfast = true, ...) - Pipelines:
Use t_make(failfast = true)
T has several categories of errors:
Occur when operations receive incompatible types:
1 + "hello"
-- Error(TypeError: Cannot add Int and String)
[1, 2] * 3
-- Error(TypeError: Cannot multiply List and Int. Hint: Use map(\(x) x * 3))
Occur when referencing undefined variables or functions:
undefined_var
-- Error(NameError: 'undefined_var' is not defined)
prnt(42)
-- Error(NameError: 'prnt' is not defined. Did you mean 'print'?)
Features: - Levenshtein distance-based suggestions -
Searches current scope and standard library - Case-insensitive
suggestions - Reserved Keyword & Built-in
Protection: Prevents overwriting core standard library
functions or keywords (such as build_log or
print), triggering a NameError if assignment
(=) or reassignment (:=) is attempted:
t print = 42 -- Error(NameError: Cannot overwrite print: it's a reserved keyword!)
Occur when values are invalid for the operation:
sqrt(-1)
-- Error(ValueError: Cannot compute square root of negative number)
quantile([1, 2, 3], 1.5)
-- Error(ValueError: Quantile must be between 0 and 1)
Occur when function calls have wrong number of arguments:
add = \(a, b) a + b
add(1)
-- Error(ArityError: Expected 2 arguments (a, b) but got 1)
add(1, 2, 3)
-- Error(ArityError: Expected 2 arguments (a, b) but got 3)
Features: - Shows expected parameter names - Shows actual argument count - Suggests correct usage
1 / 0
-- Error(DivisionByZero: Division by zero)
10 % 0
-- Error(DivisionByZero: Modulo by zero)
Occur when NA values are encountered without explicit handling:
mean([1, NA, 3])
-- Error(AggregationError: "Function `mean` encountered NA value. Handle missingness explicitly or set `na_rm` to true.")
1 + NA
-- Error(NAPredicateError: Operation on NA: NA values do not propagate implicitly. Handle missingness explicitly.)
Occur when assertions fail:
assert(2 + 2 == 5)
-- Error(AssertionError: Assertion failed)
assert(false, "Custom message")
-- Error(AssertionError: Custom message)
Custom errors for data validation:
validate_age = \(age)
if (age < 0 or age > 150)
error("ValidationError", "Age must be between 0 and 150")
else
age
validate_age(-5)
-- Error(ValidationError: Age must be between 0 and 150)
-- Generic error
err = error("Something went wrong")
-- Error with code
err = error("NotFoundError", "File not found")
-- Conditional errors
result = if (value < 0)
error("ValueError", "Value must be positive")
else
value
result = 1 / 0
-- Check if error
is_error(result) -- true
-- Get error code
error_code(result) -- "DivisionByZero"
-- Get error message
error_msg(result) -- "Division by zero"
-- Get additional context
error_context(result) -- Stack trace or additional info
The standard error functions (error_code,
error_msg, and error_context) are
polymorphic and can also be called directly on pipeline
ComputedNode variables (e.g., p.X or
p.combined_df).
VError), these
functions automatically resolve and read the detailed
VError artifact from the Nix store._pipeline/.This allows direct programmatic inspection of pipeline failures without manually opening log files:
-- Direct inspection of a failed node's error code and message
error_code(p.X) -- "RuntimeError"
error_msg(p.X) -- "NameError: name 'dataset_n' is not defined"
-- Inspecting a hard-failed nix-build node
error_msg(p.combined_df) -- Full multi-line Python/Arrow traceback
T provides powerful primitives to analyze exceptions/diagnostics and preserve their provenance:
collect_exceptions(pipeline)Converts all terminal errors and warning diagnostics from computed
nodes of a built pipeline into a structured four-column
DataFrame (node, status,
code, and message), allowing you to inspect
and filter multiple pipeline issues.
To keep the printed output clean and readable: * Traceback
Truncation: collect_exceptions automatically
extracts the last non-empty line of multi-line
tracebacks (like Python or Arrow error messages) to preserve the actual
error class and message, and caps the text length at 100
characters. * Polars-Style Cell Truncation: When
pretty-printing any DataFrame in the REPL, all string cells
exceeding 35 characters are truncated to 32
characters followed by ... (Polars-style) to prevent
wide/broken columns.
[!NOTE] Even if a build fails, the build log is written unconditionally to the
_pipeline/directory. This guarantees thatcollect_exceptions(pipeline)can extract the exact traceback and error code of failed nodes, and retrieve warnings for successfully built nodes.
exceptions_df = collect_exceptions(my_pipeline)
-- exceptions_df can now be processed with colcraft verbs:
exceptions_df |> filter($status == "Warning")
T’s built-in explain() function has specialized support
for collect_exceptions DataFrames: - Direct
Explanation (1 exception): If there is exactly one exception
row in the DataFrame, calling explain(exceptions_df) will
automatically map to that diagnostic exception and return a structured
dictionary explaining the exact error or warning details (containing the
originating node, diagnostic code, and description message). -
Consolidated Explanation (multiple exceptions): If
there are zero or multiple exceptions, calling
explain(exceptions_df) returns a structured representation
of the exception collection itself (exceptions_list), with
a count property and an exceptions list
containing the mapped explanation of each individual diagnostic
element.
error_chain(err1, err2)Preserves the causal chain of multiple failures. Chaining sets
err2 as the "cause" in err1’s
context, maintaining complete traceback and causation history:
err_low_level = error("KeyError", "Key 'id' not found")
err_high_level = error("ValueError", "Validation failed")
chained = error_chain(err_high_level, err_low_level)
error_context(chained)$cause -- returns err_low_level
Errors flow through computations until caught:
step1 = 1 / 0 -- Error
step2 = step1 + 10 -- Still error (no computation)
step3 = step2 * 2 -- Still error
is_error(step3) -- true
error_msg(step3) -- "Division by zero"
|>)Short-circuits on errors:
-- Success path
5 |> \(x) x * 2 |> \(x) x + 1 -- 11
-- Error short-circuits
error("fail") |> \(x) x * 2 |> \(x) x + 1
-- Error(GenericError: fail)
-- The functions are never called
Use case: Normal data pipelines where errors should stop processing.
df = read_csv("data.csv")
|> filter($age > 0)
|> select($name, $age)
|> arrange($age)
-- If read_csv fails, rest of pipeline is skipped
?|>)Always forwards values, including errors:
-- Success path (same as |>)
5 ?|> \(x) x * 2 -- 10
-- Error is forwarded to function
error("fail") ?|> \(x)
if (is_error(x)) 0 else x -- 0 (recovered!)
Use case: Error recovery, fallback values, logging.
-- Recovery pattern
result = risky_operation()
?|> \(x) if (is_error(x)) {
print(str_join(["Error occurred: ", error_msg(x)]))
default_value
} else {
x
}
-- Chain recovery with normal processing
error("fail")
?|> \(x) if (is_error(x)) 0 else x -- Recovery
|> \(x) x + 1 -- Normal processing (1)
Pattern matching is the most declarative way to handle first-class errors in T. It allows you to destructure error values and branch logic based on specific outcomes.
Use the Error { msg } pattern to capture the error
message:
result = match(1 / 0) {
Error { msg: m } => str_sprintf("Caught error: %s", m),
val => str_sprintf("Result: %f", val)
}
-- "Caught error: Division by zero"
If you don’t explicitly handle Error in a
match expression, the original error value is propagated
automatically. This matches the behavior of the standard pipe
(|>).
-- This match does NOT handle Error
res = match(error("Fail")) {
[head, ..tail] => head,
[] => 0
}
-- Result is still Error("Fail")
-- Return default on error using match
safe_divide = \(a, b)
match(a / b) {
Error { _ } => 0.0,
res => res
}
safe_divide(10, 0) -- 0.0
safe_divide(10, 2) -- 5.0
-- Try multiple data sources cleanly
data = match(read_csv("primary.csv")) {
Error { _ } => match(read_csv("backup.csv")) {
Error { _ } => read_csv("fallback.csv"),
res => res
},
res => res
}
-- Or combining match with the maybe-pipe for flat recovery
final_data = read_csv("primary.csv")
?|> \(res) match(res) {
Error { _ } => read_csv("backup.csv"),
valid_df => valid_df
}
-- Validate, recover if invalid
process_value = \(v)
if (v < 0)
error("ValueError", "Negative value")
else if (is_na(v))
error("NAError", "NA value")
else
v * 2
-- Recover from validation errors
safe_process = \(v)
process_value(v) ?|> \(result)
if (is_error(result)) {
print(str_sprintf("Invalid value: %s", error_msg(result)))
0 -- Default
} else {
result
}
safe_process(-5) -- Prints error, returns 0
safe_process(10) -- Returns 20
-- Log errors and continue
logged_operation = risky_function()
?|> \(x) if (is_error(x)) {
write_log(str_sprintf("Error in risky_function: %s", error_msg(x)))
x -- Pass error along
} else {
x
}
-- Or convert to success after logging
logged_with_default = logged_operation
?|> \(x) if (is_error(x)) default_value else x
-- Process list, collecting errors
process_many = \(items)
map(items, \(item)
process_item(item) ?|> \(result)
if (is_error(result))
{success: false, error: error_msg(result)}
else
{success: true, value: result}
)
results = process_many([1, -2, 3, 0])
-- [
-- {success: true, value: 2},
-- {success: false, error: "Negative value"},
-- {success: true, value: 6},
-- {success: false, error: "Division by zero"}
-- ]
-- Filter successes
successes = filter(results, \(r) r.success)
Pattern matching is excellent for converting low-level errors into domain-specific ones.
-- Transform errors using match
enhance_error = \(e)
match(e) {
Error { msg } =>
if (str_detect(msg, "DivisionByZero"))
error("MathError", str_sprintf("Calculation failed: %s", msg))
else if (str_detect(msg, "NAError"))
error("DataQualityError", str_sprintf("Quality issue: %s", msg))
else
e,
_ => e
}
risky_calc() ?|> enhance_error
In T-Lang, the materialization of a pipeline is a separate phase from the logic execution. When a node in a pipeline fails, it doesn’t necessarily halt the entire build.
When you run build_pipeline(p) and inspect the build log
using build_log(p) |> build_log_to_frame(), each node in
the pipeline is marked with a distinct execution status.
| Node Status | Did it execute? | Did it fail? | Pipeline Impact | Description & Cause |
|---|---|---|---|---|
Completed |
Yes | No | Success (propagates data) | The node ran successfully and serialized its output artifact. |
Completed with error
(Soft-Fail) |
Yes | Yes (Soft) | Continues (propagates error) | The script ran inside the sandbox but
raised a user-space exception. T-Lang captures this as a first-class
VError value so independent branches can still build. |
Errored
(Hard-Fail) |
Yes | Yes (Hard) | Aborts Build | The sandbox execution crashed entirely or exited with a non-zero code (e.g., syntax errors, missing packages, memory exhaustion). |
Skipped |
No | No | Bypassed | The node was never evaluated or executed
because an upstream dependency suffered a hard Errored
failure. |
Why Completed with error propagates
downstream: Because T-Lang treats errors as first-class values,
a soft-failure is saved as a normal serialized directory outcome.
Downstream nodes are still scheduled to run, receive the
VError as input, and automatically propagate it further
down the pipe unless you explicitly recover from it.
Why Errored halts downstream
execution: When a node suffers a hard Errored
nix-build failure, the Nix daemon immediately stops evaluating that
branch. No output directory is produced, and the entire build process
terminates.
Why subsequent nodes are marked
Skipped: Because the build aborted early, any
downstream nodes that rely on the failed node are never invoked. They do
not contain any errors themselves; they are simply bypassed entirely and
marked as Skipped to provide a clean, accurate picture of
the pipeline state.
After a build, T provides an iconographic summary:
✖ Pipeline build captured node errors [5 succeeded, 2 captured errors, 2 had warnings]
! Captured error in node: r_err
! Captured error in node: py_err
? Warnings in node: r_warn
? Warnings in node: py_warn
! (Captured error): The node failed,
producing a VError artifact.? (Warnings): The node succeeded, but
issued non-terminal diagnostics.When a node emits a warning, downstream nodes that depend on it automatically inherit that warning. This ensures that warnings are never silently lost in a pipeline chain.
Use warning_msg(node) to inspect a node’s own warnings
and any warnings inherited from ancestor nodes:
warning_msg(p.filtered)
-- "filter() excluded 1 row because the predicate evaluated to NA"
warning_msg(p.count) -- downstream of `filtered`
-- "Ancestor node 'filtered' reported following warning: filter() excluded 1 row because the predicate evaluated to NA"
Use inspect_node(node).warnings for structured
access:
inspect_node(p.count).warnings
-- [
-- { source: "filtered", message: "filter() excluded 1 row because the predicate evaluated to NA" }
-- ]
The per-node diagnostics in read_pipeline(p).nodes carry
full warning lists as well. The aggregate
diagnostics.summary counts only own warnings, so the
pipeline-level summary reflects which nodes originally produced warnings
without double-counting inherited ones.
explain()When you load a node that soft-failed, you receive a T-Lang Error
object. You can use the explain() builtin to see the exact
cause, including tracebacks from other languages.
hu = read_node("py_err")
-- Error(RuntimeError: "Critical error in Python logic")
explain(hu)
-- Output:
-- Context:
-- runtime_traceback: "Traceback (most recent call last): ..."
-- node_name: "py_err"
-- node_status: "errored"
T-Lang provides a “diagnostic bridge” for nodes running in other languages.
VError artifact.warnings.catch_warnings(). These are listed in the build
summary but do not cause the node to return an error state.tryCatch. R
conditions are mapped to VError codes.withCallingHandlers. Warnings are collected into the node’s
metadata without interrupting the primary data export.In automated daily pipelines where data is refreshed externally, a three-way branching strategy is often necessary. You can use pattern matching to choose between a full complex model, a simplified fallback model, or a validation report based on the data’s characteristics.
p = pipeline {
-- 1. Load the latest daily raw data
raw_data = node(command = read_csv("daily_extract.csv"))
-- 2. Validate and "tag" the data based on quality metrics
quality_status = node(command =
raw_data ?|> \(df) {
if (nrow(df) < 500)
error("CriticalError", "Insufficient rows for any modeling.")
else if (length(levels(df.segment)) < 2)
[type: "low_diversity", data: df]
else
[type: "high_quality", data: df]
}
)
-- 3. Precise branching: Choose the best action for each state
final_result = node(command =
quality_status ?|> \(outcome) match(outcome) {
-- Full Case: Sufficient data and to_factor levels
[type: "high_quality", data: df] =>
lm(df, $yield ~ $temp + $segment),
-- Fallback Case: Enough data, but not enough to_factor diversity
-- We drop the 'segment' predictor to avoid model failure
[type: "low_diversity", data: df] =>
lm(df, $yield ~ $temp),
-- Failure Case: Stop modeling and generate a diagnostic report
Error { msg } =>
generate_validation_report(msg, raw_data)
}
)
}
Why use this?: * Adaptive Modeling:
Your pipeline automatically scales its complexity to match the quality
of the incoming data, avoiding “singular matrix” or “one level
to_factor” errors. * Operational Intelligence: Instead
of the whole pipeline failing due to a minor data shift (like one
category disappearing from today’s extract), the system gracefully
degrades its service while still providing a result. *
Auditability: Every run clearly states which path was
taken through the use of descriptive tags like
low_diversity or high_quality.
T offers several tools for managing errors, each suited for different scenarios:
|>): Best for
“fail-fast” pipelines where any error should immediately stop further
processing. It’s the default for sequential operations.?|>): Ideal for
scenarios requiring explicit error handling or recovery at each step. It
allows functions to receive and act upon error values, enabling logging,
fallback logic, or transformation.match): The most
declarative and powerful tool for branching logic based on error types
or success values. It’s excellent for:
if (is_error(x)) becomes too nested or less readable.General Guidance: * Use |> for the
majority of your data pipelines where you expect success and want to
stop on the first error. * Use ?|> when you need to
inspect or act on an error at a specific point in a pipeline,
often followed by match or an
if (is_error(...)) check. * Use match when
your error recovery logic involves different actions for different error
types, or when you want a clear, declarative way to distinguish between
success and various error states.
Problem: Forgot to handle NA values
mean([1, 2, NA, 4])
-- Error(NAError: NA value encountered)
Solution: Use na_rm = true
mean([1, 2, NA, 4], na_rm = true) -- 2.33...
Problem: Mixing incompatible types
"Age: " + 25
-- Error(TypeError: Cannot add String and Int)
Solution: Use separate print statements or
to_string() for manual conversion.
print("Age: ")
print(25)
-- Or use R/Python for formatting if needed
Problem: Operating on empty data
mean([])
-- Error(ValueError: Cannot compute mean of empty list)
Solution: Check before operating
safe_mean = \(data)
if (length(data) == 0)
error("ValueError", "Empty data")
else
mean(data)
-- Or use default
safe_mean_with_default = \(data)
if (length(data) == 0) 0.0 else mean(data)
Problem: Dividing by zero
revenue_per_customer = total_revenue / customer_count
-- Error if customer_count = 0
Solution: Guard condition
revenue_per_customer = if (customer_count == 0)
0.0
else
total_revenue / customer_count
Problem: Referencing non-existent column
df.nonexistent_column
-- Error(NameError: Column 'nonexistent_column' not found)
Solution: Check columns first
cols = colnames(df)
has_column = length(filter(cols, \(c) c == "age")) > 0
if (has_column)
df.age
else
error("ColumnError", "Required column 'age' not found")
Good: Validate inputs early
process_data = \(df)
if (nrow(df) == 0)
error("ValidationError", "Empty dataset")
else if (not has_required_columns(df))
error("ValidationError", "Missing required columns")
else
-- Process data
Bad: Silent failures
process_data = \(df)
-- Assumes data is valid, may fail later
df |> filter(...) |> summarize(...)
Good: Clear, actionable messages
if (age < 0 or age > 150)
error("ValidationError", str_sprintf("Age must be between 0 and 150, got: %s", to_string(age)))
Bad: Vague messages
if (age < 0 or age > 150)
error("Invalid age")
Use |> for normal success paths:
df |> filter($active) |> select($name)
Use ?|> for error recovery:
load_data() ?|> \(x) if (is_error(x)) backup_data() else x
-- Computes mean, errors if list is empty or contains NA
-- Use na_rm = true to skip NA values
my_mean = \(values, na_rm)
if (length(values) == 0)
error("ValueError", "Empty list")
else
-- Implementation
-- Test successful case
assert(safe_divide(10, 2) == 5.0)
-- Test error recovery
assert(safe_divide(10, 0) == 0.0)
assert(is_error(risky_function(-1)))
production_pipeline = pipeline {
data = read_csv("data.csv") ?|> \(x) if (is_error(x)) {
write_log(str_sprintf("CRITICAL: Failed to load data: %s", error_msg(x)))
x
} else x
-- Rest of pipeline with error handling
}
Bad: Deeply nested error checks
result = if (is_error(step1))
handle_step1_error(step1)
else
if (is_error(step2))
handle_step2_error(step2)
else
if (is_error(step3))
handle_step3_error(step3)
else
success_value
Good: Use maybe-pipe for flat composition
result = step1
?|> \(x) if (is_error(x)) handle_step1_error(x) else step2(x)
?|> \(x) if (is_error(x)) handle_step2_error(x) else step3(x)
?|> \(x) if (is_error(x)) handle_step3_error(x) else x
result = risky_operation()
if (is_error(result)) {
print(str_sprintf("Error code: %s", error_code(result)))
print(str_sprintf("Error message: %s", error_msg(result)))
print(str_sprintf("Error context: %s", error_context(result)))
}
-- Add markers in pipeline
step1 = operation1() ?|> \(x) if (is_error(x)) {
print("Error at step1")
x
} else x
step2 = operation2(step1) ?|> \(x) if (is_error(x)) {
print("Error at step2")
x
} else x
-- Use assertions to catch logic errors
result = computation()
assert(result > 0, "Result should be positive")
assert(length(result_list) == expected_length, "Length mismatch")
See Also: - Examples — Error handling patterns in practice - API Reference — Error-related functions - Troubleshooting — Common issues and solutions