Version: Alpha 0.1
Status: Syntax and semantics frozen for alpha
T is a functional programming language designed for declarative, tabular data manipulation. It combines the pipeline-driven style of R’s tidyverse with OCaml’s type discipline, producing a small, focused language for data wrangling and basic statistics.
T supports the following value types:
| Type | Example | Description |
|---|---|---|
Int |
42 |
Integer numbers |
Float |
3.14 |
Floating-point numbers |
Bool |
true, false |
Boolean values |
String |
"hello" |
Text strings |
List |
[1, 2, 3] |
Ordered, heterogeneous collections |
Dict |
{x: 1, y: 2} |
Key-value maps with string keys |
Vector |
Column data | Typed arrays (from DataFrames) |
DataFrame |
read_csv("data.csv") |
Tabular data (rows × columns) |
Pipeline |
pipeline { ... } |
DAG-based execution graph |
Function |
\(x) x + 1 |
First-class functions |
NA |
NA |
Explicit missing value |
Error |
error("msg") |
Structured error value |
Null |
null |
Absence of value |
Intent |
intent { ... } |
LLM-friendly metadata block |
x = 42
name = "Alice"
active = true
Variables are bound with =. All values are
immutable.
2 + 3 -- 5
10 - 3 -- 7
4 * 5 -- 20
15 / 3 -- 5
1 + 2.5 -- 3.5 (mixed int/float is promoted)
"hi" + " T" -- "hi T" (string concatenation)
Operator precedence follows standard mathematical conventions. Parentheses override precedence.
5 == 5 -- true
5 != 3 -- true
3 < 5 -- true
5 > 3 -- true
5 >= 5 -- true
2 <= 2 -- true
true and true -- true
true and false -- false
false or true -- true
not true -- false
T supports two equivalent function definition syntaxes:
-- R-style lambda (preferred)
square = \(x) x * x
-- function keyword
square = function(x) x * x
add = \(a, b) a + b
add(3, 7) -- 10
Functions capture their enclosing environment:
make_adder = \(n) \(x) x + n
add5 = make_adder(5)
add5(10) -- 15
numbers = [1, 2, 3, 4, 5]
map(numbers, \(x) x * x) -- [1, 4, 9, 16, 25]
filter(numbers, \(x) x > 3) -- [4, 5]
The pipe operator |> passes the left-hand value as
the first argument to the right-hand function:
5 |> \(x) x * 2 -- 10
5 |> add(3) -- 8 (equivalent to add(5, 3))
[1, 2, 3] |> map(\(x) x * x) |> sum -- 14
Pipes work across lines for readability:
[1, 2, 3, 4, 5]
|> map(\(x) x * x)
|> sum -- 55
result = if (5 > 3) "yes" else "no" -- "yes"
Conditionals are expressions — they return values.
numbers = [1, 2, 3]
length(numbers) -- 3
head(numbers) -- 1
tail(numbers) -- [2, 3]
person = [name: "Alice", age: 30]
person.name -- "Alice"
person.age -- 30
config = {host: "localhost", port: 8080}
config.host -- "localhost"
config.port -- 8080
T uses explicit NA values with type tags. NA does
not propagate implicitly — operations on NA produce
errors:
NA -- untyped NA
na_int() -- typed NA (Int)
na_float() -- typed NA (Float)
na_bool() -- typed NA (Bool)
na_string() -- typed NA (String)
is_na(NA) -- true
is_na(42) -- false
-- NA does not propagate
NA + 1 -- Error(TypeError: "Operation on NA...")
Errors are values, not exceptions. Failed operations return
structured Error values:
result = 1 / 0
is_error(result) -- true
error_code(result) -- "DivisionByZero"
error_message(result) -- "Division by zero"
-- Custom errors
err = error("something broke")
err = error("ValueError", "invalid input")
assert(2 + 2 == 4) -- true
assert(false, "custom message") -- Error(AssertionError: ...)
DataFrames are T’s first-class tabular data type, loaded from CSV:
df = read_csv("data.csv")
nrow(df) -- number of rows
ncol(df) -- number of columns
colnames(df) -- list of column names
df.age -- column as Vector
T provides six core data verbs:
-- Select columns
df |> select("name", "age")
-- Filter rows
df |> filter(\(row) row.age > 25)
-- Add/transform columns
df |> mutate("age_plus_10", \(row) row.age + 10)
-- Sort rows
df |> arrange("age") -- ascending
df |> arrange("age", "desc") -- descending
-- Group and summarize
df |> group_by("dept")
|> summarize("count", \(g) nrow(g))
Pipelines define named computation nodes with automatic dependency resolution:
p = pipeline {
data = read_csv("sales.csv")
filtered = filter(data, \(row) row.amount > 100)
total = filtered |> select("amount") |> \(d) sum(d.amount)
}
p.data -- access node result
p.total -- access computed value
Pipeline features: - Automatic dependency
resolution: Nodes can be declared in any order -
Deterministic execution: Same inputs always produce
same outputs - Cycle detection: Circular dependencies
are caught and reported - Introspection:
pipeline_nodes(), pipeline_deps(),
pipeline_node() - Re-run:
pipeline_run() re-executes the pipeline
All packages are loaded automatically at startup:
| Package | Functions |
|---|---|
core |
print, type, length,
head, tail, map,
filter, sum, seq |
base |
assert, is_na, na,
na_int, na_float, na_bool,
na_string, error, is_error,
error_code, error_message,
error_context |
math |
sqrt, abs, log,
exp, pow |
stats |
mean, sd, quantile,
cor, lm |
dataframe |
read_csv, colnames, nrow,
ncol |
colcraft |
select, filter, mutate,
arrange, group_by, summarize |
pipeline |
pipeline_nodes, pipeline_deps,
pipeline_node, pipeline_run |
explain |
explain, intent_fields,
intent_get |
Pure numerical primitives that work on scalars and vectors:
sqrt(4) -- 2.0
abs(0 - 5) -- 5
log(10) -- 2.30258509299
exp(1) -- 2.71828182846
pow(2, 3) -- 8.0
Statistical summaries that work on lists and vectors:
mean([1, 2, 3, 4, 5]) -- 3.0
sd([2, 4, 4, 4, 5, 5, 7, 9]) -- 2.1380899353
quantile([1, 2, 3, 4, 5], 0.5) -- 3.0 (median)
cor(df.x, df.y) -- correlation coefficient
lm(df, "y", "x") -- linear regression model
type(42) -- "Int"
type(df) -- "DataFrame"
explain(df) -- detailed DataFrame summary
packages() -- list of loaded packages
package_info("stats") -- package details
Intent blocks provide structured metadata for LLM-assisted workflows:
i = intent {
description: "Summarize user activity",
assumes: "Data excludes inactive users"
}
intent_fields(i) -- Dict of all fields
intent_get(i, "description") -- specific field value
-- This is a single-line comment
The type() function returns the type name of any value
as a string:
type(42) -- "Int"
type(3.14) -- "Float"
type(true) -- "Bool"
type("hello") -- "String"
type([1, 2]) -- "List"
type({x: 1}) -- "Dict"
type(NA) -- "NA"
type(1 / 0) -- "Error"
type(null) -- "Null"