T Language Overview

Version: Alpha 0.1
Status: Syntax and semantics frozen for alpha

T is a functional programming language designed for declarative, tabular data manipulation. It combines the pipeline-driven style of R’s tidyverse with OCaml’s type discipline, producing a small, focused language for data wrangling and basic statistics.

Core Concepts

Values and Types

T supports the following value types:

Type	Example	Description
`Int`	`42`	Integer numbers
`Float`	`3.14`	Floating-point numbers
`Bool`	`true`, `false`	Boolean values
`String`	`"hello"`	Text strings
`List`	`[1, 2, 3]`	Ordered, heterogeneous collections
`Dict`	`{x: 1, y: 2}`	Key-value maps with string keys
`Vector`	Column data	Typed arrays (from DataFrames)
`DataFrame`	`read_csv("data.csv")`	Tabular data (rows × columns)
`Pipeline`	`pipeline { ... }`	DAG-based execution graph
`Function`	`\(x) x + 1`	First-class functions
`NA`	`NA`	Explicit missing value
`Error`	`error("msg")`	Structured error value
`Null`	`null`	Absence of value
`Intent`	`intent { ... }`	LLM-friendly metadata block

Variables and Assignment

x = 42
name = "Alice"
active = true

Variables are bound with =. All values are immutable.

Arithmetic and Operators

2 + 3       -- 5
10 - 3      -- 7
4 * 5       -- 20
15 / 3      -- 5
1 + 2.5     -- 3.5 (mixed int/float is promoted)
"hi" + " T" -- "hi T" (string concatenation)

Operator precedence follows standard mathematical conventions. Parentheses override precedence.

Comparisons

5 == 5    -- true
5 != 3    -- true
3 < 5     -- true
5 > 3     -- true
5 >= 5    -- true
2 <= 2    -- true

Boolean Logic

true and true   -- true
true and false  -- false
false or true   -- true
not true        -- false

Functions

Lambda Syntax

T supports two equivalent function definition syntaxes:

-- R-style lambda (preferred)
square = \(x) x * x

-- function keyword
square = function(x) x * x

Multi-Argument Functions

add = \(a, b) a + b
add(3, 7)  -- 10

Closures

Functions capture their enclosing environment:

make_adder = \(n) \(x) x + n
add5 = make_adder(5)
add5(10)  -- 15

Higher-Order Functions

numbers = [1, 2, 3, 4, 5]
map(numbers, \(x) x * x)     -- [1, 4, 9, 16, 25]
filter(numbers, \(x) x > 3)  -- [4, 5]

Pipe Operator

The pipe operator |> passes the left-hand value as the first argument to the right-hand function:

5 |> \(x) x * 2           -- 10
5 |> add(3)                -- 8 (equivalent to add(5, 3))
[1, 2, 3] |> map(\(x) x * x) |> sum  -- 14

Pipes work across lines for readability:

[1, 2, 3, 4, 5]
  |> map(\(x) x * x)
  |> sum                   -- 55

Conditionals

result = if (5 > 3) "yes" else "no"   -- "yes"

Conditionals are expressions — they return values.

Collections

Lists

numbers = [1, 2, 3]
length(numbers)  -- 3
head(numbers)    -- 1
tail(numbers)    -- [2, 3]

Named Lists

person = [name: "Alice", age: 30]
person.name  -- "Alice"
person.age   -- 30

Dictionaries

config = {host: "localhost", port: 8080}
config.host  -- "localhost"
config.port  -- 8080

Missing Values (NA)

T uses explicit NA values with type tags. NA does not propagate implicitly — operations on NA produce errors:

NA             -- untyped NA
na_int()       -- typed NA (Int)
na_float()     -- typed NA (Float)
na_bool()      -- typed NA (Bool)
na_string()    -- typed NA (String)

is_na(NA)      -- true
is_na(42)      -- false

-- NA does not propagate
NA + 1         -- Error(TypeError: "Operation on NA...")

Error Handling

Errors are values, not exceptions. Failed operations return structured Error values:

result = 1 / 0
is_error(result)     -- true
error_code(result)   -- "DivisionByZero"
error_message(result) -- "Division by zero"

-- Custom errors
err = error("something broke")
err = error("ValueError", "invalid input")

Assert

assert(2 + 2 == 4)                -- true
assert(false, "custom message")   -- Error(AssertionError: ...)

DataFrames

DataFrames are T’s first-class tabular data type, loaded from CSV:

df = read_csv("data.csv")
nrow(df)      -- number of rows
ncol(df)      -- number of columns
colnames(df)  -- list of column names
df.age        -- column as Vector

Data Manipulation

T provides six core data verbs:

-- Select columns
df |> select("name", "age")

-- Filter rows
df |> filter(\(row) row.age > 25)

-- Add/transform columns
df |> mutate("age_plus_10", \(row) row.age + 10)

-- Sort rows
df |> arrange("age")        -- ascending
df |> arrange("age", "desc") -- descending

-- Group and summarize
df |> group_by("dept")
   |> summarize("count", \(g) nrow(g))

Pipelines

Pipelines define named computation nodes with automatic dependency resolution:

p = pipeline {
  data = read_csv("sales.csv")
  filtered = filter(data, \(row) row.amount > 100)
  total = filtered |> select("amount") |> \(d) sum(d.amount)
}

p.data       -- access node result
p.total      -- access computed value

Pipeline features: - Automatic dependency resolution: Nodes can be declared in any order - Deterministic execution: Same inputs always produce same outputs - Cycle detection: Circular dependencies are caught and reported - Introspection: pipeline_nodes(), pipeline_deps(), pipeline_node() - Re-run: pipeline_run() re-executes the pipeline

Standard Library

All packages are loaded automatically at startup:

Package	Functions
`core`	`print`, `type`, `length`, `head`, `tail`, `map`, `filter`, `sum`, `seq`
`base`	`assert`, `is_na`, `na`, `na_int`, `na_float`, `na_bool`, `na_string`, `error`, `is_error`, `error_code`, `error_message`, `error_context`
`math`	`sqrt`, `abs`, `log`, `exp`, `pow`
`stats`	`mean`, `sd`, `quantile`, `cor`, `lm`
`dataframe`	`read_csv`, `colnames`, `nrow`, `ncol`
`colcraft`	`select`, `filter`, `mutate`, `arrange`, `group_by`, `summarize`
`pipeline`	`pipeline_nodes`, `pipeline_deps`, `pipeline_node`, `pipeline_run`
`explain`	`explain`, `intent_fields`, `intent_get`

Math Functions

Pure numerical primitives that work on scalars and vectors:

sqrt(4)        -- 2.0
abs(0 - 5)     -- 5
log(10)        -- 2.30258509299
exp(1)         -- 2.71828182846
pow(2, 3)      -- 8.0

Stats Functions

Statistical summaries that work on lists and vectors:

mean([1, 2, 3, 4, 5])             -- 3.0
sd([2, 4, 4, 4, 5, 5, 7, 9])     -- 2.1380899353
quantile([1, 2, 3, 4, 5], 0.5)   -- 3.0 (median)
cor(df.x, df.y)                   -- correlation coefficient
lm(df, "y", "x")                 -- linear regression model

Introspection

type(42)          -- "Int"
type(df)          -- "DataFrame"
explain(df)       -- detailed DataFrame summary
packages()        -- list of loaded packages
package_info("stats")  -- package details

Intent Blocks

Intent blocks provide structured metadata for LLM-assisted workflows:

i = intent {
  description: "Summarize user activity",
  assumes: "Data excludes inactive users"
}

intent_fields(i)            -- Dict of all fields
intent_get(i, "description") -- specific field value

Comments

-- This is a single-line comment

Type Introspection

The type() function returns the type name of any value as a string:

type(42)             -- "Int"
type(3.14)           -- "Float"
type(true)           -- "Bool"
type("hello")        -- "String"
type([1, 2])         -- "List"
type({x: 1})         -- "Dict"
type(NA)             -- "NA"
type(1 / 0)          -- "Error"
type(null)           -- "Null"

← Back to Home

View Source on GitHub