T Programming Language - factors

Factors and fct_* Helpers in T

This guide explains how factors work in T, when to use factor() versus fct(), and how the fct_* family helps you reorder, relabel, and combine categorical data.


The Basic Idea

A factor is categorical data with an explicit list of levels.

That level list matters because operations such as arrange() use the level order instead of alphabetical order.

sizes = factor(["medium", "small", "large"], levels = ["small", "medium", "large"])
levels(sizes)
-- ["small", "medium", "large"]

This makes factors useful for ordered categories such as:


Creating Factors

factor() — explicit categorical levels

Use factor() when you want to control the level order yourself.

priority = factor(
  ["medium", "low", "high", "medium"],
  levels = ["low", "medium", "high"]
)

If you do not provide levels, factor() derives them from the data.

fct() — levels follow first appearance

Use fct() when you want levels to keep the order in which values first appear.

status = fct(["new", "in_progress", "done", "new"])
levels(status)
-- ["new", "in_progress", "done"]

as_factor() — coerce existing values

as_factor() is the convenient coercion form for turning an existing vector or column into factor data.

df |> mutate($segment = as_factor($segment))

ordered() — ordered factors

Use ordered() when the order is meaningful and should be preserved as an ordered factor.

ratings = ordered(
  ["bad", "ok", "great"],
  levels = ["bad", "ok", "great"]
)

Why the fct_* Prefix Exists

The fct_* prefix is used for helpers that manipulate factor levels after creation.

These helpers are analogous to the factor tools popularized by forcats in R:

Examples:

fct_infreq(x)
fct_rev(x)
fct_recode(x, LARGE = "large")
fct_reorder(x, scores)
fct_lump_n(x, n = 3)

Core Factor Workflow

Inspect levels

levels(priority)

Reorder levels by frequency

df |> mutate($segment = fct_infreq($segment))

Reverse the current order

df |> mutate($segment = fct_rev($segment))

Recode level names

df |> mutate($segment = fct_recode($segment, ENTERPRISE = "enterprise", SMB = "small_business"))

Reorder levels using another variable

df |> mutate($segment = fct_reorder($segment, $revenue))

Move selected levels to the front or after a position

df |> mutate($segment = fct_relevel($segment, "enterprise", "midmarket"))

Collapse several levels into broader groups

df |> mutate($segment = fct_collapse($segment, commercial = ["enterprise", "midmarket"], self_serve = "small_business"))

Lumping and “Other”

The fct_lump_* helpers keep the most important levels and group the rest into an Other bucket by default.

Keep the top n levels

fct_lump_n($species, n = 2)

Keep levels with at least a minimum count

fct_lump_min(fct(["a", "a", "b", "c"]), 2)
levels(fct_lump_min(fct(["a", "a", "b", "c"]), 2))
-- ["a", "Other"]

Keep levels above a minimum proportion

fct_lump_prop($segment, 0.10)

You can also set a custom replacement label with other_level = "Misc".


Other Useful Helpers

Keep or drop selected levels with fct_other()

levels(fct_other(fct(["a", "b", "c"]), keep = ["a"]))
-- ["a", "Other"]

Remove unused levels with fct_drop()

levels(fct_drop(factor(["a", "b"], levels = ["a", "b", "c"])))
-- ["a", "b"]

Add levels without changing existing values with fct_expand()

levels(fct_expand(fct(["a"]), "b", "c"))
-- ["a", "b", "c"]

Combine factors with unified levels using fct_c()

levels(fct_c(fct(["a"], levels = ["a", "b"]), fct(["c"])))
-- ["a", "b", "c"]

Sorting with Factors

A factor keeps its declared level order during sorting.

df = crossing(size = ["medium", "small", "large"], id = [1, 2])

df
  |> mutate($size_fct = factor($size, levels = ["small", "medium", "large"]))
  |> arrange($size_fct)

This sorts rows by small, then medium, then large, even though alphabetical order would be different.


Choosing the Right Helper

Use:


Package Note

These factor helpers are currently implemented alongside the data-manipulation verbs in T’s colcraft package, but the naming convention is the same idea you would expect from a dedicated factor-toolkit family: factor creation helpers plus fct_* level-manipulation helpers.