Factors and `fct_*` Helpers in T

This guide explains how factors work in T, how to use to_factor(), and how the fct_* family helps you reorder, relabel, and combine categorical data.

The Basic Idea

A to_factor is categorical data with an explicit list of levels.

That level list matters because operations such as arrange() use the level order instead of alphabetical order.

sizes = to_factor(["medium", "small", "large"], levels = ["small", "medium", "large"])
levels(sizes)
-- ["small", "medium", "large"]

This makes factors useful for ordered categories such as:

shirt sizes,
survey responses,
month names,
reporting buckets.

Creating Factors

`to_factor()` — explicit or derived levels

Use to_factor() to create categorical data. If you provide levels, it uses that exact order. If not, it derives unique levels from the data and sorts them alphabetically by default.

priority = to_factor(
  ["medium", "low", "high", "medium"],
  levels = ["low", "medium", "high"]
)

status = to_factor(["new", "in_progress", "done", "new"])
levels(status)
-- ["done", "in_progress", "new"]

`ordered()` — ordered factors

Use ordered() when the order is meaningful and should be preserved as an ordered to_factor.

ratings = ordered(
  ["bad", "ok", "great"],
  levels = ["bad", "ok", "great"]
)

Why the `fct_*` Prefix Exists

The fct_* prefix is used for helpers that manipulate to_factor levels after creation.

These helpers are analogous to the to_factor tools popularized by forcats in R:

they keep the input as to_factor data,
they operate on levels or to_factor ordering,
and they make to_factor-specific intent obvious in a pipeline.

Examples:

fct_infreq(x)
fct_rev(x)
fct_recode(x, LARGE = "large")
fct_reorder(x, scores)
fct_lump_n(x, n = 3)

Core Factor Workflow

Inspect levels

levels(priority)

Reorder levels by frequency

df |> mutate($segment = fct_infreq($segment))

Reverse the current order

df |> mutate($segment = fct_rev($segment))

Recode level names

df |> mutate($segment = fct_recode($segment, ENTERPRISE = "enterprise", SMB = "small_business"))

Reorder levels using another variable

df |> mutate($segment = fct_reorder($segment, $revenue))

Move selected levels to the front or after a position

df |> mutate($segment = fct_relevel($segment, "enterprise", "midmarket"))

Collapse several levels into broader groups

df |> mutate($segment = fct_collapse($segment, commercial = ["enterprise", "midmarket"], self_serve = "small_business"))

Lumping and “Other”

The fct_lump_* helpers keep the most important levels and group the rest into an Other bucket by default.

Keep the top `n` levels

fct_lump_n($species, n = 2)

Keep levels with at least a minimum count

fct_lump_min(to_factor(["a", "a", "b", "c"]), 2)
levels(fct_lump_min(to_factor(["a", "a", "b", "c"]), 2))
-- ["a", "Other"]

Keep levels above a minimum proportion

fct_lump_prop($segment, 0.10)

You can also set a custom replacement label with other_level = "Misc".

Other Useful Helpers

Keep or drop selected levels with `fct_other()`

levels(fct_other(to_factor(["a", "b", "c"]), keep = ["a"]))
-- ["a", "Other"]

Remove unused levels with `fct_drop()`

levels(fct_drop(to_factor(["a", "b"], levels = ["a", "b", "c"])))
-- ["a", "b"]

Add levels without changing existing values with `fct_expand()`

levels(fct_expand(to_factor(["a"]), "b", "c"))
-- ["a", "b", "c"]

Combine factors with unified levels using `fct_c()`

levels(fct_c(to_factor(["a"], levels = ["a", "b"]), to_factor(["c"])))
-- ["a", "b", "c"]

Sorting with Factors

A to_factor keeps its declared level order during sorting.

df = crossing(size = ["medium", "small", "large"], id = [1, 2])

df
  |> mutate($size_fct = to_factor($size, levels = ["small", "medium", "large"]))
  |> arrange($size_fct)

This sorts rows by small, then medium, then large, even though alphabetical order would be different.

Choosing the Right Helper

Use:

to_factor() when you want explicit or derived levels,
ordered() when the to_factor should be marked as ordered,
fct_* helpers when changing levels after creation,
levels() when you need to inspect the current level set.

These to_factor helpers are currently implemented alongside the data-manipulation verbs in T’s colcraft package, but the naming convention is the same idea you would expect from a dedicated to_factor-toolkit family: to_factor creation helpers plus fct_* level-manipulation helpers.

Next Steps

Now that you can handle categorical data, explore vector operations and statistical modeling in T:

Arrays and Matrices — Vector and matrix operations.
Formulas and Models — Statistical modeling in T.
Pipeline Tutorial — Build reproducible data pipelines.
API Reference — Complete function reference by package.

Factors and fct_* Helpers in T