fct_*
Helpers in TThis guide explains how factors work in T, how to use
to_factor(), and how the fct_* family helps
you reorder, relabel, and combine categorical data.
A to_factor is categorical data with an explicit list of levels.
That level list matters because operations such as
arrange() use the level order instead of alphabetical
order.
sizes = to_factor(["medium", "small", "large"], levels = ["small", "medium", "large"])
levels(sizes)
-- ["small", "medium", "large"]
This makes factors useful for ordered categories such as:
to_factor() —
explicit or derived levelsUse to_factor() to create categorical data. If you
provide levels, it uses that exact order. If not, it
derives unique levels from the data and sorts them alphabetically by
default.
priority = to_factor(
["medium", "low", "high", "medium"],
levels = ["low", "medium", "high"]
)
status = to_factor(["new", "in_progress", "done", "new"])
levels(status)
-- ["done", "in_progress", "new"]
ordered() — ordered
factorsUse ordered() when the order is meaningful and should be
preserved as an ordered to_factor.
ratings = ordered(
["bad", "ok", "great"],
levels = ["bad", "ok", "great"]
)
fct_* Prefix
ExistsThe fct_* prefix is used for helpers that manipulate
to_factor levels after creation.
These helpers are analogous to the to_factor tools popularized by
forcats in R:
Examples:
fct_infreq(x)
fct_rev(x)
fct_recode(x, LARGE = "large")
fct_reorder(x, scores)
fct_lump_n(x, n = 3)
levels(priority)
df |> mutate($segment = fct_infreq($segment))
df |> mutate($segment = fct_rev($segment))
df |> mutate($segment = fct_recode($segment, ENTERPRISE = "enterprise", SMB = "small_business"))
df |> mutate($segment = fct_reorder($segment, $revenue))
df |> mutate($segment = fct_relevel($segment, "enterprise", "midmarket"))
df |> mutate($segment = fct_collapse($segment, commercial = ["enterprise", "midmarket"], self_serve = "small_business"))
The fct_lump_* helpers keep the most important levels
and group the rest into an Other bucket by default.
n levelsfct_lump_n($species, n = 2)
fct_lump_min(to_factor(["a", "a", "b", "c"]), 2)
levels(fct_lump_min(to_factor(["a", "a", "b", "c"]), 2))
-- ["a", "Other"]
fct_lump_prop($segment, 0.10)
You can also set a custom replacement label with
other_level = "Misc".
fct_other()levels(fct_other(to_factor(["a", "b", "c"]), keep = ["a"]))
-- ["a", "Other"]
fct_drop()levels(fct_drop(to_factor(["a", "b"], levels = ["a", "b", "c"])))
-- ["a", "b"]
fct_expand()levels(fct_expand(to_factor(["a"]), "b", "c"))
-- ["a", "b", "c"]
fct_c()levels(fct_c(to_factor(["a"], levels = ["a", "b"]), to_factor(["c"])))
-- ["a", "b", "c"]
A to_factor keeps its declared level order during sorting.
df = crossing(size = ["medium", "small", "large"], id = [1, 2])
df
|> mutate($size_fct = to_factor($size, levels = ["small", "medium", "large"]))
|> arrange($size_fct)
This sorts rows by small, then medium, then
large, even though alphabetical order would be
different.
Use:
to_factor() when you want explicit or derived
levels,ordered() when the to_factor should be marked as
ordered,fct_* helpers when changing levels after creation,levels() when you need to inspect the current level
set.These to_factor helpers are currently implemented alongside the
data-manipulation verbs in T’s colcraft package, but the
naming convention is the same idea you would expect from a dedicated
to_factor-toolkit family: to_factor creation helpers plus
fct_* level-manipulation helpers.
Now that you can handle categorical data, explore vector operations and statistical modeling in T: