fct_*
Helpers in TThis guide explains how factors work in T, when to use
factor() versus fct(), and how the
fct_* family helps you reorder, relabel, and combine
categorical data.
A factor is categorical data with an explicit list of levels.
That level list matters because operations such as
arrange() use the level order instead of alphabetical
order.
sizes = factor(["medium", "small", "large"], levels = ["small", "medium", "large"])
levels(sizes)
-- ["small", "medium", "large"]
This makes factors useful for ordered categories such as:
factor() —
explicit categorical levelsUse factor() when you want to control the level order
yourself.
priority = factor(
["medium", "low", "high", "medium"],
levels = ["low", "medium", "high"]
)
If you do not provide levels, factor()
derives them from the data.
fct() — levels
follow first appearanceUse fct() when you want levels to keep the order in
which values first appear.
status = fct(["new", "in_progress", "done", "new"])
levels(status)
-- ["new", "in_progress", "done"]
as_factor() —
coerce existing valuesas_factor() is the convenient coercion form for turning
an existing vector or column into factor data.
df |> mutate($segment = as_factor($segment))
ordered() — ordered
factorsUse ordered() when the order is meaningful and should be
preserved as an ordered factor.
ratings = ordered(
["bad", "ok", "great"],
levels = ["bad", "ok", "great"]
)
fct_* Prefix
ExistsThe fct_* prefix is used for helpers that manipulate
factor levels after creation.
These helpers are analogous to the factor tools popularized by
forcats in R:
Examples:
fct_infreq(x)
fct_rev(x)
fct_recode(x, LARGE = "large")
fct_reorder(x, scores)
fct_lump_n(x, n = 3)
levels(priority)
df |> mutate($segment = fct_infreq($segment))
df |> mutate($segment = fct_rev($segment))
df |> mutate($segment = fct_recode($segment, ENTERPRISE = "enterprise", SMB = "small_business"))
df |> mutate($segment = fct_reorder($segment, $revenue))
df |> mutate($segment = fct_relevel($segment, "enterprise", "midmarket"))
df |> mutate($segment = fct_collapse($segment, commercial = ["enterprise", "midmarket"], self_serve = "small_business"))
The fct_lump_* helpers keep the most important levels
and group the rest into an Other bucket by default.
n levelsfct_lump_n($species, n = 2)
fct_lump_min(fct(["a", "a", "b", "c"]), 2)
levels(fct_lump_min(fct(["a", "a", "b", "c"]), 2))
-- ["a", "Other"]
fct_lump_prop($segment, 0.10)
You can also set a custom replacement label with
other_level = "Misc".
fct_other()levels(fct_other(fct(["a", "b", "c"]), keep = ["a"]))
-- ["a", "Other"]
fct_drop()levels(fct_drop(factor(["a", "b"], levels = ["a", "b", "c"])))
-- ["a", "b"]
fct_expand()levels(fct_expand(fct(["a"]), "b", "c"))
-- ["a", "b", "c"]
fct_c()levels(fct_c(fct(["a"], levels = ["a", "b"]), fct(["c"])))
-- ["a", "b", "c"]
A factor keeps its declared level order during sorting.
df = crossing(size = ["medium", "small", "large"], id = [1, 2])
df
|> mutate($size_fct = factor($size, levels = ["small", "medium", "large"]))
|> arrange($size_fct)
This sorts rows by small, then medium, then
large, even though alphabetical order would be
different.
Use:
factor() when you want explicit levels,fct() when you want first-appearance order,as_factor() when coercing an existing column,ordered() when the factor should be marked as
ordered,fct_* helpers when changing levels after creation,levels() when you need to inspect the current level
set.These factor helpers are currently implemented alongside the
data-manipulation verbs in T’s colcraft package, but the
naming convention is the same idea you would expect from a dedicated
factor-toolkit family: factor creation helpers plus fct_*
level-manipulation helpers.