Formulas provide a declarative way to specify statistical models, inspired by R.
response ~ predictor
The ~ operator creates a Formula object that can be
passed to modeling functions.
-- Simple linear regression
model = lm(data = df, formula = y ~ x)
-- Multiple regression
model = lm(data = df, formula = y ~ x1 + x2 + x3)
-- Interaction terms
model = lm(data = df, formula = y ~ x1 * x2)
lm() - Linear regressionlm() - Linear
RegressionFit a linear model using least squares.
lm(data: DataFrame, formula: Formula, ...) -> Dict
data: DataFrame containing the variablesformula: Formula specifying the model (e.g.,
y ~ x)Dictionary containing: - formula: The model formula -
coefficients: Dictionary of term estimates -
std_errors: Dictionary of standard errors -
r_squared: R² statistic - adj_r_squared:
Adjusted R² statistic - sigma: Residual standard error -
nobs: Number of observations - _tidy_df: Tidy
DataFrame of results
data = read_csv("mtcars.csv")
model = lm(data = data, formula = mpg ~ hp)
print(model.r_squared)
y ~ x + 1 vs
y ~ x - 1y ~ log(x)Now that you’ve explored formulas, learn about statistical modeling and pipelines in T: