A practical guide to moving classical models between R, Python, and T with
^pmml
PMML is T’s interchange format for many classical statistical and tree-based models. It is useful when you want to:
predict(data, model)This tutorial focuses on the workflows that T supports today,
including the initial T-native t_write_pmml() pass-through
path for PMML artifacts that were already loaded from disk or a pipeline
node.
Use ^pmml when you want a model artifact that can cross
runtime boundaries while still being readable by T’s native model
tooling.
PMML is a good fit for:
sklearn2pmml can represent
faithfullyPMML is not a universal model format. If your model
family or preprocessing pipeline cannot be represented faithfully in
PMML, prefer ^onnx or keep scoring in the source
runtime.
PMML support is polyglot, so the exact requirements depend on where the model is produced or consumed.
| Workflow | Main Requirements |
|---|---|
| R writes PMML | r2pmml, XML, jsonlite,
jre |
| Python writes PMML | sklearn2pmml or jpmml-statsmodels, plus
jre |
| Python reads PMML | JPMML evaluator via wrapper |
| T reads and scores PMML | built-in t_read_pmml() + native evaluator |
When PMML is used inside pipelines, these dependencies should be declared explicitly in your project configuration so T can build the correct runtime environment.
The simplest workflow is:
^pmmlpredict()p = pipeline {
model_r = rn(
command = <{
data <- read.csv("mtcars.csv")
lm(mpg ~ wt + hp, data = data)
}>,
serializer = ^pmml
)
}
build_pipeline(p)
model = read_node("model_r")
At this point model is a T model object reconstructed
from the PMML artifact.
You can inspect it with the usual model helpers:
summary(model)
fit_stats(model)
coef(model)
Once a PMML model is in T, prediction looks the same as prediction for other supported model objects:
new_data = read_csv("mtcars_new.csv")
preds = predict(new_data, model)
This is the main attraction of PMML in T: training can happen in R or Python, but downstream prediction, summary, and fit-stat extraction can stay inside the T runtime.
For PMML-imported tree models, random forests, and supported boosted ensembles, this is already a strong workflow:
forest = t_read_pmml("tests/golden/data/iris_random_forest.pmml")
iris = read_csv("tests/golden/data/iris.csv")
predict(iris, forest)
R is the most natural PMML producer for many classical models.
p = pipeline {
train_df = node(
command = read_csv("mtcars.csv"),
serializer = ^csv
)
model_r = rn(
command = <{
glm(am ~ wt + hp, data = train_df, family = binomial())
}>,
deserializer = ^csv,
serializer = ^pmml
)
scored = node(
command = predict(read_csv("mtcars.csv"), model_r),
deserializer = ^pmml
)
}
The important part is the boundary:
serializer = ^pmmldeserializer = ^pmmlThis keeps the interchange contract explicit.
Python is a strong fit when your model is supported by
sklearn2pmml or the JPMML statsmodels bridge.
p = pipeline {
model_py = pyn(
command = <{
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state = 42)
clf.fit(X, y)
clf
}>,
serializer = ^pmml
)
scored = node(
command = predict(read_csv("iris.csv"), model_py),
deserializer = ^pmml
)
}
As with R, the boundary is explicit:
If the Python model cannot be represented faithfully in PMML, T should fail explicitly rather than silently switching to a different interchange story.
You can also bypass pipelines and load an existing PMML file manually:
model = t_read_pmml("model.pmml")
summary(model)
This is useful when:
t_write_pmml()T now provides an initial native t_write_pmml() path,
but it is intentionally narrow.
Today, t_write_pmml() supports pass-through
copying of PMML artifacts that were already loaded into T
from:
t_read_pmml("path.pmml")read_node("node_name") for PMML pipeline outputsExample:
model = t_read_pmml("model.pmml")
t_write_pmml(model, "model-copy.pmml")
This is useful when you want to:
t_write_pmml() does not yet export
arbitrary native T model objects to fresh PMML XML. If you construct or
fit a model natively in T and then call t_write_pmml(), T
should fail explicitly unless the value still carries an original PMML
source artifact.
That is deliberate: T currently exposes a safe pass-through path, not a pretend PMML writer for unsupported cases.
For now, the safest PMML workflow is:
^pmmlt_write_pmml() only to preserve or copy existing
PMML artifactsIn other words:
t_write_pmml() as an artifact-preservation
tool, not as a general exporterread_node()
returns an error about missing PMML dependenciesMake sure the runtime that produced or reads the PMML artifact declares the required packages in project configuration. PMML support is not implicit magic.
Check that jpmml-evaluator is available in
[additional-tools] and that pyarrow is
available in [py-dependencies]. Also ensure the JVM-backed
PMML toolchain is correctly provisioned.
t_write_pmml()
rejects a modelThat usually means the value does not carry a source PMML artifact
path. Reload it with t_read_pmml() or obtain it from
read_node() before calling t_write_pmml().
Verify that the model family and preprocessing steps are truly representable in PMML. The model coefficients are often not the problem; preprocessing fidelity usually is.
Now that you have a PMML workflow in place, continue with:
summary(),
fit_stats(), and predict().^pmml, ^onnx,
^arrow, and ^json fit into the broader
interchange system.