Composable Preprocessing Operators, or CPO, are the central entity provided by the mlrCPO package. CPOs can perform operations on a data.frame or a Task, for the latter even modifying target values and converting between different Task types.

CPOs can be “composed” using the %>>% operator, the composeCPO function, or the pipeCPO function, to create new (“compound”) operators that perform multiple operations in a pipeline. While all CPOs have the class “CPO”, primitive (i.e. not compound) CPOs have the additional class “CPOPrimitive”, and compound CPOs have the class “CPOPipeline”. It is possible to split a compound CPOs into its primitive constituents using as.list.CPO.

CPOs can be “attached” to a mlr-Learner objects to create CPOLearners, using the %>>% operator, or the attachCPO function. These CPOLearners fit the model specified by the Learner to the data after applying the attached CPO. Many CPOs can be attached to a Learner sequentially, or in form of a compound CPO.

CPOs can be “applied” to a data.frame or a Task using the %>>% operator, or the applyCPO function. Applying a CPO performs the operations specified by the (possibly compound) CPO, and returns the modified data. This data also contains a “retrafo” and and “inverter” tag, which can be accessed using the retrafo and inverter functions to get CPORetrafo and CPOInverter objects, respectively. These objects represent the “trained” CPOs that can be used when performing validation or predictions with new data.

Hyperparameters

CPOs can have hyperparameters that determine how they operate on data. These hyperparameters can be set during construction, as function parameters of the CPOConstructor, or they can potentially be modified later as exported hyperparameters. Which hyperparameters are exported is controlled using the export parameter of the CPOConstructor when the CPO was created. Hyperparameters can be listed using getParamSet, queried using getHyperPars and set using setHyperPars.

S3 properties

A CPO object should be treated as an opaque object and should only be queried / modified using the given set* and get* functions. A list of them is given below in the section “See Also”--“cpo-operations”.

Special CPO

A special CPO is NULLCPO, which functions as the neutral element of the %>>% operator and represents the identity operation on data.

See also

Examples

class(cpoPca()) # c("CPOPrimitive", "CPO")
#> [1] "CPOPrimitive" "CPO"
class(cpoPca() %>>% cpoScale()) # c("CPOPipeline", "CPO")
#> [1] "CPOPipeline" "CPO"
print(cpoPca() %>>% cpoScale(), verbose = TRUE)
#> Trafo chain of 2 cpos: #> pca(center = TRUE, scale = FALSE)[not exp'd: tol = <NULL>, rank = <NULL>] #> Operating: feature #> ParamSet: #> Type len Def Constr Req Tunable Trafo #> pca.center logical - TRUE - - TRUE - #> pca.scale logical - FALSE - - TRUE - #> ====> #> scale(center = TRUE, scale = TRUE) #> Operating: feature #> ParamSet: #> Type len Def Constr Req Tunable Trafo #> scale.center logical - TRUE - - TRUE - #> scale.scale logical - TRUE - - TRUE -
getHyperPars(cpoScale(center = FALSE))
#> $scale.center #> [1] FALSE #> #> $scale.scale #> [1] TRUE #>
head(getTaskData(iris.task %>>% cpoScale()))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa #> 2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa #> 3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa #> 4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa #> 5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa #> 6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa