Get the Retransformation or Inversion Function from a Resulting Object

When applying a CPO to a data.frame or Task, the data is not only changed, additionally a retransformation and an inversion object is created that can be applied to other data of the same kind. This is useful if new data (for prediction or validation) is to be handled in the same machine learning procedure.

For example, when performing PCA on training data using cpoPca, the rotation matrix is saved and can be used on new (prediction) data. As another example, consider a log-transformation of the target column in a regression problem. When predictions are made with new data, it may be useful to invert the transformation on the predicted values by exponentiating them.

The information created when a CPO is applied is saved in a CPORetrafo object, and a CPOInverter object, which are both saved as attributes. The retrafo and inverter function retrieve these objects. It is furthermore possible to set these attributes using the retrafo<- and inverter<- functions, using constructs like retrafo(data) <- retr.obj. The retrafo or inverter attributes can be reset individually by setting them to NULL: retrafo(data) <- NULL, or by using the clearRI function.

When chaining %>>% on a data object, the retrafo and inverter associated with the result is also chained automatically. Beware, however, that this just accesses the retrafo attribute internally. Therefore, if you plan to do apply multiple transformations with other operations in between, make sure to reset the retrafo function by setting it to NULL, or using the clearRI function. See examples.

retrafo(data)

inverter(data)

retrafo(data) <- value

inverter(data) <- value

Arguments

data	[`data.frame` \| `Task` \| `WrappedModel`] The result of a `CPO` applied to a data set.
value	[`CPOTrained` \| NULL] The retrafo or inverter to set. This must either be a `CPORetrafo` for `retrafo<-` or a `CPOInverter` for `inverter<-`, or `NULL` to reset the `retrafo` or `inverter` attributes.

Value

[CPOTrained]. The retransformation function that can be applied to new data. This is a CPORetrafo object for retrafo or a CPOInverter object for inverter.

`CPORetrafo` and `CPOInverter`

CPORetrafo and CPOInverter objects are members of the CPOTrained class, which can be handled similarly to CPO objects: Their hyperparameters can be inspected using getParamSet and link[mlr]{getHyperPars}, print.CPOTrained is used for (possibly verbose) printing. To apply the retrafo or inverter transformation represented by the object to data, use the applyCPO or %>>% function.

CPOTrained objects can be chained using %>>% or pipeCPO, and broken into primitives using as.list.CPOTrained. However, since the CPOTrained objects represent transformations that relate closely to the data used to train it (and therefore to the position within a CPO pipeline), it is only advisable to chain or break apart CPOTrained pipes for inspection, or if you really know what you are doing.

(Primitive) CPORetrafo objects can be inspected using getCPOTrainedState, and it is possible to create new CPORetrafo objects from (possibly modified) retrafo state using makeCPOTrainedFromState.

Difference between `CPORetrafo` and `CPOInverter`

The fundamental difference between CPORetrafo and CPOInverter is that a CPORetrafo is created only when a CPO is applied to a data set, and is used to perform the same transformation on new (prediction) data. The CPOInverter is created whenever a CPO or CPORetrafo is applied to data (whether training or prediction data). It is in fact used to invert the transformation done to the target column of a Task. Since this operation may depend on the new prediction data, and not only on the training data fed to the CPO when the CPORetrafo was created, the CPOInverter object is more closely bound to the particular data set used to create it.

In some cases a target transformation is independent of the data used to create it (e.g. log-transform of a regression target column); in that case the CPORetrafo can be used with invert. This is the concept of CPOTrainedCapability, which can be queried using getCPOTrainedCapability.

Using `CPORetrafo`

CPORetrafo objects can be applied to new data sets using the %>>% operator, the applyCPO generic, or the predict generic, all of which perform the same action.

Using `CPOInverter`

To use a CPOInverter, use the invert function.

Examples

traindat = subsetTask(pid.task, 1:400)
preddat = subsetTask(pid.task, 401:768)

trained = traindat %>>% cpoPca()
reFun = retrafo(trained)
predicted = preddat %>>% reFun
head(getTaskData(predicted))
#>     diabetes        PC1        PC2        PC3        PC4        PC5        PC6
#> 401      pos -84.533128 -16.377902   5.953769  15.471857 -2.5141001  6.1500829
#> 402      neg -80.383496  27.275779   9.607740  17.130030 15.8863960 -5.2785909
#> 403      pos   9.548944  13.030095 -17.953331 -17.919489  2.2009380 -4.1929553
#> 404      neg -85.240015 -39.456703 -15.756013   0.639444  9.4488413 -0.4820485
#> 405      pos -77.210758  56.209452  10.437033   7.524593  0.1793082  2.0727927
#> 406      neg  83.877962  -9.720671  19.047282 -13.157908  0.8537302  9.1316199
#>            PC7        PC8
#> 401  0.5578393 0.20910205
#> 402 -1.8236278 0.31616133
#> 403  0.4826818 0.25252337
#> 404  4.1093591 0.07530882
#> 405 -0.4976554 0.34331083
#> 406 -0.9156816 0.07432789

# chaining works
trained = traindat %>>% cpoPca() %>>% cpoScale()
reFun = retrafo(trained)
predicted = preddat %>>% reFun
head(getTaskData(predicted))
#>     diabetes        PC1        PC2        PC3         PC4         PC5
#> 401      pos -0.6907341 -0.5287483  0.3071310  1.11929566 -0.24772162
#> 402      neg -0.6568268  0.8805780  0.4956246  1.23925448  1.56533296
#> 403      pos  0.0780260  0.4206668 -0.9261401 -1.29636705  0.21686484
#> 404      neg -0.6965102 -1.2738300 -0.8127893  0.04625992  0.93102191
#> 405      pos -0.6309018  1.8146799  0.5384045  0.54435893  0.01766776
#> 406      neg  0.6853806 -0.3138246  0.9825726 -0.95189535  0.08412053
#>             PC6        PC7       PC8
#> 401  0.86367029  0.2031152 0.6241936
#> 402 -0.74128466 -0.6640023 0.9437779
#> 403 -0.58882636  0.1757496 0.7538113
#> 404 -0.06769518  1.4962614 0.2248055
#> 405  0.29108705 -0.1812016 1.0248223
#> 406  1.28237437 -0.3334094 0.2218773

# reset the retrafo when doing other steps!

trained.tmp = traindat %>>% cpoPca()
reFun1 = retrafo(trained.tmp)

imp = impute(trained.tmp)
trained.tmp = imp$task  # nonsensical example
retrafo(trained.tmp) = NULL  # NECESSARY HERE

trained = trained.tmp %>>% cpoScale()

reFun2 = retrafo(trained)
predicted = getTaskData(reimpute(preddat %>>% reFun1, imp$desc),
  target.extra = TRUE)$data %>>% reFun2

Get the Retransformation or Inversion Function from a Resulting Object

Arguments

Value

`CPORetrafo` and `CPOInverter`

Difference between `CPORetrafo` and `CPOInverter`

Using `CPORetrafo`

Using `CPOInverter`

See also

Examples

Contents

Get the Retransformation or Inversion Function from a Resulting Object

Arguments

Value

CPORetrafo and CPOInverter

Difference between CPORetrafo and CPOInverter

Using CPORetrafo

Using CPOInverter

See also

Examples

Contents

`CPORetrafo` and `CPOInverter`

Difference between `CPORetrafo` and `CPOInverter`

Using `CPORetrafo`

Using `CPOInverter`