R/attributes.R
CPOTrained.Rd
When applying a CPO
to a data.frame
or Task
,
the data is not only changed, additionally a retransformation and an inversion
object is created that can be applied to other data of the same
kind. This is useful if new data (for prediction or validation) is to be handled in the same machine learning
procedure.
For example, when performing PCA on training data using cpoPca
, the rotation
matrix is saved and can be used on new (prediction) data. As another example, consider
a log-transformation of the target column in a regression problem. When predictions are made with
new data, it may be useful to invert the transformation on the predicted values by exponentiating them.
The information created when a CPO
is applied is saved in a CPORetrafo
object, and a CPOInverter
object, which are both saved as attributes. The retrafo
and inverter
function
retrieve these objects. It is furthermore possible to set these attributes using the retrafo<-
and inverter<-
functions, using constructs like retrafo(data) <- retr.obj
. The retrafo
or inverter
attributes can be reset individually by setting them to NULL
:
retrafo(data) <- NULL
, or by using the clearRI
function.
When chaining %>>%
on a data object, the retrafo and inverter
associated with the result is also chained automatically. Beware,
however, that this just accesses the retrafo attribute internally. Therefore, if you plan to do apply
multiple transformations with other operations in between,
make sure to reset the retrafo function by setting it to NULL
, or using the clearRI
function. See examples.
retrafo(data) inverter(data) retrafo(data) <- value inverter(data) <- value
data | [ |
---|---|
value | [ |
[CPOTrained
]. The retransformation function that can be
applied to new data. This is a CPORetrafo
object for retrafo
or a CPOInverter
object for inverter
.
CPORetrafo
and CPOInverter
CPORetrafo
and CPOInverter
objects are members of the CPOTrained
class, which can be handled similarly to CPO objects:
Their hyperparameters can be inspected using getParamSet
and link[mlr]{getHyperPars}
,
print.CPOTrained
is used for (possibly verbose) printing. To apply the retrafo or inverter transformation represented by the
object to data, use the applyCPO
or %>>%
function.
CPOTrained
objects can be chained using %>>%
or pipeCPO
, and broken into primitives using as.list.CPOTrained
.
However, since the CPOTrained
objects represent transformations that relate closely to the data used to train it (and therefore
to the position within a CPO pipeline), it is only advisable to chain or break apart CPOTrained
pipes for inspection, or
if you really know what you are doing.
(Primitive) CPORetrafo
objects can be inspected using getCPOTrainedState
, and it is possible to create new CPORetrafo
objects from (possibly modified) retrafo state using makeCPOTrainedFromState
.
CPORetrafo
and CPOInverter
The fundamental difference between CPORetrafo
and CPOInverter
is that a CPORetrafo
is
created only when a CPO
is applied to a data set, and is used to perform the same transformation on new
(prediction) data. The CPOInverter
is created whenever a CPO
or CPORetrafo
is
applied to data (whether training or prediction data). It is in fact used to invert the transformation done to the target
column of a Task
. Since this operation may depend on the new prediction data, and not only on the training
data fed to the CPO
when the CPORetrafo
was created, the CPOInverter
object is more
closely bound to the particular data set used to create it.
In some cases a target transformation is independent of the data used to create it (e.g. log-transform of a regression target
column); in that case the CPORetrafo
can be used with invert
. This is the concept of
CPOTrainedCapability
, which can be queried using getCPOTrainedCapability
.
CPORetrafo
CPORetrafo
objects can be applied to new data sets using the %>>%
operator, the
applyCPO
generic, or the predict
generic, all of which perform the same action.
CPOInverter
To use a CPOInverter
, use the invert
function.
clearRI
about the problem of needing to reset retrafo
and inverter
attributes sometimes.
Other CPO lifecycle related:
CPOConstructor
,
CPOLearner
,
CPO
,
NULLCPO
,
%>>%()
,
attachCPO()
,
composeCPO()
,
getCPOClass()
,
getCPOConstructor()
,
getCPOTrainedCPO()
,
identicalCPO()
,
makeCPO()
Other retrafo related:
NULLCPO
,
%>>%()
,
applyCPO()
,
as.list.CPO
,
clearRI()
,
getCPOClass()
,
getCPOName()
,
getCPOOperatingType()
,
getCPOPredictType()
,
getCPOProperties()
,
getCPOTrainedCPO()
,
getCPOTrainedCapability()
,
getCPOTrainedState()
,
is.retrafo()
,
makeCPOTrainedFromState()
,
pipeCPO()
,
print.CPOConstructor()
Other inverter related:
NULLCPO
,
%>>%()
,
applyCPO()
,
as.list.CPO
,
clearRI()
,
getCPOClass()
,
getCPOName()
,
getCPOOperatingType()
,
getCPOPredictType()
,
getCPOProperties()
,
getCPOTrainedCPO()
,
getCPOTrainedCapability()
,
getCPOTrainedState()
,
is.inverter()
,
makeCPOTrainedFromState()
,
pipeCPO()
,
print.CPOConstructor()
traindat = subsetTask(pid.task, 1:400) preddat = subsetTask(pid.task, 401:768) trained = traindat %>>% cpoPca() reFun = retrafo(trained) predicted = preddat %>>% reFun head(getTaskData(predicted))#> diabetes PC1 PC2 PC3 PC4 PC5 PC6 #> 401 pos -84.533128 -16.377902 5.953769 15.471857 -2.5141001 6.1500829 #> 402 neg -80.383496 27.275779 9.607740 17.130030 15.8863960 -5.2785909 #> 403 pos 9.548944 13.030095 -17.953331 -17.919489 2.2009380 -4.1929553 #> 404 neg -85.240015 -39.456703 -15.756013 0.639444 9.4488413 -0.4820485 #> 405 pos -77.210758 56.209452 10.437033 7.524593 0.1793082 2.0727927 #> 406 neg 83.877962 -9.720671 19.047282 -13.157908 0.8537302 9.1316199 #> PC7 PC8 #> 401 0.5578393 0.20910205 #> 402 -1.8236278 0.31616133 #> 403 0.4826818 0.25252337 #> 404 4.1093591 0.07530882 #> 405 -0.4976554 0.34331083 #> 406 -0.9156816 0.07432789# chaining works trained = traindat %>>% cpoPca() %>>% cpoScale() reFun = retrafo(trained) predicted = preddat %>>% reFun head(getTaskData(predicted))#> diabetes PC1 PC2 PC3 PC4 PC5 #> 401 pos -0.6907341 -0.5287483 0.3071310 1.11929566 -0.24772162 #> 402 neg -0.6568268 0.8805780 0.4956246 1.23925448 1.56533296 #> 403 pos 0.0780260 0.4206668 -0.9261401 -1.29636705 0.21686484 #> 404 neg -0.6965102 -1.2738300 -0.8127893 0.04625992 0.93102191 #> 405 pos -0.6309018 1.8146799 0.5384045 0.54435893 0.01766776 #> 406 neg 0.6853806 -0.3138246 0.9825726 -0.95189535 0.08412053 #> PC6 PC7 PC8 #> 401 0.86367029 0.2031152 0.6241936 #> 402 -0.74128466 -0.6640023 0.9437779 #> 403 -0.58882636 0.1757496 0.7538113 #> 404 -0.06769518 1.4962614 0.2248055 #> 405 0.29108705 -0.1812016 1.0248223 #> 406 1.28237437 -0.3334094 0.2218773# reset the retrafo when doing other steps! trained.tmp = traindat %>>% cpoPca() reFun1 = retrafo(trained.tmp) imp = impute(trained.tmp) trained.tmp = imp$task # nonsensical example retrafo(trained.tmp) = NULL # NECESSARY HERE trained = trained.tmp %>>% cpoScale() reFun2 = retrafo(trained) predicted = getTaskData(reimpute(preddat %>>% reFun1, imp$desc), target.extra = TRUE)$data %>>% reFun2