The properties of a CPO
object determine the kind of data the CPO will be able to handle, and how
it transforms data. Properties describe what kind of data a CPO can work with.
By default, this function returns a list of three values: $handling
, $adding
, and
$needed
.
The $handling
determines what data the CPO handles. If a CPO is applied to a data set
(using %>>%
or applyCPO
, or indirectly when a CPOLearner
is trained)
that has a property not listed in $handling
, an error will be given.
$adding
can be one or many of the same values as $handling
. These properties
get added to a Learner
or CPO coming after / behind this CPO. When a CPO imputes missing values, for example,
this is “missings”. This is always a subset of $handling
.
$properties.needed
can be one or many of the same values as $handling
. These properties
are required from a Learner (or CPO) coming after / behind this CPO. E.g., when a CPO converts factors to
numerics, this is “numerics” (and $adding
would be “factors” in this case).
$adding
and $needed
never have any value in common.
There are two more properties mostly for internal usage: $adding.min
and $needed.max
.
These are for internal checking of trafo / retrafo function return values: If some
hyperparameter settings lead to a CPO returning values not conforming to properties (e.g. not
removing all ‘missings’, or creating ‘missings’ where there were none before),
while in other cases the CPO does conform, it is desirable to treat the CPO like
it behaves in the best case (and rely on the user to make good hyperparameter choices).
The properties discussed so far thus represent the CPO on its ‘best’ behaviour.
Internally, each CPO also has a list of properties that it minimally ‘adds’ to its successors
or maximally ‘needs’ from it in the worst case. These are $adding.min
and $needed.max
.
$adding.min
is always a subset of $adding
, $needed.max
is always a superset of needed
.
Their compliance is checked by the CPO framework, so a CPO that doesn't conform to these crashes.
getCPOProperties(cpo, only.data = FALSE, get.internal = FALSE) # S3 method for CPOTrained getCPOProperties(cpo, only.data = FALSE, get.internal = FALSE)
cpo | [ |
---|---|
only.data | [ |
get.internal | [ |
[list
]. A list
with slots $handling
, $adding
, and $needed
;
also $adding.min
and $needed.max
if get.internal
is TRUE
.
“numerics”, “factors”, “ordered”, “missings”:
Whether any data column contains the type in question, or has missings. When only.data
is TRUE
, only these are returned.
“cluster” “classif” “multilabel” “regr” “surv”:
The type of the task. data.frame
data objects have the implicit property “cluster”.
“oneclass” “twoclass” “multiclass”:
Whether the target column of a classif
task has one, two, or more classes.
Other getters and setters:
CPO
,
getCPOAffect()
,
getCPOClass()
,
getCPOConstructor()
,
getCPOId()
,
getCPOName()
,
getCPOOperatingType()
,
getCPOPredictType()
,
getCPOTrainedCPO()
,
getCPOTrainedCapability()
,
setCPOId()
Other retrafo related:
CPOTrained
,
NULLCPO
,
%>>%()
,
applyCPO()
,
as.list.CPO
,
clearRI()
,
getCPOClass()
,
getCPOName()
,
getCPOOperatingType()
,
getCPOPredictType()
,
getCPOTrainedCPO()
,
getCPOTrainedCapability()
,
getCPOTrainedState()
,
is.retrafo()
,
makeCPOTrainedFromState()
,
pipeCPO()
,
print.CPOConstructor()
Other inverter related:
CPOTrained
,
NULLCPO
,
%>>%()
,
applyCPO()
,
as.list.CPO
,
clearRI()
,
getCPOClass()
,
getCPOName()
,
getCPOOperatingType()
,
getCPOPredictType()
,
getCPOTrainedCPO()
,
getCPOTrainedCapability()
,
getCPOTrainedState()
,
is.inverter()
,
makeCPOTrainedFromState()
,
pipeCPO()
,
print.CPOConstructor()