The properties of a CPO object determine the kind of data the CPO will be able to handle, and how it transforms data. Properties describe what kind of data a CPO can work with.

By default, this function returns a list of three values: $handling, $adding, and $needed.

The $handling determines what data the CPO handles. If a CPO is applied to a data set (using %>>% or applyCPO, or indirectly when a CPOLearner is trained) that has a property not listed in $handling, an error will be given.

$adding can be one or many of the same values as $handling. These properties get added to a Learner or CPO coming after / behind this CPO. When a CPO imputes missing values, for example, this is “missings”. This is always a subset of $handling.

$properties.needed can be one or many of the same values as $handling. These properties are required from a Learner (or CPO) coming after / behind this CPO. E.g., when a CPO converts factors to numerics, this is “numerics” (and $adding would be “factors” in this case). $adding and $needed never have any value in common.

There are two more properties mostly for internal usage: $adding.min and $needed.max. These are for internal checking of trafo / retrafo function return values: If some hyperparameter settings lead to a CPO returning values not conforming to properties (e.g. not removing all ‘missings’, or creating ‘missings’ where there were none before), while in other cases the CPO does conform, it is desirable to treat the CPO like it behaves in the best case (and rely on the user to make good hyperparameter choices). The properties discussed so far thus represent the CPO on its ‘best’ behaviour. Internally, each CPO also has a list of properties that it minimally ‘adds’ to its successors or maximally ‘needs’ from it in the worst case. These are $adding.min and $needed.max. $adding.min is always a subset of $adding, $needed.max is always a superset of needed. Their compliance is checked by the CPO framework, so a CPO that doesn't conform to these crashes.

getCPOProperties(cpo, only.data = FALSE, get.internal = FALSE)

# S3 method for CPOTrained
getCPOProperties(cpo, only.data = FALSE, get.internal = FALSE)

Arguments

cpo

[CPO]
The cpo.

only.data

[logical(1)]
Only get the CPO data properties (not target or task type properties). Default is FALSE.

get.internal

[logical(1)]
Also retrieve $adding.min and $needed.max. Default is FALSE.

Value

[list]. A list with slots $handling, $adding, and $needed; also $adding.min and $needed.max if get.internal is TRUE.

Possible properties

data properties

“numerics”, “factors”, “ordered”, “missings”: Whether any data column contains the type in question, or has missings. When only.data is TRUE, only these are returned.

task type properties

“cluster” “classif” “multilabel” “regr” “surv”: The type of the task. data.frame data objects have the implicit property “cluster”.

target properties

“oneclass” “twoclass” “multiclass”: Whether the target column of a classif task has one, two, or more classes.

See also