Home Reference Source
public class | source

DecisionTree

Extends:

EstimatorClassifier → DecisionTree

Decision tree learner. Builds a decision tree by greedily splitting samples on one feature hierarchically.

Constructor Summary

Public Constructor
public

constructor(optionsUser: Object)

Constructor.

Member Summary

Public Members
public
public
public
public
public

tree: *

Method Summary

Public Methods
public

buildTree(XSub: Array<Array<number>>, ySub: Array<mixed>, depth: number): DecisionTreeNode

Build a (sub-)tree from a set of samples.

public

calculateImpurity(groups: Array<Array<mixed>>): number

Calculate the impurity for multiple groups of labels.

public

calculateWeightedImpurity(groups: Array<Array<mixed>>, impurityCallback: function(labels: Array<number>): number): number

Calculate the weighted impurity for multiple groups of labels.

public

entropy(labels: Array<mixed>): number

Calculate the Shannon entropy a set of labels.

public

findSplit(XSub: Array<Array<number>>, ySub: Array<mixed>, baseImpurity: number): DataSplit

Find the best splitting feature and feature value for a set of data points.

public

gini(labels: Array<mixed>): number

Calculate the Gini coefficient a set of labels.

public

predict(X: *): *

public

predictSample(sampleFeatures: Array<number>): mixed

Make a prediction for a single sample.

public

splitSamples(XSub: Array<number>, ySub: Array<mixed>, fInd: number, splitValue: number): DataSplitGroups

Split a set of samples into two groups by some splitting value for a feature.

public

train(X: *, y: *)

Inherited Summary

From class Estimator
public abstract

predict(X: Array<Array<number>>): Array<mixed>

Make a prediction for a data set.

public abstract

train(X: Array<Array<number>>, y: Array<mixed>)

Train the supervised learning algorithm on a dataset.

Public Constructors

public constructor(optionsUser: Object) source

Constructor. Initialize class members and store user-defined options.

Params:

NameTypeAttributeDescription
optionsUser Object
  • optional

User-defined options for decision tree

optionsUser.criterion string
  • optional
  • default: 'gini'

Splitting criterion. Either 'gini', for the Gini coefficient, or 'entropy' for the Shannon entropy

optionsUser.numFeatures number | string
  • optional
  • default: 1.0

Number of features to subsample at each node. Either a number (float), in which case the input fraction of features is used (e.g., 1.0 for all features), or a string. If string, 'sqrt' and 'log2' are supported, causing the algorithm to use sqrt(n) and log2(n) features, respectively (where n is the total number of features)

optionsUser.maxDepth number
  • optional
  • default: -1

Maximum depth of the tree. The depth of the tree is the number of nodes in the longest path from the decision tree root to a leaf. It is an indicator of the complexity of the tree. Use -1 for no maximum depth

Public Members

public criterion: * source

public maxDepth: * source

public numFeatures: * source

public numFeaturesInt: * source

public tree: * source

Public Methods

public buildTree(XSub: Array<Array<number>>, ySub: Array<mixed>, depth: number): DecisionTreeNode source

Build a (sub-)tree from a set of samples.

Params:

NameTypeAttributeDescription
XSub Array<Array<number>>

Features of samples to build a tree for

ySub Array<mixed>

Labels of samples

depth number
  • optional
  • default: 0

Current tree depth. 0 indicates the root node

Return:

DecisionTreeNode

Decision tree node

public calculateImpurity(groups: Array<Array<mixed>>): number source

Calculate the impurity for multiple groups of labels. The impurity criterion used can be specified by the user through the user-defined options.

Params:

NameTypeAttributeDescription
groups Array<Array<mixed>>

Groups of labels. Each group is an array of labels

Return:

number

Impurity for the provided groups

public calculateWeightedImpurity(groups: Array<Array<mixed>>, impurityCallback: function(labels: Array<number>): number): number source

Calculate the weighted impurity for multiple groups of labels. The returned impurity is calculated as the weighted sum of the impurities of the individual groups, where the weights are determined by the number of samples in the group.

Params:

NameTypeAttributeDescription
groups Array<Array<mixed>>

Groups of labels. Each group is an array of labels

impurityCallback function(labels: Array<number>): number

Callback function taking an array of labels as its first and only argument

Return:

number

Weighted impurity for the provided groups

public entropy(labels: Array<mixed>): number source

Calculate the Shannon entropy a set of labels.

Params:

NameTypeAttributeDescription
labels Array<mixed>

Array of predicted labels

Return:

number

Shannon entropy

public findSplit(XSub: Array<Array<number>>, ySub: Array<mixed>, baseImpurity: number): DataSplit source

Find the best splitting feature and feature value for a set of data points.

Params:

NameTypeAttributeDescription
XSub Array<Array<number>>

Features of samples to find the split for

ySub Array<mixed>

Labels of samples

baseImpurity number

Impurity of parent node

Return:

DataSplit

public gini(labels: Array<mixed>): number source

Calculate the Gini coefficient a set of labels.

Params:

NameTypeAttributeDescription
labels Array<mixed>

Array of predicted labels

Return:

number

Gini impurity

public predict(X: *): * source

Make a prediction for a data set.

Override:

Estimator#predict

Params:

NameTypeAttributeDescription
X *

Return:

*

See:

public predictSample(sampleFeatures: Array<number>): mixed source

Make a prediction for a single sample.

Params:

NameTypeAttributeDescription
sampleFeatures Array<number>

Data point features

Return:

mixed

Prediction. Label of class with highest prevalence among k nearest neighbours

public splitSamples(XSub: Array<number>, ySub: Array<mixed>, fInd: number, splitValue: number): DataSplitGroups source

Split a set of samples into two groups by some splitting value for a feature. The samples with a feature value lower than the split value go the left (first) group, and the other samples go to the right (second) group.

Params:

NameTypeAttributeDescription
XSub Array<number>

Features of samples to split by some feature

ySub Array<mixed>

Labels of samples

fInd number

Index of feature to split by

splitValue number

Value to be used as the splitting point for the feature

Return:

DataSplitGroups

Assigned sample indices, features, and labels for both of the groups

public train(X: *, y: *) source

Train the supervised learning algorithm on a dataset.

Override:

Estimator#train

Params:

NameTypeAttributeDescription
X *
y *

See: