import DecisionTree from '@jsmlt/jsmlt/src/supervised/trees/decision-tree.js'

public class | source

DecisionTree

Extends:

Estimator → Classifier → DecisionTree

Decision tree learner. Builds a decision tree by greedily splitting samples on one feature hierarchically.

Constructor Summary

Public Constructor
public	constructor(optionsUser: Object) Constructor.

Member Summary

Public Members
public	criterion: *
public	maxDepth: *
public	numFeatures: *
public	numFeaturesInt: *
public	tree: *

Method Summary

Public Methods
public	buildTree(XSub: Array<Array<number>>, ySub: Array<mixed>, depth: number): DecisionTreeNode Build a (sub-)tree from a set of samples.
public	calculateImpurity(groups: Array<Array<mixed>>): number Calculate the impurity for multiple groups of labels.
public	calculateWeightedImpurity(groups: Array<Array<mixed>>, impurityCallback: function(labels: Array<number>): number): number Calculate the weighted impurity for multiple groups of labels.
public	entropy(labels: Array<mixed>): number Calculate the Shannon entropy a set of labels.
public	findSplit(XSub: Array<Array<number>>, ySub: Array<mixed>, baseImpurity: number): DataSplit Find the best splitting feature and feature value for a set of data points.
public	gini(labels: Array<mixed>): number Calculate the Gini coefficient a set of labels.
public	predict(X: ):
public	predictSample(sampleFeatures: Array<number>): mixed Make a prediction for a single sample.
public	splitSamples(XSub: Array<number>, ySub: Array<mixed>, fInd: number, splitValue: number): DataSplitGroups Split a set of samples into two groups by some splitting value for a feature.
public	train(X: , y: )

Inherited Summary

From class Estimator
public abstract	predict(X: Array<Array<number>>): Array<mixed> Make a prediction for a data set.
public abstract	train(X: Array<Array<number>>, y: Array<mixed>) Train the supervised learning algorithm on a dataset.

Public Constructors

public constructor(optionsUser: Object) source

Constructor. Initialize class members and store user-defined options.

Params:

Name	Type	Attribute	Description
optionsUser	Object	optional	User-defined options for decision tree
optionsUser.criterion	string	optional default: 'gini'	Splitting criterion. Either 'gini', for the Gini coefficient, or 'entropy' for the Shannon entropy
optionsUser.numFeatures	number \| string	optional default: 1.0	Number of features to subsample at each node. Either a number (float), in which case the input fraction of features is used (e.g., 1.0 for all features), or a string. If string, 'sqrt' and 'log2' are supported, causing the algorithm to use sqrt(n) and log2(n) features, respectively (where n is the total number of features)
optionsUser.maxDepth	number	optional default: -1	Maximum depth of the tree. The depth of the tree is the number of nodes in the longest path from the decision tree root to a leaf. It is an indicator of the complexity of the tree. Use -1 for no maximum depth

Public Members

public criterion: * source

public maxDepth: * source

public numFeatures: * source

public numFeaturesInt: * source

public tree: * source

Public Methods

public buildTree(XSub: Array<Array<number>>, ySub: Array<mixed>, depth: number): DecisionTreeNode source

Build a (sub-)tree from a set of samples.

Params:

Name	Type	Attribute	Description
XSub	Array<Array<number>>		Features of samples to build a tree for
ySub	Array<mixed>		Labels of samples
depth	number	optional default: 0	Current tree depth. 0 indicates the root node

Return:

DecisionTreeNode

Decision tree node

public calculateImpurity(groups: Array<Array<mixed>>): number source

Calculate the impurity for multiple groups of labels. The impurity criterion used can be specified by the user through the user-defined options.

Params:

Name	Type	Attribute	Description
groups	Array<Array<mixed>>		Groups of labels. Each group is an array of labels

Return:

number

Impurity for the provided groups

public calculateWeightedImpurity(groups: Array<Array<mixed>>, impurityCallback: function(labels: Array<number>): number): number source

Calculate the weighted impurity for multiple groups of labels. The returned impurity is calculated as the weighted sum of the impurities of the individual groups, where the weights are determined by the number of samples in the group.

Params:

Name	Type	Attribute	Description
groups	Array<Array<mixed>>		Groups of labels. Each group is an array of labels
impurityCallback	function(labels: Array<number>): number		Callback function taking an array of labels as its first and only argument

Return:

number

Weighted impurity for the provided groups

public entropy(labels: Array<mixed>): number source

Calculate the Shannon entropy a set of labels.

Params:

Name	Type	Attribute	Description
labels	Array<mixed>		Array of predicted labels

Return:

number

Shannon entropy

public findSplit(XSub: Array<Array<number>>, ySub: Array<mixed>, baseImpurity: number): DataSplit source

Find the best splitting feature and feature value for a set of data points.

Params:

Name	Type	Attribute	Description
XSub	Array<Array<number>>		Features of samples to find the split for
ySub	Array<mixed>		Labels of samples
baseImpurity	number		Impurity of parent node

Return:

DataSplit

public gini(labels: Array<mixed>): number source

Calculate the Gini coefficient a set of labels.

Params:

Name	Type	Attribute	Description
labels	Array<mixed>		Array of predicted labels

Return:

number

Gini impurity

public predict(X: ): source

Make a prediction for a data set.

Override:

Estimator#predict

Params:

Name	Type	Attribute	Description
X	*

Return:

See:

Classifier#predict

public predictSample(sampleFeatures: Array<number>): mixed source

Make a prediction for a single sample.

Params:

Name	Type	Attribute	Description
sampleFeatures	Array<number>		Data point features

Return:

mixed

Prediction. Label of class with highest prevalence among k nearest neighbours

public splitSamples(XSub: Array<number>, ySub: Array<mixed>, fInd: number, splitValue: number): DataSplitGroups source

Split a set of samples into two groups by some splitting value for a feature. The samples with a feature value lower than the split value go the left (first) group, and the other samples go to the right (second) group.

Params:

Name	Type	Attribute	Description
XSub	Array<number>		Features of samples to split by some feature
ySub	Array<mixed>		Labels of samples
fInd	number		Index of feature to split by
splitValue	number		Value to be used as the splitting point for the feature

Return:

DataSplitGroups

Assigned sample indices, features, and labels for both of the groups

public train(X: , y: ) source

Train the supervised learning algorithm on a dataset.

Override:

Estimator#train

Params:

Name	Type	Attribute	Description
X	*
y	*

See:

Classifier#train

DecisionTree

Extends:

Constructor Summary

Member Summary

Method Summary

Inherited Summary

Public Constructors

public constructor(optionsUser: Object) source

Params:

Public Members

public criterion: * source

public maxDepth: * source

public numFeatures: * source

public numFeaturesInt: * source

public tree: * source

Public Methods

public buildTree(XSub: Array<Array<number>>, ySub: Array<mixed>, depth: number): DecisionTreeNode source

Params:

Return:

public calculateImpurity(groups: Array<Array<mixed>>): number source

Params:

Return:

public calculateWeightedImpurity(groups: Array<Array<mixed>>, impurityCallback: function(labels: Array<number>): number): number source

Params:

Return:

public entropy(labels: Array<mixed>): number source

Params:

Return:

public findSplit(XSub: Array<Array<number>>, ySub: Array<mixed>, baseImpurity: number): DataSplit source

Params:

Return:

public gini(labels: Array<mixed>): number source

Params:

Return:

public predict(X: *): * source

Override:

Params:

Return:

See:

public predictSample(sampleFeatures: Array<number>): mixed source

Params:

Return:

public splitSamples(XSub: Array<number>, ySub: Array<mixed>, fInd: number, splitValue: number): DataSplitGroups source

Params:

Return:

public train(X: *, y: *) source

Override:

Params:

See:

public predict(X: ): source

public train(X: , y: ) source