Immersing self in machine learnings, regression and classification problems can be solved through a variety of steps. For this week the focus is on:

### What new skills have you learned?

๐ฆ K Nearest Neighbors

๐ฆ Decision Trees

๐ฆ Random Forests

#### K Nearest Neighbors

KNN is a classification algorithm that classifies elements in a dataset based on features of the closest (nearest) points. K is used to set the no. of nearest neighbors that is used to classify an entity.

Key components used in creating the classifier are;

๐ญ Distance Metric

๐ No. of `Nearest`

neighbors to look at.

โฒ Optional weighting function

๐ฅ Method of aggregating neighboring points.;Usually defaults to Simple majority vote

Here is a notebook on fruit classification ๐ using K Nearest Neighbors.

#### Decision Trees.

Decision trees are a widely used models for classification and regression tasks. A set of splitting rules is used to segment the predictor via a hierarchy of โif-elseโ questions, leading to a decision.

๐ Nodes : Split the value of attributes.

๐ด Edges : These are outcomes of a split to the next node.

๐ฒ Root : Node that does the first split.

๐ Terminal nodes that predict the outcome.

Each node in the tree either represents a question, or a terminal node (also called a leaf) which contains the answer.

The edges connect the answers to a question with the next question you would ask.

#### Random Forests

Random forests incorporates use of many trees with a random sample of features for every single tree at every single split.

Each time a split in a tree is considered, a random sample of m predictors is chosen as split candidates from the full set of p predictors. The split is allowed to use only one of those m predictors.

For classification, `m`

, is typically chosen to be (squareroot of P == m).
(that is, the number of predictors considered at each split is approximately equal to the square root of the total number of predictors )

By randomly leaving out features, random forests *decorellates* the trees providing an improvement over the trees.

Check out this Decision Tree and Random Forests notebook working on sample kyphosis dataset - (excessive outward curvature of the spine) among patients

So that was the Seventh week.. ๐