### Common Machine Learning Algorithms

## Introduction

Although Google's self-driving cars and robots receive a lot of attention, the company's real future lies in machine learning, the technology that makes computers smarter and more individualized.

– Google Chairman Eric Schmidt

We are probably living through the most pivotal time in human history. the time when PCs, large mainframes, and the cloud dominated computing. However, what makes it unique is not what has occurred, but rather what lies ahead for us in the years to come.

For someone like me, the democratization of various tools and methods that followed the rise in computing is what makes this period exciting and captivating. The field of data science is waiting for you!

For a few dollars an hour, a data scientist can build machines that crunch data using complex algorithms. However, getting here was difficult! I had my bad nights and days.

## Who can benefit the most from this guide?

## Types of Machine Learning

### Supervised Learning

**How it operates:**A target/outcome variable—also known as a dependent variable—that must be predicted from a particular set of predictors—also known as independent variables—make up this algorithm. We generate a function that maps inputs to desired outputs using this set of variables. The model is trained until the desired level of accuracy on the training data is reached. Supervised Learning Examples: KNN, Logistic Regression, Decision Tree, Random Forest, and others

### Unsupervised Learning

**How it operates:**We do not have any target or outcome variables to estimate or predict in this algorithm. It is widely utilized for segmenting customers into distinct groups for specific interventions. It is used to cluster populations into distinct groups. Unsupervised Learning Examples: K-means, the a priori algorithm.

### Reinforcement Learning:

**How it operates:**The algorithm is used to train the machine to make specific choices. This is how it works: The environment in which the machine trains continuously through trial and error in order to make accurate business decisions, this machine learns from previous experiences and tries to collect as much information as possible. An illustration of reinforcer learning is the Method of Markov Decisions.

## List of Common Machine Learning Algorithms

- Linear Regression
- Logistic Regression
- Decision Tree
- SVM
- Naive Bayes
- kNN
- K-Means
- Random Forest
- Dimensionality Reduction Algorithms
- Gradient Boosting algorithms
- GBM
- XGBoost
- LightGBM
- CatBoost

### Linear Regression

**Y – Dependent Variable**

**a – Slope**

**X – Independent variable**

**b – Intercept**

### Logistic Regression

odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence ln(odds) = ln(p/(1-p)) logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

### Decision Tree

### SVM (Support Vector Machine)

### Naive Bayes

- P(c|x) is the posterior probability of class (target) given predictor (attribute).
- P(c) is the prior probability of class.
- P(x|c) is the likelihood which is the probability of predictor given class.
- P(x) is the prior probability of the predictor.

### kNN (k- Nearest Neighbors)

- KNN is computationally expensive
- Variables should be normalized else higher range variables can bias it
- Works on pre-processing stage more before going for kNN like an outlier, noise removal

### K-Means

#### How K-means clustering works:

- K-means selects k points, or centroids, for each cluster.
- With the closest centroids or k clusters, each data point is a cluster.
- uses members of the current cluster to determine the centroid of each cluster. We have brand-new centroids here.
- Repeat steps 2 and 3 as new centroids are created. Get associated with new k-clusters by determining each data point's closest distance to new centroids. This procedure should be repeated until convergent centroids remain constant.

#### How to figure out K's value:

### Random Forest

- If the number of cases in the training set is N, then a sample of N cases is taken at random but with replacement. This sample will be the training set for growing the tree.
- If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random out of the M and the best split on this m is used to split the node. The value of m is held constant during the forest growth.
- Each tree is grown to the largest extent possible. There is no pruning.

### Dimensionality Reduction Algorithms

### Gradient Boosting Algorithms

#### GBM

#### XGBoost

#### LightGBM

- Faster training speed and higher efficiency
- Lower memory usage
- Better accuracy
- Parallel and GPU learning supported
- Capable of handling large-scale data

## Comments

## Post a Comment