Certificate Authentication

Machine Learning Certificate for Andras Kovari

Add to LinkedIn

Certificate ID:

559883

Authentication Code:

e260d

Certified Person Name:

Andras Kovari

Trainer Name:

Krzysztof Mędrela

Duration Days:

Duration Hours:

Course Name:

Machine Learning

Course Date:

17 September 2018 09:30 to 21 September 2018 16:30

Venue:

Ericsson - Budapest

Course Outline:

1. Introduction to Machine Learning

● Applications of machine learning

● Supervised Versus Unsupervised Learning

● Machine Learning Algorithms

○ Regression

○ Classification

○ Clustering

○ Recommender System

○ Anomaly Detection

○ Reinforcement Learning

2. Toolset - ML libraries and programming languages (Lab)

● Why use a programming language

● Python resources

● Python Libraries for Machine learning: attrs graphviz matplotlib numpy mglearn pandas

scikit-learn tensorflow keras protobuf

● Jupyter notebooks and interactive coding

3. Data preparation (Lab)

● Data import and storage

● Understand the data – basic explorations

● Data manipulations with pandas library

● Data transformations – Data wrangling

● Exploratory analysis

● Missing observations – detection and solutions

● Outliers – detection and strategies

● Standarization, normalization, binarization

● Qualitative data recoding

● Examples in Python

4. Regression

● Simple & Multiple Regression

● Simple linear regression

● Least Square Method

● Correlations

● Multiple Linear Regression

● Estimating the Coefficients

● Assessing the Accuracy of the Coefficient Estimates

● Assessing the Accuracy of the Model

● Post Estimation Analysis

● Other Considerations in the Regression Models

● Qualitative Predictors

● Extensions of the Linear Models

● Potential Problems

● Bias-variance trade off [under-fitting/over-fitting] for regression models

5. Model Evaluation and Improvement

● Resampling Methods

○ Cross-Validation

○ The Validation Set Approach

○ Leave-One-Out Cross-Validation

○ k†-Fold Cross-Validation

○ Bias-Variance Trade-Off for k†-Fold

○ The Bootstrap

● Model Selection and Regularization

○ Subset Selection [Best Subset Selection, Stepwise Selection, Choosing the Optimal

Model]

○ Shrinkage Methods/ Regularization [Ridge Regression, Lasso & Elastic Net]

○ Selecting the Tuning Parameter

○ Dimension Reduction Methods

■ Principal Components Regression

■ Partial Least Squares

6. Regression Lab:

● Simple & Multiple Linear Regression

● Interaction Terms

● Non-linear Transformations

● Dummy variable regression

● Cross-Validation and Bootstraping:

● Subset selection methods

● Penalization [Ridge, Lasso, Elastic Net]

7. Classification

● Logistic Regression

○ The Logistic Model cost function

○ Estimating the Coefficients

○ Making Predictions

○ Odds Ratio

○ Performance Evaluation Matrices

○ [Sensitivity/Specificity/PPV/NPV, Precision, ROC curve etc.]

○ Multiple Logistic Regression

○ Logistic Regression for æ†2 Response Classes

○ Regularized Logistic Regression

● Linear Discriminant Analysis

○ Using Bayes’ Theorem for Classification

○ Linear Discriminant Analysis for p†=1

○ Linear Discriminant Analysis for p†æ†1

● Quadratic Discriminant Analysis

● K-Nearest Neighbors

○ The kNN algorithm

○ Calculating distance

○ Choosing an appropriate k

○ Preparing data for use with kNN

○ Why is the kNN algorithm lazy?

● Understanding naive Bayes

○ Basic concepts of Bayesian methods

○ Probability

○ Joint probability

○ Conditional probability with Bayes' theorem

○ The naive Bayes algorithm

○ The naive Bayes classification

○ The Laplace estimator

○ Using numeric features with naive Bayes

● Classification with Non-linear Decision Boundaries

● Support Vector Machines

○ Optimization Objective

○ Classification with hyperplanes

○ Finding the maximum margin

○ The case of linearly separable data

○ The case of non-linearly separable data

○ The Maximal Margin Classifier

○ Kernels

○ One-Versus-One Classification

○ One-Versus-All Classification

● Comparison of Classification Methods

● Understanding decision trees

○ Divide and conquer

○ The C5.0 decision tree algorithm

○ Choosing the best split

○ Pruning the decision tree

○ Understanding regression trees and model trees

○ Understanding Ensembles, Random Forests and Boosting

● Measuring performance for classification

○ Working with classification prediction data

○ A closer look at confusion matrices

○ Using confusion matrices to measure performance

○ Beyond accuracy – other measures of performance

○ The kappa statistic

○ Sensitivity and specificity

○ Precision and recall

○ The F-measure

○ Visualizing performance tradeoffs

○ ROC curves

○ Estimating future performance

○ The holdout method

○ Cross-validation

○ Bootstrap sampling

● Lab:

○ Logistic Regression, LDA, QDA, and KNN,

○ Resampling & Regularization

○ Support Vector Machine

○ Resampling & Regularization

8. Data Analysis Workflow - Case Study on Random Forests + Testing ML algorithms (Lab)

● Business Understanding

● Data Understanding

● Data Preparation and Pre-processing

● Creating Different Models

● Model Training and Tuning

● Model Selection

● Feature Extraction

● Performance Evaluation

● Comparing Decision Trees, Random Forests and Boosting

● Generalization and overfitting

● Avoiding overfitting

○ Holdout method

○ Cross-Validation

○ Bootstrapping

● Evaluating numerical predictions

○ Measures of accuracy: ME, MSE, RMSE, MAPE

○ Parameter and prediction stability

● Evaluating classification algorithms

○ Accuracy and its problems

○ The confusion matrix

○ Unbalanced classes problem

● Visualizing model performance

○ Profit curve

○ ROC curve

○ Lift curve

● Model selection

● Model tuning – grid search strategies

9. Understanding classification rules

● Separate and conquer

● The One Rule algorithm

● The RIPPER algorithm

● Rules from decision trees

10. Understanding association rules

● The Apriori algorithm for association rule learning

● Measuring rule interest – support and confidence

● Building a set of rules with the Apriori principle

11. Understanding clustering

● Clustering as a machine learning task

● The k-means algorithm for clustering

● Using distance to assign and update clusters

● Choosing the appropriate number of clusters

12. Neural Networks & Deep Learning

● Understanding neural networks

○ From biological to artificial neurons

○ Activation functions

○ Network topology

○ The number of layers

○ The direction of information travel

○ The number of nodes in each layer

○ Training neural networks with backpropagation

● ANN Structure

○ Biological neurons and artificial neurons

○ Non-linear Hypothesis

○ Model Representation

○ Examples & Intuitions

○ Transfer Function/ Activation Functions

● Feed forward ANN

○ Structures of Multi-layer feed forward networks

○ Back propagation algorithm

○ Back propagation - training and convergence

○ Functional approximation with back propagation

○ Practical and design issues of back propagation learning

● Deep Learning

○ Artificial Intelligence & Deep Learning

○ Softmax Regression ( Multinomial Logistic Regression)

○ Self-Taught Learning

○ Deep Networks

○ Demos and Applications

● Neural Networks Architectures

○ Perceptron

○ CNN - Convolutional Neural Networks

○ RNNs - Recurrent Neural Network and its units: LSTMs - Long short-term memory

○ Hopfield Networks

○ Overview of Other Architectures

○ BIg Networks (i.e. GoogLeNet)

○ Benefits and Limitations of each Architecture

● Libraries for Deep Learning

○ Scikit-learn and its Limitations

○ TensorFlow with Examples

○ Keras

○ Caffe