Data Science

Data Science Training

Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst and data scientist will have obtained after some years in the role. This course takes the business analysts / predictive modelers to the next level in terms of delivering effective and realistic solutions to machine learning and big data problems. This course provides techniques for data cleaning, visualizing the data, predictive modeling and machine learning

R, Predictive Modeling, Machine Learning, Python and SAS

Course Duration & Features:

  • Data Analytics training is a 150 HR long Self-paced and instructor-led online and off line
  • Course 100% hands-on training
  • Contains real word business applications and examples. Project work at the end of each module
  • Rich material and handouts for student reference.

After Completion of this training:

  • Gain exposure to key disciplines and skills needed to fulfill the role of a business analyst /predictive modeler / data scientist
  • Build predictive models using linear, logistic regression and decision trees
  • Build machine learning models using Neural nets, SVM and Random forest

Prerequisite:

  • Before attending this course, candidate should Have experience using applications, such as SAS/R/word processors/spreadsheets
  • No statistical background is necessary
  • Data Analysis and Reporting background is necessary
  • This is a 100% hands on training. Every participant should have access to a computer.

Tools:

  • R , Hadoop, Python and SAS

Course Contents

Week 0:-STATISTICAL ANALYSIS SYSTTEM (SAS)

  1. Introduction to SAS
  2. Types of Libraries and Variables
  3. Data –Reading ,Writing ,Importing and Exporting
  4. Functions and Options
  5. Conditional Statements and Logical Operators
  6. Datasets –Introduction ,Appending ,Merging and Sorting
  7. Report Generation ,Data set Manipulation
  8. Introduction to Databases ,RDBMS Concepts
  9. Structured Query Language

Week 1:- Statistics Theory:

  1. Introduction to Statistics
  2. Graphical and Tabular Descriptive Statistics
  3. Probability
  4. Probability Distribution
  5. Hypothesis Testing
  6. Statistical Tests (Z-Test, Chi-Square, T-Tests, etc)

Week 2:- R Programming, Data Handling and Basic Statistics

  1. Introduction Analytics Tool(R)
  2. Introduction to Data Analysis
  3. Introduction to R programming
  4. R Environment and Basic Commands
  5. Data Handling in R
  6. Importing data
  7. Sampling
  8. Data Exploration
  9. Creating calculated fields
  10. Sorting & removing duplicates
  11. Basic Descriptive Statistics
  12. Population and Sample
  13. Measures of Central tendency
  14. Measures of dispersion
  15. Reporting and Data Validation
  16. Percentiles & Quartiles
  17. Box plots and outlier detection
  18. Creating Graphs and Reporting

Week3-Project-1 – Data Exploration, Validation and Cleaning Project

  1. Project on Data handling
  2. Data exploration
  3. Data validation
  4. Missing values identification
  5. Outliers identification
  6. Data Cleaning
  7. Basic Descriptive statistics

Week4-Regression Analysis & Logistic Regression Model Building

  1. Regression Analysis
  2. Correlation
  3. Simple Regression models
  4. R-Square
  5. Multiple regression
  6. Multi collinearity
  7. Individual Variable Impact
  1. Logistic Regression
  2. Need of logistic Regression
  3. Logistic regression models
  4. Validation of logistic regression models
  5. Multi collinearity in logistic regression
  6. Individual Impact of variables
  7. Confusion Matrix

Week 5:-Decision Trees & Model Selection

  1. Decision Trees
  2. Segmentation
  3. Entropy
  4. Building Decision Trees
  5. Validation of Trees
  6. Fine tuning and Prediction using Trees

2.Model Selection and Cross validation

  1. How to validate a model?
  2. What is a best model?
  3. Types of data d. Types of errors
  4. The problem of over fitting
  5. The problem of under fitting
  6. Bias Variance Tradeoff
  7. Cross validation
  8. Boot strapping

Week 6:-Project2 -Predictive Modeling Project

  1. Objective
  2. Model building-1
  3. Model building-2
  4. Model validation
  5. Variable selection
  6. Model calibration
  7. Out of time validation

Week 7:-Neural Network, SVM and Random Forest

  1. Neural Networks
  2. Neural network Intuition
  3. Neural network and vocabulary
  4. Neural network algorithm
  5. Math behind neural network algorithm
  6. Building the neural networks
  7. Validating the neural network model
  8. Neural network applications
  9. Image recognition using neural networks
  10. SVM
  11. Introduction
  12. The decision boundary with largest margin
  13. SVM- The large margin classifier
  14. SVM algorithm
  15. The kernel trick
  16. Building SVM model
  17. Conclusion
  18. Random Forest and Boosting
  19. Introduction
  20. The decision boundary with largest margin
  21. SVM- The large margin classifier
  22. SVM algorithm
  23. The kernel trick
  24. Building SVM model
  25. Conclusion

Week 9:-Project3-Machine Learning Project

  1. Objective
  2. ML Model-1
  3. ML Model-2

Week 10:-Python Introduction & Project-

Python Introduction

  1. What is Python & History?
  2. Installing Python & Python Environment
  3. Basic commands in Python
  4. Data Types and Operations
  5. Python packages
  6. Loops
  7. My first python program
  8. If-then-else statement
  9. Data Handling in Python
  10. Data importing
  11. Working with datasets
  12. Manipulating the datasets
  13. Creating new variables
  14. Exporting the datasets into external files
  15. Data Merging
  16. Python Basic Statistics
  17. Taking a random sample from data
  18. Descriptive statistics
  19. Central Tendency
  20. Variance e. Quartiles, Percentiles
  21. Box Plots
  22. Graphs
  23. Python Data Handling project
  24. Project on Data handling
  25. Data exploration
  26. Data validation
  27. Missing values identification
  28. Outliers identification
  29. Data Cleaning
  30. Basic Descriptive statistics

Python Predictive Modeling & Project-

Regression Analysis

  1. Correlation
  2. Simple Regression models
  3. R-Square
  4. Multiple regressions
  5. Multi collinearity
  6. Individual Variable Impact
  7. Logistic Regression
  8. Need of logistic Regression
  9. Logistic regression models
  10. Validation of logistic regression models
  11. Multi collinearity in logistic regression
  12. Individual Impact of variables
  13. Confusion Matrix
  14. Decision Trees
  15. Segmentation
  16. Entropy
  17. Building Decision Trees
  18. Validation of Trees
  19. Fine tuning and Prediction using Trees
  20. Model Selection and Cross validation
  21. How to validate a model?
  22. What is a best model?
  23. Types of data
  24. Types of errors
  25. The problem of over fitting
  26. The problem of under fitting
  27. Bias Variance Tradeoff
  28. Cross validation
  29. Boot strapping

Week 12:- Python Machine Learning

-Neural Network, SVM and Random Forest

  1. Neural Networks
  2. Neural network Intuition
  3. Neural network and vocabulary
  4. Neural network algorithm
  5. Math behind neural network algorithm
  6. Building the neural networks
  7. Validating the neural network model
  8. Neural network applications
  9. Image recognition using neural networks
  10. SVM
  11. Introduction
  12. The decision boundary with largest margin
  13. SVM- The large margin classifier
  14. SVM algorithm
  15. The kernel trick
  16. Building SVM model
  17. Conclusion
  18. Random Forest and Boosting
  19. Introduction
  20. The decision boundary with largest margin
  21. SVM- The large margin classifier
  22. SVM algorithm
  23. The kernel trick
  24. Building SVM model
  25. Conclusion

Week 14:-Project3-Machine Learning Project

  1. Objective
  2. ML Model-1
  3. ML Model-2

-Data Science Hackathon  / Competition

Final Project: Enroll to data online science completion

  • Data exploration
  • Model building
  • Testing the score and rank
  • Variable selection
  • Future reengineering
  • Checking the score and rank
  • Final Submission