Overview
This two-day workshop will introduce students to data exploration and machine learning techniques. You will learn about the data science workflow and will practice exploring and visualising data using Python and built-in libraries.
You will also explore the differences between supervised and unsupervised learning techniques and practice creating predictive regression models.
Please register at least 72 hours before the course.
Takeaways
After this lesson, you will be able to:
- Collect data from a variety of sources (e.g., Excel, web-scraping, APIs and others)
- Explore large data sets
- Clean and "munge" the data to prepare it for analysis
- Apply machine learning algorithms to gain insight from the data
- Visualize the results of your analysis
- Build your own library and Python scripts
Prereqs and preparations
Beginner/Intermediate. This workshop is for analysts, product managers, mathematicians, business managers or anyone else that wants to learn about machine learning.
A background in computer science, programming, and/or statistics is preferred for this workshop. It is not required but you are expected to be somewhat familiar with the command line tools and how to write simple programs.
It is recommended that you take the Python for Beginners workshop prior to attending this.
Programme Details:
DAY 1 [Nov 17: 10AM - 5PM]
Developing the Fundamentals
Module 1: Introduction to Machine Learning (2.5 hours)
- What is machine learning?
- Installation and update of tools
- Machine learning algorithms
Module 2: Exploring and using data sets (2.5 hours)
- Learn the steps to pre-process a dataset and prepare it for machine learning algorithms
Day 2 [Nov 24: 10 AM - 5 PM]
Diving into machine learning
Module 3: Supervised vs. unsupervised learning (2.5 hours)
- Review of machine learning algorithms
- Classification, linear regression, and logistic regression
- Random forests, clustering
- Decision trees
Module 4: Model Evaluation (2.5 hours)
- Feature Engineering and Model Selection
- Model Evaluation Metrics - Accuracy, RMSE, ROC, AUC, Confusion Matrix, Precision, Recall, F1 Score
- Overfitting and Bias-Variance trade-off
- Cross-Validation
Trainer
Saif Farooqui is a Technical Analytics Lead in the Business Integrity team, which protects users and ensure safe connections between users and businesses. The team's focus on data analysis, machine learning and a robust infrastructure of back-end systems allow them to collaborate effectively with engineering and product teams.
Prior to working on data science at Facebook, Saif bounced around a fair bit, from consulting to economics research to marketing science and then computer vision, before a teaching position at (drum roll) General Assembly led to the job of his dreams. Ask him about python generators and the temporal considerations of data analysis!