Overview
This two day workshop will introduce students to data exploration and machine learning techniques. Students will learn about the data science workflow and will practice exploring and visualising data using Python and built-in libraries. Students will also explore the differences between supervised and unsupervised learning techniques and practice creating predictive regression models.
Note: This is a two day workshop and the second session will be on 18 August (Sat).
About This Workshop
This two day workshop will introduce students to data exploration and machine learning techniques. Students will learn about the data science workflow and will practice exploring and visualising data using Python and built-in libraries. Students will also explore the differences between supervised and unsupervised learning techniques and practice creating predictive regression models.
Takeaways
After this lesson, you will be able to:
- Collect data from a variety of sources (e.g., Excel, web-scraping, APIs and others)
- Explore large data sets
- Clean and "munge" the data to prepare it for analysis
- Apply machine learning algorithms to gain insight from the data
- Visualize the results of your analysis
- Build your own library and Python scripts
Recommended Prerequisites
Beginner/intermediate. This workshop is for analysts, product managers, mathematicians, business managers or anyone else that wants to learn about machine learning. A background in computer science, programming, and/or statistics is preferred for this workshop. It is not required but you are expected to be somewhat familiar with the command line tools and how to write simple programs. Recommended that you take the “Python for Beginners†workshop prior to attending this.
Event Agenda
Day 1 - Developing the Fundamentals August 11 - 10AM - 5PM
Module 1: Introduction to Machine Learning (2.5 hours)
- What is machine learning?
- Installation and update of tools
- Machine learning algorithms
Module 2: Exploring and using data sets (2.5 hours)
- Learn the steps to pre-process a dataset and prepare it for machine learning algorithms
Day 2 - Diving into machine learning August 18 - 10AM - 5PM
Module 3: Supervised vs. unsupervised learning (2.5 hours)
- Review of machine learning algorithms
- Classification, linear regression and logistic regression
- Random forests, clustering
- Decision trees
Module 4: Model Evaluation (2.5 hours)
- Feature Engineering and Model Selection
- Model Evaluation Metrics - Accuracy, RMSE, ROC, AUC, Confusion Matrix, Precision, Recall, F1 Score
- Overfitting and Bias-Variance trade-off
- Cross Validation
Instructors' Biodata
Kwan Chong Tan Chief Data Scientist, Booz Allen Hamilton
Kwan Chong is currently a Chief Data Scientist at Booz Allen Hamilton and has nearly ten years of experience working with government and commercial organizations internationally in applying data analytics to achieve business outcomes. He previously worked at Palantir Technologies and the Defence Science & Technology Agency, leading projects to leverage analytics across a wide range of domains including cybersecurity, financial crimes, and open source intelligence.