Data science is now recognized as a highly-critical growth area with impact across many sectors including science, government, finance, health care, telecom, manufacturing, advertising, retail, and others.

Launch your data science career with this practical workshop. Build a solid foundation in machine learning using R and start exploring data-related careers.

- Understand the art and science of discovering patterns and making intelligent predictions from big data.
- Define machine learning, why it matters, and discuss its relationship to analytics, data science, and big data.
- Machine learning fundamentals, the importance of algorithms, and machine learning as a service.
- Basics of R platform, programming language concepts, common and useful R commands, and applying machine learning methods.
- Doing machine learning - Understanding the steps in the machine learning pipeline, from data acquisition and feature generation, to training and model selection.
- Practically learn the most commonly used machine learning methods, covering both supervised and unsupervised learning.
- Develop understanding of which algorithm to choose based on the analytics challenge and the data you have.
- Be able to appreciate the trade-offs involved in choosing particular techniques for particular problems.
- Discover how to understand, interpret and convey the results of data science life cycle.

24

Hours

2

speakers

11

Labs

15

Models

The workshop has a strong focus on gaining hands-on experience implementing algorithms and building predictive models on real datasets. By the end of the 3 days, participants will be ready to implement the machine learning algorithms using data science on your own data, and immediately generate value.

The workshop will take participants through the conceptual and applied foundations of the subject. Topics covered include:

** R for Statistical Analysis and Machine Learning**

** Machine learning theory, types of learning**

** Techniques, models and methods**

**Labs are developed to practically learn how to use the R programming language and packages for applying the main concepts and techniques of data science and machine learning.**

** **

A data-driven digital world, introduction to Data Science and component parts of Data Science

Enterprise Big Data platform architectures, Hadoop ecosystem and Apache Spark

Data Science Toolkit and Life Cycle – A strategy to approach data analytics problems

Fundamentals of Machine Learning for Data Science

R for Statistical Analysis and Machine Learning

Often we have far too many features to work with. In this lab, we see how to use pairwise statistical tests to select high-information features and discard low-information features.

Principle component analysis creates a new set of features as linear combinations of the original features. These are ordered by the amount of variance each contains and selecting a subset of high-variance principle components provides a powerful way to both reduce the number of features used and ensure that those used have high information content.

Guest Speaker slot

Model Selection: Training, Validating and Testing

Use ordinary least squares to model the relationship between (X1) the education requirements of a career and (X2) its remuneration and (Y) the prestige in which the vocation is held. This provides experience with an important foundational algorithm.

Use Poisson regression to model the relationship between (X1) wind and (Y) ozone levels. This looks at one of the typical cases where linear regression is unsuitable and provides an introduction to the use of generalize linear models in R.

Use polynomial regression to model clearly non-linear synthetic data. This provides a clear example of basis transformation.

Use logistic regression to model proportional and binary data. In the first case we model the proportion of women who have reached menarche versus their age in years, and in the second, the probability of a seed germinating versus its age in days. This gives us a second look at how to use generalized linear models in R, and provides experience with another important foundational algorithm.

We use LDA and QDA to classify synthetic data. These provide a simple introduction to the use of Gaussian distributions, as well as exposure to these surprisingly well performing techniques.

Use these tree-based ensemble techniques to model passenger survival in the Titanic disaster. Simple but very high performing, these methods are favorites of data scientists. These exercises provide experience using these important techniques in R.

** **

As a participant in this workshop, you will receive an exclusive copy of this study guide. The guide provides both a deep understanding of the techniques and practices of machine learning and exposes a wide set of resources capable of being wielded by the data scientist and analysts in their work. Readers will encounter explanations of the theory behind the algorithms and models they are exposed to, giving them an understanding of the strengths and weaknesses of each which they should be able to use to reason about suitable approaches to real life problem – and to communicate such reasoning to other stakeholders in such problems.

** **