This is a guideline for individuals who regularly work with data or have access to it and are interested in taking their career to the next level. There is no need for this group of individuals to have a software engineering or a computer science background.
A good example being Industrial Engineers who need to make decisions with the support of data, perform statistical simulations, time series forecasting, and optimization tasks.
First, let me start off by explaining why I choose R for this instead of Python. R has a far broader ecosystem for ad hoc data analysis, manipulation, and statistical analysis. R is more potent at performing these tasks as it was designed to be a data exploration and statistical modeling language. It is therefore supported by a much stronger statistical community with fantastic documentation.
Becoming proficient at exploratory data tasks will help you add more value in your current role, whether you plan on becoming a Data Scientist or not. Plus, the reality is that many companies are not on the cutting edge and are not solving very complex machine learning tasks. This is where a language like Python would be most suitable.
I used R in my postgraduate studies, and it was, therefore, the programming language that launched my Data Science career. I now have over 5 years of experience using R and would like to take you through my guide on getting started with R that can take you from novice to becoming an R ninja depending on how much you practice, of course.
What you need to know, the checklist…
I have broken down the R programming road map and suggested courses or eBooks in the table below. Note I have only recommended courses from the platforms Coursera and DataCamp. These are the platforms that I have used and have experience with. Alternatively, you can search for courses providing similar content on Udemy, Udacity, and Codecademy. My preference is Coursera, for when I need to know more details and have a better understanding as they have peer-review assessments. DataCamp helps with code practice and implementation.
Note: you do not have to learn all that is in this checklist you can stop at the tidyverse section.
|What you need to know||Learning Material Suggestions|
|Workflow, R scripts, R projects, R markdown, notebooks, Git and version control||Jenny Bryan’s excellent *Happy Git and GitHub for the R user: |
More advanced material:
|Learn about the different data types and data objects (vectors, list, data frames, matrices) and how to subset each object. |
Learn how to use the R language for programming/writing your own custom functions (for loops, while loops, if statements, etc)
|The basics of R programming: |
DataCamp introduction to R:
DataCamp intermediate R:
|Learn about the R environment how to install and use R libraries|
Learn how to read data with varying formats (csv, txt, json)
Learn about data exploration, data visualisation analysis and tidyverse ecosystem
|Data Science: Foundations using R Specialization:|
DataCamp data analyst with R:
E-books: R4DS: R for Data Science (2017) Hadley Wickham and Garrett Grolemund. This is exceptionally good for learning about the tidyverse :
TMR: Text Mining with R (2017) Julia Silge and David Robinson:
|Learn how to use R for summary statistics, fitting statistical models (the caret package)||Practical Machine learning:|
Supervised Learning in R: Classification:
Supervised Learning in R: Regression:
|Learn about building R packages and writing documentation||Use R as a language for development: |
|Learn how to do parallel computing in R||Parallel Programming in R:|
Introduction to Spark with sparklyr in R :
|R Shiny and dashboards||Building Dashboards with shinydashboard : https://learn.datacamp.com/courses/building-dashboards-with-shinydashboard|
Learning the items in this roadmap, you will be able to build an end to end data analysis pipeline and quickly add value. It is easy for aspiring Data Scientist to fall into the trap of wanting to implement the latest state of the art deep learning models. Although I think this is alright, especially if all you want to do is have some fun, a shallow understanding of the basics will often lead to designing applications that do not work well or fail to solve business problems. Making it much harder to adapt and sell Data Science in your organization.
Tell me more about tidyverse
I would like to spend a little bit more time talking about tidyverse. These are a group of packages that work well together and form part of an end to end data analysis pipeline. I would highly recommend getting familiar with tidyverse and participating in “Tidy Tuesday” as practice. Learn more about “Tidy Tuesday” here: https://github.com/rfordatascience/tidytuesday
If you can’t explain it simply, you don’t understand it well enough.Albert Einstein