A repository dedicated to provide a roadmap for prospective Data Science practitioners as well as maintaining resources about Data Science for beginners, intermediate as well as advanced practioners.
- Motivation
- Path to becoming a Data Scientist
- Beginner (3-6 Months)
- Intermediate (6-12 Months)
- Advanced (12-18 Months)
- Professional/ Job Ready (18-24 Months)
- Bonus (sprinkle in)
- Miscellaeneous Resources
The primary motivation behind this repository was so that beginners in the field of Data Science will be able to find their way through the path towards becoming a Data Scientist. In today's information age, it is very easy to get lost in the vast amount of content available online. This repository will help navigate through this information by providing a structure towards learning Data Science.
This section will try to develop a path towards becoming a Data Scientist.
We will assume about a 24 month window for this.
Depending on your starting point, you can choose where to start from. Even if you are an intermediate Data Science practioner, I would highly recommend going though this path from the start.
The roadmap presented in this repo isn't a-one-size-fits-all type of curriculum. Some of you coming from CS/Math/Statistics backgrounds may choose to cherry pick some resources while discarding others. This is perfectly okay. ππΌ
Start here if you are a complete beginner to Data Science. or if you have recently completed an introduction to Machine Learning course like the one offered by Microsoft Azure on Udacity.
A very hands on type of book, Python Crash Course helps readers get familiarized with the language as well as understand a lot of the best pratices that are followed when writing Python code.
Learn Python the Hard Way is perhaps the best way to learn Python 3 for someone with 0 coding experience. The author provides a careful, methodical and sort of rigit approach to explaining concepts. This manner of teaching actually helps develop the best practices for learning Python the right way.
Automating the boring stuff with Python is a book that focuses on a more project based approach towards the language. Readers can expect to learn a variety of ways to use the Python language to create mini automation projects. A really good read!
Khan Academy
Books
This book covers almost every mathamatical concept required to understand how Machine Learning Algorithms work. A really great book designed specifically for ML.
Books
Written by the person who created Pandas himself, Python for Data Analysis is a must read for Data Scientists. The book delves into numerous features provided by the Numpy and Pandas libraries.
Courses
Python Data Science
These two courses are freely available by freecodecamp on Youtube. I found the content really easy to understand for beginners. They are lengthy but they provide a fantastic coverage of performing Data Analysis using Python.
At this point which should be around the 2-4 month mark, you should be able to start working on Kaggle. By this, I do not mean competing in competitions.
Kaggle users post datasets on which EDA notebooks can be created by other users. With your knowledge of Python, you can start creating these notebooks and gain popularity in the Kaggle community.
You may also start working on exploratory data analysis based projects where you find a dataset of your interest and do some data analysis on it.
Courses
- Stanford Coursera Machine Learning (Recommended)
- IBM Cousera Machine Learning with Python (Optional)
- edX Machine Learning (Recommended)
Books
A book that is does exactly as the title says, the 100 page Machine Learning book actually covers the entire breadth of Machine Learning in the space of 100 pages! Anybody who wants a super high level understanding of Machine Learning will love this book.
Perhaps the most well known book published about practical Machine Learning, Aurelien Geron's masterpiece provides an impressive coverage of two of the most beloved Machine Learning libraries.
Note that for the above book, it is advisable to read upto Part 1 initially. This will cover the scikit-learn library.
We can move to part two in the next 6 months which involves Deep Learning with Tensorflow.
At this stage, you will be comfortable working with Data Science libraries and different datasets. Now, we will be moving into slightly more advanced topics like Deep Learning.
Courses
- Deep Learning Specialization on Coursera by Andrew Ng
- Deep Learning Nanodegree
- Tensoflow in Practice on Coursera by Deeplearning.ai
- Practical Learning for Deep Coders by Fast.Ai
- Practical Learning for Deep Coders Part 2 by Fast.Ai
- Pytorch for Deep Learning by Jovian.ml
Books
This book makes a second appearance here. This time, you would want to read the second part of the book which is focused on Tensorflow.
Another book written by the author of a famous library, Deep Learning with Python offers a deep-dive into the famous Keras library.
Note that there is a newer version of the book set to be released which will cover the use of Keras in Tensorflow 2.0.
This section is left optional because these books will take a lot of effort to read and understand. There is a lot of math involved within the content of these books but it really helps to understand what is really happening underneath the Machine Learning libraries that we have learnt so far.
The most famous book about Deep Learning, it is a must read for all Deep Learning enthusiasts. Although there is significant mathematics presented in the topic, the approach is easy to grasp.
Another very famous book that focuses on the mathematics behind classical machine learning models, The Elements of Statistical Learning is a classic.
Depending on how soon you have completed the material (excluding the optional section), you might want to considering building another project at this point. You can pick either a classical ML project or a Deep Learning one.
At this stage, you would have sufficient knowledge about the breadth of the field of Data Science and Machine Learning. There are a couple of paths one could take here depending on the interests and goals.
These projects could be domain specific such as:
- Computer Vision
- NLP
- Reinforcement Learning
Look up competitions that are happening in websites like Kaggle, Hackerearth, DrivenData and join and participate! If you are in college, you could also look at college level Data Science competitions as well.
This is technically also a project but in this case, you will be specifically taking a research paper of interest and trying to implement the concepts presented in it. This is definitely the hardest among all of the three tracks mentioned here. However, having a research paper implementation project in your resume is pretty awesome as it will help you stand out from the crowd.
This section will focus on making you a really well rounded Data Scientist. By now, your resume should have atleast 2-4 Data Science projects. In these sections, we will try to cover the remaining aspects of landing that Data Science job.
Many roadmaps may include this step at the start. However, the reason I include this at the end is because I feel the best way to learn SQL is to practice it. Post this stage, most of you will be lining up interviews for Data Science positions. Most companies will have a round of SQL questions in their interviews. Hence, by practicing it now, your practical concepts will remain fresh.
Courses
- Intro to SQL by Kaggle (Recommended)
- Advanced SQL by Kaggle (Recommended)
- SQL for Data Science (Optional)
Practice
- SQL Zoo (Recommended)
- HackerRank (Recommended)
- SQL Fiddle (Optional)
Again, one of those topics that a lot of roadmaps either choose to include in the initial stages or discard entirely. Eitherways, having a strong foundation in Data Structures and Algorithms is very important.
Data Science roles may not test these concepts as hard as a Software Engineering role. However, most FAANG companies as well as a lot of startups do ask a round of coding interview questions even for Data Science roles.
If you are from a CS background, you will most likely be familiar with all the topics presented here.
Courses
- Algorithms Part 1 by Princeton University of Coursera (Recommended)
- Algorithms Part 2 by Princeton University of Coursera (Recommended)
- Data Structures and Algorithms Specialization on Coursera (Optional)
- Data Structures and Algorithms in Python by Udacity (Recommended)
Practice
- Leetcode (Recommended)
- HackerRank (Recommended)
- Codechef (Optional)
- Codeforces (Optional)
There will me more resources available for this topic in resources section
Essential skill to have. Every company working with software will be using version control. Hence, it is absolutely essential to have a sound knowledge of git.
Books
In my opinion, the best book there is to learn Git.
Courses
A really nice 30 min crash course to understand Git.
Next Steps
This is where most of you can start applying to Data Science jobs. Congratulations!!! π₯³ You have officially completed the journey of becoming a data scientist πππ
If you have more time left on your hands, this bonus material may help strengthen your profile even more before applying for Data Science jobs. This section might be more relevant for more experienced folks in the field as well.
- AWS Solution Architect Course by freecodecamp
- AWS Certified Cloud Practitioner by freecodecamp
- Google Cloud Training
- Microsoft Azure Fundamentals by freecodecamp
- Cloud Academy
To be filled soon!