A reader (Ian) recently asked me what I felt was the best way to learn data science in his spare time.
Great question, Ian, especially considering the field is red-hot right now and shows no signs of slowing down!
In 2018 demand for data scientists grew by 29%. This highlights the 344% increase since 2013 according to reports by Indeed and Dice. The supply of qualified candidates, however, drastically lags behind.
My first instinct was to refer Ian to the contemporary King of AI, Siraj Raval, and his free curriculum “Learn Data Science in 3 Months”. The video, however, was created six months ago (a lifetime in the ever-changing world of data science) and I saw some opportunities to update it.
Some courses were gone or had changed… so I updated/replaced them.Some people prefer a written guide… so I typed it out.Some need to know why a topic is important to understand it… so I added explanations.Finally, (based on personal experience) some can get stuck on any single course no matter how interesting it is. I always like to have an additional course available that may explain something a different way or fill in some knowledge gaps… so I added some alternatives/additions.
Hopefully, this complete curriculum to becoming a data scientist helps Ian and anyone else interested in the field!
1. Learn Python
Tools you’ll use? Python
Massachusetts Institute of Technology (MIT) | Introduction to Computer Science and Programming in Python
Kaggle | Python
Siraj Raval | Learn Python for Data Science
2. Learn Statistics and Probability
Math
As a data scientist, you’ll have to extract useful information from extremely imperfect data. You can’t completely eliminate uncertainty but you can reduce it with a strong grasp of statistics and probability fundamentals.
Khan Academy | Statistics and Probability
UC San Diego | Probability and Statistics in Data Science using Python
3. Learn Data Analysis
Pandas, R
Data analysis enables you to summarize the characteristics of a data set. This deeper understanding of the data can direct you to the best way to extract useful, actionable conclusions. In short… learn how to understand and clean data. It’s what 90% of your time will be spent doing.
Georgia Tech | Computing for Data Analysis
Kaggle | Pandas
4. Learn Algorithms and Machine Learning
Pandas, scikit-learn
This is likely why you got into data science in the first place! Use Skynet to draw conclusions from the data we mere humans never could.
Columbia | Machine Learning for Data Science and Analytics
Kaggle | Machine Learning
Related: 12 Best Free Machine Learning Software Tools
5. Learn Deep Learning
TensorFlow, Keras
Because everyone’s talking about deep learning so you have to use deep learning always.Alright… not exactly.Good ol’ fashion machine learning is still the best option for most data science endeavors. Deep Learning, however, is making major breakthroughs in certain fields such as image recognition, automation and many more.
Udacity | Intro to Relational Databases and
Kaggle | SQL
6. Learn Relational Databases
SQL, DB-API, NoSQL
As a data scientist chances are good you’ll need to access some data. Equally likely is the fact that that data will be stored in databases. Might be a good idea to learn your way around them.
Udacity | Intro to Relational Databases and
Kaggle | SQL
7. Learn Distributed Computing for Big Data
Hadoop, MapReduce, Spark
2.5 quintillion bytes of data are created every day. Let me repeat that… 2,500,000,000,000,000,000 bytes. That’s 2,500 with 15 extra 0’s. If every byte were a single penny, and we laid them all flat, they’d cover the entire Earth… five times. How does a data scientist actually process that kind of data? By filtering and sorting it (MapReduce) and distributing that work over clusters (Hadoop / Spark).
Udacity | Intro to Hadoop and MapReduce
Stanford | Intro to Apache Spark
8. Learn Data Presentation and Storytelling
Matplotlib, Seaborn, Folium, Excel, PowerPoint
If a tree falls in the woods, but no one’s there to hear it, does it make a sound? What if useful insight is extracted from data, but no one understands it enough to take action, does it serve a purpose?Not really. Data science is useless if the results aren’t actionable. You have to be able to show not just what the data says but why it matters and what should be done about it. An average data scientist with outstanding presentation skills will almost always produce more useful results than the best data scientist who can’t explain them.
IBM | Visualizing Data with Python
Kaggle | Data Visualization
Want to jump-start your data scientist career journey?
Email us hello@ptechpartners.com, or just browse all our open positions today.
About the Company:
Peterson Technology Partners (PTP) has partnered with some of the biggest Fortune brands to offer excellence of service and best-in-class team building for the last 25 years.
PTP’s diverse and global team of recruiting, consulting, and project development experts specialize in a variety of IT competencies which include:
- Cybersecurity
- DevOps
- Cloud Computing
- Data Science
- AI/ML
- Salesforce Optimization
- VR/AR
Peterson Technology Partners is an equal opportunities employer. As an industry leader in IT consulting and recruitment, specializing in diversity hiring, we aim to help our clients build equitable workplaces.