Skip to content

studying panda

Data Science Curriculum

Overview

The curriculum presented here curates and organizes resources covering all core aspects of data science. It also includes content on how to learn effectively, mentions of influencers to follow, conferences to attend, links to important books, blogposts, videos, Youtube channels, and encourages one to learn & collaborate with a community of students by means of its own Discord channel or through joining other existing communities.

Visit the Resource Hub for complementary material on all topics.

Learning Path Example

Made with Visme Infographic Maker

The Curriculum Explained

Data Science Intro & Learning How to Learn

The best way to get into data science is to first learn how to program and to gain some familiarity with computer science foundations. Also, since you are about to engage in a lifetime of learning new concepts and skills, viewing the resources related to becoming a better learner is highly recommended.

Note: ❤️s represent material we particularly enjoyed and recommend.

Topics covered: Introduction to AI , Introduction to computer science and Python, Learning how to learn

Resources Source Format
Programming Environment Setup Le Wagon Tutorial
CS50 introduction to programming with Python Harvard Videos & coding exercises/projects
CS50 introduction to Computer Science Harvard Videos & coding exercises/projects
Learning How to Learn ❤️ Coursera  Videos and quizzes
A Mind for Numbers Barbara Oakley Book
Pragmatic Thinking and Learning ❤️ Andy Hunt Book

Tips

  • Start with setting up your computer and installing the required tools to do data science effectively. Le Wagon's setup is well maintained and has clear step by step instructions.
  • For those new to programming, start with Harvard’s CS50’s introduction to programming with Python. David Malan is one of the pioneers of online teaching. His content and pedagogical approach is always stellar. This class is new and given for the first time in 2022.
  • For a real challenge, or for those already somewhat familiar with programming, look at CS50’s introduction to Computer Science from the same professor. The exercises are more involved and there is a focus on the C programming language instead of Python. That said, it covers more material and computer science concepts. If you're not taking this course, at least watch lecture 0 - Scratch for its marvelous introduction to the field of computer science.
  • Since you're about to engage in a long journey of continuous studying, make sure you understand the latest research and science around acquiring knowledge. Taking “Learning how to learn” on Coursera will yield immense benefits long term and give you a competitive advantage over your peers. The book on which the course is based on, “A Mind for Numbers”, and the brilliant “Pragmatic Thinking and Learning” are good complementary options too. For a quick summary of all three resources, refer to this blogpost or listen to this podcast by Dr Paul Pen for an excellent overview of the subject.
  • Start your data science journey with Udacity's Data Analyst Nanodegree. This is a great place to start to learn about statistics, probability, data wrangling, data visualization, etc.

Core Data Science

If you can afford a bootcamp, Le Wagon is among the best data science ones out there. There is a heavy emphasis on practical exercises, the curriculum is constantly evolving with the latest libraries and technologies, and the final project involves deploying your own machine learning model.

If bootcamps are too expensive, you can replace it by first taking Udacity's Data Analyst Nanodegree, followed by Andrew Ng’s machine learning course on Coursera.

Read the indispensible An Introduction to Statistical Learning, 2nd Edition. This book has been updated in 2021 and has been a solid reference in the community for years. We suggest doing the proposed exercises in Python, and not in R.

Consolidate all of the above with Andriy Burkov’s famous and succinct The Hundred-Page Machine Learning Book, as well as the book: Python for Data Analysis (make sure you're reading the latest version, V3, which is free and fully available online) by Wes McKinney’s, creator of the pandas library. These two books are complementary: ene is more focused on theory, while the other is hands-on with Python.

Topics covered: Data wrangling Data collection with an API, SQL, Statistical tests & experiments, Data visualization, Machine Learning, Deep Learning, Random Forests, Model interpretation techniques

Resources Source Format
Data Science Bootcamp ❤️ Le Wagon In person / remote lectures - 9 weeks
An Introduction to Statistical Learning 2nd Edition G. James, D. Witten, T. Hastie, R. Tibshirani  Book
Data Analyst Nanodegree  Udacity Videos & coding exercises/projects
Machine Learning Course Coursera Videos & coding exercises/projects
The Hundred-Page Machine Learning Book ❤️ Andriy Burkov Book
Python for Data Analysis, 3rd Edition Wes McKinney  Book

Tips

  • Gain end-to-end experience in ML with Le Wagon's bootcamp. For a cheaper alternative look at Udacity's Data Analyst Nanodegree combined with Coursera's Machine Learning Course
  • An Intro to Statistical Learning is an accessible overview of the field and is widely used in many undergraduate and graduate courses all over the world.
  • Complement your learning with the very well written and concise Hundred Page Machine Learning Book.
  • Python for data analysis book just got updated to its 3rd edition and is now fully available online. It is a practical, modern introduction for manipulating, processing, cleaning, and visualizing datasets in Python. It is a great way to get better at pandas, Numpy, and matplotlib which are fundamental to any data scientist.
  • Visit the machine learning Resource Hub section for a list of complementary learning material.

Core Programming

The following resources will help you become a good programmer, understand some core software engineering principles, and give you the tools required to pass the technical tests that most employers send you during recruitment. We suggest reviewing this material early in your education because being a good programmer will pay off very fast. You don’t need to go through all of this material in a linear way. Review this on an as-need basis but make sure you’re regularly coming back to it.

Resources Source Format
Python with Corey Schafer ❤️ YouTube  Videos
Intro to Data Structures and Algorithms Udacity Videos & coding exercises
Missing Semester MIT Videos & coding exercises
Coding Exercises HackerRank  Coding exercises
Fluent Python ❤️ Luciano Ramalho Book

Tips

  • If you're struggling with any programming concept in Python, make sure you search for videos of Corey Schafer explaining the subject. His videos are always well-built, clear and enlightening.
  • Familiarize yourself with common data structures and algorithms in Python with Udacitys course which features practice exercises.
  • Complete HackerRank exercises to refine your Python skills with interview-style questions.
  • Once you’re comfortable with the basics of Python - that is to say, after at least 1 year of coding experience - you can slowly start reading Fluent Python for a deep dive on Python core language features and libraries. Note 1: Some of the later chapters are very advanced and optional. Note 2: Keep an eye out for the updated edition of the book which is coming soon.
  • If you still aren't comfortable with the shell, version control (Git) and debugging, watch the lectures from MIT's Missing Semester and do the exercises.
  • For a short and very well explained primer on the basics of Git, we highly recommend Made with ML's Git guide

Core Math

Machine learning is a mix of Statistics, Linear Algebra, Probability, and Calculus. Some say that it’s not strictly necessary to go deep into mathematical theory and that it’s better to focus on coding. While there is some truth to this, if your end goal is to read, write, implement papers, and to be a true expert in data science, then do not neglect math. The following list of resources will help you both get started if you’re a beginner, or let you go deep down the math rabbit whole if you’re advanced. Remember to practice solving exercises if you want what you're learning to stick.

Topics covered: Linear algebra statistics Vector calculus Probability and more

Resources Source Format
Essence of Linear Algebra ❤️ YouTube Videos
Seeing Theory - Visual intro to Prob & Stats ❤️ Seeing Theory Interactive Website
StatQuest - Machine Learning ❤️ Youtube Videos
Linear Algebra Khan Academy Videos & exercises
Linear Algebra 18.06 with Gilbert Strang ❤️ MIT Videos & homework
Calculus 1 & 2 Khan Academy Videos & Math exercises
The Learning Machine ❤️ The Learning Machine Online Book
Intro to Probability, Stats & Random Processes Probability Course Online Book

Tips

  • Build an intuition for linear algebra with this fantastic resource: Essence of Linear Algebra by 3Blue1Brown.
  • Seeing Theory is an incredible interactive resource for an introduction to probability and statistics. Start here if you're new to the concepts.
  • Learn all things related to probability, statistics and machine learning with Statquest. Josh Starmer has a gift for breaking down complex ideas into some of the simplest and best explanations on the Web. He also recently published a book which we encourage you to check out.
  • Delve deep into linear algebra with prof. Gilbert Strang's amazing lecture which has been viewed by millions before. Complement with exercises in his book (which includes solutions to the exercises). For a less in depth alternative, refer to Khan Academy.
  • The Learning Machine has great interactive visuals to help build your intuition for a lot of math concepts.
  • Intro to Prob/Stats has everything you'll need in the subject coupled with exercises and their solution, all in a very clear interface.
  • Don't forget to actually do the exercises and work on assignments. This is the only way you'll become good at math.
  • Visit the math Resource Hub section for a list of complementary learning material.

Deep Learning

After completing the courses in Core Data Science, and with more solid foundations in programming and machine learning theory, you can move onto deep learning if that’s an area that interests you. It can be tempting to jump straight to this section when you're starting because there are really cool applications to work on. Our POV on this is if this is what really motivates you, try it out and see what you can get out of it. Remember to come back to the sections above however, or you will have gaping holes in your fundamentals which will come back and haunt you down the line.

Topics covered: Loss functions and optimization, Convolutional neural networks, Recurrent neural networks, Deep learning hardware and software, Deep learning for tabular data, NLP, Computer vision, Generative models,

Resources Source Format
Practical Deep Learning for Coders - Part 1 (v5) ❤️ U of San Francisco Videos & coding exercises/projects
Fastai Book Jeremy Howard, Sylvain Gugger Book
Deep Learning Specialization ❤️ Coursera - Andrew Ng Videos & coding exercises/projects
The spelled-out intro to neural networks and backpropagation: building micrograd Andrej Karpathy Video
The Illustrated Transformer Jay Alammar Blogpost

Tips

  • Learn how to create state of the art models using the Fastai Library with Part 1 of their course. We suggest taking both the Fastai and Deep Learning Specialization courses together since one is more focused on coding while the other is more focused on the theory and math behind it. While you're at it, follow Fastai's course with their book.
  • Andrej Karpathy's video on micrograd is a great introduction to neural networks and backpropagation.
  • The transformer architecture is widely used these days. To get a solid grasp of what they are, be sure to read Jay Alammar's blogpost on the subject.
  • Visit the deep learning Resource Hub section for a list of complementary learning material.

MLOps & Data Engineering Primer

Working on machine learning for a company is usually a lot more involved than just running models inside a Jupyter notebook.

The resources below will get you familiarized with the whole life-cycle of a machine learning system, and the data engineering required to get your data ready. You'll learn about formulating a problem, ingesting, labeling & cleaning data, building reusable pipelines for each step, deploying models online and monitoring them, and much more. You'll gain preliminary notions about what it takes to put a model in production. As the field is maturing, knowing about these steps isn't optional anymore for anyone doing machine learning unless you're only doing R&D.

Topics covered:

Resources Source Format
Full Stack Deep Learning ❤️ UC Berkeley Videos & coding exercises/projects
Made With ML Made With ML MLOps Course
Machine Learning Engineering for Production (MLOps) Specialization Deeplearning.ai  Videos & coding exercises/projects
Designing Machine Learning Systems ❤️ Chip Huyen Book
SQL Mode Analytics  Coding environment exercises

Tips

  • Learn how to create experiment management scripts, unit tests, labelling, linting scripts, continuous integration/continuous development with CircleCI, model versioning, Docker and web deployment with the Full Stack Deep Learning course. The labs walk you through how to build a fully fledged hand writing text recognizer using Pytorch.
  • Design an ML production system end-to-end with Deeplearning.ai's Machine Learning Engineering for Production (MLOps) specialization.
  • Read Goku Mohandas' Made With ML, a course on deploying machine learning models in an automated, reproducible, and auditable manner.
  • Complement this course with Chip's amazing Designing Machine Learning Systems book that will teach you about the whole life cycle of a machine learning project.
  • Go back to some of the models you have built for your projects and deploy them!
  • Visit the data engineering Resource Hub section for a list of complementary learning material.

Extras

In addition to all of the above, we suggest doing the following:

  • Subscribe to these newsletters to stay up to date with data science news: Andriy Burkov, Deeplearning.ai's The Batch, DataScienceWeekly for a constant flow of curated blogpost to stay up to date in the field.
  • Subscribe to Yannic Kilcher's Youtube channel for news related to the latest trends in the industry. He also explains and reviews newly published papers.
  • Regularly explore Meetup.com to see if there are meetups on topics you are interested in. We have no recommendations since this is location dependant, but since Covid, some meetup groups are now 100% online - you potentially have access to meetups around the world.
  • Attend conferences. One we highly suggest going to is Pycon, even if that means spending a bit of money to attend and travelling to a host city. The value you'll get from it will be worth it in our experience. It is a way for you to be inspired by all that is happening in the world of Python, engineering, and machine learning. Alternatively, you can get an online only ticket for cheaper.
  • Participate in a Kaggle competition. Find a competition you're interested in joining - look for ones that are currently open/soon to open. Find teammates to work with, and start coding! Visit the Code section of the competition to read other competitor's notebooks. There are always people sharing interesting work.
  • Participate in Hackathons. Keep an eye out for these events happening in your city, or look on meetup.com to find them.
  • Actively look for and join communities on Reddit, Discord, and Slack. For instance, subscribe to Reddit's /r/learnmarchinelearning and /r/machinelearning subreddits. Join discord servers to find study groups so that you're not learning alone.
  • Put yourself out there and start writing! Create a personal blog and write articles about what you learned. Your target audience should be people in the same situation you were in 6 months / 1 year ago.
  • Find a meetup group and ask if you can present a subject you've been working on. Do the same within a bunch of different study groups. This will help your oral presentation skills.
  • Keep an eye out for recruiting events in your area. Companies are always participating in these events.
  • Update your CV and LinkedIn profile to match your studies and personal projects.
  • Visit the career Resource Hub section for a list of complementary material.

Final Notes

As a rule of thumb, you can be sure to trust the quality of the following content if you come across their material:

  • All new and old courses from Deeplearning.ai.
  • All computer science / machine learning courses at Stanford Online.
  • All courses from Fastai and Jeremy Howard specifically.
  • Andrew Ng for machine learning.
  • Justin Johnson for computer vision.
  • Chris Manning for NLP.
  • All of Andriy Burkov's content.
  • StatQuest for statistics/ML explanations (checkout his new book!).
  • 3Blue1Brown for math.

If you want to help improve the curriculum and think you have a good resource to share, read about how you can contribute here.