This page will collect useful resources for learning data science, machine learning, and related topics. Many of the resources listed here are far beyond the scope of CS 307, so you are not responsible for the material presented. However, if you are interested in learning more about these topics, these resources will help build the foundation of your virtual “data science bookshelf” that you can refer to in the future.
With each resources, we will try to provide a description of why that particular resource could be useful.
Python
- Think Python
- Subtitled “How to Think Like a Computer Scientist,” this book is simultaneously an introduction to Python and the larger field of computer science. The Python presented is not directly related to data science, but instead focused on the fundamentals of the language and programming itself. These fundamental skills are unlikely to be immediately useful, but would easily pay long-term benefits. If you plan to use Python for data science, it is useful to know Python beyond the common data science usages.
- Python for Data Analysis
- Written by its creator, this book could alternatively be called “The Pandas Book” as it is a deep dive into using Python for data analysis, with a heavy emphasis on
pandas
.
- Written by its creator, this book could alternatively be called “The Pandas Book” as it is a deep dive into using Python for data analysis, with a heavy emphasis on
Machine Learning
- An Introduction to Statistical Learning
- A classic introductory machine learning textbook that was originally subtitled “with Applications in
R
” but now includes a parallel version written in Python. Written by statisticians, this book will have a more statistical lean, and sometimes includes or emphasizes topics that are not as common in the machine learning community.
- A classic introductory machine learning textbook that was originally subtitled “with Applications in
Scikit-Learn
scikit-learn
Website and Documentation- If you plan to apply machine learning using Python,
scikit-learn
is had to avoid. Because it is a mature package that is widely used, the documentation, including the API Reference and User Guide are excellent resources. Even if you’ve already been usingscikit-learn
, be sure to read the Getting Started page as it provided as succinct but comprehensive overview of the package.
- If you plan to apply machine learning using Python,
probabl
scikit-lego
Probability
- Introduction to Probability for Data Science
- A straightforward and well presented introduction to probability and statistics.
Miscellaneous
- The Markdown Guide
- The web is powered by HTML, but you do not want to write HTML. Instead, more and more the linga fraca for writing on the web is moving towards Markdown. This guide will help you learn the basics of Markdown.