This page is a categorised reading list of useful books for Data Scientists. I’ve read these and recommend them for the challenges a Data Science team will face when operating in dynamic Guerrilla Analytics project environments. If you would like to recommend an addition, please get in touch.
- The basics. Not basic as in ‘easy’ but rather the foundational techniques you need to get up and running as a Guerrilla Analyst.
- Intermediate. Now that you have the basics in place, these books will help you with gentle introductions to machine learning, visualization and general project management.
- Advanced. Sometimes really tough and interesting projects come along. Good times! Here are some advanced techniques to help you deliver.
- Miscellaneous. A variety of other indirectly related books that will help and inspire your Guerrilla Analytics.
’s a fact of Guerrilla Analytics life. A significant amount of project chaos will disappear if you have some form of version control.
You don’t need to become an enterprise class dev ops practitioner. You do need to know about versioning, tagging, reverting and other common version control activities. This is the go-to Git reference. Everything you need to know about Git and written from a Git perspective.
To be a true Guerrilla Analyst, you need to be comfortable at the command line. It’s the only way to quickly peek at, summarise, clean and join up the wide variety of data files that you are likely to encounter. It’s also the best way to automate your work for efficiency and reproducibility.
This book will teach you all the tools and tricks you need to get around the most awkward and broken data files that come your way. You’ll learn about chunking files, patching them together, sorting, editing and modifying in ways you probably thought possible only in ‘real’ analytics environment.
A great introductory book written in a fun and entertaining style and based around analytics done in spreadsheets. Spreadsheets mean trouble for the Guerrilla Analyst but from a beginner’s perspective they are a familiar way to dip a toe in the water.
Sometimes a spreadsheet is the quickest way to get a feel for your data and this book might open your eyes to how much is possible in ubiquitous desktop software.
If you are going to work with data then you really need to understand the many ways it can be flawed. This book is a fun and comprehensive treatment of the flaws to expect and how to detect them in a huge variety of data types. I especially liked the chapter ‘Data Quality Demystified’ which was the foundation for the categorisation of data tests in Guerrilla Analytics: A Practical Approach to Working with Data. You may not have time to implement everything in this book but it never hurts to be aware of problems lurking in your data and what may be causing those strange and unexpected numbers in your report.
Now that you have your Git reference book, you could probably use this shorter pocket guide for most of your day-to-day work.
This book is a really well written and structured introduction to the main machine learning techniques. Every technique is supported by real coded examples on real datasets.
Read this book to whet your appetite for all things machine learning.
Sometimes SQL just isn’t enough. SQL is great for heavy lifting data preparation but certain data transformations are difficult in plain old SQL and its ability to summarise data is limited. This book is all about pandas, a Python library for data manipulation, plotting and basic data analysis.
The book is a comprehensive guide to the pandas library and will get you through the most awkward data manipulations you are likely to encounter.
Intermediate knowledge of Python is required.
Some shameless self-promotion by yours truly. Now that you have the basics and some of the intermediates down, you’re ready for some Guerrilla Analytics.
Learn how to organise your projects (data, code, deliverables, testing, processes and team) so you are robust to all the disruptions of high pressure Data Science projects.
This book is a fun, well written and comprehensive introduction to a wide range of common machine learning algorithms. The author takes you through building up each algorithm step by step and sets the context for why the algorithm does what it does. Intermediate knowledge of Python and programming is required to get the most benefit from this book. For a bonus challenge, work through the exercises using your pandas knowledge from Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython!
So-called unstructured data is where a significant amount of insight lies. But how do you get at it?
This book is a tour de force in natural language processing using the NLTK Python library. Not for the faint hearted but very well written and comprehensive. Read this book and you have a powerful weapon in your Guerrilla arsenal
Even the best data science will fail if its benefits cannot be communicated and understood. Regardless of whether your job title is consultant or not, we are all consultants to the extent that we wish to influence others and have our opinions and ideas accepted. Peter Block’s book gives an amazing guide to consulting including how to work with peers, difficult clients and others. The book emphasises the idea that success is based on authentic behaviours that establish trust and great working relationships. This is the consultant’s bible but everybody can learn something from it.