Название: Software Engineering for Data Scientists (MEAP v3) Автор: Andrew Treadway Издательство: Manning Publications Год: 2023 Страниц: 319 Язык: английский Формат: pdf, epub Размер: 16.8 MB
These easy to learn and apply software engineering techniques will radically improve collaboration, scaling, and deployment in your data science projects.
In Software Engineering for Data Scientists you’ll learn to improve performance and efficiency by:
Using source control Handling exceptions and errors in your code Improving the design of your tools and applications Scaling code to handle large data efficiently Testing model and data processing code before deployment Scheduling a model to run automatically Packaging Python code into reusable libraries Generating automated reports for monitoring a model in production
Software Engineering for Data Scientists presents important software engineering principles that will radically improve the performance and efficiency of data science projects. Author and Meta data scientist Andrew Treadway has spent over a decade guiding models and pipelines to production. This practical handbook is full of his sage advice that will change the way you structure your code, monitor model performance, and work effectively with the software engineering teams.
The book is split into four parts:
• Part 1 – Getting started This part will cover topics such as source control, exception handling, better structuring your code, object-oriented programming (OOP) for data science, and monitoring the progress of your code (such as model training or data extraction)
• Part 2 – Scaling Part 2 covers scaling your code effectively. For example – how do you deal with larger datasets? We’ll cover both the computational and memory components of scaling
• Part 3 – Scheduling, testing, and deployment into production Part 3 details how to rigorously test your code, protecting your credentials (for example when connecting to a database to query data, scheduling models and data pipelines to run automatically, and packaging data analytics code into a portable library that can be shared with and downloaded by others
• Part 4 – Monitoring your data processing and modeling code Lastly, Part 4 will teach you how to effectively monitor your code in production. This is especially relevant when you deploy a machine learning model to make predictions on a recurring or automated basis. We’ll cover logging, automated reporting, and how to build dashboards with Python.
In addition to the direct topics we cover in the book, you’ll also get hands-on experience with the code examples. The code examples in the book are meant to be runnable on your own with downloadable datasets, and you’ll find corresponding files available in the Github repository. Besides the examples laid out in the book, you’ll also find Practice on your own sections at the end of most chapters so that you can delve further into the material in a practical way.
about the technology Many basic software engineering skills apply directly to Data Science! As a data scientist, learning the right software engineering techniques can save you a world of time and frustration. Source control simplifies sharing, tracking, and backing up code. Testing helps reduce future errors in your models or pipelines. Exception handling automatically responds to unexpected events as they crop up. Using established engineering conventions makes it easy to collaborate with software developers. This book teaches you to handle these situations and more in your Data Science projects.
Скачать Software Engineering for Data Scientists (MEAP v3)
Внимание
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
Информация
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.