Dataproc Cookbook: Running Spark and Hadoop Workloads in Google Cloud
- Добавил: literator
- Дата: 4-06-2025, 16:40
- Комментариев: 0

Автор: Narasimha Sadineni, Anuyogam Venkataraman
Издательство: O’Reilly Media, Inc.
Год: 2025
Страниц: 500
Язык: английский
Формат: epub
Размер: 19.1 MB
Want to build Big Data solutions in Google Cloud? Dataproc Cookbook is your hands-on guide to mastering Dataproc and the essential GCP fundamentals—like networking, security, monitoring, and cost optimization--that apply across Google Cloud services. Learn practical skills that not only fast-track your Dataproc expertise, but also help you succeed with a wide range of GCP technologies.
Written by data experts Narasimha Sadineni and Anu Venkataraman, this cookbook tackles real-world use cases like serverless Spark jobs, Kubernetes-native deployments, and cost-optimized data lake workflows. You'll learn how to create ephemeral and persistent Dataproc clusters, run secure data science workloads, implement monitoring solutions, and plan effective migration and optimization strategies.
The evolution of distributed systems for data processing has progressed from the constraints of single VMs, through the power of specialized Massively Parallel Processing (MPP) systems, to the revolutionary breakthrough of Hadoop utilizing clusters of commodity hardware—a shift that fundamentally redefined the scale of data we could handle. Technologies like Apache Hadoop (MapReduce, HDFS, Hive) allowed us to tackle data problems at a scale previously unimaginable, and to do so within practical time frames. Spark, with its in-memory processing capabilities, pushed the boundaries even further, enabling large-scale data operations in mere seconds.
Google Cloud Dataproc sits right at the heart of this exciting intersection. It provides a managed service designed to let you run your familiar Hadoop and Spark workloads (and other tools like Flink and Presto) seamlessly on GCP’s robust infrastructure. This means you can migrate existing applications with minimal-to-no code changes, shedding the burden of infrastructure management and focusing instead on extracting value from your data. Dataproc makes leveraging the power and flexibility of the cloud for big data workloads incredibly straightforward—and that’s something to be genuinely excited about! Until now, practical, consolidated resources beyond official documentation have been scarce, and this book aims to be your definitive guide. Packed with practical, tested recipes, it’s your go-to guide for exploring the real-world power of Dataproc. While Dataproc is our primary focus, the underlying Google Cloud fundamentals explored here—including resource organization, IAM, logging, monitoring, and security—provide valuable, transferable knowledge applicable across the GCP ecosystem. Let’s dive into harnessing the capabilities of Google Cloud Dataproc for your data.
Create Dataproc clusters on Compute Engine and Kubernetes Engine
Run data science workloads on Dataproc
Execute Spark jobs on Dataproc Serverless
Optimize Dataproc clusters to be cost effective and performant
Monitor Spark jobs in various ways
Orchestrate various workloads and activities
Use different methods for migrating data and workloads from existing Hadoop clusters to Dataproc
Who Should Read This Book:
This is a handy cookbook on Dataproc that will help you accelerate your Hadoop migration and Dataproc learning journey and optimize your workloads. It is designed for data engineers, data scientists, cloud architects, and more:
Data engineers
Professionals responsible for designing, building, and maintaining data processing pipelines using Dataproc. This book will help you learn about the various features, best practices, and optimization techniques for managing big data workflows.
Data scientists
Researchers and analysts who work with large datasets and need to perform advanced analytics and machine learning tasks. This book will help you understand how to leverage Dataproc’s capabilities to process and analyze data effectively.
Cloud architects
Professionals responsible for designing and implementing data processing solutions on Google Cloud Platform. This book will help you understand how to integrate Dataproc with other services and architectures to create scalable and efficient data processing systems.
Data analysts
Individuals who work with data to derive insights and make informed business decisions. This book will help you learn how to leverage Dataproc’s capabilities to process and transform data for analysis and reporting.
Students and researchers
People studying data engineering, data science, or related fields who want to gain a comprehensive understanding of data processing technologies and how to use Dataproc effectively.
IT managers and decision makers
Executives and managers responsible for making decisions regarding data infrastructure and processing solutions. This book will help you understand the benefits, costs, and use cases of adopting Dataproc for your organization.
Скачать Dataproc Cookbook: Running Spark and Hadoop Workloads in Google Cloud

Внимание
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
Информация
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.