Название: Ultimate Big Data Analytics with Apache Hadoop: Master Big Data Analytics with Apache Hadoop Using Apache Spark, Hive, and Python
Автор: Simhadri Govindappa
Издательство: Orange Education Pvt Ltd, AVA
Год: 2024
Страниц: 787
Язык: английский
Формат: pdf, epub
Размер: 25.3 MB
Master the Hadoop Ecosystem and Build Scalable Analytics Systems. This book serves as your comprehensive guide to navigating the complexities of Big Data analytics using Hadoop, offering both foundational knowledge and practical expertise to empower your journey. The Hadoop ecosystem, nurtured under the auspices of the Apache Software Foundation (ASF), represents a cornerstone of modern data processing infrastructure. Starting with an exploration of Hadoop's historical roots and its evolution, we delve into its pivotal role in managing vast datasets efficiently. We also illuminate the broader impact of the Apache Software Foundation, highlighting its collaborative ethos and the innovation it catalyzes across the technology sector. The book begins laying a strong foundation with an overview of data lakes, data warehouses, and related concepts. It then delves into core Hadoop components such as HDFS, YARN, MapReduce, and Apache Tez, offering a blend of theory and practical exercises. You will gain hands-on experience with query engines like Apache Hive and Apache Spark, as well as file and table formats such as ORC, Parquet, Avro, Iceberg, Hudi, and Delta. Detailed instructions on installing and configuring clusters with Docker are included, along with Big Data visualization and statistical analysis using Python. This book is tailored for data engineers, analysts, software developers, data scientists, IT professionals, and engineering students seeking to enhance their skills in big data analytics with Hadoop. Prerequisites include a basic understanding of big data concepts, programming knowledge in Java, Python, or SQL, and basic Linux command line skills.