Название: Building Real-Time Analytics Applications: Operational Workflows with Apache Druid Автор: Darin Briskman Издательство: O’Reilly Media, Inc. Год: 2023-02-03 Язык: английский Формат: pdf, epub, mobi Размер: 10.2 MB
This report will introduce you to the real-time analytics applications that organizations are building to power new operational workflows. You’ll learn about the purposes of these applications, the value they provide, and the technologies needed to create them. Once you’ve completed this report, you’ll know how real-time analytics applications can help your organization, and you’ll know what you’ll need to create a solution that works for you.
What Is a Real-Time Analytics Application? Every organization needs insight to succeed and excel. While insights can come from many wellsprings, the foundation for insights is dаta: both internal data from operational systems and external data from partners, vendors, and public sources. For decades, the traditional approach to analytics has focused on data warehousing and business intelligence, where experts query historical data “once in a while” for executive dashboards and reports.
This new approach is the real-time analytics application, and it’s formed at the intersection of the analytics and application paradigms, with technical requirements that bring the scale of data warehouses to the speed of transactional databases.
Real-time analytics applications generally share three common technical characteristics:
1) Subsecond performance at scale This allows humans and, sometimes, machines to quickly and easily see and comprehend complex information and to hold interactive conversations with data, drilling down to deep detail and panning outward to global views. Many real-time analytics solutions support interactive conversations with large data sets, maintaining subsecond performance even with dozens of petabytes of data.
2) High concurrency This enables large numbers of users to generate multiple queries as they interact with the data. Architectures that support a few dozen concurrent queries aren’t sufficient when thousands of concurrent queries must be executed simultaneously. Of course, this must be done affordably, without requiring large installations of expensive infrastructure.
3) Real-time and historical data Real-time data is usually delivered in streams, using tools like Apache Kafka, Confluent Cloud, or Amazon Kinesis. Data from past streams and from other sources, such as transactional systems, is delivered as a batch, through extract, load, and transform (ELT) processes. The combination of data types allows both real-time understanding and meaningful comparisons to the past.
Like nearly any application, real-time analytics applications require a persistent service where data can be reliably stored and retrieved. There are hundreds of databases available, with both open source and commercial options, and nearly any database can, in theory, be used to support a real-time analytics application. However, an application that can provide the needed performance, scale, and reliability for real-time analytics will require a database with some specific capabilities. Speed is critical for real-time analytics applications: delivering data in milliseconds isn’t useful unless data queries also execute in milliseconds. A real-time analytics database must be able to both ingest incoming events and process queries with subsecond performance, even for large data sets of hundreds of terabytes or petabytes.
How quickly can data be added to the database? All databases support moving sets of records from files into the database, usually known as batch ingestion. For many analytics databases, such as Snowflake and Amazon Redshift, this is the only method of data ingestion. Only a few databases designed for real-time data analytics, such as Apache Druid, can perform both batch ingestion and stream ingestion, with each event becoming immediately available for queries as soon as it arrives.
Внимание
Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь.
Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
Информация
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.