Amazon Kinesis

Back to glossary

What is Amazon Kinesis?

The Simple Explanation

Amazon Kinesis is an Amazon Web Service designed to process large-scale data streams from a multitude of services in real-time. It can be considered, like Apache Kafka, as a kind of message broker. This means that it operates as a middleman between various data generating sources, to allow other applications or services to work with the source data.

One of the key benefits of Kinesis (and Kafka) is that it allows you to process and analyze data nearly immediately, rather than waiting for a complete dataset to arrive, then processing it, then delivering it for analysis. Insights can be derived in minutes, rather than hours, days or weeks. Kinesis makes this possible without weeks of complicated setup, as it is delivered as a managed platform, meaning you do not have to manage any infrastructure.

What Can Kinesis Do? 

In short, Kinesis is designed to ingest, process and analyze streams of data in real-time. Within this core skillset, Kinesis offers four key solutions:  

Amazon Kinesis Data Streams  

Amazon Kinesis Data Streams (KDS)  is designed to be a massively scalable and resilient real-time data streaming service. KDS is used when you have a large amount of data streaming from a multitude of potentially unconventional data producers. It can ingest gigabytes of data per second from a multitude of sources, including (but not limited to) website clicks, database event streams, financial transactions, gaming micro-transactions, IoT devices, and location-tracking events.  

In other words, if the data you wish to stream needs to go directly to and be actionable by a service or application, or needs to drive analysis immediately upon receipt, KDS is the option for you. The data collected is nearly immediately available (within 70 milliseconds of being collected) for real-time analytics, allowing for real-time dashboards, real-time anomaly detection, dynamic pricing, and more.   

Amazon Kinesis Video Streams 

Amazon Kinesis Video Streams is a data streaming service as well but tailored to video streaming. It allows you to securely stream video from any number of devices and present the data for playback, machine learning, analytics, or other processing. It can ingest data from nearly any video device you can think of: security cameras, video from smartphones, drones, RADARs, LIDARs, satellites, and more. It can help you easily build applications with real-time computer vision capabilities via integration with Amazon Rekognition Video, as well as video analytics using popular open-source machine learning frameworks. 

Kinesis Video Streams can also help you with streaming live or recorded media to browsers or mobile applications via HTTP Live Streaming (HLS). Using WebRTC, two-way real-time streaming between web browsers, mobile apps, and connected devices is also made possible.  

Amazon Kinesis Firehose 

Kinesis Firehose is used to reliably load large-scale streaming data into data lakes, data sources, and analytics services. Firehose can ingest, process, and deliver streaming data to any number of endpoints and services. This can include Amazon S3, Amazon Redshift, Amazon ElasticSearch Service, or generic HTTP endpoints, as well as service providers. It supports compression, batch processing, and can transform and encrypt data streams prior to loading, increasing security, and reducing storage costs. Firehose is used to deliver a deluge of data quickly to a central repository (whatever form that repository might take) for further processing. 

Amazon Kinesis Data Analytics 

Kinesis Data Analytics is used to transform and analyze streaming data in real-time, leveraging the open-source framework and engine of Apache Flink. It is designed to reduce the complexity of building, managing, and integrating Flink applications with other AWS services. Here you can learn a bit more about Apache Flink.  

Kinesis Data Analytics supports building applications in commonly used languages, including SQL, Java, Scala, and Python. It also integrates with a number of Amazon Web services, including Kinesis Data Streams (KDS), Managed Streaming for Apache Kafka (Amazon MSK, Kinesis Firehose, Amazon Elasticsearch), and more. 

AWS Kinesis vs Apache Kafka: What’s the difference? 

There are both a number of similarities and a number of differences between Kinesis and Kafka. Both are designed to ingest and process multiple large-scale streams of data with a great deal of flexibility in terms of source. Both replace traditional message brokers in environments that ingest large streams of data that need to be processed and delivered to other applications and services.  

The biggest difference between the two is that Amazon Kinesis is a managed service that requires minimal setup and configuration. Kafka is an open-source solution, requiring significant investment and knowledge to configure, often requiring weeks of setup, rather than hours.  

Kafka and Kinesis serve similar functions and deliver similar outcomes but operate differently. Key concepts used by Kinesis include Data Producers, Data Consumers, Data Streams, Shards, Data Records, Partition Key, and Sequence Number.  

Data Producers are the source devices that emit Data Records. The Data Consumer retrieves the Data Records from shards in the stream. The consumer is the app or service that makes use of the stream data. Shards are comprised of these Data Records, and in turn, numerous shards make up a Kinesis Data Stream. The partition key is an identifier, such as a user ID or timestamp, with the sequence number serving as a unique identifier for each data record. This ensures that the data remains unchanged throughout the stream.  

Kafka utilizes similar concepts but is broken down a little differently: Records, Topics, Consumers, Producers, Brokers, Logs, Partitions, and Clusters. With Kafka, records are immutable from the outset and sent sequentially to ensure continuous flow without data degradation. A Topic could be considered analogous to a Kinesis Shard and is essentially a stream of records. Logs serve as storage on disk, further sub-divided into partitions and segments.  

Kafka has four key APIs. The Producer API sends streams of data to Topics in the Kafka cluster. The Consumer API reads the streams of data from topics. The Streams API transforms streams of data from input to output Topics. The Connect API implements connectors that pull from source systems and push from Kafka to other systems/services/applications. 

The Broker, mentioned above, could be considered a Kafka Server running in a Kafka Cluster. Multiple Kafka Brokers are part of a given cluster, and a Kafka Cluster can consist of Brokers split among many servers. However, Broker can sometimes refer to Kafka as a whole. Essentially, it is the piece that manages the stream of data, incoming and outgoing.  

In addition to the differences in operation, in terminology, and in structure, there are also different integrations and feature sets. Kafka has SDK support for Java, but Kinesis can support Android, Java, Go, and .Net, among others. Because Kafka is open source, however, new integrations are in development every day. But while Kinesis may currently offer more flexibility in integrations, it is less flexible in terms of configuration – it only allows the number of days and shards to be configured and writes synchronously to 3 different machines, datacenters, and availability zones (this standard configuration can constrain throughput performance). Kafka is more flexible, allowing more control over configuration, letting you set the complexity of replications, and, when configured or tailored properly to a given use case, can be even more scalable and offer greater throughput.  

However, Kinesis’ lack of configuration flexibility is by design. Standardized configuration is part of what allows it to be set up in hours as opposed to weeks. This is also why Kinesis offers separate solutions such as FireHose, Video Streaming, Data Analysis, and Data Streaming. These use case-specific configurations allow Kinesis to be used in more scenarios while maintaining the benefits of a managed, quick-to-configure solution. 

Additional Resources 

Kinesis Documentation 

AWS Training: 

Introduction to Amazon Kinesis Streams 

Udemy Training: 

Intro to Amazon Kinesis Video 

Amazon Seminar Video:  

High Performance Data Streaming with Amazon Kinesis: Best Practices and Common Pitfalls 

 

Back to glossary
data lake ETL Demo

Batch and streaming pipelines.

Streaming plus batch in a single pipeline platform

No Airflow – orchestration inferred from data

$99 / TB of data ingested | unlimited free pipelines

Get Started Now

Templates

All Templates

Explore our expert-made templates & start with the right one for you.