Glossary

Search through our Glossary

Amazon Kinesis

What is Amazon Kinesis? Amazon Kinesis is an Amazon Web Service designed to process large-scale data streams from a multitude of services in real-time. It can

Apache Kafka

What is Apache Kafka? Apache Kafka is an open-source streaming platform originally developed by LinkedIn. It was developed as a messaging queue but took on a

Apache Spark

Apache Spark is a fast, flexible engine for large-scale data processing. More specifically, Apache Spark is a parallel processing framework that boosts the performance of big-data

Change Data Capture

Change Data Capture is not a new concept, and has been a part of database and data warehouse management for nearly as long as they have

Data Lake

A data lake is an architectural design pattern in big data. It is not a single product; rather, a data lake is a set of tools

Data Pipeline

A data pipeline is a process for moving data between source and target systems. Data pipelines are used to replicate, move, or transform data, or simply

Data Warehouse

What is a Data Warehouse? A data warehouse is a technology that aggregates data from operational systems and external data sources from anywhere within an organization,

ETL Pipeline

A data pipeline is a process of moving data from one location to another, from source to target. A data ETL pipeline (extract/transform/load) is a data

Spark Streaming

What is Spark Streaming? Apache Spark Streaming is an extension of the core Apache Spark API, a distributed general-purpose cluster computing framework that natively supports both