Glossary

Amazon Kinesis

What is Amazon Kinesis? The Simple Explanation Amazon Kinesis is an Amazon Web Service designed to process large-scale data streams from a multitude of services in

Apache Airflow

What is Apache Airflow? Airflow is an open-source workflow management system designed to programmatically author, schedule, and monitor data pipelines and workflows. The open-source distribution is

Apache Airflow DAG

What is Apache Airflow DAG? DAG stands for Directed Acyclic Graph. DAGs can be used to schedule and monitor airflow tasks. It is a collection of

Apache Kafka

What is Apache Kafka? Apache Kafka is an open-source streaming platform originally developed by LinkedIn. It was developed as a messaging queue but took on a

Apache Spark

Apache Spark is a fast, flexible engine for large-scale data processing. More specifically, Apache Spark is a parallel processing framework that boosts the performance of big-data

Change Data Capture

Change Data Capture is not a new concept, and has been a part of database and data warehouse management for nearly as long as they have

Data Lake

A data lake is an architectural design pattern in big data. It is not a single product; rather, a data lake is a set of tools

Data Pipeline

A data pipeline is a process for moving data between source and target systems. Data pipelines are used to replicate, move, or transform data, or simply

Data Warehouse

What is a Data Warehouse? A data warehouse is a technology that aggregates data from operational systems and external data sources from anywhere within an organization,

ETL Pipeline

A data pipeline is a process of moving data from one location to another, from source to target. A data ETL pipeline (extract/transform/load) is a data

Spark Streaming

What is Spark Streaming? Apache Spark Streaming is an extension of the core Apache Spark API, a distributed general-purpose cluster computing framework that natively supports both

Amazon Kinesis

Apache Airflow

Apache Airflow DAG

Apache Kafka

Apache Spark

Change Data Capture

Data Lake

Data Pipeline

Data Warehouse

ETL Pipeline

Spark Streaming

Templates

All Templates