Athena or Redshift? 4 Questions to Decide

Athena or Redshift

Whether you’re using Athena or Redshift, Upsolver is the ultimate Data Lake ETL tool to transform event streams into usable data and unlock the value of your big data architecture. Get a free demo now or check out our architecture overview to learn more.

With the near-ubiquity of Amazon Web Services in the cloud space, it’s easy to forget how new many of these services are. Redshift has only been commercially available since 2013; Amazon Athena has been around less than three years. The rapid release of new databases and analytics tools – both by AWS and newer players such as Snowflake – can leave software architects baffled as to which stack they should turn to in order to solve a specific business scenario.

“Tool X vs Tool Y” comparisons usually start and end with a cop-out stating that everything depends on your particular circumstances, there is no single correct answer, etc. (we’ve done it too when we compared Kafka to Kinesis.) Like most cliches, this one is also true; technology decisions are rarely black or white and if two platforms are commonly used, it’s often because each can be useful for some users in some scenarios.

Nevertheless, there are some high-level factors you can look at to help you gauge which tech could be the more relevant for your situation. In this article, we’ll try to do exactly that – help you identify the use cases where you’d use Redshift for your analytic workloads, versus those where you’d lean more towards Athena. We’ll try to do that with 2 sentences, 4 questions, or a detailed feature-by-feature breakdown.

In this article you will learn:

Redshift and Athena from a Bird’s Eye View 

Once you understand the differences between an enterprise data warehouse such as Redshfit and an interactive query service such as Athena, you can identify the scenarios in which one will be more useful than the other

Let’s give the 30 second overview of each one:

Amazon Athena

  • Serverless query service – AWS manages all infrastructure aspects
  • Works on top of Amazon S3 object storage
  • Typically only read-only and does not alter data on S3; however query results can be written to S3
  • Uses ANSI SQL and Presto to query data
  • Utilizes Amazon Glue as a Hive Metastore

Amazon Redshift

  • Cloud data warehouse based on PostgreSQL
  • Composed of leader nodes and compute nodes. Users can scale by adding compute nodes as needed
  • Coupled architecture (storage and compute); serverless functionality available via Redshift Spectrum

There’s a metric ton more that can be said about each of these tools – the AWS website would be a good place to start if you want to further your research (Redshift, Athena). However, for the purposes of this article, these are the important aspects you need to understand to answer the question of…

Redshift or Athena? The super-short answer

Let’s start with the simplest, most basic way to approach the question of where to analyze your cloud data.

  • Redshift is faster and more robust (as long as you’re willing to add computer power), but also more expensive and complicated to manage
  • Athena makes it is easier to get started and is more flexible in the types of data it can query, but performance is not guaranteed without significant data preparation

From here it’s a question of how much data you have, what your use case is and how much you are willing to invest in money and human hours. 

Let’s go a little bit more in-depth with…

4 Questions to decide where to run your analytic workloads

The caveat from above still applies – your mileage may vary, every organization and every architecture is different. However, here are the key factors to consider when debating between Redshift and Athena:

1. What is your use case?

Redshift is the more natural choice for data warehouse reporting, Athena for ad-hoc queries against S3 storage

Redshift would be the better choice if you have data coming in from diverse sources and you would like to transform that data, enforce consistent schema and formats, and create a single source that feeds a large number of business intelligence and reporting tools across the organization (see the AWS Athena FAQ). 

If you’re running the same query every hour or day to power a BI dashboard, and the structure of the underlying data stays consistent, Redshift might be the better choice – especially if you’re willing to invest in infrastructure to ensure consistent performance come hell or high water. 

Athena, on the other hand, might be your go-to for an ad-hoc query – most queries will ‘just work’, even on semi-structured or unstructured data, and they will return results without extensive tuning. However, if you are running your analytic workflows at higher scales, you will need to apply a comprehensive set.of best practices in order to improve Athena performance.

2. Do you have the human resources to manage infrastructure?

Redshift requires DBA resources to manage and resize clusters, with Athena there is no infrastructure to manage 

While both Athena and Redshift are offered as managed services by AWS, Redshift still requires non-negligible effort to initialize and configure clusters (last year’s release of Elastic Resize is meant to streamline this process). If data volumes and retention policies are in constant flux – which would often be the case if you work with high volumes of streaming data – you will need to invest significant human resources in cluster management. Additional time needs to be spent loading the data into Redshift.

Athena, on the other hand, is by definition serverless – there is no infrastructure to manage except object storage on Amazon S3, which scales virtually infinitely. This removes the reliance on DBA resources, but also means you can’t add compute resources to improve performance when data loads increase.

Redshift Spectrum, released in late 2017, is an attempt to bridge the gap by providing a serverless extension to Redshift that allows users to join data from S3 with Redshift tables, without managing any infrastructure. However, Spectrum is only available if you already have a Redshift cluster, and some consider it to be less mature than Athena (see this related discussion on Stack Overflow).

3. How much are you willing to invest in performance?

Redshift is more expensive as you are paying for both storage and compute, compared to Athena’s decoupled architecture.

A highly optimized Redshift cluster with sufficient compute resources will most likely return faster than the same query in Athena. However, getting to that stage could cost you an arm and a leg.

Since Redshift is a database, you are paying per node, and the size of your cluster will greatly depend on the volume and velocity of the data you’re dealing with. This can get prohibitively expensive at higher scales, forcing you to either devote increasingly large budgets to your cloud infrastructure or playing a game of cat-and-mouse with your data retention policy (a telltale sign that you should move to a big data architecture). This is without even discussing the software and human costs of data transformation, which can be even more signficant.

Athena runs directly on top of S3 and is priced per terabyte of data scanned. This decoupled architecture allows you to leverage inexpensive storage on Amazon S3 which should not make much of a dent in your AWS bill. Querying costs can also be significantly reduced by means of effective data preparation for Athena, which can also have an impact on performance. 

4. How structured is your data?

Redshift is a relational database and best suited for tabular data; Athena is better for semi-structured and unstructured data

As an RDBMS, Redshift stores data in tables and enforces schema-on-write. This makes it better suited for structured data that is ingested already in tabular format, which would often be the case with business application sources such as CRM or HR systems. For these sources, the ETL effort is less significant and there are plenty of tools available to help you get the job done.

However, if you are working with unstructured or semi-structured data – such as raw event data captured from clickstream, IoT sensors or online advertising – this ETL becomes a much more acute problem, requiring complex coding in Spark/Hadoop, workflow orchestration and more. In these cases it might make more sense to store schemaless data on S3 first, and solve the ETL problem once you actually have your analytic use case figured out.

This is not to say you cannot store semi-structured data in Redshift – it’s just a question of whether you are willing to invest this significant ETL effort on ingest. If not, Athena might be the better choice as it queries S3 directly, and can handle complex data types, including nested JSON.

Feature Breakdown

Now that you are hopefully able to place your scenario broadly within the scope of one tool or the other, you can use the following table and resources to continue comparing Redshift and Athena on a feature-by-feature basis.



Need to initialize and config cluster, and load data


Instant – no need to setup infrastucture



Does not support table partitioning; CREATE TABLE is used to definte distribution and optimize processing


Data partitions on Hive metastore correspond to folders on S3, with support for custom partitioning



Users can create user-defined functions using an SQL SELECT clause or Python


Does not support UDFs or UDAFs



– Dense Compute: $0.25 per hour for dc2.large or $4.80 per hour for dc2.8xlarge

– Dense Storage: $0.85 per hour for ds2.xlarge or $6.80 per hour for ds2.8xlarge


– Price per query: $5 per terabyte scanned

– Minimum 10mb scanned per query

Storage on Amazon S3

Primary Keys


No-build in constraint on primary key; Needs to be declared at onset


No indexing since works on top of S3 – no primary keys

Data formats


AVRO, CSV, JSON, Parquet, ORC and TXT.


CSV, TSV, Json or Textfiles; highly recommended to store data in columnar fomrat such as Apache Parquet or Apache ORC



Cluster-level security with support for encryption


Relies on Amazon Identity management (IAM); access to data based on S3 permissions; can read encrypted data from S3

Further Reading




Share with your friends

Don't Stop Here - More to Explore

Explore all Blog Categories

Explore all Blog Categories:

data lake ETL Demo

Let’s get personal:
See Upsolver on your data in a live demo.

Schedule a free, no-strings-attached demo to discover how Upsolver can radically simplify data lake ETL in your organization.