Athena or Redshift? 4 Questions to Decide

Athena or Redshift

The following article is part of our free Amazon Athena resource bundle. Read on for the excerpt, or get the full education pack for FREE right here.

With the near-ubiquity of Amazon Web Services in the cloud space, it’s easy to forget how new many of these services are. Redshift has only been commercially available since 2013; Amazon Athena has been around less than three years. The rapid release of new databases and analytics tools – both by AWS and newer players such as Snowflake – can leave software architects baffled as to which stack they should turn to in order to solve a specific business scenario.

“Tool X vs Tool Y” comparisons usually start and end with a cop-out stating that everything depends on your particular circumstances, there is no single correct answer, etc. (we’ve done it too when we compared Kafka to Kinesis.) Like most cliches, this one is also true; technology decisions are rarely black and white and if two platforms are commonly used, it’s often because each can be useful for some users in some scenarios.

Nevertheless, there are some high-level factors you can look at to help you gauge which tech could be the more relevant for your situation. In this article, we’ll try to do exactly that – help you identify the use cases where you’d use Redshift for your analytic workloads, versus those where you’d lean more towards Athena. We’ll try to do that with 2 sentences, 4 questions, or a detailed feature-by-feature breakdown.

In this article you will learn:

Redshift and Athena from a Bird’s Eye View 

Once you understand the differences between an enterprise data warehouse such as Redshfit and an interactive query service such as Athena, you can identify the scenarios in which one will be more useful than the other

Let’s give the 30 second overview of each one:

Amazon Athena

  • Serverless query service – AWS manages all infrastructure aspects
  • Works on top of Amazon S3 object storage
  • Typically only read-only and does not alter data on S3; however query results can be written to S3
  • Uses ANSI SQL and Presto to query data
  • Utilizes Amazon Glue as a Hive Metastore

Amazon Redshift

  • Cloud data warehouse based on PostgreSQL
  • Composed of leader nodes and compute nodes. Users can scale by adding compute nodes as needed
  • Coupled architecture (storage and compute); serverless functionality available via Redshift Spectrum

There’s a metric ton more that can be said about each of these tools – the AWS website would be a good place to start if you want to further your research (Redshift, Athena). However, for the purposes of this article, these are the important aspects you need to understand to answer the question of…

Redshift or Athena? The super-short answer

Let’s start with the simplest, most basic way to approach the question of where to analyze your cloud data.

  • Redshift is faster and more robust (as long as you’re willing to add computer power), but also more expensive and complicated to manage
  • Athena makes it is easier to get started and is more flexible in the types of data it can query, but performance is not guaranteed without significant data preparation

From here it’s a question of how much data you have, what your use case is and how much you are willing to invest in money and human hours. 

Let’s go a little bit more in-depth with…

4 Questions to decide where to run your analytic workloads

The caveat from above still applies – your mileage may vary, every organization and every architecture is different. However, here are the key factors to consider when deciding between Redshift and Athena:

1. What is your use case?

Redshift is the more natural choice for data warehouse reporting, Athena for ad-hoc queries against S3 storage

Redshift would be the better choice if you have data coming in from diverse sources and you would like to transform that data, enforce consistent schema and formats, and create a single source that feeds a large number of business intelligence and reporting tools across the organization (see the AWS Athena FAQ). 

If you’re running the same query every hour or day to power a BI dashboard, and the structure of the underlying data stays consistent, Redshift might be the better choice – especially if you’re willing to invest in infrastructure to ensure consistent performance come hell or high water. 

Athena, on the other hand, might be your go-to for an ad-hoc query – most queries will just work, even on semi-structured or unstructured data, and they will return results without extensive tuning. However, if you are running your analytic workflows at higher scales, you will need to apply a comprehensive set of best practices to improve Athena performance.

2. Do you have the human resources to manage infrastructure?

Redshift requires DBA resources to manage and resize clusters; with Athena there is no infrastructure to manage 

While both Athena and Redshift are offered as managed services by AWS, Redshift still requires non-negligible effort to initialize and configure clusters (last year’s release of Elastic Resize is meant to streamline this process). If data volumes and retention policies are in constant flux – which would often be the case if you work with high volumes of streaming data – you will need to invest significant human resources in cluster management. Additional time needs to be spent loading the data into Redshift.

Athena, on the other hand, is by definition serverless – there is no infrastructure to manage except object storage on Amazon S3, which scales virtually infinitely. This removes the reliance on DBA resources, but also means you can’t add compute resources to improve performance when data loads increase.

Redshift Spectrum, released in late 2017, is an attempt to bridge the gap by providing a serverless extension to Redshift that allows users to join data from S3 with Redshift tables, without managing any infrastructure. However, Spectrum is only available if you already have a Redshift cluster, and some consider it to be less mature than Athena (see this related discussion on Stack Overflow).

3. How much are you willing to invest in performance?

Redshift is more expensive as you are paying for both storage and compute, compared to Athena’s decoupled architecture.

A highly optimized Redshift cluster with sufficient compute resources will most likely return results faster than the same query in Athena. However, getting to that stage could cost you an arm and a leg.

Since Redshift is a database, you are paying per node, and the size of your cluster will greatly depend on the volume and velocity of the data you’re dealing with. This can get prohibitively expensive at higher scales, forcing you to either devote increasingly large budgets to your cloud infrastructure or playing a game of cat-and-mouse with your data retention policy (a telltale sign that you should move to a big data architecture). This is without even discussing the software and human costs of data transformation, which can be even more significant.

Athena runs directly on top of S3 and is priced per terabyte of data scanned. This decoupled architecture allows you to leverage inexpensive storage on Amazon S3 which should not make much of a dent in your AWS bill. Querying costs can also be significantly reduced by means of effective data preparation for Athena, which can also have an impact on performance. 

4. How structured is your data?

Redshift is a relational database and best suited for tabular data; Athena is better for semi-structured and unstructured data

As an RDBMS, Redshift stores data in tables and enforces schema-on-write. This makes it better suited for structured data that is ingested already in tabular format, which would often be the case with business application sources such as CRM or HR systems. For these sources, the ETL effort is less significant and there are plenty of tools available to help you get the job done.

However, if you are working with unstructured or semi-structured data – such as raw event data captured from clickstreams, IoT sensors, or online advertising – this ETL becomes a much more acute problem, requiring complex coding in Spark/Hadoop, workflow orchestration and more. In these cases it might make more sense to store schemaless data on S3 first, and solve the ETL problem once you actually have your analytic use case figured out.

This is not to say you cannot store semi-structured data in Redshift – it’s just a question of whether you are willing to invest this significant ETL effort on ingest. If not, Athena might be the better choice as it queries S3 directly, and can handle complex data types, including nested JSON.

Feature Breakdown

Now that you are hopefully able to place your scenario broadly within the scope of one tool or the other, you can use the following table and resources to continue comparing Redshift and Athena on a feature-by-feature basis.



Need to initialize and config cluster, and load data


Instant – no need to setup infrastructure



Does not support table partitioning; CREATE TABLE is used to define distribution and optimize processing


Data partitions on Hive metastore correspond to folders on S3, with support for custom partitioning



Users can create user-defined functions using an SQL SELECT clause or Python


Does not support UDFs or UDAFs



– Dense Compute: $0.25 per hour for dc2.large or $4.80 per hour for dc2.8xlarge

– Dense Storage: $0.85 per hour for ds2.xlarge or $6.80 per hour for ds2.8xlarge


– Price per query: $5 per terabyte scanned

– Minimum 10mb scanned per query

Storage on Amazon S3

Primary Keys


No-build in constraint on primary key; needs to be declared at onset


No indexing, as it works on top of S3; no primary keys

Data formats


AVRO, CSV, JSON, Parquet, ORC, and TXT.


CSV, TSV, JSON or text files; highly recommended to store data in columnar format such as Apache Parquet or Apache ORC



Cluster-level security with support for encryption


Relies on Amazon Identity Management (IAM); access to data based on S3 permissions; can read encrypted data from S3

Further Reading

Want to master data engineering for Amazon Athena? Get the free resource bundle:

Learn everything you need to build performant cloud architecture on Amazon S3 with our ultimate Amazon Athena pack, including:

– Ebook: Partitioning data on S3 to improve Athena performance

– Recorded Webinar: Improving Athena + Looker Performance by 380%

– Recorded Webinar: 6 Must-know ETL tips for Amazon Athena

– Athena compared to Google BigQuery + performance benchmarks

And much more. Get the full bundle for FREE right here.

Get email updates

Share with your friends

Learn more about Upsolver

Visit our homepage
data lake ETL Demo

Start for free with the Upsolver Community Edition.

Build working solutions for stream and batch processing on your data lake in minutes.

Get Started Now