VPC Flow Logs Analysis in Athena: A Step-by-Step Guide

How to Analyze VPC Flow Logs for Security

In this article we will go through the technical side of analyzing AWS VPC flow logs for security and intrusion. We will go through the definition of the VPC flow logs and what they are. We’ll also list some use cases of how you can use flow logs to secure your instances.

We cover the data ingestion and preparation process here: Analyzing Amazon VPC Flow Logs Using SQL

We’ll discuss ways to defend instances from malicious software and unauthorized traffic. Then we will explain how to enable VPC flow logs and how to access them through CloudWatch.

Also, we’ll go through some use cases for analyzing VPC flow logs via Amazon Athena. It is an easy and interactive query service perfect for analyzing flow logs with standard SQL.

What Are VPC Flow Logs?

Amazon VPC flow logs allow you to track and analyze all the IP addresses coming in and out from the network interface in the VPC. So flow logs can work as the main source of information to the network in your VPC. From VPC flow logging information, you get the traffic flows within VPC, subnets, or ENIs. Flow logs are collected outside your network traffic, which doesn’t affect your network performance.

Every flow log is published to Amazon CloudWatch Logs to easily retrieve its data and help monitor the traffic for your instance and determine the direction of this traffic and what you can do with it.

Using a flow log, you can analyze the region where you get the most traffic and detect if specific traffic is not properly connected and reaches the instanc

Before the introduction of VPC flow logs, customers collected network flow logs by installing agents on their instances, limiting them in terms of how  they could view the network flows.

Benefits of VPC Flow Logs Analysis

Flow logs can help you analyze incoming network requests and decide whether to accept or reject them – improving Access Control List rules. Using VPC flow logs, you can create alerts for unauthorized IPs and destination port redirects, which could indicate malicious software trying to access your network.

By analyzing VPC flow logs, we can detect threats by monitoring port scanning. Also, we monitor the traffic flows to build confidence between ACLs. We also can use the flow logs to diagnose and troubleshoot the connection issues.

It is important  to understand the difference between security groups and network ACLs; in security groups, it acts as a firewall application that allows network traffic to go in and out. Network ACLs act as a network firewall from subnets that controls the traffic movement.

You can monitor remote logins from ports like SSH and RDP. These ports can only be accessed from trusted sources. So using flow logs, we can analyze these ports to maximize security and detect suspicious activities.

Analyzing VPC Flow Logs for Security in Athena – Step by Step

Before beginning, you have to access your VPC flow logs. You can access them through the CloudWatch interface, then select the log group, then the log stream to view it

After creating the flow logs, gaining access to them, and publishing them to Amazon CloudWatch Logs, you can analyze these logs using Amazon S3 to provide scalability. Let’s start by configuring amazon Athena to query data to try different security scenarios.

Step1

Copy this DDL code into the Athena console

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
version int,

account string,

interfaceid string,

sourceaddress string,

destinationaddress string,

sourceport int,

destinationport int,

protocol int,

numpackets int,

numbytes bigint,

starttime int,

endtime int,

action string,

logstatus string
)

PARTITIONED BY (dt string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION 's3://example_bucket/prefix/AWSLogs/{subscribe_account_id}/vpcflowlogs/{region_code}/'

TBLPROPERTIES ("skip.header.line.count"="1");

Step 2

Change location s3://example_bucket/prefix/AWSLogs/{subscribe_account_id}/vpcflowlogs/{region_code}/ to the address of the log you want to analyze.

Step 3

Now, you need to run the above query in the Athena console, which will register a table called vpc_flow_logs.

Now, there are many use cases scenarios for analyzing flow logs in security.

For example, to monitor SSH and RDP traffic. RDP is used for Windows while SSH is for AWS linux instances. We run the query below to get the activity on the SSH and RDP ports. Here 22 is the port of SSH, while 3389 is for RDP.

SELECT
*
FROM vpc_flow_logs
WHERE
sourceport in (22,3389) 
OR
destinationport IN (22, 3389)
ORDER BY starttime ASC

Another scenario is to monitor traffic on web app ports. Let’s assume that your application serves requests on a specific port. By applying these queries you get the top 10 IP addresses that are transferred.

SELECT
ip,
sum(bytes) as total_bytes
FROM (
SELECT
destinationaddress as ip,
sum(numbytes) as bytes
FROM vpc_flow_logs
GROUP BY 1
UNION ALL
SELECT
sourceaddress as ip,
sum(numbytes) as bytes
FROM vpc_flow_logs
GROUP BY 1
)
GROUP BY ip
ORDER BY total_bytes DESC
LIMIT 10

Now suppose you want to check the servers which have the highest number of HTTPS requests. We use this query as it counts the number of packets received on HTTPS port 443.

SELECT SUM(numpackets) AS
  packetcount,
  destinationaddress
FROM vpc_flow_logs
WHERE destinationport = 443 AND date > current_date - interval '7' day
GROUP BY destinationaddress
ORDER BY packetcount DESC
LIMIT 10;

Conclusion

By increasing the popularity of AWS environments, it becomes more complex, which requires more enhanced tools and techniques. With these tutorials, we explained the VPC flow logs and how to analyze them to track the traffic on your instance for better data security management and detecting malicious software and events, which helps teams identify and fix them.

For more information about VPC flow logs, visit Amazon’s official page for VPC flow logs to learn more about it, troubleshooting, and how to publish on Amazon S3 and CloudWatch. It’s very useful documentation to get started using the flow logs in your system.

Published in: Blog , Use Cases
Upsolver Team
Upsolver Team

Upsolver enables any data engineer to build continuous SQL data pipelines for cloud data lake. Our team of expert solution architects is always available to chat about your next data project. Get in touch

Keep up with the latest cloud best practices and industry trends

Get weekly insights from the technical experts at Upsolver.

Subscribe

Templates

All Templates

Explore our expert-made templates & start with the right one for you.