最後更新: 2024-03-04
介紹
Athena queries data directly from Amazon S3.
支援 Querying
- SQL
- Apache Spark
應用例子
Analysis aws waflogs from S3
目錄
- Pricing
- Analysis aws waflogs from S3
- 建立 Data Source
- 建立 Workgroup
- Usage
- WAF log 常用的 fields
- Athena create saved query
Pricing
SQL: $5.00 per TB of data scanned
- Rounded up to the nearest megabyte, with a 10 MB minimum per query
- no charges for Data Definition Language (DDL) statements like CREATE, ALTER, or DROP TABLE, or failed queries.
Apache Spark: $0.35 per DPU-hour billed per minute.
Additional costs
You are charged standard S3 rates for storage, requests, and data transfer.
If you use the AWS Glue Data Catalog with Athena, you are charged standard Data Catalog rates. (Data Source)
省錢方式
Partitioning
限制 scan data 的數量
Gzip log file
Athena 支援 scan gz file => size 細左會平左
Parquet format
Parquet is columnar
The query in question only references X column, Athena reads only that column and can avoid reading other file.
Analysis aws waflogs from S3
Create an AWS Athena table to query the AWS WAF logs stored in S3.
This allows you to perform SQL-like queries on the log data.
log file path in S3
S3 > Buckets > "Name-Of-Bucket" > AWSLogs/AWS_AC_ID/WAFLogs/cloudfront/Web-ACLs-Name/YYYY/MM/DD/hh/mm/files
- hh: 00~23
- mm: 00, 05, 10 ...
- files: ?.log.gz # 一行一 json
Data Source
data source = metadata (schema) + dataset (CSV, JSON, Parquet ...)
Athena natively supports the "AWS Glue Data Catalog"
$1.00 per 100,000 objects over a million, per month.
An object in the Data Catalog is a table, table version, partition, partition indexes, statistics or database.
$1.00 per million requests over the first million.
Some of the common requests are CreateTable, CreatePartition, GetTable , GetPartitions, and GetColumnStatisticsForTable.
建立 "Workgroups"
左邊 menu > Administration section > Data sources
- S3 - AWS Glue Data Catalog
建立 Workgroup
By default, each account has a primary workgroup and
the default permissions allow all authenticated users access to this workgroup.
The primary workgroup cannot be deleted.
Each workgroup that you create shows saved queries and query history only for queries that ran in it.
Disabling a workgroup prevents queries from running in it, until you enable it.
Queries sent to a disabled workgroup fail, until you enable it again.
Query result configuration (optional)
將 Run 完的 result save 到 S3
i.e.
建立 "Workgroups"
左邊 menu > Administration section > Workgroups
Analytics engine: Athena SQL / Apache Spark
Data usage controls(Per query data usage control)
If the query exceeds the limit, it will be cancelled (Minimum limit is 10 MB)
Partition projection
Automatically adds new partitions as new data is added.
This removes the need for you to manually add partitions by using ALTER TABLE ADD PARTITION.
Usage
0. 選: us-east-1
1. Create DB
Query editor >
CREATE DATABASE waf_logs
2. Create Tables
Define the table schema based on the log format and structure.
Tables DDL statement
https://docs.aws.amazon.com/athena/latest/ug/waf-logs.html
3. 建立並執行 SQL
行完會有 Result
Completed Time in queue: 60 ms Run time: 3.118 sec Data scanned: 52.11 MB
i.e.
# Return records from the last 24 hours
SELECT to_iso8601(from_unixtime(timestamp/1000)) AS time_ISO_8601, httpsourcename, httpsourceid, httprequest FROM waf_logs WHERE from_unixtime(timestamp/1000) > now() - interval '1' day LIMIT 10;
# Search by URL
SELECT to_iso8601(from_unixtime(timestamp/1000)) AS time_ISO_8601, httpsourceid as "CF Distribution", terminatingruleid as "Last issued rule", action as "Action", httprequest.clientip as "Client IP", httprequest.country as "Country", httprequest.uri as "Request URI", httprequest.httpmethod as "Method", httprequest.headers as "Full request header" FROM waf_logs WHERE httprequest.uri like '/stripe%'
WAF log 常用的 fields
i.e.
- "timestamp": 1700694004928
- "terminatingRuleId": "AWS-AWSManagedRulesAnonymousIpList"
- "terminatingRuleType": "MANAGED_RULE_GROUP"
- "action": "BLOCK"
- "httpSourceId": "CF 的 Distributions 名"
httpRequest # client info 的 json
- clientIp
- country
- headers
- uri
- httpMethod
- requestId # httprequest.requestId
Athena create saved query
* You cannot change the workgroup or database to which the query belongs.
Above the query editor window, on the tab for the query, choose the three vertical dots, and then choose Save as.