A Step-By-Step Guide to Data Management and Analysis Using AWS: S3, Athena, and QuickSight

Varun Kumar Manik
6 min readSep 1, 2023

--

About the Event

This blog post is inspired by my recent participation in the International Symposium on Multidisciplinary Approach for Data Management, organized by the Amity School of Engineering & Technology and Amity Business School, Amity University Uttar Pradesh, Lucknow Campus. The symposium brought together experts from various fields to discuss the challenges and opportunities in data management. I was honored to be invited as a speaker to give a lecture and demo on data management and analysis using AWS services. This blog post is a detailed walkthrough of the demo I presented at the symposium.

Introduction

In the era of digital transformation, businesses are inundated with data, making effective data management and analysis more crucial than ever. Companies need robust and scalable solutions to store, manage, and analyze this data to glean valuable insights. Amazon Web Services (AWS) offers a suite of serverless microservices with modern architecture that provides a comprehensive solution for these tasks.

Data management and analysis are key components in today’s digital world. Companies generate enormous amounts of data daily and require robust tools to store, manage, and analyze this data effectively. Amazon Web Services (AWS) offers a range of powerful tools for these tasks. This blog post aims to walk you through a detailed demo showcasing the seamless integration of Amazon S3, Athena, and QuickSight for an end-to-end data management solution with high availability and scalability.

Overview of AWS Services Covered in the Demo

  • Amazon S3: A highly scalable storage service to store and retrieve any amount of data.
  • Amazon Athena: An interactive query service that enables SQL queries directly on data stored in S3.
  • Amazon QuickSight: A business intelligence tool for creating and publishing interactive dashboards.
aws data management tools

Creating a Sample Data File in Amazon S3 (Data Storage)

First, let’s create a CSV file named sample_data.csv with the following sample data:

customer_id,first_name,last_name,age,country
1,John,Doe,30,USA
2,Jane,Doe,28,Canada
3,Jim,Beam,35,UK
4,Jack,Daniels,40,USA
5,Aarav,Kumar,25,India
6,Aanya,Sharma,32,India
7,Aditi,Patel,28,India
8,Aditya,Chopra,37,India
9,Akshay,Verma,29,India
10,Alisha,Shah,24,India
11,Aman,Agarwal,31,India
12,Amar,Singh,30,India
13,Amrita,Reddy,27,India
14,Anika,Gupta,33,India
15,Anil,Joshi,40,India
16,Anisha,Rao,26,India
17,Anuj,Mishra,29,India
18,Anushka,Desai,28,India
19,Arjun,Patel,35,India
20,Avani,Shah,31,India
21,Chirag,Kumar,34,India
22,Deepa,Singh,29,India
23,Divya,Choudhury,27,India
24,Esha,Mehra,26,India
25,Gaurav,Verma,36,India
26,Harshita,Saxena,28,India
27,Ishaan,Sharma,25,India
28,Jyoti,Agarwal,30,India
29,Kabir,Chopra,32,India
30,Kavya,Singh,23,India
31,Madhav,Reddy,38,India
32,Mahika,Gupta,29,India
33,Manish,Joshi,41,India
34,Meera,Rao,27,India
35,Mohit,Mishra,30,India
36,Neha,Desai,28,India
37,Nikhil,Patel,36,India
38,Nisha,Shah,32,India
39,Nitish,Kumar,29,India
40,Pooja,Singh,26,India
41,Pranav,Choudhury,24,India
42,Priya,Mehra,37,India
43,Rahul,Verma,28,India
44,Riya,Saxena,31,India
45,Rohan,Sharma,33,India
46,Ruchi,Agarwal,30,India
47,Sahil,Chopra,27,India
48,Sakshi,Singh,25,India
49,Samir,Reddy,39,India
50,Sanya,Gupta,29,India
51,Sarika,Joshi,42,India
52,Shreya,Rao,28,India
53,Siddharth,Mishra,31,India
54,Sneha,Desai,29,India
55,Sourav,Patel,37,India
56,Swati,Shah,33,India
57,Tanvi,Kumar,30,India
58,Varun,Choudhury,26,India
59,Vidya,Mehra,24,India
60,Vikas,Verma,38,India
61,Vrinda,Saxena,29,India
62,Yash,Sharma,34,India
63,Yamini,Agarwal,31,India
64,Yuvraj,Chopra,27,India
65,Zara,Singh,25,India
67,Michael,Smith,32,USA
68,Emily,Johnson,28,USA
69,David,Brown,45,USA
70,Lisa,Miller,29,USA
71,James,Anderson,37,USA
72,Sarah,Williams,24,USA
73,Robert,Jones,31,USA
74,Mary,Davis,40,USA
75,John,Wilson,26,USA
76,Jennifer,Moore,35,USA
77,William,Taylor,28,USA
78,Linda,Clark,33,USA
79,Charles,Hall,30,USA
80,Patricia,White,39,USA
81,Matthew,Lee,27,USA
82,Ashley,Harris,36,USA
83,Joseph,Martin,34,USA
84,Jessica,King,25,USA
85,Daniel,Johnson,41,USA
86,Karen,Miller,32,USA
87,Mark,Thompson,29,USA
88,Elizabeth,Thomas,28,Canada
89,Andrew,Lee,35,Canada
90,Kimberly,Wong,31,Canada
91,Joshua,Chan,27,Canada
92,Amanda,Li,24,Canada
93,Christopher,Wilson,39,Canada
94,Michelle,Smith,30,Canada
95,Ryan,Brown,38,Canada
96,Nicole,Johnson,26,Canada
97,Justin,Williams,33,Canada
98,Melissa,Jones,37,Canada
99,Daniel,Nguyen,29,Canada
100,Emily,Tran,34,Canada
101,Brandon,Evans,27,Canada
102,Sophia,Garcia,31,UK
103,Oliver,Smith,28,UK
104,Amelia,Johnson,45,UK
105,Harry,Williams,32,UK
106,AVA,Jones,29,UK

Uploading Data to Amazon S3

  1. Open the AWS Management Console and navigate to the S3 service.
  2. Create a new S3 bucket and note its name for later use.
  3. Upload the sample_data.csv file into the bucket.

Data Analysis with Amazon Athena

Creating a New Table in Athena

To create a data source, database, and table using the Amazon Athena query, you can follow these steps:

  1. Open the Amazon Athena console in the AWS Management Console.
  2. In the query editor, you will first need to create a database. Run the following query to create a database:
CREATE DATABASE IF NOT EXISTS amity_data_symposium;

3. Once the database is created, you can switch to the database by selecting it from the drop-down list in the Athena console or by running the following query:

USE amity_data_symposium;

Now you can create the table by running the query you have. Make sure the S3 bucket and folder path is correct:

CREATE EXTERNAL TABLE IF NOT EXISTS sample_data (
customer_id INT,
first_name STRING,
last_name STRING,
age INT,
country STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://amity-university-data-symposium-2023/'
TBLPROPERTIES (
'skip.header.line.count'='1'
);

That’s it! You have created a data source (Amazon S3 bucket), a database, and a table in Amazon Athena.

Please make sure to replace the s3://amity-university-data-symposium-2023/ with the correct path where your sample_data.csv file is stored. Also, ensure that the AWS account has the necessary permissions to access Athena and the S3 bucket.

Running a Sample Query

Your database and table names would be amity_data_symposium and sample_data respectively.

So your query would look like this:

SELECT * FROM “amity_data_symposium”.”sample_data” LIMIT 10;

This query will retrieve the first 10 rows from your sample_data table in the amity_data_symposium database.

Data Visualization with Amazon QuickSight

  1. Navigate to Amazon QuickSight and create a new analysis.
  2. Choose Athena as your data source and connect to the sample_data table.
  3. Create a bar chart visualization to display the count of customers by country.

After connecting to Amazon QuickSight and selecting your Athena table as the data source, you can perform a variety of analyses and create different types of visualizations based on your data.

Given your sample data, here are a few types of analysis and visualizations you might consider:

  1. Bar Chart: This could be used to visualize the count of customers from each country. For example, how many customers are from the USA, Canada, UK, etc.
  2. Pie Chart: This could also be used to represent the distribution of customers by country. Each slice of the pie would represent a different country.
  3. Line Chart: If your dataset includes a time dimension (e.g., the date when each customer joined), you could use a line chart to show the trend of new customers joining over time.
  4. Table: A table could be used to simply display the data, perhaps sorted by age or another interesting dimension.

Remember that the types of analysis and visualizations you can do depend on your dataset. If your dataset includes more fields, especially numerical or time-based fields, then you could create more complex visualizations like heat maps, scatter plots, or time series analyses.

In QuickSight, you can also perform additional data preparation tasks like filtering, aggregating, and creating calculated fields which can be used to create more advanced visualizations.

Conclusion and Takeaways

This demo showcases the potential of AWS in handling various data management tasks. Amazon S3 allows you to securely store data, Athena offers powerful query capabilities, and QuickSight enables you to visualize your data for better insights. These services are fully scalable and can be combined to build complex data management and analysis solutions.

I would like to extend my sincere thanks to Dr. Anuradha Misra, who was my teacher back in 2014 and is currently serving as an Assistant Professor, at Amity University Lucknow Campus. It’s an honor to contribute to the International Symposium on Multidisciplinary Approach for Data Management under her esteemed guidance.

--

--

Varun Kumar Manik

AWS APN Ambassador | SME of DevOps DevSecOps | Cloud Architect & Trainer | Blogger | Youtuber |Chef