Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second semester Altschool examination submission #1

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 65 additions & 52 deletions infra_setup/init.sql
Original file line number Diff line number Diff line change
@@ -1,60 +1,73 @@

-- Create schema
CREATE SCHEMA IF NOT EXISTS ALT_SCHOOL;


-- create and populate tables
create table if not exists ALT_SCHOOL.PRODUCTS
(
id serial primary key,
name varchar not null,
price numeric(10, 2) not null
);

create table
if not exists ALT_SCHOOL.PRODUCTS (
id serial primary key,
name varchar not null,
price numeric(10, 2) not null
);

COPY ALT_SCHOOL.PRODUCTS (id, name, price)
FROM '/data/products.csv' DELIMITER ',' CSV HEADER;
FROM
'/data/products.csv' DELIMITER ',' CSV HEADER;

-- setup customers table following the example above

-- TODO: Provide the DDL statment to create this table ALT_SCHOOL.CUSTOMERS

-- TODO: provide the command to copy the customers data in the /data folder into ALT_SCHOOL.CUSTOMERS



-- TODO: complete the table DDL statement
create table if not exists ALT_SCHOOL.ORDERS
(
order_id uuid not null primary key,
-- provide the other fields
);


-- provide the command to copy orders data into POSTGRES


create table if not exists ALT_SCHOOL.LINE_ITEMS
(
line_item_id serial primary key,
-- provide the remaining fields
);


-- provide the command to copy ALT_SCHOOL.LINE_ITEMS data into POSTGRES


-- setup the events table following the examle provided
create table if not exists ALT_SCHOOL.EVENTS
(
-- TODO: PROVIDE THE FIELDS
);

-- TODO: provide the command to copy ALT_SCHOOL.EVENTS data into POSTGRES







create table
if not exists ALT_SCHOOL.CUSTOMERS (
customer_id uuid primary key,
device_id uuid NOT NULL,
"location" varchar(255) NOT NULL,
currency varchar(10) NULL
);

-- copy the customers data in the /data folder into ALT_SCHOOL.CUSTOMERS
COPY ALT_SCHOOL.CUSTOMERS (customer_id, device_id, "location", currency)
FROM
'/data/customers.csv' DELIMITER ',' CSV HEADER;

create table
if not exists ALT_SCHOOL.ORDERS (
order_id uuid not null primary key,
customer_id uuid not null,
"status" varchar(50) not null,
checked_out_at timestamp not null
);

-- copy orders data into POSTGRES
COPY ALT_SCHOOL.ORDERS (order_id, customer_id, "status", checked_out_at)
FROM
'/data/orders.csv' DELIMITER ',' CSV HEADER;

create table
if not exists ALT_SCHOOL.LINE_ITEMS (
line_item_id serial primary key,
order_id uuid NOT NULL,
item_id int8 NOT NULL,
quantity int8 NOT NULL
);

-- copy ALT_SCHOOL.LINE_ITEMS data into POSTGRES
COPY ALT_SCHOOL.LINE_ITEMS (line_item_id, order_id, item_id, quantity)
FROM
'/data/line_items.csv' DELIMITER ',' CSV HEADER;

-- setup the events table
create table
if not exists ALT_SCHOOL.EVENTS (
event_id serial primary key,
customer_id uuid NOT NULL,
event_data jsonb NOT NULL,
event_timestamp timestamp NOT NULL
);

-- copy ALT_SCHOOL.EVENTS data into POSTGRES
COPY ALT_SCHOOL.EVENTS (
event_id,
customer_id,
event_data,
event_timestamp
)
FROM
'/data/events.csv' DELIMITER ',' CSV HEADER;
239 changes: 239 additions & 0 deletions questions/answers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
-- Question 2a.1
/*
what is the most ordered item based on the number of times it appears in an order cart that checked out successfully?

To get the most ordered item, created 2 CTEs, most_ordered_items and most_ordered_items_rank.
1. most_ordered_items: in this CTE, only product that are successfully checked out are selected. It is also worth nothing that the sum of 'quantity' of each product is used to determine
the number of times it appears in an order cart that checked out successfully.
2. most_ordered_items_rank: in this CTE, the rank() function is used to rank the products based on the number of times they appear in an order cart that checked out successfully.
The rank is done in descending order, finally, filter from this CTE where the row_rank is equal to 1.

Note: Do not use order by and limit in the final query, as it will not work in the case of ties.

*/

with
most_ordered_items as (
select
id as product_id,
name as product_name,
sum(quantity) as num_times_in_successful_orders
from
alt_school.orders o
join alt_school.line_items li using (order_id)
join alt_school.products p on li.item_id = p.id
where
status = 'success'
group by
id,
name,
status
),
most_ordered_items_rank as (
select
*,
rank() over (
order by
num_times_in_successful_orders desc
) row_rank
from
most_ordered_items
)
select
product_id,
product_name,
num_times_in_successful_orders
from
most_ordered_items_rank
where
row_rank = 1;

-- Question 2a.2
/*
without considering currency, and without using the line_item table, find the top 5 spenders

To find the top 5 spenders, created 3 CTEs, order_quantity, spender, and spender_rank.
1. order_quantity: in this CTE, only customers who successfully checked out are selected.
2. spender: in this CTE, the total spend of each customer is calculated by multiplying the quantity of each product by the price of the product.
3. spender_rank: in this CTE, the rank() function is used to rank the customers based on the total spend.
The rank is done in descending order, finally, filter from this CTE where the row_rank is less than or equal to 5.

Note: Do not use order by and limit 5 in the final query, as it will not work in the case of ties.
*/
with
order_quantity as (
select
customer_id,
e.event_data ->> 'event_type' as event_type,
e.event_data ->> 'item_id' as item_id,
e.event_data ->> 'quantity' as quantity
from
alt_school.events e
where
customer_id in (
select distinct
customer_id
from
alt_school.events e
where
e.event_data ->> 'status' = 'success'
)
and e.event_data ->> 'event_type' not in ('checkout', 'visit')
),
spender as (
select
customer_id,
sum(cast(quantity as integer) * price) as total_spend
from
order_quantity o
join alt_school.products p on cast(o.item_id as integer) = p.id
where
quantity is not null
group by
customer_id
),
spender_rank as (
select
customer_id,
location,
total_spend,
rank() over (
order by
total_spend desc
) ROW_RANK
from
spender
join alt_school.customers using (customer_id)
)
select
customer_id,
location,
total_spend
from
spender_rank
where
row_rank <= 5;


-- Question 2b.1
/*
Determine the most common location (country) where successful checkouts occurred
To get the most common location where successful checkouts occurred, created 2 CTEs, location_count and location_count_rank.
1. location_count: in this CTE, only successful checkouts are selected. The count of successful checkouts is grouped by location.
2. location_count_rank: in this CTE, the rank() function is used to rank the locations based on the number of successful checkouts.
The rank is done in descending order, finally, filter from this CTE where the row_rank is equal to 1.
*/

with
location_count as (
select
location,
count(1) as checkout_count
from
alt_school.events e
join alt_school.customers using (customer_id)
where
e.event_data ->> 'event_type' = 'checkout'
and e.event_data ->> 'status' = 'success'
group by
location
),
location_count_rank as (
select
location,
checkout_count,
rank() over (
order by
checkout_count desc
) row_rank
from
location_count
)
select
location,
checkout_count
from
location_count_rank
where
row_rank = 1;


-- Question 2b.2
/*
Identify the customers who abandoned their carts and count the number of events (excluding visits) that occurred before the abandonment

To identify the customers who abandoned their carts, created a CTE.
1. event_group: in this CTE, the count of all events, grouped by customer_id, event_type, and status.
2. In the final query, get the unique list of customers (id) who successfully checked out (a subquery is used to get this list).
filter by the result of the subquery, by excluding customers who successfully checked out, the result will be customers who abandoned their carts.
In addition, exclude events of type 'visit' from customers who abandoned their carts.
Sum the count of remaining events that occurred grouped by customer_id, this result to the num_of_events before the abandonment.

*/
with
event_group as (
select
customer_id,
e.event_data ->> 'event_type' as event_type,
e.event_data ->> 'status' as status,
count(1) as event_count
from
alt_school.events e
group by
customer_id,
e.event_data ->> 'event_type',
e.event_data ->> 'status'
)
select
customer_id,
sum(event_count) as num_events
from
event_group
join alt_school.customers c using (customer_id)
where
customer_id not in (
select distinct
customer_id
from
event_group
where
event_type = 'checkout'
and status = 'success'
)
and event_type != 'visit'
group by
customer_id;

-- question 2b.3
/*
Find the average number of visits per customer, considering only customers who completed

To find the average number of visits per customer, created a CTE.
1. event_group: in this CTE, the count distinct event of all event_data[timestamp] events, grouped by customer_id.
filter for customers who successfully checked out by using a subquery returns a list, and considering only event_type is 'visit'.
In the final query, get the average of the event_count.
*/
with
event_group as (
select
customer_id,
count(distinct e.event_data ->> 'timestamp') as event_count
from
alt_school.events e
where
customer_id in (
select distinct
customer_id
from
alt_school.events e
where
e.event_data ->> 'status' = 'success'
)
and e.event_data ->> 'event_type' = 'visit'
group by
customer_id
)
select
round(avg(event_count), 2) as average_visits
from
event_group;