Performance Analysis & Reporting for Adventure Works
The AdventureWorks database is a fictional dataset designed by Microsoft to simulate a real-world business scenario for a global manufacturing company specializing in bicycles, components, and accessories.
It encompasses extensive, interconnected tables spanning sales, production, purchasing, customer information, and human resources, making it ideal for comprehensive data analysis projects.
I designed this project by proposing a meaningful business question, along with several ad-hoc inquiries that fall under its umbrella, in order to:
Demonstrate my skills in SQL querying for data extraction, operating in the SQL Server Management Studio;
Demonstrate my proficiency in data reporting via Power BI, which involves using Power Query for data cleaning and transformation, data modeling, and creating interactive visualizations.
Let’s dive in…
Let’s assume the overarching business question I am tasked to look into is:
“Which product categories and subcategories are driving the highest sales and profit margins across different regions, and how can we optimize inventory and marketing efforts based on historical trends?”
And I am brought in as an outside analytics consultant, provided with access to client’s database, tasked to look into the inquiry.
The database is properly normalized following a snowflake schema and branches out in three major directions:
Human resources & Person;
Production & Purchasing;
Sales;
The scope of the present project fortunately do not require us to leverage all 80+ tables available, but to only focus on the Sales & Products related dimensions and facts.
Relationships between key tables are visualized as above. Let’s briefly walk through them one by one…
-
high-level details for each sales transaction, including order dates, statuses, and financial summaries like total amounts, taxes, and freight charges. It links orders to customers, salespeople, shipping methods, and geographic territories, providing comprehensive tracking and reporting capabilities. Additionally, it includes timestamps and unique identifiers for data integration and auditing purposes.
-
line-item-level details for each sales order, including product IDs, quantities, unit prices, and discounts applied. It links to the Sales Order Header table via a unique identifier, allowing a breakdown of each order into individual products or services. This table is essential for tracking itemized sales data, calculating order totals, and analyzing product-level performance.
-
high-level details for each sales transaction, including order dates, statuses, and financial summaries like total amounts, taxes, and freight charges. It links orders to customers, salespeople, shipping methods, and geographic territories, providing comprehensive tracking and reporting capabilities. Additionally, it includes timestamps and unique identifiers for data integration and auditing purposes.
-
organize products into a hierarchical structure for better classification.
ProductCategory contains broad categories of products, such as "Bikes" or "Clothing," with each category assigned a unique ID and name.
ProductSubcategory provides more granular classification within each category, linking subcategories to their parent categories using a foreign key and including details like subcategory ID and name.
-
Currency: The Currency table stores information about the currencies used in transactions, including currency codes and their corresponding names.
TaxRate: The TaxRate table defines tax percentages applicable to specific regions or products based on tax authorities.
SpecialOffer: The SpecialOffer table contains promotional discounts, including descriptions, discount percentages, and validity dates for marketing campaigns.
Territory: The Territory table organizes sales regions, linking territories to country or region groups and providing details like region codes and sales goals.
Before I can properly tackle the ad hoc queries and overarching business question, three tasks were done to achieve a thorough understanding of the database and its subsidiaries. The tasks were:
Understanding Features Contained By Each Table, achieved by:
Running a few “SELECT * FROM”s;
Inspecting the “Columns” folder under each table in Object Explorer.
3. Ensuring I have the correct formula for calculation of sales profits, achieved by:
Joining relevant tables;
Iteratively experiment with including different variables to determine the correct formula for calculating total sales;
2. Understanding relationships between tables and the appropriate joining keys, achieved by generating the table schema shown previously
I was eventually able to rule out irrelevant variables (e.g. currency exchange rate, volume discount, freight, tax), and conclude that in this specific case:
Contribution Margin =
( Unit Price * (1 - Unit Price Discount) - Unit Cost ) * Order Quantity
With that, I feel comfortable moving forward and build queries to answer specific ad hoc inquiries. Let’s look at a few examples…
Q1: Generate a report that shows the customers who placed orders in consecutive years, as well as their total revenue increase or decrease between those years.
Solution:
Group all individual orders by customer IDs and year of order placement;
Sum up the order sizes (TotalDue) to create a column of total yearly consumption;
Perform a self-join using the above CTE;
Get both the yearly consumption of a customer in any given year, plus their consumption in the corresponding prior year.
Q2: Generate a report showing the product name, year, quantity, and price for the first year of every product sold.
Solution:
Group all order line records by SKU and year of the order date;
Use a window function to assign a ranking to each year that a given SKU was sold in ascending order;
Sum up the quantity ordered and amount paid (line total);
Using the above CTE, filter for rows where year sold has a ranking of one, which equates to the first year when a SKU was sold.
Q3