SQL [Week 5-10]

Advanced SQL Topics [Week 9]

Normalization

Normalization is a process in database design that organizes tables to reduce redundancy and improve data integrity. It ensures that tables are structured to minimize duplication of information and avoid anomalies during data manipulation (e.g., insertions, updates, and deletions). There are several levels of normalization, known as Normal Forms (NFs).

1NF: First Normal Form

  • Eliminate duplicate columns from the same table.
  • Ensure each table cell contains only atomic (indivisible) values, and each record must be unique.

Example:

A non-normalized table might look like this:

IDNamePhone Numbers
1John123-4567, 789-4567
2Alice321-6547, 654-7894

In 1NF, we would split the phone numbers into separate rows:

IDNamePhone Number
1John123-4567
1John789-4567
2Alice321-6547
2Alice654-7894

2NF: Second Normal Form

  • Meet all requirements of 1NF.
  • Remove subsets of data that apply to multiple rows and place them in separate tables.
  • Ensure that every non-key attribute is fully dependent on the entire primary key (in the case of composite keys).

Example:

Consider this table that violates 2NF:

OrderIDCustomerIDCustomerNameProductIDProductName
1101John201Laptop
2102Alice202Tablet

Here, CustomerName depends only on CustomerID, not OrderID. To achieve 2NF, we should split the customer data into a separate table:

Customers Table:

CustomerIDCustomerName
101John
102Alice

Orders Table:

OrderIDCustomerIDProductID
1101201
2102202

Products Table:

ProductIDProductName
201Laptop
202Tablet

3NF: Third Normal Form

  • Meet all requirements of 2NF.
  • Ensure that non-key attributes depend only on the primary key, and no transitive dependencies (i.e., a non-key attribute should not depend on another non-key attribute).

Example:

Consider the following table:

OrderIDProductIDProductNameProductPrice
1201Laptop1000
2202Tablet500

Here, ProductName and ProductPrice depend on ProductID, not OrderID. To achieve 3NF, we should move product details into a separate table:

Products Table:

ProductIDProductNameProductPrice
201Laptop1000
202Tablet500

Subqueries and Common Table Expressions (CTE)

Subquery

A subquery is a query nested inside another SQL query. Subqueries are often used in WHERE or FROM clauses to retrieve data based on the results of another query.

Example:

SELECT * 
FROM orders 
WHERE total_amount = (SELECT MAX(total_amount) FROM orders);

In this example, the subquery retrieves the maximum total_amount, and the main query retrieves all orders that have that value.

Common Table Expressions (CTE)

A CTE is a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs are more readable and reusable compared to subqueries.

Syntax:

`WITH cte_name AS (
    SELECT column1, column2
    FROM table_name
    WHERE condition
)
SELECT * 
FROM cte_name;` 

Example:


`WITH SalesCTE AS (
    SELECT employee_id, SUM(sales_amount) AS total_sales
    FROM sales
    GROUP BY employee_id
)
SELECT employee_id, total_sales
FROM SalesCTE
WHERE total_sales > 10000;` 

This CTE calculates total sales for each employee and then filters employees with sales greater than 10,000.

Window Functions

Window functions perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, window functions do not collapse the result set.

ROW_NUMBER

The ROW_NUMBER() function assigns a unique sequential integer to rows within a result set.

Syntax:


`SELECT ROW_NUMBER() OVER (ORDER BY column_name) AS row_num, *
FROM table_name;` 

Example:


`SELECT ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank, employee_name, salary
FROM employees;` 

This query assigns a unique rank to employees based on their salary, with the highest salary ranked first.

RANK and DENSE_RANK

  • RANK() assigns a rank to rows, allowing for gaps in the ranking where there are ties.
  • DENSE_RANK() assigns ranks without gaps.

Syntax (RANK):


`SELECT RANK() OVER (ORDER BY salary DESC) AS rank, employee_name, salary
FROM employees;` 

Syntax (DENSE_RANK):


`SELECT DENSE_RANK() OVER (ORDER BY salary DESC) AS rank, employee_name, salary
FROM employees;` 

LEAD and LAG

  • LAG() allows you to access data from the previous row in the result set.
  • LEAD() allows you to access data from the following row in the result set.

Example (LAG):


`SELECT employee_name, salary, 
       LAG(salary, 1) OVER (ORDER BY salary DESC) AS previous_salary
FROM employees;` 

This query shows each employee's salary along with the salary of the employee ranked just before them.

Example (LEAD):


`SELECT employee_name, salary, 
       LEAD(salary, 1) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;` 

This query shows each employee's salary along with the salary of the employee ranked just after them.

Conclusion

This week introduces crucial advanced SQL concepts such as normalization, subqueries, CTEs, and window functions. Mastering these will help you write efficient, maintainable, and optimized SQL queries for handling complex data and performance scenarios.