Category :Data Analytics
Published on: 27-02-2024
In the world of database management, mastering advanced SQL techniques can significantly enhance your ability to manipulate data efficiently. In this guide, we'll explore four powerful SQL techniques - indexing, window functions, common table expressions (CTEs), and conditional aggregation - with practical examples using a sample employee database. Let's dive in!
Let's start by creating a simple table to represent our employee database:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
department_id INT,
salary DECIMAL(10, 2)
);
Now, let's populate the table with some sample data:
INSERT INTO employees (id, name, department_id, salary) VALUES (1, 'John Doe', 1, 60000.00), (2, 'Jane Smith', 1, 65000.00), (3, 'Alice Johnson', 2, 70000.00), (4, 'Bob Brown', 2, 72000.00), (5, 'Charlie Lee', 3, 55000.00), (6, 'David Wang', 3, 58000.00);
Indexing helps speed up data retrieval by creating a sorted list of values for quick access. Let's create an index on the salary column:
CREATE INDEX idx_salary ON employees (salary);
This query creates an 'index' named 'idx_salary' on the 'salary' column of the employees table, improving the performance of queries involving filtering or sorting by salary.
With this index in place, queries filtering by salary will execute faster.
Window functions allow you to perform calculations across a set of rows related to the current row. Let's use a window function to calculate the average salary for each department:
SELECT name, department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg_salary_by_department FROM employees;
This query selects the 'name', 'department_id, and salary columns from the 'employees' table while utilizing a window function to calculate the average salary for each department. The 'AVG(salary) OVER (PARTITION BY department_id)' clause partitions the data by the department_id', enabling the calculation of the average salary within each department.
Additionally, other important window functions such as ROW_NUMBER(), RANK(), LEAD(), LAG(), SUM(), MIN(), MAX(), COUNT(), PERCENT_RANK(), and CUME_DIST() offer diverse functionalities for analytical tasks within specified partitions of a result set. These window functions enhance the analytical capabilities of SQL queries, allowing for efficient and flexible data analysis.
CTEs allow you to define temporary result sets that can be referenced multiple times within a single query. Let's use a CTE to find departments with an average salary above a certain threshold:
WITH department_avg_salary AS ( SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id )
SELECT department_id FROM department_avg_salary WHERE avg_salary > 60000;
This query utilizes a Common Table Expression (CTE) named 'department_avg_salary' to calculate the average salary for each department. Within the CTE, the 'AVG(salary)' function computes the average salary for each distinct 'department_id' by aggregating salary values from the 'employees' table. The result set is grouped by the 'department_id'. Subsequently, the main query selects the 'department_id' from the 'department_avg_salary CTE where the calculated 'avg_salary is greater than Rs. 60,000.
This query simplifies complex calculations by breaking down the calculation of average salary into a CTE, enhancing the readability and maintainability of the main query.
Conditional aggregation allows you to apply aggregate functions selectively based on specific conditions. Let's calculate the average salary for each department, excluding salaries below certain threshold:
SELECT department_id, AVG(CASE WHEN salary >= 60000 THEN salary END) AS avg_salary FROM employees GROUP BY department_id;
This query calculates the average salary for each department while excluding salaries below Rs.60,000. It utilizes a conditional aggregation technique where the 'AVG()' function is applied conditionally within a 'CASE' statement. The 'CASE WHEN salary >= 60000 THEN salary END' expression evaluates to the salary value if it meets the condition (salary >= Rs.60,000), otherwise it returns NULL. The 'AVG()' function then calculates the average of these non-NULL values for each department. The result set is grouped by the 'department_id'. This approach allows for customizing aggregates based on specific conditions, enabling tailored data analysis. It provides insights into the average salary distribution across different departments, excluding lower salary values, thus facilitating informed decision-making regarding departmental resources and compensation structures.
This query calculates the average salary for each department, excluding salaries below Rs.60,000.
In conclusion, mastering advanced SQL techniques such as indexing, window functions, common table expressions (CTEs), and conditional aggregation is highly beneficial for data analysts. By honing these skills, data analysts can extract valuable insights more efficiently from complex datasets. In real-world scenarios, In22labs extensively utilizes these techniques. We leverage indexing to speed up data retrieval, window functions to analyze data contextually, CTEs to simplify complex queries, and conditional aggregation to tailor aggregations based on specific conditions. These practices enhance decision-making and contribute to business success by facilitating faster and more intelligent data analysis.
Tags
Author
Kaviarasan G