SQL Basics
SQL DISTINCT
Removing Duplicates with DISTINCT
DISTINCT removes duplicate rows, with DISTINCT ON available in PostgreSQL for specific columns.
Introduction to SQL DISTINCT
The SQL DISTINCT keyword is used in a SELECT statement to remove duplicate rows from the result set. This keyword ensures that the data returned is unique across the specified columns. It is particularly useful when you want to see only the different (distinct) values in a column or a combination of columns.
Using DISTINCT in SQL
To use DISTINCT, simply place it before the column names in your SELECT statement. DISTINCT evaluates the columns from left to right and removes any duplicate rows from the results. Here's the basic syntax:
Example: Removing Duplicate Rows
Consider a table named Employees
with the following data:
- id: 1, name: John, department: Sales
- id: 2, name: Jane, department: Marketing
- id: 3, name: John, department: Sales
- id: 4, name: Jake, department: IT
To get a list of unique names from the Employees
table, you would use:
This query returns:
- John
- Jane
- Jake
As you can see, duplicate entries of 'John' are removed from the result set.
PostgreSQL DISTINCT ON Clause
PostgreSQL provides an additional feature called DISTINCT ON, which allows you to select a distinct row based on one or more specified columns. This feature is particularly useful when you need to fetch distinct records with the ability to control which row to pick in case of duplicates based on sorting criteria.
Example: Using DISTINCT ON in PostgreSQL
Consider the same Employees
table. If you want to select the first employee from each department by their name, you can use the following query:
This query returns the first employee (alphabetically by name) from each department:
- id: 4, name: Jake, department: IT
- id: 2, name: Jane, department: Marketing
- id: 1, name: John, department: Sales
Here, DISTINCT ON ensures that each department appears only once in the result set, showing the first alphabetical name within each department.
Conclusion
The DISTINCT keyword is a powerful tool in SQL for reducing redundancy in your result sets by eliminating duplicate rows. With PostgreSQL's DISTINCT ON clause, you gain even more control over how distinct records are selected, making it easier to organize and analyze data efficiently.