SQL Basics

SQL DISTINCT

Removing Duplicates with DISTINCT

DISTINCT removes duplicate rows, with DISTINCT ON available in PostgreSQL for specific columns.

Introduction to SQL DISTINCT

The SQL DISTINCT keyword is used in a SELECT statement to remove duplicate rows from the result set. This keyword ensures that the data returned is unique across the specified columns. It is particularly useful when you want to see only the different (distinct) values in a column or a combination of columns.

Using DISTINCT in SQL

To use DISTINCT, simply place it before the column names in your SELECT statement. DISTINCT evaluates the columns from left to right and removes any duplicate rows from the results. Here's the basic syntax:

Example: Removing Duplicate Rows

Consider a table named Employees with the following data:

  • id: 1, name: John, department: Sales
  • id: 2, name: Jane, department: Marketing
  • id: 3, name: John, department: Sales
  • id: 4, name: Jake, department: IT

To get a list of unique names from the Employees table, you would use:

This query returns:

  • John
  • Jane
  • Jake

As you can see, duplicate entries of 'John' are removed from the result set.

PostgreSQL DISTINCT ON Clause

PostgreSQL provides an additional feature called DISTINCT ON, which allows you to select a distinct row based on one or more specified columns. This feature is particularly useful when you need to fetch distinct records with the ability to control which row to pick in case of duplicates based on sorting criteria.

Example: Using DISTINCT ON in PostgreSQL

Consider the same Employees table. If you want to select the first employee from each department by their name, you can use the following query:

This query returns the first employee (alphabetically by name) from each department:

  • id: 4, name: Jake, department: IT
  • id: 2, name: Jane, department: Marketing
  • id: 1, name: John, department: Sales

Here, DISTINCT ON ensures that each department appears only once in the result set, showing the first alphabetical name within each department.

Conclusion

The DISTINCT keyword is a powerful tool in SQL for reducing redundancy in your result sets by eliminating duplicate rows. With PostgreSQL's DISTINCT ON clause, you gain even more control over how distinct records are selected, making it easier to organize and analyze data efficiently.

Previous
ORDER BY
Next
LIKE