What's faster, SELECT DISTINCT or GROUP BY in MySQL?

Question

If I have a table

CREATE TABLE users (
  id int(10) unsigned NOT NULL auto_increment,
  name varchar(255) NOT NULL,
  profession varchar(255) NOT NULL,
  employer varchar(255) NOT NULL,
  PRIMARY KEY  (id)
)

and I want to get all unique values of profession field, what would be faster (or recommended):

SELECT DISTINCT u.profession FROM users u

or

SELECT u.profession FROM users u GROUP BY u.profession

?

You could test for yourself as quickly as ask the question. Irritatingly, it is almost impossible to construct a scenario in which DISTINCT outperforms GROUP BY - which is annoying because clearly this is not the purpose of GROUP BY. However, GROUP BY can produce misleading results, which I think is reason enough for avoiding it. — Strawberry, Aug 11, 2014 at 22:19
There's another duplicate with a different answer. see MySql - Distinct vs Group By <<< it says GROUP BY is better — Aung Myo Linn, Jun 3, 2016 at 9:27
Please see here if you want to measure the time difference between DISTINCT and GROUP BY running your query. — Aung Myo Linn, Jun 7, 2016 at 8:28
Also, don't use DISTINCT and GROUP BY in the same query. It can cause huge slowness issues for some queries. — akenney, Feb 10, 2023 at 18:49

guest · Accepted Answer · 2014-08-11 22:12:08Z

304

They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).

If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.

When in doubt, test!

edited Aug 11, 2014 at 22:12

guest

6,63832 silver badges46 bronze badges

answered Feb 24, 2009 at 12:09

SquareCog

19.5k8 gold badges50 silver badges63 bronze badges

89

DISTINCT will be faster only if you DON'T have an index (as it doesn't sort). When you do have an index and it's used, they're synonyms.
– Quassnoi
Feb 27, 2009 at 15:11
14

The definition of DISTINCT and GROUP BY differ in that DISTINCT doesn't have to sort the output, and GROUP BY by default does. However, in MySQL even a DISTINCT+ORDER BY might still be faster than a GROUP BY due to the extra hints for the optimizer as explained by SquareCog.
– rustyx
Jan 25, 2015 at 15:03
1

DISTINCT is much faster with large amount data.
– Pankaj Wanjari
Dec 28, 2015 at 13:19
13

I tested this, and found that on an indexed column, mysql, group by was about 6x slower than distinct with a fairly complicated query. Just adding this as a datapoint. About 100k rows. So test it and see for yourselves.
– Lizardx
Feb 21, 2016 at 22:32
see MySql - Distinct vs Group By <<< it says GROUP BY is better
– Aung Myo Linn
Jun 3, 2016 at 9:25

| Show 2 more comments

Quassnoi · Accepted Answer · 2009-02-27 14:50:08Z

115

If you have an index on profession, these two are synonyms.

If you don't, then use DISTINCT.

GROUP BY in MySQL sorts results. You can even do:

SELECT u.profession FROM users u GROUP BY u.profession DESC

and get your professions sorted in DESC order.

DISTINCT creates a temporary table and uses it for storing duplicates. GROUP BY does the same, but sortes the distinct results afterwards.

So

SELECT DISTINCT u.profession FROM users u

is faster, if you don't have an index on profession.

answered Feb 27, 2009 at 14:50

Quassnoi

420k91 gold badges624 silver badges619 bronze badges

10

You can add ORDER BY NULL to the GROUP BY to avoid the sort.
– Ariel
Aug 20, 2014 at 3:21
Still slower even with grouping by null
– Thanh Trung
Jun 7, 2019 at 11:05
@ThanhTrung: what is slower than what?
– Quassnoi
Jun 7, 2019 at 11:12
@Quassnoi groupby slower than distinct even if avoiding sort
– Thanh Trung
Jun 7, 2019 at 11:47
1

Note: Order qualifiers on GROUP BY were deprecated in MySQL 8.
– Matthew Lenz
Feb 27, 2020 at 13:49

Add a comment |

daniel.gindi · Accepted Answer · 2013-05-16 10:49:53Z

27

All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column. Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...

BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.

So if you have something like:

// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins

// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id

It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.

So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!

answered May 16, 2013 at 10:49

daniel.gindi

3,4751 gold badge30 silver badges36 bronze badges

5

Although this question is about MySQL it should be noted that the second query will work only in MySQL. Nearly every other DBMS will reject the second statement because it's an invalid use of the GROUP BY operator.
– user330315
Sep 15, 2013 at 10:44
Well, "nearly" is a problematic definition :-) It would be much more helpful if you state a specific DBMS that you have tested to see that it generates an error for this statement.
– daniel.gindi
Sep 15, 2013 at 11:53
4

Postgres, Oracle, Firebird, DB2, SQL Server for starters. MySQL :sqlfiddle.com/#!2/6897c/1 Postgres: sqlfiddle.com/#!12/6897c/1 Oracle: sqlfiddle.com/#!12/6897c/1 SQL Server: sqlfiddle.com/#!6/6897c/1
– user330315
Sep 15, 2013 at 13:09
And to confuse us :) , mysql allows using select distinct(a), b which means select distinct a, b, which means distinct on the pair.
– Marinos An
Mar 4, 2021 at 13:01

Add a comment |

Raj · Accepted Answer · 2012-07-12 09:56:29Z

17

Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!

edited Jul 12, 2012 at 9:56

Raj

22.6k14 gold badges101 silver badges146 bronze badges

answered Feb 24, 2009 at 13:37

Tim

2723 silver badges9 bronze badges

Add a comment |

Pavel Stehule · Accepted Answer · 2013-09-15 05:59:00Z

9

well distinct can be slower than group by on some occasions in postgres (dont know about other dbs).

tested example:

postgres=# select count(*) from (select distinct i from g) a;

count 

10001
(1 row)

Time: 1563,109 ms

postgres=# select count(*) from (select i from g group by i) a;

count
10001
(1 row)

Time: 594,481 ms

http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I

so be careful ... :)

edited Sep 15, 2013 at 5:59

Pavel Stehule

44.2k6 gold badges95 silver badges98 bronze badges

answered Nov 18, 2011 at 13:56

OptilabWorker

991 silver badge1 bronze badge

GROUP BY is also faster than DISTINCT in AWS Redshift, because GROUP BY uses a XN HashAggregate and DISTINCT uses a XN Unique. It is the same problem that older versions of Postgres have.
– Bernardo Loureiro
Jul 1, 2020 at 18:03

Add a comment |

TheTechGuy · Accepted Answer · 2013-08-26 19:27:20Z

7

Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null ..

SELECT DISTINCT u.profession FROM users u

is equal to

SELECT u.profession FROM users u GROUP BY u.profession order by null

edited Aug 26, 2013 at 19:27

TheTechGuy

16.9k16 gold badges116 silver badges140 bronze badges

answered Jun 21, 2013 at 1:09

Ranjith

4756 silver badges17 bronze badges

is equal to SELECT profession FROM users GROUP BY profession
– user5490177
Dec 16, 2019 at 20:05

Add a comment |

Ahmed Ekri · Accepted Answer · 2014-02-11 18:47:45Z

6

In MySQL, "Group By" uses an extra step: filesort. I realize DISTINCT is faster than GROUP BY, and that was a surprise.

edited Feb 11, 2014 at 18:47

Ahmed Ekri

4,6413 gold badges24 silver badges42 bronze badges

answered Feb 11, 2014 at 18:27

Carlos

611 silver badge1 bronze badge

Add a comment |

amartynov · Accepted Answer · 2009-02-24 12:20:18Z

5

It seems that the queries are not exactly the same. At least for MySQL.

Compare:

describe select distinct productname from northwind.products
describe select productname from northwind.products group by productname

The second query gives additionally "Using filesort" in Extra.

answered Feb 24, 2009 at 12:20

amartynov

4,1452 gold badges32 silver badges36 bronze badges

2

They are the same in terms of what they get, not in terms of how they get it. An ideal optimizer would execute them the same way, but MySQL optimizer is not ideal. Based on your evidence, it would seem that DISTINCT would go faster -- O(n) vs O(n*log n).
– SquareCog
Feb 24, 2009 at 15:07
So, "using filesort" is essentially bad thing?
– vava
Feb 25, 2009 at 0:17
In this case it is, because you don't need to sort (you would if you needed the groups). MySQL sorts in order to place the same entries together, and then get groups by scanning the sorted file. You just need distincts, so you just have to hash your keys while doing a single table scan.
– SquareCog
Feb 25, 2009 at 15:36
1

Add ORDER BY NULL to the GROUP BY version and they will be the same.
– Ariel
Aug 20, 2014 at 3:22

Add a comment |

Grumpy · Accepted Answer · 2015-06-09 09:12:37Z

After heavy testing we came to the conclusion that GROUP BY is faster

SELECT sql_no_cache opnamegroep_intern FROM telwerken WHERE opnemergroep IN (7,8,9,10,11,12,13) group by opnamegroep_intern

635 totaal 0.0944 seconds Weergave van records 0 - 29 ( 635 totaal, query duurde 0.0484 sec)

SELECT sql_no_cache distinct (opnamegroep_intern) FROM telwerken WHERE opnemergroep IN (7,8,9,10,11,12,13)

635 totaal 0.2117 seconds ( almost 100% slower ) Weergave van records 0 - 29 ( 635 totaal, query duurde 0.3468 sec)

Ivan Dossev · Accepted Answer · 2012-05-21 17:15:41Z

2

(more of a functional note)

There are cases when you have to use GROUP BY, for example if you wanted to get the number of employees per employer:

SELECT u.employer, COUNT(u.id) AS "total employees" FROM users u GROUP BY u.employer

In such a scenario DISTINCT u.employer doesn't work right. Perhaps there is a way, but I just do not know it. (If someone knows how to make such a query with DISTINCT please add a note!)

answered May 21, 2012 at 17:15

Ivan Dossev

5655 silver badges12 bronze badges

Add a comment |

Aung Myo Linn · Accepted Answer · 2016-06-03 10:08:20Z

Here is a simple approach which will print the 2 different elapsed time for each query.

DECLARE @t1 DATETIME;
DECLARE @t2 DATETIME;

SET @t1 = GETDATE();
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SET @t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, @t1, @t2) AS varchar);

SET @t1 = GETDATE();
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET @t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, @t1, @t2) AS varchar);

OR try SET STATISTICS TIME (Transact-SQL)

SET STATISTICS TIME ON;
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET STATISTICS TIME OFF;

It simply displays the number of milliseconds required to parse, compile, and execute each statement as below:

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.

Beep beep · Accepted Answer · 2009-02-27 14:55:01Z

1

SELECT DISTINCT will always be the same, or faster, than a GROUP BY. On some systems (i.e. Oracle), it might be optimized to be the same as DISTINCT for most queries. On others (such as SQL Server), it can be considerably faster.

answered Feb 27, 2009 at 14:55

Beep beep

19k12 gold badges66 silver badges78 bronze badges

Add a comment |

ConteudoAnimal · Accepted Answer · 2015-07-16 23:24:28Z

1

This is not a rule

For each query .... try separately distinct and then group by ... compare the time to complete each query and use the faster ....

In my project sometime I use group by and others distinct

answered Jul 16, 2015 at 23:24

ConteudoAnimal

291 gold badge1 silver badge3 bronze badges

Add a comment |

tehvan · Accepted Answer · 2009-02-24 12:09:28Z

0

If you don't have to do any group functions (sum, average etc in case you want to add numeric data to the table), use SELECT DISTINCT. I suspect it's faster, but i have nothing to show for it.

In any case, if you're worried about speed, create an index on the column.

answered Feb 24, 2009 at 12:09

tehvan

10.3k5 gold badges28 silver badges31 bronze badges

Add a comment |

Daniel R · Accepted Answer · 2014-06-09 19:37:04Z

If the problem allows it, try with EXISTS, since it's optimized to end as soon as a result is found (And don't buffer any response), so, if you are just trying to normalize data for a WHERE clause like this

SELECT FROM SOMETHING S WHERE S.ID IN ( SELECT DISTINCT DCR.SOMETHING_ID FROM DIFF_CARDINALITY_RELATIONSHIP DCR ) -- to keep same cardinality

A faster response would be:

SELECT FROM SOMETHING S WHERE EXISTS ( SELECT 1 FROM DIFF_CARDINALITY_RELATIONSHIP DCR WHERE DCR.SOMETHING_ID = S.ID )

This isn't always possible but when available you will see a faster response.

da Bich · Accepted Answer · 2021-10-09 23:06:04Z

0

in mySQL i have found that GROUP BY will treat NULL as distinct, while DISTINCT does not.
Took the exact same DISTINCT query, removed the DISTINCT, and added the selected fields as the GROUP BY, and i got many more rows due to one of the fields being NULL.

So.. I tend to believe that there is more to the DISTINCT in mySQL.

answered Oct 9, 2021 at 23:06

da Bich

5181 gold badge4 silver badges15 bronze badges

Add a comment |

Collectives™ on Stack Overflow

What's faster, SELECT DISTINCT or GROUP BY in MySQL?

16 Answers 16

Not the answer you're looking for? Browse other questions tagged
mysql
sql
database
group-by
distinct
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

Not the answer you're looking for? Browse other questions tagged mysqlsqldatabasegroup-bydistinct or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
mysql
sql
database
group-by
distinct
or ask your own question.