What is the difference between UNION and UNION ALL?

This link may help you codefari.com/2015/10/what-is-difference-between-union-and.html


The implication of this, is that union is much less performant as it must scan the result for duplicates

UNION ALL will indeed be more performant, specifically due to the lack of the distinct sort. My general practice is to use UNION ALL unless I specifically want duplicates.

Just noticed that there are a lot of good comments/answers here, so I turned on the wiki flag and added a note about performance...

UNION ALL can be slower than UNION in real-world cases where the network such as the internet, is a bottleneck. The cost of transferring many duplicate rows can exceed the query execution time benefit. This has to be analyzed on a case-by-case basis.

AdamCaviness Your comment doesn't quite make sense.

"all selected columns need to be of the same data type" -- actually, things aren't that strict (not a good thing from a relational model point of view!). The SQL standard says their respective column descriptor must be the same except in name.

"In order to remove duplicates the result set must be sorted" -- maybe you have a particular vendor in mind but there are no vendor-specific tags on the question. Even if there was, could you prove that duplicates cannot be removed without sorting?

distinct will "implicitly" sort the results, because removing duplicates is quicker on a sorted set. this does not mean the returned resultset is actually sorted that way, but in most cases distinct (and therefore, UNION) will internally sort the set of results.

This will omit rows and therefore fail to produce the expected result if a contains NULL values. Also, it still does not return the same result as a UNION - UNION also removes duplicates that are returned by the subqueries, whereas your approach won't.

FrankSchmitt - thanks for this answer; this bit about subqueries is exactly what I wanted to know!

Nitpick: UNION ALL isn't "offered" by T-SQL. UNION ALL is part of the ANSI SQL standard and not specific to MS SQL Server.

The 'Nitpick' comment could imply that you can't use "Union All" in TSQL, but you can. Of course, the comment doesn't say that, but someone reading it may infer it.

There is no SQL Server tag on this question. I think the option that returns duplicates just because it usual perform best is the wrong advice.

onedaywhen I guess the OP used the phrase "SQL Servers" as a synonym for all RDBMSs (e.g. MySQL, PostGreSQL, Oracle, SQL Server). The wording is unfortunate, though (and of course, I might be mistaken).

FrankSchmitt: none of the products you listed are truly RDBMSs :)

onedaywhen care to elaborate? At least en.wikipedia.org/wiki/Relational_database_management_system seems to agree with me - it explicitly mentions Microsoft SQL Server, Oracle Database and MySQL. Or are you nitpicky about the difference between Oracle and Oracle Database e.g. ?

Everything in this answer has been said already, is too confusing to be useful (suggesting joins over unions when they do different things, giving "blocking" as a reason without explaining what you mean by that or which database servers it applies to), or is highly misleading (your percentages in your screenshot are not applicable to real actual use of UNION/UNION ALL).

Blocking Operators are well known operators in TSQL. Everything that blocking operators do can be achieved by Joins but not vice versa. Distinct Sort operation is circled in the picture to show why union all performs better than union and also to show exactly where it exists in the execution plan. Feel free to add more data to the tables T1 and T2 to play around with the percentages!

You technically CAN produce the results of a union using a combination of joins and some really nasty cases, but it makes the query darn-near impossible to read and maintain, and in my experience it is also terrible for performance. Compare: select foo.bar from foo union select fizz.buzz from fizz against select case when foo.bar is null then fizz.buzz else foo.bar end from foo join fizz where foo.bar is null or fizz.buzz is null

DBA Your answer is only relevant for users of MS SQL Server. The OP never mentioned the RDBMS they're using - they might be using MySQL, PostgreSQL, Oracle, SQLite, ...

Your second picture suggests the two are mutually exclusive when they are not. The picture should rather show the same as the first but with the 'intersection ellipse' () shown a second time. Actually, on second thoughts, because the union all result is not a set, you should make no attempt to draw it using a Venn diagram!

"Union Result set is sorted in ascending order" -- Unless there is an ORDER BY, sorted results are not guaranteed. Maybe you have a particular SQL vendor in mind (even then, ascending order what exactly...?) but this question has no vendor=specific tags.

"merges the contents of two structurally-compatible tables" -- I think you've stated this part really well :)

Really? For a four-row result?! I would think this is a scenario where you would want to use UNION to convey intent (i.e. no duplicates) because UNION ALL is unlikely to given any real life performance gain in absolute terms.

True ! UNION might change the order of the two sub-results.

This is wrong. A UNION will NOT sort the result in ascending order. Any ordering you see in a result without using order by is pure coincidence. The DBMS is free to use any strategy it thinks is efficient to remove the duplicates. This might be sorting, but it could also be a hashing algorithm or something entirely different - and the strategy will change with the number of rows. A union that appears sorted with 100 rows might not be with 100.000 rows

Without an ORDER BY clause on the query, the RDBMS is free to return the rows in any sequence. The observation that the result set from a UNION operation is returned "in ascending order" is only a byproduct of a "sort unique" operation performed by the database. The behavior observed is not guaranteed. So don't rely on it. If the specification is to return rows in a particular order, then add an appropriate ORDER BY clause.

But your first chunk of results could be one row duplicated many times: how useful is that?!

How does this add any value compared to the accepted answer?

Nick It is shorter answer.

Shorter might be an advantage if you had to read a significant part of the accepted answer to get this data. But in this case the accepted answer contains all this information in the very first sentence after which it goes on to discuss the implications of the difference in detail.