Wednesday, March 7, 2018

SQL MAX vs TOP 1: Part 2! The Revenge of IV30500!

By Steve Endow

I just can't let it go.  I need to know.  I need answers.  I need to solve the mystery.  The riddle.  The enigma.

Why does the MAX function sometimes perform very poorly compared to TOP 1?

I really thought that "MAX vs TOP 1" was a simple question. An easy question.  A spend 10 seconds on Stack Overflow and get the answer type of question.

But I just couldn't leave it alone and had to go and test it for myself.  And open a veritable Pandora's Box of SQL riddles.

In Part 1 of this series, I delved into how MAX and TOP 1 behave in several random queries, and I ended on a query that showed MAX performing quite poorly compared to TOP 1.

After that video, I stumbled across an even simpler query that produced an even more dramatic performance difference, where MAX performed miserably.  But I couldn't figure out why.

In this video, I discuss what I learned about the query and the specific Index Scan that is causing the MAX query to performing so poorly. 


Here's the recap:

When querying some fields, such as the IV30500 POSTEDDT, with no WHERE clause, both MAX and TOP 1 perform virtually the same, with both using a very efficient Index Scan.  50% vs 50% relative costs.



But when I query the TRXSOURCE field, with no WHERE clause, MAX shows 100% relative cost, whereas TOP 1 shows 0% relative cost.



What???

The POSTEDDT query uses an Index Scan.  The TRXSOURCE query uses an Index Scan.  But for some reason with the TRXSOURCE query, MAX is much more costly.

I stared at this result for a few hours trying to figure out why.  I eventually found this tiny little detail.


Notice that the Actual Number of Rows for the MAX Index Scan is 147,316.  That's every row in the table.

By contrast, the TOP 1 Index Scan has Actual Number of Rows = 1.

What is going on?

For some reason, the MAX is having to scan the ENTIRE AK1IV30500 index.  It isn't getting much benefit from the index.

But why?

Unfortunately, I don't yet know.

"Maybe SQL is caching the query execution plan?"

Apparently not.  I tried DBCC FREEPROCCACHE, and saw no change.


"Maybe your statistics are stale and need to be updated?"

Nope.  I tried UPDATE STATISTICS WITH FULLSCAN.  No change


"C'mon Steve, clearly you need to re-index!"

Did that.  I tried DBREINDEX on the specific AK1IV30500 index, as well as the entire IV30500 table.  No change in the execution plan.


I didn't find any standard maintenance task that changed the behavior of the MAX Index Scan.

As a last resort, I used the Import/Export wizard to export all of the data from the IV30500 table into a new table that I called IV30500REBUILD.  I then ran scripts to create all of the same indexes on the REBUILD table, making it identical to the original IV30500 table.

I then ran the MAX and TOP 1 queries on the new REBUILD table.

And like magic, the MAX Index Scan returned just one row.


Same table structure.

Same data.

Same indexes.

But the Index Scan on the new REBUILD table behaves properly.

So there is apparently something about my IV30500 that is causing this problem, and rebuilding the entire table resolves it.  But rebuilding a table isn't exactly a typical SQL maintenance task, so it's not really a solution.

But this is way past my SQL skill level, so I don't yet know what conventional maintenance task might be able to achieve the same results.

I've asked for help from a true SQL expert, so I'm hoping that she will assist and help me figure out this mystery.




You can also find him on Twitter, YouTube, and Google+





No comments: