Tuesday, December 8, 2015

T-SQL Tuesday #73: IOPS on Azure VM's Naughty to Nicer

Hello Dear Reader!  Last week I posted the invitation to T-SQL Tuesday #73.  The premise? 

As you work with SQL Server look around you.  Is your environment Naughty or Nice?  If it is Naughty what’s wrong with it?  What would you do to fix it?  Do you have a scrooge that is giving you the Christmas chills?  Perhaps you have servers of past, present, and future haunting you.  Maybe you are looking at SQL Server 2016 like some bright shining star in the east.

I don’t have an environment of my own.  Not one that I get to interact with every day.  Recently I’ve had several experiences where some friends with various companies were having issues and the root cause was all hardware.  Memory, SAN, or CPU one had become a bottleneck.

Several had to do with I/O.  One had issues with their SAN, another had issues with their code (causing unnecessary I/O), and another on their Azure VM.

“So Balls,” you say, “What does this have to do with Naughty & Nice?”

As always Dear Reader, thanks for keeping me on track.  For this week I’m going to tackle how IOp’s can affect your SQL Server.   I’ll be combining this with the pseudo code to mimic the issue my client had.  Then a little on Azure VM’s, and how Premium Storage made a big difference. 

THE SET UP

I’ve got a really bad query.  This is from years past.  Lots of unnecessary row by row logic.  Duplicate logic.  It’s bad. 

Sufficient to say, I’m rewriting it with pseudo code because I don’t want to risk insulting the guilty.  I will post the stats though.  We were on an Azure D13 Series VM, and we moved to a DS 13 Series VM.  This fixes nothing by itself.  Click here to see the machine specs.  See below for a quick table.  We will be using the DS3 VM.  We will move from standard storage to premium storage on the same machine to highlight our fix.


SIZE – AZURE CLASSIC PORTAL\CMDLETS & APIS
CPU CORES
MEM
NICS (MAX)
MAX. DISK SIZES – VIRTUAL MACHINE
MAX. DATA DISKS (1023 GB EACH)
CACHE SIZE (GB)
MAX. DISK IOPS & BANDWIDTH
Standard_DS3\same
4
14
4
OS = 1023 GB
8
172
12,800
OUR DEMO MACHINE
Local SSD disk = 28 GB
128 MB per second
Standard_DS13\same
8
56
8
OS = 1023 GB
16
288
25,600
Local SSD disk = 112 GB
256 MB per second

A DS Series VM gives us the ability to use Premium Storage.  We will use that turn bad code into, faster bad code.  Fixing the code comes later.  The first thing I will do is run CrystalDiskmark against my disk. 

Our G drive is standard storage. 


M drive is our premium storage.



The M drive is a 1 TB Premium Storage Drive.  I have done nothing fancy here.  Just attached and formatted.  As you can see we received a little bit better than 128 MB per second sequential in my baseline.

Our really bad query?  It runs in 19:48 seconds on our regular storage.  When I found it in the wild it actually ran for 4 hours.  I could only replicate so much.  Sufficed to say, for this example it will do.



Without tuning the query, but moving the database to premium storage, dropping all data from memory, and rerunning the query we reduced the time to 5:49 seconds.  A roughly 72% improvement.




Premium storage can go faster.  It is a matter of configuring the VM & the Storage to hit the IOP's limits you need.  For example, this is a screen shot from the DS 13 VM I was talking about earlier.  



With a DS 14 we would have been able to go above 512 MB per second.  Premium storage on DS Series VM's is some serious good stuff.

THE WHOLE TRUTH


The query is bad.  Really bad.  I can fix the whole thing by converting it from row by row logic to set based logic.  I can get it down to 2 seconds using the merge statement.

We did.  It was a big fix.  Originally we were hoping to get the process from 4 hours down to 2 hours.  After I was done we had it down to 5 minutes.

I say this often when teaching.  I can throw super-fast hardware at bad code and make it run faster.  But there is still room for improvement.  Fixing the query is for another day.   Today we keep our IO on the ball.   Why is it so fast?


CPU
Memory
SSD
Spinning Disk
Nanosecond
Nanosecond
Microsecond
Millisecond

You can only get as fast as your slowest component.  Premium storage uses SSD's.  We are hastening our retrieval of data from our disks by a factor of x10.  In this case it made our query 72% faster.  There are other things to fix.  The IO issue is one of them.


“So Balls,” you say, “Keep our IO on the ball.  Really?!”


Sorry Dear Reader, I couldn’t resist.  As always, Thank You for stopping by.

Thanks,

Brad





1 comment: