Showing posts with label T-SQL Tuesday. Show all posts
Showing posts with label T-SQL Tuesday. Show all posts

Tuesday, December 8, 2015

T-SQL Tuesday #73: IOPS on Azure VM's Naughty to Nicer

Hello Dear Reader!  Last week I posted the invitation to T-SQL Tuesday #73.  The premise? 

As you work with SQL Server look around you.  Is your environment Naughty or Nice?  If it is Naughty what’s wrong with it?  What would you do to fix it?  Do you have a scrooge that is giving you the Christmas chills?  Perhaps you have servers of past, present, and future haunting you.  Maybe you are looking at SQL Server 2016 like some bright shining star in the east.

I don’t have an environment of my own.  Not one that I get to interact with every day.  Recently I’ve had several experiences where some friends with various companies were having issues and the root cause was all hardware.  Memory, SAN, or CPU one had become a bottleneck.

Several had to do with I/O.  One had issues with their SAN, another had issues with their code (causing unnecessary I/O), and another on their Azure VM.

“So Balls,” you say, “What does this have to do with Naughty & Nice?”

As always Dear Reader, thanks for keeping me on track.  For this week I’m going to tackle how IOp’s can affect your SQL Server.   I’ll be combining this with the pseudo code to mimic the issue my client had.  Then a little on Azure VM’s, and how Premium Storage made a big difference. 

THE SET UP

I’ve got a really bad query.  This is from years past.  Lots of unnecessary row by row logic.  Duplicate logic.  It’s bad. 

Sufficient to say, I’m rewriting it with pseudo code because I don’t want to risk insulting the guilty.  I will post the stats though.  We were on an Azure D13 Series VM, and we moved to a DS 13 Series VM.  This fixes nothing by itself.  Click here to see the machine specs.  See below for a quick table.  We will be using the DS3 VM.  We will move from standard storage to premium storage on the same machine to highlight our fix.


SIZE – AZURE CLASSIC PORTAL\CMDLETS & APIS
CPU CORES
MEM
NICS (MAX)
MAX. DISK SIZES – VIRTUAL MACHINE
MAX. DATA DISKS (1023 GB EACH)
CACHE SIZE (GB)
MAX. DISK IOPS & BANDWIDTH
Standard_DS3\same
4
14
4
OS = 1023 GB
8
172
12,800
OUR DEMO MACHINE
Local SSD disk = 28 GB
128 MB per second
Standard_DS13\same
8
56
8
OS = 1023 GB
16
288
25,600
Local SSD disk = 112 GB
256 MB per second

A DS Series VM gives us the ability to use Premium Storage.  We will use that turn bad code into, faster bad code.  Fixing the code comes later.  The first thing I will do is run CrystalDiskmark against my disk. 

Our G drive is standard storage. 


M drive is our premium storage.



The M drive is a 1 TB Premium Storage Drive.  I have done nothing fancy here.  Just attached and formatted.  As you can see we received a little bit better than 128 MB per second sequential in my baseline.

Our really bad query?  It runs in 19:48 seconds on our regular storage.  When I found it in the wild it actually ran for 4 hours.  I could only replicate so much.  Sufficed to say, for this example it will do.



Without tuning the query, but moving the database to premium storage, dropping all data from memory, and rerunning the query we reduced the time to 5:49 seconds.  A roughly 72% improvement.




Premium storage can go faster.  It is a matter of configuring the VM & the Storage to hit the IOP's limits you need.  For example, this is a screen shot from the DS 13 VM I was talking about earlier.  



With a DS 14 we would have been able to go above 512 MB per second.  Premium storage on DS Series VM's is some serious good stuff.

THE WHOLE TRUTH


The query is bad.  Really bad.  I can fix the whole thing by converting it from row by row logic to set based logic.  I can get it down to 2 seconds using the merge statement.

We did.  It was a big fix.  Originally we were hoping to get the process from 4 hours down to 2 hours.  After I was done we had it down to 5 minutes.

I say this often when teaching.  I can throw super-fast hardware at bad code and make it run faster.  But there is still room for improvement.  Fixing the query is for another day.   Today we keep our IO on the ball.   Why is it so fast?


CPU
Memory
SSD
Spinning Disk
Nanosecond
Nanosecond
Microsecond
Millisecond

You can only get as fast as your slowest component.  Premium storage uses SSD's.  We are hastening our retrieval of data from our disks by a factor of x10.  In this case it made our query 72% faster.  There are other things to fix.  The IO issue is one of them.


“So Balls,” you say, “Keep our IO on the ball.  Really?!”


Sorry Dear Reader, I couldn’t resist.  As always, Thank You for stopping by.

Thanks,

Brad





Tuesday, December 1, 2015

T-SQL Tuesday #73 Invitation – Naughty or Nice?




Hello Dear Reader!  This is the first Tuesday of the month and you know what that means.  It’s time to announce the T-SQL Tuesday Topic of the month!  This is your opportunity to participate in the largest SQL Blog party on the intrawebs. 

T-SQL Tuesday is an event started by Adam Machanic(@AdamMachanic| Blog) back in 2009.  The basic idea one blogger hosts the party and others participate.  We announce the topic the first Tuesday of the month, December 1st 2015 for today, and everyone will post their blogs on the second Tuesday of the month, December 8th 2015 for the actual posts.  This month the host is none other than……….. drum roll………….. ME!

Since it is the holiday season, I wanted to keep this holiday themed.  I couldn’t help but think of the approaching Christmas holiday.  Every year I get my children to make a list of presents that they would like to get from Santa Clause.  If we are out and about and they start to misbehave I remind them about their list and that only Nice children get presents.

“So Balls,” you say, “What in the ever loving wide wide world of sports does this have to do with SQL Server?!?”

As you work with SQL Server look around you.  Is your environment Naughty or Nice?  If it is Naughty what’s wrong with it?  What would you do to fix it?  Do you have a scrooge that is giving you the Christmas chills?  Perhaps you have servers of past, present, and future haunting you.  Maybe you are looking at SQL Server 2016 like some bright shining star in the east.

First and foremost the rules. 

Rule 1: Don’t get yourself fired.  If your boss is a scrooge don’t name names, don’t hint to hard.  It’s okay to protect yourself.  You want to write about events we can look back on and reflect over, not events HR would *love* to know about.  Perhaps a reminder of SQL Server's past would be better than that of SQL present.

Rule 2: Some Time next Tuesday using GMT, here’s a link to a GMT time convertor, publish your blog post.  For example in the US that would cover 8 pm Monday to 8 pm Tuesday.

Rule 3: Make sure that you include the Image at the top of the page helping to identify your post as a T-SQL Tuesday blog.  Then come back here and post a link in the comments so I can find them.  Before the end of the week I'll do a round up of all the blogs. 

Extra Credit!

Tweet your blog with the hash tag #tsql2sday, use SQL Server 2016, & go read someone else’s blog on the subject!

As Always Dear Reader, Thanks for stopping by and I’ll see you next Tuesday!

Thanks,


Brad

Tuesday, September 10, 2013

T-SQL Tuesday #46 Rube Goldberg Machine aka Automating Deleting Older Certificates




Hello Dear Reader!  This is the second Tuesday of the month and you know what that means, T-SQL Tuesday the largest blog party on the Intrawebs.  T-SQL Tuesday is the brain child of SQL Community member extraordinaire Adam Machanic(@AdamMachanic | Blog), also the inventor of the word “Twote”  as in “To misquote a Tweet”, when used in a sentence it sounds like “He Twoted me wrong”.  This month our host is Rick Krueger(@DataOger | Blog).  So Rick what’s our topic?

My first exposure to Rube Goldberg Machines was playing the game Mouse Trap as a child. I work almost exclusively on the SQL development side of the house, where we sometimes build crazy creative solutions to solve business problems. We generally know the ‘right’ way to do things, but pesky issue like budgets, personnel, and deadlines get in the way. So, we channel our inner MacGyver, grab a handful paper clips and some duct tape, and then do things with SQL Server that we know shouldn’t be done (in an ideal world). And we hope nobody ever finds out how we bent the rules, because we know they will judge us (as we would judge them) and call our work a <gasp>HACK</gasp>.
So, if you would please, dust off one of those skeletons and tell us how you got really creative with SQL Server, instead of doing it ‘the right way’. In other words, tell us about your ugly SQL baby. If you’re worried about saving face, feel free to describe how you would have implemented the solution if you lived in that ideal world.”
I love mouse trap and MacGyver!  Over the years as a DBA sometimes you have to work with what you’ve got.  Other times your boss says do A, you say the best way to achieve A is by doing B & C and they say do A.   I’ve got two of these that I can think of off the top of my head.  One we used Change Data Capture in lieu of Auditing (don’t ask me why, because that version of SQL also had Auditing.  Oh Hello A…..).  The other may actually prove useful.  Which one to choose from?
“So Balls”, you say, “What’s the useful one?”
Good call Dear Reader, we’ll go with the useful one!

OUT OUT D@MN SPOT CERTIFICATE

When you are using Transparent Data Encryption one of the most important things is the certificate.  Once you enable it on a production database that certificate is just as important as your database backup.  Why?  Because in case of a catastrophic failure that backup is dependent on the certificate.  If you cannot restore the certificate to a new instance your backup is useless.  *There are some work arounds to this using backups of the Master DB, but we’ll save that for another day.*

When you look at setting up maintenance plans for your server you should create a job to back up your certificate daily.  A certificate is only 1 KB in size.  Very tiny file.  If you use a private key to encrypt your certificate it is only 1 KB in size as well.  So if you leave a year of them on your hard drive you haven’t taken up 1 MB.

As a DBA sometimes you can be anal retentive a neat freak.  I don’t keep a year’s worth of backups on hand, why would I keep a year’s worth of certificates on hand?  I’d like a process to automatically delete them and only keep the last two weeks on hand, or month on hand whatever matches up with my backup retention policy.

The problem is the automated cleanup task doesn’t work.  Sure you can go in the maintenance plan wizard, make one that looks in a directory for a .CER file, but the true problem lies in the data storage.  You have to custom script out the certificates.  If you didn’t think to add a line to the backup set history table with the extension of .cer and .key and the path to your Private Key or Certificate backups then the job won’t work.

Inserting records into the MSDB tables could work, but as a DBA new to TDE that thought hadn’t crossed my mind.  I wanted a way to back up my certificates and delete my old ones.  So I built one.

MY RUBE GOLDBERG MACHINE

This is a demo I do in my TDE presentation.  It’s up on my Resource Page and has been for some time.  Today I realized I’d never blogged about it.  My scripts heavily use XP Command Shell.  I had an audit setting in my environment that wouldn’t allow that to be on my servers.  So in this script I turn it on in the beginning and off at the end.  The nice thing about the script is I unit tested it and even if there is an error in the script the sp_configure settings are server level commands that occur outside of transactions, so they run no matter what.  The script runs quick, but it will make logged entries in the SQL Server Error log stating that XP_Command shell was turned on and off.  My audit team could live with this so I was able to implement it.
I also like to use a private key and a password for my TDE Encryption.  I don’t want the password sitting around in plain text in the job either.  So I make a database called TDE.  In it I have one table called tdeKeys.  I put two columns in there one is the name of my certificate that a private key will be created for the other is the password to use for that private key.  In secure environments you could set up column level encryption to ensure the password is not in plain text even in the table field.  The demo scripts I’m going to give you doesn’t use column level encryption.  It contains a function that retrieves the Password for the Certificate Name.
Next we will create the dynamic script to back up the certificate.  Note that I backup the Master Key as well.  If you are using column level encryption you’ll want a copy of the Master Key.  You’ll need to specify the path that you want to back up the certificates.  Also you will need to specify the certificate name.
Finally we will create the script that will use xp_cmdshell to transverse directories to manually delete our backups.  You will need to edit the file path in this script and insert the Master Key and certificate names in line 74.  Finally on line 103 you will need to alter the DATEADD function.  Right now it would only keep 4 days of certificates on hand, you’ll need to edit the DATEADD to match up your backup retention policy.

Want to see the whole presentation live?  I’ve done this for Pragmatic Works Training on the T’s, click Here to watch.  You’ve got to sign up for a Pragmatic Works account if you don’t already have one, and you’ll get free info on all the free training we do monthly!


OVERKILL

“So Balls”, you say, “Is this over kill?”
<soapbox>
Well Dear Reader it depends on your environment.  You must consider Level of Effort and Level of Response, LOE and LOR.
LOE is one part you one part the hacker.  The more secure you make something the less likely that a hacker will keep going for it, or how far they will bother to go.  On your part it is how far you are willing to go to do your job.  We can also get dissuaded from going the extra mile sometimes.  Your LOE should be governed by your organizations LOR.
LOR is the response that your organization will have to the event.  One thing I like to tell folks is that if you are ever in the position that your security has been breached, and you are then talking to your boss, his/her boss, the CIO, a high ranking officer in the military, or a/multiple high ranking government official(s).  Trust me when I say that you want to be able to say you took every last step possible to protect the data under your care.  The more detail you can provide the better.  So overkill?  Maybe.  CYA.  Absolutely. Thankful that no fault on your part was found and you still have a job? Yep.
Having been in this position trust me take the extra couple steps, if you ever need it you’ll be glad you did.
</soapbox>
Thanks to Rick for hosting this today, and as always Thank You Dear Reader for stopping by!
Thanks,

Brad

Monday, July 15, 2013

T-SQL Tuesday #44 Wrap UP



Hello Dear Reader, what a busy week we’ve had last week!  I’ve got 3 big Thank You’s that I would like to give. 

First off I’d like to say Thank You to Adam Machanic for allowing me to host T-SQL Tuesday #44.  When I first started blogging, I participated in T-SQL Tuesday to keep myself writing at least once a month.  Hosting one seemed like a very far off goal at the time.  Thank you Adam for coming up with the concept and helping to promote growth for all of us bloggers out there!

Secondly my SQL Family, SQL Friends, and fellow Bloggers.  Without you writing there is no content.  You put your hard earned time into this effort and I Thank you.  Not gonna like I’m getting a little verklempt! 

Lastly Dear Reader, Thank You.  Without you all we may as well not put words to digital paper.

The subject was Second Chances and I was very impressed with the blogs, all around great job everyone.   18 spectacular blogs all about different types of Second Chances.



Koen Verbeeck (@Ko_Ver | Blog) Wrote about free time on the job, and how he would approach it now vs. years ago.



Jes Schultz Borland (@grrl_geek | Blog) Blogs about her first experience with SQL Server Clustering and the lessions she’s learned along the way.  She also has the funniest picture of the day, I believe people stopped and looked at me as I laughed.


Steve, our resident Aussie/DBA in Exile, (Blog) talks to us about some of the many different mistakes made.  I believe my groan and the words “ouch” may have audibly escaped my lips stirring my cube mates when I read about the SAN bullet point.  Most importantly he reminds us to celebrate failure as well as success, and how to do it with a sense of humor.

Joey D’Antoni (@jdanton | Blog) A mountain of a man, with a stare that could straighten out T-SQL at 200 paces, tells us all about the moment you realize that you’ve done something wrong, and a time where a QA task was run, but not against QA.  Probably the best quote of the day goes to him “the bead of sweat moment”.  Beautiful description of a feeling that all of us have either had, or will have.


Oliver Asmus (@OliverAsmus|Blog) shares with us a story of woe when he was a Junior DBA.  It involves a Delete statement that has a where clause, but only the delete portion was highlighted.  The dreaded where clause, we know thee well.  Fortunately a Sr DBA was there to help out and Oliver adds some nice thoughts on explicit transactions.

The always excellent Robert Pearl (@PearlKnows|Blog) gives us some Pearls of Wisdom on sending emails.  In our youth we tend to fire them off rather quickly, sometimes that leads to trouble.  This is great advice for anyone in the business world with an email account!

Lance England (@LanceEngland|Blog) reminds us of all the things that can go wrong when we leave our cell phone at our desk during lunch….. and kick off a large update transaction on prod…… and do not execute a commit or rollback.  The most important part is once you make the mistake how you learn from it.

Stuart Moore (@napalmgram|Blog) reminds us that it isn’t just mistakes that we want a second shot at.  There are quite a few where you did a good or even great job but you know you could have done better with more time.

Martyn Jones (@MartynJones|Blog) Wrote his FIRST T-SQL TUESDAY Blog this week!  First off Thanks for Joining the Party Martyn!  Martyn takes us through an exercise in making sure you’re executing your code in the right environment.  Great Point!  I always like to check SSMS to validate my environment, once bitten twice shy.


The Editor and Chief of SQLServerCentral.com Steve Jones(@way0utwest|Blog) weighs in on SQL Slammer and the havoc that can be done by not patching.  Ahhh 2002, we all remember you well!



WRAP IT UP 

When you look at the people that participated this past month you see MVP’s, an MCM, Consultants, DBA’s, and none of us are perfect.  Mistakes are a part of life, and it isn’t the mistake that is important but how you handle it.

The next time you do I hope you remember that, take a deep breath, and just keep going.  That Second Chance to do things differently will be just around the corner.

As always Thanks for stopping by.

Thanks,


Brad

Tuesday, July 9, 2013

T-SQL Tuesday #44 How to Take Down Prod in 30 Seconds

Hello Dear Reader!  Welcome to my blog on T-SQL Tuesday #44 Second Chances.  I’m hosting this month, and we are writing all about second chances.  My second chance comes from the not to distant past.

One day the DBA team was given a toy frog as some part of swag from a vendor.  We did what any group of grown men would do.  We put a dunce hat on it.  We decided that whoever screwed up next would have it sitting on their cubical wall, and we would pass it around as the next offender appeared.  A fun little way to pass the time and rib one another.  

No sooner had I participated in developing this badge of shame, than I earned it.  The title says it all.  How to take down prod in 30 seconds, but I should clarify.  Not some, not half, but allllllll you’re clustered servers in just 30 seconds.


I have to give a special Thank you to my buddy Dan Taylor (@DBABulldog | Blog), you see I remembered I had the frog.  I had forgotten what I had done to earn it.  It was sitting on the edge of my mind, but no matter how hard I tried I could not remember it.  It was sitting in a fog just out of reach.   An itch that I couldn’t scratch.  A few words out of his mouth and it all came flooding back.  As a good friend we've swapped many stories over the years, without his memory (which is better than mine) I would have had to go with a less interesting tale of woe.

“So Balls”, you say, “How did you screw up?”

Well Dear Reader I had an unfortunate convergence of unexpected anomalies that peaked in a spectacular crescendo of a mistyped password.   Yes a mistyped password.  My second chance would be typing in correctly.  The next best thing is explaining it so you hopefully never have to feel the same pain.

I SOLEMNLY SWEAR I AM UP TO NO GOOD


I had a new production SQL 2008 R2 Instance to install.  Things were going pretty smooth.  I got up to the screen where you punch in the password for the service account, and that’s when it all went wrong.  

I mistyped the password.  GASP, SHOCK, AWE, OTHER SUCH EXPRESSIONS!!!!!

Normally I would agree no big deal, but the next time I punched in the password I didn't get a password error, I got an error informing me that the account was locked.  Enter the series of unfortunate events.

Imagine you live in a world where all of the Prod servers are using the same service account.  Imagine that you've suggested this be changed but it ended up on the “That’s a good idea we’ll tackle that another day” pile.  Imagine that you are not using Microsoft Clustering for your Clustered servers, and that the inventive Server Engineers rolled their own “health check”.  Imagine that your current password policy locks out when you mistype the password somewhere between 3-8 times.

“But Balls”, you say, “You only typed your password once?  Not 3-8 times!”

Exactly.  There’s a bug in the installer for SQL Server 2008 R2.  When you click the next button after filling out the service account information, you authenticate at least twice for every account you type in.  Not so in SQL 2005 or SQL 2008 (not R2).  But in SQL 2008 R2 one mistyped password counts a whole lot more.  Depending on the services being installed, enough to lock out an account.

Then you are left to watch the manual health checks fail because the account is locked out, attempt a cluster failover, only to be locked out on the other side because the SQL Service account was locked out.

You catch your error quickly.  Run to the Team Lead, report what has happened, hoping this can get fixed before the inevitable outages begin.  Then you race back to your desk.  You have an uncomfortable phone call to place to the help desk.

Imagine that while this unfolds you are waiting on hold for the help desk to open a ticket (you have to follow protocol), that will get assigned to an engineer, who will pass it on to AD Services.  Queue the uncomfortable elevator music.

Co-workers scrambling in the back ground, like the bull pin of a busy newspaper.  Someone is keeping an active wipe board of what servers are now down, every minute someone in your cube starting to say “Have you….” Only to be cut off by your response “Still on Hold”.  Queue the music.

Other co-workers are fielding calls from App Teams reporting that their applications are offline.  Other co-workers trying to reach managers that can bypass a well-orchestrated bureaucratic separation of duties that results in elevator music while you are still on hold. Did I mention being on hold?  While on hold forty-five minutes can feel like weeks.

The saving grace (for my job), the bug I found was easily to duplicate.  It was easy to see that this behavior was not in previous versions.  As an added bonus those service accounts started becoming unique real quick.

DEMO: THE BUG I LEARNED ALL ABOUT

We’ll skip ahead a bit.  Say you are installing SQL Server 2008 R2.  We’ve gotten up to the Server Configuration where we are punching in our passwords.  First let’s open up our Event Viewer, click on our Security Tab and clear it out. 


*If this were anything other than my personal VM I would backup the log so we could restore it, do not clear out a security log on a prod server without proper guidance.


Now the only event in our log is the event denoting that our log has been cleared.  Back to SQL Server. 
 

We will click on the Use the same account for all SQL Server services button and type in our .\s-sqlsrv service account.  Definitely not following best practices here.  SQL Engine, SQL Agent, and SSIS all getting the same service account. 



Let’s Type the password in wrong and see what happens?  Click OK.  Click Next.



SQL reacted just like we thought.  Theoretically we should have 1 bad login check right?  The same user name was in use, we don’t need to validate it 3 more times.  One should do.  Perhaps at most we’ve got three validation checks right?


Let’s head over to our trusty error log and see. 

We’ve gone from 1 to 13 errors in the click of a button.  How many failed logins do we have?  Not 1, 2, 3, 4, 5, 6, 7, but 8 failed logins from one attempt.  You’ll get this if you use the button or if you do not use the button.

You may be asking did this get fixed in SQL 2012?



One look at the installer and you can see the button is gone.  Let’s punch in the same service account name and an incorrect password.



And now on to our error log.



Wow!  Six entries, now we are looking at 3 entries per account.  Nope didn’t get any better.

WRAP IT UP

Long story short, make sure those passwords are correct.  Personally I like to use a utility like KeePass to generate, store, and copy my passwords from.  Anything that keeps me from typing.  Or as the case may be mistyping J.

As always Dear Reader, Thanks for stopping by!

Thanks,

Brad