Thursday, May 22, 2014

Introducing What_To_Compress V2

Hello Dear Reader!  I'm about 32,000 feet in the air leaving St. Louis after a great Performance Tuning Workshop for Pragmatic Works.  While there Jason Strate (@StrateSQL | Blog) and I had a two day class, and I was able meet some great people and speak with the St Louis SSUG about the new features in SQL Server 2014.  It was a good trip, but I'm happy to be on the way home.

"So Balls", you say, "The blog title is Introducing What_To_Compress, what is that?"

Great question Dear Reader!  For quite a few years I've been presenting, blogging, and writing about Compression.  On the Resource Page I have a list of those presentations as well as the scripts I use.  I'd been thinking about putting together a script to give compression recommendations since I first did my deep dive at the PASS Summit on the subject back in 2011.

About a year ago I did just that.  I've tossed this script around to some co-workers, friends, SQL People, and MVP's and asked for feedback.  I'm finally at the point that I'm ready to release the first version.  So without further ado, here it is.

WHAT_TO_COMPRESS V2

First off this is not a stored procedure like Jason's sp_IndexAnalysis script.  I'll probably make it dynamic in the future, but that is a next release.  If you want to jump straight to the download click here to get What_To_CompressV2.sql. 


This takes the Best Practices that I've been preaching about and applies it to telling you what to compress.  It looks at every Index, every Table, by partition, and gathers the In_Row_Data, Row_OverFlow_Data, and Lob_Data counts.  It tells you the Percentage of COMPRESSIBLE and UNCOMPRESSIBLE data per table, what the Scan and Update patterns are for your tables & Indexes, and makes a recommendation on the level of compression you should use.

It also gives you my detailed reasoning behind the recommendation that I've given you.  For example:

"The Percentage of Scan and Seek operations is 0.00% and the average amount of Update operations is 0.00%.  Data that can be compressed makes up 100.00% of this table.  There is no workload for the current table.  Please wait for the usage statistics to become representative of a typical work load.  If this is a typical work load, this is an excellent candidate for Page Compression.  Test with sp_estimate_data_compression_savings.  Remember that it takes 5% of the tables size and moves it to tempDB.  Validate that you have enough room on your server to perform this operation before attempting."

"The Percentage of Scan and Seek operations is 0.60% and the average amount of Update operations is 99.00%.  Data that can be compressed makes up 100.00% of this table.  However Based on the workload of this server this table should not be compressed.  If you apply Row or Page Compression it will have a higher CPU cost because of the low Seek and Scan Ratio.  Test with sp_estimate_data_compression_savings.  Remember that it takes 5% of the tables size and moves it to tempDB.  Validate that you have enough room on your server to perform this operation before attempting."

"The amount of Uncompressible data in this table does not make it a match for compression.  Data that can be compressed makes up 17.12% of this table.  While data that cannot be compressed makes up 82.88% of this table."

There is one parameter within the script that allows you to set the number of Pages in a table that you want to consider for compression.  By default the number is set at 8 pages, but you can increase that if you would like.

"So Balls", you say, "This sounds great but isn't their a built in stored procedure that can estimate size compression already in SQL Server?"

Yes, there is Dear Reader.  The built in stored procedure has a few things that we should discuss.

SP_ESTIMATE_DATA_COMPRESSION_SAVINGS


The first thing you should know before you use a tool is how it works, and what it does.  You wouldn't normally use a nail gun to open a beer.  You could, but it's not the right tool for the job.

The way sp_estimate_data_compression_savings works is that it takes the table that you specify, moves 5% of it into tempdb applies the compression you specify, and then extrapolates that estimate out over the size of your entire table.  It does a nice job of taking fragmentation into account in order not to give you an inaccurate information.  The key phrase that defines my root concern is, *it takes 5% of your table and moves into tempdb*.  For small tables this probably isn't an issue.  For VLDBs that have very large tables, this is a pretty big deal.

There are some well meaning community scripts available on blogs and codeplex that take sp_estimate_data_compression_savings and wrap it in a cursor to estimate the space savings for each table.  They do this estimation for Row and Page compression, for every table in your database.

This step tells us the space savings, but their are other settings we should take into account.  We should look at those before we begin estimating compression savings across the board.  What should we look at first?


  1. Our Allocation Units.  Only IN_ROW_DATA compresses.  Tables with a lot of LOB data types may not see any advantage in compression.  Even if they slightly compress the over head on those tables can make queries less efficient.
  2. Do we read from our tables?  If we do a lot of scans, seeks, and lookups from our tables this could indicate whether Page or Row compression would give us the best performance.
  3. Do we update our tables often?  Notice I said update.  Not delete, not insert, update.  When we apply compression we remove all the extra white space from fixed length data making all data types, that can use compression, in affect variable length fields.  This can lead to increased Page Splits, specifically mid-Page Splits, aka LOP_DELETE_SPLITs.  For more on Page Splits and mid-Page Splits see my blog, How to Find Bad Page Splits.
  4. SP_ESTIMATE_DATA_COMPRESSION_SAVINGS doesn't look at any of these.  Why estimate all the different compression types without first identifying your table that are proper candidates and looking at what the overall size of those tables are.
  5. You have to have Enterprise Edition to run sp_estimate_data_compression_savings.  You can run What_To_Compress on Standard Edition.


I wouldn't avoid using sp_estimate_data_compression_savings.  However, it wouldn't be the first thing that I run when looking at what to compress.

FEEDBACK

Okay Dear Reader, I need your help.  Run this.  Let me know what you think, what it's missing, anything you can think of.  I'll try to get a v3 out in the next couple months based on feedback.  Most of all if it works, drop me a line!  I love hearing success stories on compression.  Send your emails to: bball@pragmaticworks.com.  And as always Dear Reader, Thank you for stopping by.


Thanks,

Brad



2 comments:

  1. Saw this on a tweet. Pretty cool. Feedback, hmmn... several fact tables came back in my results with recommendations to do nothing. Not sure this makes sense, but then again, I need to experiment might be true because only 83mil rows. I'll let you know if I find anything in this effort to suggest, but PDC (pretty dang cool)! Thanks for sharing!

    ReplyDelete
  2. Looking forward to the v3. Any ETA yet?

    ReplyDelete