Tuesday, January 25, 2011

Vote For SQL Rally! And Why you should vote for Page and Row Compression.



I received an email on behalf of the Organizers of SQL Rally, from Mr. Kendal Van Dyke (Blog|Twitter), and it felt like a call to action.  It asked that we who have submitted a topic blog about it, post on Twitter, Facebook, and Linked-In.  It had some quotes from Andy Warren (Blog | Twitter), here are Andy’s word.

"Vote for me" is a start, but why vote for you? Show me why I need to learn the material and why I should pick you to teach it to me. Do you have experience? Credentials? Passion for the topic? Have you been paying your dues in our community and earned the chance to step up? What we’re trying to do here is put more of the marketing work on the speakers. It’s not enough to just submit an idea, you have to help us get people to want to attend, and that starts with convincing enough people that you’ve got the best content out of the options on your track!

This get’s a big (….check’s that the kids are in bed…) HELL YA from me!  I feel passionate about the topic I’m presenting on, and I would hope so.  I’ve spent quite a bit of time researching it, figuring out how it works, how and when to apply it, and most importantly WHY you would want to. 

So let’s start there why.
Page and Row Compression How, When, and Why

One of the great things about presenting is you get to meet a lot of brilliant people, Bill Ramos (Blog| Twitter) shared with me some insight he had received from the ever brilliant Sunil Agarwal, as to why you should use compression.  It was so good that it is my #1 reason.

1st Let’s talk space.  Sure hard drives aren’t as expensive as they used to be, unless we are talking Solid State Drives, so one question you may have is why do I care about compressing data and saving (*Cheap*) hard drive space.







Production

Well let’s say you have a Database, and let’s say this is your Production Database Server.  Well Ball’s”, you say, I can go out and get a 1 TB hard drive from Office Depot for $100.  Why do I care about compression?” 

We’ll Dear Reader I would say you probably didn’t buy your companies hard drives at Office Depot during black Friday.  You may be using a SAN and the hard drives you put in that SAN may not be the cheapest you could find, probably wanted a bit of Quality, a bit of not super expensive.

I would wager to say that you probably don’t have just one Database Instance at your company.



                        
Production          Staging                 Test                    Dev      
                 
You could have this Database Instance in Multiple Environments.  Don’t forget your offsite Disaster Recovery/COOP (Continuation Of Operations Planning). 

So multiple servers, separate hardware cost, perhaps more than one SAN, the time and effort of SAN Administrators, or outside vendors like EMC, the hard drive space is starting to add up.  And as business look to get a control of or reduce their data footprint Compression can help.

2nd Reduced Memory Footprint, Compression happens at the Record and Page levels.  

The structures that we know and have become familiar with are physically changed.  When compressed pages are read off of the hard drive they are stored into Memory, the buffer pool, in a compressed state.

This is a very important distinction, so I want to repeat it.  Compressed Records and Pages are still stored in a compressed state in memory.  

So why is this important?  

There are 2 ways to read records in SQL Server, one is off of the Hard Disk and is a Physical Read.  When a Page is read off of the disk it is then loaded into the buffer pool before SQL can use it.  This means that we are able to load more records onto Compressed Pages, which translates into being able to load more data into the buffer pool for SQL to read.  This would normally only be possible by adding more RAM to a server and increasing the amount of Memory that SQL Server can utilize.  Loading pages into the buffer pool allows SQL to get the information quicker is called a Logical Read.

Think of it like the human memory.  

I have a conversation with you 1st thing in the morning about compression, we stop to go grab some coffee, and then we have meetings and work to do, 5 hours later I start to have the same conversation and you have to search your mind to pick up where we left off.  That is a Physical Read, and it took a little more effort.

Now let's try again.

I have a conversation with you 1st thing in the morning about compression, we stop to go grab some coffee, and then come back 5 minutes later and keep having the conversation.  The conversation is fresh in your mind, you don't have to search around for it, it is still in your buffer pool.  

By applying compression properly we just expanded SQL Server's Memory without having to change our physical hardware at all.

3rd Trading out CPU Cycles for Physical and Logical Reads.  

In SQL Server you can have hardware bottlenecks of CPU, Memory, IO, and Network.  Most SQL Servers that I have seen in the Public Sector tend to have very little CPU overhead, in the private sector it can be a bit more mixed.  But in either setting it is not uncommon to find SQL Servers that have a low usage of their CPU’s.

Just a moment ago I wrote about how the internal structure of records and pages are changed by compression.  You read about how they are still compressed in memory, so you might wonder how it is that SQL accounts for all these different types of pages and records?  The simple answer is it doesn’t.  When the storage engine passes Data Pages from the buffer pool to the relational engine the pages are translated back to their original format.  The additional effort of reading compressed records, and de-compressing them for the relation engine accounts for some of the CPU overhead that Compression adds to a Server.

This is a balancing act and SQL Server accounts for it by shifting some of the load from our Memory and Physical Disks to our CPU’s, which in the majority of cases can handle it.  

DEMO

So Let’s do a little demo tease to give you an idea of how this can help you.  I’m using the AdventureWorks2008R2 Database for this example.  Here is a quick little script

SET STATISTICS IO ON
SET STATISTICS TIME ON

select
    *
from Sales.SalesOrderHeader soh
left join Sales.SalesOrderDetail sod
on soh.SalesOrderID= sod.SalesOrderID

Now we will execute this script twice and get the second set of statistics, so that way we are only looking at the Logical IO’s required for the query.

We give this a run and we see:


It took 1240 logical reads from table Sales.SalesOrderDetail, 686 Logcial Reads for table Sales.SalesOrderHeader, and a CPU time of 1014 ms to return 121,317 rows of data in 5105 ms.

Now we will apply Page Compression to the Sales.SalesOrderDetail table and look at the reduction of logical reads for that table, as well as any increase in CPU time.

/*
Rebuild our table with
Page Compression
*/
USE [AdventureWorks2008R2]
ALTER TABLE [Sales].[SalesOrderDetail] REBUILD
WITH (DATA_COMPRESSION = PAGE);

/*
Set Statistics On
and re-execute our
query
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON

select
    *
from Sales.SalesOrderHeader soh
left join Sales.SalesOrderDetail sod
on soh.SalesOrderID= sod.SalesOrderID

Now Let’s take a look at our results.



And we see our Logical Reads for table Sales.SalesOrderDetail were reduced to 598 from 1240, A Reduction of almost 52%!  We didn’t change Sales.SalesOrderHeader and the Logical Reads remained at 686, our CPU time went from 1014 to 1716 ms, and our elapse time dropped from 5105 to 5079 ms.

“We’re What You Call Professionals”






Unlike the Myth Busters I do want you to try this at home, and probably at work as well.  I’ve busted my hump to research, learn, and apply this topic, set up demo’s showing how you can find your biggest tables, what the allocation unit make-up of the tables and indexes are (*this is important because only In-Row-Data compresses).  How to use DMV’s to track your Scan and Update usage per object, and how scan’s and updates could affect the type of compression you want to use.

So the order I want you to go in is:


1. Gather Baselines-You need to know what your hardware performance as well as your query performance is before you apply compression!
           
     2. Look at your table’s, which tables have the largest amount of records?


    3. Look at their internal structure, are they mainly comprised of In-Row-Data?


    4. What is their pattern of usage, are they updated frequently, is the majority of the usage scans because they are historical data?


    5. Only once you have found the candidates, and determined the type of compression you want to examine use sp_estimate_data_compression_savings

*Keep in mind the over head!  The way sp_estimate_data_compression_savings works is that it takes a 5% sample of the table you are scanning and moves it to the TempDB, performs the actual compression operation and “estimates” the overall compression savings based on that 5% sample, as well as displaying the sample’s compression rate.

This means that if you try to compress a 100 GB table you need to have 5 GB worth of TempDB data space that can be used by this operation.  If you have a maintenance window, this is the sort of heavy operation you would want to do in off hours, and be sure that your TempDB can handle the space it will need.

SQL Rally, Go VOTE!

There are a lot of great sessions that have been submitted for SQL Rally.  The event is a great deal for some really great training, the Pre-Con’s are top notch professionals that you will be seeing and/or have seen at the PASS Sumit.  There are an absolute plethora of topics that are being presented, and you, the SQL community, get to vote for what you want to see.  Go make it great!  Click Here to VOTE, and Click Here to see the sessions you are voting on, and Click Here for the SQL RALLY Website.

Thanks,
Brad

No comments:

Post a Comment