Tuesday, October 11, 2011

PASS Summit Day 2: Monday Pre-Con


Our instructor is Maciej Pilecki (@DrHouseofSQL).  I’m familiar with Maciej because of MagicPASS, the SSUG I call home.  He did a presentation to us on the memory cache back in June.  It was amazing and I had promised to write a blog on that, and have not gotten that one published yet.  However this one will go first and what a day it has been.  Maciej is an MCM (Microsoft Certified Master), and a SQL Server MVP.  He has presented at conferences like the PASS Summit, Tech Ed North America, and many many more.

We spent the first half of the day discussing the SQL OS and the way that SQL Server uses CPU’s.   The second half of the day was all about Memory, and we’ve got some big changes coming in Denali.  So of particular interest is not just how we work now, but how we are going to be working in Memory in the future. 

WHAT IS THE SQL OS
SQL Server has it’s own Operating System.  WHAT!?!  Yep you heard me. 

“But Balls”, you say, “You install SQL Server on Windows right?”

Yes Dear Reader you do!  However Windows and the way that it handles processing is very different than how we would like processing to occur within SQL Server.   Windows uses Pre-emptive Scheduling.  Each Thread, work that is being done on your CPU/Processor, is given a quantum of time.  Windows as an Operating System must manage that time and make sure everybody get’s a turn. 

This is similar to having multiple sugar infused children, all of them run up to you wanting to tell you something.  You let one go, and depending on how much they are jumping over one another you may need to stop one before they are done, and have another one start to tell their story.  You’ll get back to the first one, but your job is to make sure everybody get’s a turn.  Windows will stop your Threads as they are working before they are finished, they will let them have another turn, but they need to make sure all the kids get a chance to talk.

SQL OS is a Cooperative Scheduling OS, it is a bunch of orderly Kiddo’s lining up to take turns telling you a story.  And if one needs to Wait and think they go to the back of the line and let the next one speak.  They Cooperate to make sure that things happen as orderly as possible, but most importantly if you are telling your story, your Thread is working, no one stops it until it is ready to stop.

“But Balls”, you say, “When did this change occur, this is huge?”

Yes Dear Reader it is huge!  But take heart we have all been using the SQL OS since version 7.0.  It has been maturing over the years and we see a lot of benefits of that today in 2008 R2 and we will see even more in Denali.   We spent the rest of the morning talking about how CPU’s work within the SQL OS, and I learned more that I have time to write.  This was an incredible day of training, and I can’t wait to apply some of what I learned.


LUNCH

Some times you just have to smile at how things work out.  At the lunch break I went down a little later than the rest of the group.   I was walking down  and I saw my friend Eric Wishdal (@EricWisdahl | Blog), and I asked if I could join him.  Eric is a veteran of many SQL Saturday’s and PASS events in Florida.  He is a great guy and has a very sharp wit.  So Thousands of miles from home I meet up with a friend who is just up the road.

As we waited through the line and got our food, great meal and props to the folks at PASS, we arrived at the end and I struck up a conversation with Brent Ozar (@BrentO | Blog) and along walks Jeremiah Peschka (@Peschkaj | Blog).

If you are reading my blog more than likely you’ve heard of these guys, they’re kinda a big deal around here.  We all sit down at a table as we are eating we are joined by Mike Hilwig (@mikehilwig | Blog).  I met Mike at SQL Saturday 79 in South Florida, and he is awesome.   It was really nice lunch with great company, as Stan Lee would say ‘Nuff Said!

MEMORY, RESOURCE GOVERNOR, AND EXTENDED EVENTS
We spent the second half of the day learning about how Memory and Buffers work and are managed within the SQL OS, how Resource Governor works internally, how you can bind SQL to multiple ports and affinity each port to a specific set of CPU’s, what Maciej called a poor man’s Resource Governor, looked at Extended Events, and some of the new features coming up in Denali.

This was an absolutely wonderful day, and I learned a lot about a portion of SQL that I had only just started to scratch the surface on.  

The rest of the evening was spent writing and in the room.  There is a lot to do this week, and I’m looking forward to some of the social activities tonight, but I wasn’t feeling up to going out.  I got a pizza and sat down to do a lot of writing.

And now on to day 3!

Thanks,

Brad

Lesson 2: Internal Structures, Vardecimal, & Row Compression


SQL University Compression Week: Lesson 1 About Compression

Welcome to Lesson 2 on Compression and welcome back Dear Reader.   I want to thank the Chancellor of SQL University Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.  Yesterday we discussed What Kind of Compression SQL Server uses and Why you would want to use Compression.  Today let’s get down and dirty with the first level of Compression.  But first let’s do a history lesion.

Compression was first introduced in SQL 2005 SP2, as Vardecimal Compression.  In SQL 2005 you would have to enable the Database for Compression and then enable the table that you wanted to Compress.  The compression would only work on Numeric or Decimal data types. 

If you enabled Vardecimal Compression it would store Decimal and Numeric data as a new data type, Vardecimal.  But to understand what that data type does we need to talk about the Fixed Length portion of a Data Record.

LET’S START AT THE BEGINNING

When we talk about Compression we have to look at our data from the inside out.  So I like to make a reverse pyramid.  We understand when we create a table there is a hierarchy to it.  Tables have Columns and Rows, Columns and Rows have fields.  When you break it down to the smallest unit storage, you would think it is a field, but it’s a little bit different.


Instead of Tables, Columns, Rows, and Fields we have IAM Chains and Allocation Units, Allocation Bitmaps, Extents, Pages, and Records.  What we care about with Row Compression are the Records.   If you look up above you’ll see what I like to call my Mall map of SQL Internal Storage.  And you are here in the Records section today.  So let’s take a closer look at what that records looks like.



This is from a slide I have in my Compression talks.  If you look above you’ll find the structure of a regular record as it is stored on a page.   This is courtesy of Paul Randal (@PaulRandal | Blog) go and look at his Data Structures MCM video it will give you a deeper understanding of how SQL Storage works internally.    A Record essentially equals a row of data in a table (the deeper you dive the trickier this get’s but stick with me Row=Record). The part that we care about the most is the Fixed Length Columns.   If you declare a Decimal or Numeric data type and insert data into it, that would be stored on the Fixed Length Column of the Record.  If you declare a decimal(38,38) and put a number, 0.12345678900123456789012345678902134567, it takes up the exact same amount of space as if you stored 0.1234.   That is what a Fixed Length column means, regardless of what you use you take up all the space.

When Vardecimal Compression occurs it changes where the record is stored, it goes to the variable length storage instead of fixed.   So it is altering how the storage of a particular record data type occurs.  So now instead of 0.1234 taking up 38 bytes of data it only takes up 4 bytes of data, in other words we need only what we use.

SQL 2008 takes it a step further not only do you change how it is stored, but you change the physical storage of Data at a Record level. With Row Compression we added a lot more data types Smallint, int, bigint, bit, smallmoney, money, float, real, datetime, datetime2, datetimeoffset, char, binary, timestamp/rowversion.  SQL 2008 R2 added Unicode Compression for nchar and nvarchar. So if it is fixed length and we Row Compress it we remove all the unused space, and here is how we store it.


Paul Randal is a busy man, he has another MCM video on New Database Structures in SQL Server 2008 (go watch it).  Compression wasn’t the only new data structure we got with 2008 go have a watch and you’ll learn all about it.

So now that we know all about how Row Compression changes our storage, let’s play around with it.

DEMO

First we need to create our database that we will be using.  I’m doing this in SQL 2008 R2, so to make sure your results match up I would suggest 2008 R2 as well, however if you have 2008 for this example we will be fine.

USE master;
Go
IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demoCOMPRESSION')
     BEGIN
          DROP Database demoCOMPRESSION
     END
    
CREATE DATABASE demoCOMPRESSION
GO

USE demoCompression
GO

So we’ve got our database our next step is to create a table and populate it with some records.
CREATE TABLE dbo.myTable1(
          myID int IDENTITY(1,1)
          ,myChar CHAR(8000)
          ,CONSTRAINT PK_myTable1_myID
          PRIMARY KEY CLUSTERED(myID)
          );
GO

/*
Insert Some Records
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
     WHILE (@i<10000)
          BEGIN
              INSERT INTO dbo.myTable1(mychar)
              VALUES('myData');
              SET @i = @i +1

          END
COMMIT TRAN
Now that we’ve got a Table and Data we have something to play with we need a baeline.  Remember KNOW THY DATA!  To know our data a little more I would suggest turning on STATISTICS IO & TIME then running a select.  Now this is very figurative, if you have a more targeted work load or set of stored procedures you would want to test those by getting the statistics and query plans so you can see IF compression changes anything for the better or worse.

/*
Do a select to look
at the CPU time and I/Os
Before Compression
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT * FROM dbo.myTable1

  At this point set those baselines to the side.   I would use sp_estimate_data_compression_savings on the table specifying ROW Compression.  This will take a 5% sample of the table move it into your tempdb and apply the Compression setting that you specify, in this case row.  It will return the results and then give you an estimate based on that sample that tells you what it believes your compression rate will be.

sp_estimate_data_compression_savings 'dbo', 'myTable1', NULL, NULL, ROW ;
GO

When I apply compression I like to use sp_spaceused to see the space the table takes up before and after.

/*
How Big is our Clustered Index
*/
sp_spaceused 'dbo.myTable1';
go


/*
Rebuild Our Table with Compression
*/
ALTER TABLE dbo.myTable1
REBUILD WITH (DATA_COMPRESSION = ROW);
go

 /*
How Big is our Clustered Index
*/
sp_spaceused 'dbo.myTable1';
GO

As you can see we reduced the table from around 80 MB to 288 KB.  Now let’s run our query to get our comparison baseline and see how query performance was effected.
/*
Do a select to look
at the CPU time and I/Os
After Compression
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT * FROM dbo.myTable1

Home work, go watch Paul Randal’s MCM videos .  Take the Row Compression Demo and add some more columns and Fixed Length Data Types to it, and see what you can Compress.  Happy Compressing!

Thanks,

Brad                                                                     

Monday, October 10, 2011

SQL University Compression Week: Lesson 1 About Compression

Hello Dear Reader!  Welcome to Lesson 1 on Compression.  I want to thank the Chancellor of SQL University Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.  Before we talk about how to turn it on, where we should apply it, or even the types of compression that there are available to us there are two burning questions that we need to answer.  What is Compression and Why should I use it?




A PICTURE IS WORTH 1000 WORDS, BUT I WOULDN’T’ TRUST IT WITH YOUR DATA
Let’s get a 40,000 foot overview of Compression.  When the good folks at Microsoft were looking at adding Compression into SQL Server the first thing they had to answer was, What kind of Compression will we use?  If you are new to the Compression game then you may not know that there are a lot of different ways to Compress Data.  Sure they all do the same thing, reduce the overall size and foot print of the Data you are Compressing, but we use different types of Compression differently.

On one end of the Scale you have Compression that reduces the overall size of the file but it does so buy stripping out what it deems as unneeded or necessary data.  The best example I can think for this is photography.   A RAW photograph file is huge.   But it contains every last bit of detail that your camera can capture.   When you look at some of the breathtaking pictures that line the magazines of scenic landscapes, athletic accomplishment, or new events, those photographs were taken in a format to capture all of the data.  When you browse USA Today on your mobile phone, or on your computer you receive this photograph in JPEG, GIF, or PNG format not in RAW.  The reason it’s smaller.  

“But Balls,” you say, “How did it get smaller?” 

The answer Dear Reader is Compression.  JPEG, GIF, and PNG are Compressed file formats for pictures.  Why do we Compress pictures?  Because the human eye can only detect so much detail.  Some people can detect more than others, but eventually we hit a threshold were we cannot detect anything more.  So the extra Data becomes unnecessary overhead at that point.  So we strip it out.  

Jim Grey went through the trouble to come up with ACID, We Need to Talk Your Database is on ACID, and Microsoft had adopted it into the core programming of SQL we can assume that they would not want to strip out data.  As Database professionals we would hope that we are preserving our data, and we are.

So we know the kind of Compression that we will use must preserve our Data,  the next question is how hard do we want to work for it?   We could use something like Winzip, and Compress our files down to the lowest possible value, preserving our data and getting it really really small.  However that only works if you aren’t planning on doing that every time you want to read it.  Have you ever zipped a 2 GB file?  It takes a little while.  Have you ever unzipped a 2 GB file, takes a little while too.  Imagine if every time you wanted to read your multi GB Database you had to sit through the same scenario.

We are still operating a Database, while reducing our Data footprint is laudable we wouldn’t want to do it at the expense of performance.  As a matter of fact if at all possible we would want performance to increase not decrease.

I know we are picky and asking a lot, Shrink the data, but don’t loose any, make it smaller but easy to access, and if possible make it work faster (in some not in all cases).   Microsoft listened and the particular flavor of Compression we got was Page & Row Compression. 

The first stab at Compression actually came up in SQL 2005 SP2 and it was called Vardecimal Compression.  It only worked on the Numeric and Decimal Data Types, but it used concepts that we will be reviewing when we discuss Row Compression.   In SQL 2008 we got Page and Row Compression and Vardecimal Compression was immediately deprecated.  You can still use Vardecimal Compression in 2008, but if you really want to use Compression you will get a lot more bang for your buck with Row Compression, more on that tomorrow.

WHY IN THE WIDE WIDE WORLD OF SPORTS SHOULD I USE COMPRESSION

Now that we understand the basic idea of what Compression is, the most important question you will ask is Why should I use it!  Disks are cheap.  I hear it all the time.  You’ve probably seen the ad’s for 1 TB hard drives for sale on Black Friday for $30.  You are correct Disks are cheap and they are getting cheaper all the time, however, I bet you don’t go and buy the Hard drives for your server on Black Friday.

As a matter of fact if you have SAN, Shared Attached Network storage?, then you need to buy different disks altogether.  You probably need to have a port in the back of them that allows for a HBA, Host Bus Adaptor, that takes a fiber cable.  The reason?  If you are using a SAN device you want the quickest possible connection for your database server.   And those HBA ports raise the price of those hard drives.  And if you are using Solid State hard drives on a SAN that price goes up even more.  Now we are talking from 10’s of dollars to 100’s of dollars.  And if you have a SAN your not going to have just one hard drive in there.  You will have multiple drives striped in some sort of RAID Array.  How cheap does it sound now?

But wait there’s more.  If you have a lot of SAN’s that can get pretty complicated.  Your probably going to need a SAN administrator.  And you may need more than one if you are a big enough shop.  So now we’re not just talking hardware but people as well.


All this for a database?  One lowly single database?  I would say most of us don’t have one Database to worry about.   You probably have a Prod (Production) environment, maybe a UAT/Staging, a Test, a Dev (Development) or even a DR/COOP.   You may have a Data warehouse as well.   You could have multiple copies of your database for multiple projects.

And you probably don’t have just one Application you are supporting.  Most places have 10’s, 100’s, and in some cases even 1,000’s of Database across just as many instances that they maintain.  When you start thinking about the scope that your data touches, you start to realize just how much you have out there.

As Database professionals we try to normalize our data to prevent data duplication, but by nature of the change control process we will always have it.  On top of that there are 3 performance bottle necks we normally encounter; CPU, Memory, and IO (Input/Output). 
The option to re-write code is always there, but how often have you been in a situation where management decides to throw heavier hardware at a problem because it is cheaper and more cost effective than a re-write?  If you haven’t been there you will eventually.  You can toss more CPU on a server to get more processing power.  More RAM to get more memory and to speed up operations because we can hold more in memory.  But with Disk, if I throw more disk at a server I just get more storage room, it doesn’t help me with performance.

Page and Row Compression can do just that.  The cost to the server is CPU cycles which most database servers have to spare.  *Just a side note if your server is averaging around 60% CPU utilization or higher I would not recommend using Compression without thorough testing to ensure it would not reduce existing performance.  We reduce the size of the structure not only on disk but in memory.  Pages of a databases are compressed on DISK and in Memory.  This is such an important concept I’m going to say it one more time, on DISK and in MEMORY!  So not only do you save room, reduce the size of a table or index.  Make it quicker to read off of disk because of a reduced size, but you take up less Page Cache because your Data Pages are compressed in memory.

 So  you can reduce your data foot print across multiple environments, possibly speed up performance by reducing the number of reads to process a query, and possibly use your memory cache more efficiently.   I don’t think the question is why should I use Compression, but why am I not using compression.

WARNING: DANGER WILL ROBINSON


I’m not going to sell you a bill of goods, that is just snake oil.  There are some serious things that you need to keep in mind when it comes to Compression.
  1. Know Thy DATA!
  2. Take Baselines
  3. Know Thy DATA!

I like to think of Data Compression like Indexes.  You wouldn’t just put an index on a table and never check to see if it is being used.  You would check to see that the query plans you have are still running efficiently, and you would monitor your performance.

You need to do the same thing with Compression.  Know how your queries perform before hand, know how they perform after Compression.  If your CPU shoot’s up you will need to trouble shoot it, we’ll cover this later in the week.  If your data types are not right, don’t compress them in the first place, later in the week as well.

HOME WORK

Your Assignment should you choose to accept it is to go read some of the Blogs by the SQL Server Storage Engine Team on the Topic of Compression, Sunil Agwilari is the Project Manager for the Engine Team and he has written a lot on this topic.

Data Compression: Why Do we need it?

Data compression techniques and trade offs

Thanks,

Brad