Showing posts with label SQL University. Show all posts
Showing posts with label SQL University. Show all posts

Monday, October 17, 2011

SQL University Lesson 3 Page Compression

Lesson 2: Internal Structures, Vardecimal, & Row Compression



Welcome to Lesson 3 on Compression and welcome back Dear Reader.   I want to once again Thank the esteemed Chancellor of SQL University none other than Mr. Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.   Click on the link to the previous lessons to view any of the other previous days in this series.   Just a quick recap we’ve discussed the different types of Compression that exist in the world, Why you would want to use Compression, the Internal Structure of Data Records and how they and storage change when we enable Compression, Vardecimal, and Row Compression.  So now onto Page Compression.

SQL University Compression Week: Lesson 1 About Compression


“So Balls,” you say, “What is Page Compression and how does it compare to Row Compression?”

Great question Dear Reader!  Page Compression is a Super Set of Compression that includes Row Compression.  We discussed the data types that Row Compress yesterday, the data types for Page Compression are….. All of them!  Page Compression, while still only occurring on IN_ROW_Data Pages (More on that later), Compresses at a binary level.  For the additional levels of Page Compression other than Row we need to take a deeper dive again!

STARTING OUT WITH SOME MORE INTERNALS

Compression is a very internal technology.  I hope you’re not sick of learning about how things work on the inside because this is some of the most fun stuff to me, I realize how sad that will sound to my soon to be teenager.  To get a really good understanding of anything it helps to know what you are doing internally and Compression is no different.  SQL Server is made up of Records and Pages.  There are many types of Pages but the three types that we care about are all called Allocation Units.  These 3 types of Allocation Units break down into IN_ROW_Data, ROW_OVERFLOW_Data, and LOB_Data.

Out of those three types of pages only data in IN_ROW_Data Compresses.   So what kind of Data Types are IN_ROW_Data Data Types?   It is actually easier to list those that are not IN_ROW_Data as the list is much shorter.  You can Compress everything but varchar(max), varbinary(max), nvarchar(max), XML, Text, Image, NTEXT, CLR Data Types (Spatial (that changes in SQL 2012) and Geography), and Filestream.  Compression is also not compatible with Spares columns.  Anything else and you can Page Compress.

You’ll remember our Mall Map of internal structures from yesterday’s lesson.  Well today we are moving up a level and we are tackling Pages.


So today you are here on the Pages portion of our Mall Map for the subject at hand Page Compression.  Page Compression is made up of 3 different components.   Row Compression, Column Prefix Compression, and Page Dictionary Compression.   If they are all applied to a page that is the order in which SQL Server will attempt to apply them.  We discussed Row Compression at length yesterday.   So when you apply Page Compression our Record format changes to the Compressed Record Structure and Row Compression will occur removing all unneeded extra bytes.

Now I don’t know about you Dear Reader but reading binary is not a forte of mine.  So in order to explain this I’m going to use some screen shots on one of my presentation decks covering Compression.  For these next couple images I want to make sure you understand that this is not actually what a Page looks like, but right now as Louis Davidson(@DrSQL | Blog)  would say, your drinking my flavor of Kool-aid.   Next you will have Column Prefix Compression.  


We'll start by going down each column and taking the common values for the column and populating the anchor tag at the top of the page.  Looking at the first column the most common pattern is Brad, between the first 2 columns and Br with the last column.  Column Prefix Compression will take the longest value with the longest matching pattern and move that record to the Anchor Tag and replace it with 4 bits representing a special Null that points towards the top of the page.  It will also replace the first [4] Characters in Brad22 leaving [4]22 and leaving [2]et31 out of Bret31.


Now let's fill out the rest of our columns.  Daniel is our longest value with a matching pattern, we'll move that to the Anchor Tag and leave Dan21 as [3]21 and David33 as [2]vid33.  For our last column we'll take value Many31 and move that to our header and leave [2]et31 from Maet31 and [2]ny31 from Many31.



Now we have COMPLETED Row Compression and Column Prefix Compression.  The last portion of Page Compression is Page Dictionary Compression.  For Page Dictionary Compression we will look for common patterns across the entire page.  When we find them we will move them into a multi dimensional zero based array in the Dictionary portion of the Page.

We will start with [2]et31.  As you look at the page you can see two of these values.  This common value will be placed in the Dictionary portion of the page and a 0 will be entered in it's place.  The next value that we will find on the page is [3]21 we will move that value into our Dictionary portion of the page and replace it with a 1 entry for both columns.  




Now as you can see our page looks very different from where we started.   This is where our CPU over head starts to come into play.  Not only do we have the Compressed record format, but we to uncompress our pages when they are handed from the Storage Engine to the Relational Engine.

DEMO

Here is a full and complete demo.  For more scripts go to my Resources Page and download any of my presentations on Compression.  They all contain the scripts I use as demos.  For this one we will create a database, a table, we will add some data, then we will apply compression looking at the space saved.   We will be using sp_estimate_data_compression_savings.  One BIG WARNING that I like to give is that sp_estimate_data_compression_savings works by taking 5% of your actual physical table or index and copying it into your TempDB applying Compression, and then estimating the space savings against the total size of your table.

So MAKE SURE YOU HAVE ENOUGH TEMPDB SPACE before you use this.  Do not use it on your 100 GB table if you do not have 5 GB of free space in Tempdb.  For this demo our table is relatively small and you should only need MB worth of free space vs. GB.  But before you use this on something at your job make sure your not going to adversely effect your server and test this in a lower life cycle first.




USE demoCompression
GO
/*
Set our Statistics ON
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON
/*
Create a Heap Table
*/
CREATE TABLE heap2(myID int IDENTITY(1,1), myChar CHAR(8000));
GO

/*
Insert Some Records
We are going to Fill up a lot more
of the Char(8000) this time
aa= 2 * 1000 = 2000
1234= 4 * 100 = 400
bb= 2 * 1000 = 2000
mydataandOtherStuff1234 = 23 * 347 = 7981
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
     WHILE (@i<15000)
          BEGIN
              IF (@i<=1000)
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>1000) AND (@i<=2000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>2000) AND (@i<=3000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END 
              ELSE IF ((@i>3000) AND (@i<=4000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>4000) AND (@i<=5000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>5000) AND (@i<=6000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END 
              ELSE IF ((@i>6000) AND (@i<=7000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>7000) AND (@i<=8000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>8000) AND (@i<=9000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END      
              ELSE IF ((@i>9000) AND (@i<=10000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>10000) AND (@i<=11000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>11000) AND (@i<=12000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END      
              ELSE IF ((@i>12000) AND (@i<=13000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>13000) AND (@i<=14000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END 
              ELSE
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('mydataandOtherStuff1234', 347));
                        SET @i = @i +1
                   END
          END
COMMIT TRAN
/*
Quick Select of the data
*/

SELECT * FROM dbo.heap2

/*
How big is our Table
*/
sp_spaceused 'dbo.heap2';
GO

/*
Estimate our space savings
*/
sp_estimate_data_compression_savings 'dbo', 'heap2', NULL, NULL, ROW;
GO

/*
We reduced by around 1/3
Can we do better with Page Compression?
*/
sp_estimate_data_compression_savings 'dbo', 'heap2', NULL, NULL, PAGE;
GO

/*
How big is our Table
*/
sp_spaceused 'dbo.heap2';
GO

/*
Rebuild With Compression
*/
ALTER TABLE dbo.heap2
REBUILD WITH(DATA_COMPRESSION=PAGE);
GO


/*
How big is our Table
*/
sp_spaceused 'dbo.heap2';
GO

/*
Do another select
to get the Statistics
*/
SELECT * FROM dbo.heap2

/*
Turn Off Compression
*/
ALTER TABLE dbo.heap2
REBUILD WITH(DATA_COMPRESSION=NONE);
GO

Alright Dear Reader!  Your homework do the demo and review row compression make sure you know the difference between the two!

Thanks,

Brad

Tuesday, October 11, 2011

Lesson 2: Internal Structures, Vardecimal, & Row Compression


SQL University Compression Week: Lesson 1 About Compression

Welcome to Lesson 2 on Compression and welcome back Dear Reader.   I want to thank the Chancellor of SQL University Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.  Yesterday we discussed What Kind of Compression SQL Server uses and Why you would want to use Compression.  Today let’s get down and dirty with the first level of Compression.  But first let’s do a history lesion.

Compression was first introduced in SQL 2005 SP2, as Vardecimal Compression.  In SQL 2005 you would have to enable the Database for Compression and then enable the table that you wanted to Compress.  The compression would only work on Numeric or Decimal data types. 

If you enabled Vardecimal Compression it would store Decimal and Numeric data as a new data type, Vardecimal.  But to understand what that data type does we need to talk about the Fixed Length portion of a Data Record.

LET’S START AT THE BEGINNING

When we talk about Compression we have to look at our data from the inside out.  So I like to make a reverse pyramid.  We understand when we create a table there is a hierarchy to it.  Tables have Columns and Rows, Columns and Rows have fields.  When you break it down to the smallest unit storage, you would think it is a field, but it’s a little bit different.


Instead of Tables, Columns, Rows, and Fields we have IAM Chains and Allocation Units, Allocation Bitmaps, Extents, Pages, and Records.  What we care about with Row Compression are the Records.   If you look up above you’ll see what I like to call my Mall map of SQL Internal Storage.  And you are here in the Records section today.  So let’s take a closer look at what that records looks like.



This is from a slide I have in my Compression talks.  If you look above you’ll find the structure of a regular record as it is stored on a page.   This is courtesy of Paul Randal (@PaulRandal | Blog) go and look at his Data Structures MCM video it will give you a deeper understanding of how SQL Storage works internally.    A Record essentially equals a row of data in a table (the deeper you dive the trickier this get’s but stick with me Row=Record). The part that we care about the most is the Fixed Length Columns.   If you declare a Decimal or Numeric data type and insert data into it, that would be stored on the Fixed Length Column of the Record.  If you declare a decimal(38,38) and put a number, 0.12345678900123456789012345678902134567, it takes up the exact same amount of space as if you stored 0.1234.   That is what a Fixed Length column means, regardless of what you use you take up all the space.

When Vardecimal Compression occurs it changes where the record is stored, it goes to the variable length storage instead of fixed.   So it is altering how the storage of a particular record data type occurs.  So now instead of 0.1234 taking up 38 bytes of data it only takes up 4 bytes of data, in other words we need only what we use.

SQL 2008 takes it a step further not only do you change how it is stored, but you change the physical storage of Data at a Record level. With Row Compression we added a lot more data types Smallint, int, bigint, bit, smallmoney, money, float, real, datetime, datetime2, datetimeoffset, char, binary, timestamp/rowversion.  SQL 2008 R2 added Unicode Compression for nchar and nvarchar. So if it is fixed length and we Row Compress it we remove all the unused space, and here is how we store it.


Paul Randal is a busy man, he has another MCM video on New Database Structures in SQL Server 2008 (go watch it).  Compression wasn’t the only new data structure we got with 2008 go have a watch and you’ll learn all about it.

So now that we know all about how Row Compression changes our storage, let’s play around with it.

DEMO

First we need to create our database that we will be using.  I’m doing this in SQL 2008 R2, so to make sure your results match up I would suggest 2008 R2 as well, however if you have 2008 for this example we will be fine.

USE master;
Go
IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demoCOMPRESSION')
     BEGIN
          DROP Database demoCOMPRESSION
     END
    
CREATE DATABASE demoCOMPRESSION
GO

USE demoCompression
GO

So we’ve got our database our next step is to create a table and populate it with some records.
CREATE TABLE dbo.myTable1(
          myID int IDENTITY(1,1)
          ,myChar CHAR(8000)
          ,CONSTRAINT PK_myTable1_myID
          PRIMARY KEY CLUSTERED(myID)
          );
GO

/*
Insert Some Records
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
     WHILE (@i<10000)
          BEGIN
              INSERT INTO dbo.myTable1(mychar)
              VALUES('myData');
              SET @i = @i +1

          END
COMMIT TRAN
Now that we’ve got a Table and Data we have something to play with we need a baeline.  Remember KNOW THY DATA!  To know our data a little more I would suggest turning on STATISTICS IO & TIME then running a select.  Now this is very figurative, if you have a more targeted work load or set of stored procedures you would want to test those by getting the statistics and query plans so you can see IF compression changes anything for the better or worse.

/*
Do a select to look
at the CPU time and I/Os
Before Compression
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT * FROM dbo.myTable1

  At this point set those baselines to the side.   I would use sp_estimate_data_compression_savings on the table specifying ROW Compression.  This will take a 5% sample of the table move it into your tempdb and apply the Compression setting that you specify, in this case row.  It will return the results and then give you an estimate based on that sample that tells you what it believes your compression rate will be.

sp_estimate_data_compression_savings 'dbo', 'myTable1', NULL, NULL, ROW ;
GO

When I apply compression I like to use sp_spaceused to see the space the table takes up before and after.

/*
How Big is our Clustered Index
*/
sp_spaceused 'dbo.myTable1';
go


/*
Rebuild Our Table with Compression
*/
ALTER TABLE dbo.myTable1
REBUILD WITH (DATA_COMPRESSION = ROW);
go

 /*
How Big is our Clustered Index
*/
sp_spaceused 'dbo.myTable1';
GO

As you can see we reduced the table from around 80 MB to 288 KB.  Now let’s run our query to get our comparison baseline and see how query performance was effected.
/*
Do a select to look
at the CPU time and I/Os
After Compression
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT * FROM dbo.myTable1

Home work, go watch Paul Randal’s MCM videos .  Take the Row Compression Demo and add some more columns and Fixed Length Data Types to it, and see what you can Compress.  Happy Compressing!

Thanks,

Brad