Showing posts with label SQL University. Show all posts
Showing posts with label SQL University. Show all posts

Monday, October 10, 2011

SQL University Compression Week: Lesson 1 About Compression

Hello Dear Reader!  Welcome to Lesson 1 on Compression.  I want to thank the Chancellor of SQL University Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.  Before we talk about how to turn it on, where we should apply it, or even the types of compression that there are available to us there are two burning questions that we need to answer.  What is Compression and Why should I use it?

Let’s get a 40,000 foot overview of Compression.  When the good folks at Microsoft were looking at adding Compression into SQL Server the first thing they had to answer was, What kind of Compression will we use?  If you are new to the Compression game then you may not know that there are a lot of different ways to Compress Data.  Sure they all do the same thing, reduce the overall size and foot print of the Data you are Compressing, but we use different types of Compression differently.

On one end of the Scale you have Compression that reduces the overall size of the file but it does so buy stripping out what it deems as unneeded or necessary data.  The best example I can think for this is photography.   A RAW photograph file is huge.   But it contains every last bit of detail that your camera can capture.   When you look at some of the breathtaking pictures that line the magazines of scenic landscapes, athletic accomplishment, or new events, those photographs were taken in a format to capture all of the data.  When you browse USA Today on your mobile phone, or on your computer you receive this photograph in JPEG, GIF, or PNG format not in RAW.  The reason it’s smaller.  

“But Balls,” you say, “How did it get smaller?” 

The answer Dear Reader is Compression.  JPEG, GIF, and PNG are Compressed file formats for pictures.  Why do we Compress pictures?  Because the human eye can only detect so much detail.  Some people can detect more than others, but eventually we hit a threshold were we cannot detect anything more.  So the extra Data becomes unnecessary overhead at that point.  So we strip it out.  

Jim Grey went through the trouble to come up with ACID, We Need to Talk Your Database is on ACID, and Microsoft had adopted it into the core programming of SQL we can assume that they would not want to strip out data.  As Database professionals we would hope that we are preserving our data, and we are.

So we know the kind of Compression that we will use must preserve our Data,  the next question is how hard do we want to work for it?   We could use something like Winzip, and Compress our files down to the lowest possible value, preserving our data and getting it really really small.  However that only works if you aren’t planning on doing that every time you want to read it.  Have you ever zipped a 2 GB file?  It takes a little while.  Have you ever unzipped a 2 GB file, takes a little while too.  Imagine if every time you wanted to read your multi GB Database you had to sit through the same scenario.

We are still operating a Database, while reducing our Data footprint is laudable we wouldn’t want to do it at the expense of performance.  As a matter of fact if at all possible we would want performance to increase not decrease.

I know we are picky and asking a lot, Shrink the data, but don’t loose any, make it smaller but easy to access, and if possible make it work faster (in some not in all cases).   Microsoft listened and the particular flavor of Compression we got was Page & Row Compression. 

The first stab at Compression actually came up in SQL 2005 SP2 and it was called Vardecimal Compression.  It only worked on the Numeric and Decimal Data Types, but it used concepts that we will be reviewing when we discuss Row Compression.   In SQL 2008 we got Page and Row Compression and Vardecimal Compression was immediately deprecated.  You can still use Vardecimal Compression in 2008, but if you really want to use Compression you will get a lot more bang for your buck with Row Compression, more on that tomorrow.


Now that we understand the basic idea of what Compression is, the most important question you will ask is Why should I use it!  Disks are cheap.  I hear it all the time.  You’ve probably seen the ad’s for 1 TB hard drives for sale on Black Friday for $30.  You are correct Disks are cheap and they are getting cheaper all the time, however, I bet you don’t go and buy the Hard drives for your server on Black Friday.

As a matter of fact if you have SAN, Shared Attached Network storage?, then you need to buy different disks altogether.  You probably need to have a port in the back of them that allows for a HBA, Host Bus Adaptor, that takes a fiber cable.  The reason?  If you are using a SAN device you want the quickest possible connection for your database server.   And those HBA ports raise the price of those hard drives.  And if you are using Solid State hard drives on a SAN that price goes up even more.  Now we are talking from 10’s of dollars to 100’s of dollars.  And if you have a SAN your not going to have just one hard drive in there.  You will have multiple drives striped in some sort of RAID Array.  How cheap does it sound now?

But wait there’s more.  If you have a lot of SAN’s that can get pretty complicated.  Your probably going to need a SAN administrator.  And you may need more than one if you are a big enough shop.  So now we’re not just talking hardware but people as well.

All this for a database?  One lowly single database?  I would say most of us don’t have one Database to worry about.   You probably have a Prod (Production) environment, maybe a UAT/Staging, a Test, a Dev (Development) or even a DR/COOP.   You may have a Data warehouse as well.   You could have multiple copies of your database for multiple projects.

And you probably don’t have just one Application you are supporting.  Most places have 10’s, 100’s, and in some cases even 1,000’s of Database across just as many instances that they maintain.  When you start thinking about the scope that your data touches, you start to realize just how much you have out there.

As Database professionals we try to normalize our data to prevent data duplication, but by nature of the change control process we will always have it.  On top of that there are 3 performance bottle necks we normally encounter; CPU, Memory, and IO (Input/Output). 
The option to re-write code is always there, but how often have you been in a situation where management decides to throw heavier hardware at a problem because it is cheaper and more cost effective than a re-write?  If you haven’t been there you will eventually.  You can toss more CPU on a server to get more processing power.  More RAM to get more memory and to speed up operations because we can hold more in memory.  But with Disk, if I throw more disk at a server I just get more storage room, it doesn’t help me with performance.

Page and Row Compression can do just that.  The cost to the server is CPU cycles which most database servers have to spare.  *Just a side note if your server is averaging around 60% CPU utilization or higher I would not recommend using Compression without thorough testing to ensure it would not reduce existing performance.  We reduce the size of the structure not only on disk but in memory.  Pages of a databases are compressed on DISK and in Memory.  This is such an important concept I’m going to say it one more time, on DISK and in MEMORY!  So not only do you save room, reduce the size of a table or index.  Make it quicker to read off of disk because of a reduced size, but you take up less Page Cache because your Data Pages are compressed in memory.

 So  you can reduce your data foot print across multiple environments, possibly speed up performance by reducing the number of reads to process a query, and possibly use your memory cache more efficiently.   I don’t think the question is why should I use Compression, but why am I not using compression.


I’m not going to sell you a bill of goods, that is just snake oil.  There are some serious things that you need to keep in mind when it comes to Compression.
  1. Know Thy DATA!
  2. Take Baselines
  3. Know Thy DATA!

I like to think of Data Compression like Indexes.  You wouldn’t just put an index on a table and never check to see if it is being used.  You would check to see that the query plans you have are still running efficiently, and you would monitor your performance.

You need to do the same thing with Compression.  Know how your queries perform before hand, know how they perform after Compression.  If your CPU shoot’s up you will need to trouble shoot it, we’ll cover this later in the week.  If your data types are not right, don’t compress them in the first place, later in the week as well.


Your Assignment should you choose to accept it is to go read some of the Blogs by the SQL Server Storage Engine Team on the Topic of Compression, Sunil Agwilari is the Project Manager for the Engine Team and he has written a lot on this topic.

Data Compression: Why Do we need it?

Data compression techniques and trade offs



Friday, October 7, 2011

Top 4 Reasons Next Week is a Big Week

Hello Dear Reader, next week is a very big week for me SQL wise.  Family wise I will be missing the heck out of my wife and my kiddos while I’m half a continent away.   Just this last week my Bug lost another tooth, the baby got even cuter (don’t ask me how she does it constantly), my Big Guy got even bigger (seriously my 7 year old has pec’s), and my wife continues to be AWESOME, (the Bug and I had a conference and deliberated on this earlier in the week).  So while I’m missing all the other Balls, I’ll be busy out of my mind. 

“So Balls,” you say, “What’s keeping you so busy?”

No good way to say it other than list it.
1.       My First PASS Summit
2.       I’m sitting at the Bloggers Table at the PASS Summit
3.       I’m Presenting TWO times at the PASS Summit
4.       I’m the Professor of Compression Next Week For SQL University!

Any other week one of those items would keep me fully occupied.   Next week I will be utterly consumed learning, teaching, meeting, and greeting.  So let’s break it down a little.


This needs no introduction.  If you are involved with SQL Server you know that the PASS Summit is the biggest SQL Server Conference in the World.  Not North America, not Europe, not Asia, the WORLD.  The brightest SQL Speakers in the World will be there.  People from all over the Globe, experts from many many nations will be together in one spot.   Top Microsoft Experts, the Cat Team, many Microsoft Developers for the SQL Server Team, you name it the PASS Summit has it.

I’ve wanted to go to this event since I started seeing advertisements for it on, for the last 4 or 5 years.   Getting to watch the streaming keynote last year just made my desire to go grow even more, and now here it is! 

If you have never heard of the PASS Summit then go check out the current website.  As if that isn’t enough I’m attending 2 of the great pre-cons that are being offered.    I will be blogging like a mad man while I’m out there so keep checking back Dear Reader, this should be a fun week!

I’ve blogged previously on The Top 5 Reasons You Should Be On Twitter, but I could add a 6th as well.  Opportunity.  On Twitter it was announced that PASS was accepting applications to sit at the Bloggers table.   I jumped.  Then I emailed the contact listed and WAHOO I was selected!   As a result I’ll have access to blog from a table of Blogger All-STAR’s and SQL MVP’s, and lil ol’ me J!

So not only will I be at the top event for SQL Server, I’ve got a front row seat.  I’ll do live blogging of the Keynotes, and attempt a daily re-cap.  As I snap pictures I’ll make sure to get them uploaded and attached to the blog.


My wife’s best friend had a quote on her Facebook page that I feel describes me to a T, since she told me it I laughed and then I thought, “If I could only get that on a Family Crest”. ….  The quote “I’ve got 2 speeds; Off & M@Ther F#(king ON”.  That has been me my whole life, when I’m down I’m out.  When I’m up and running, there’s no stopping.

During this week I’m presenting a Lightning Round Talk 24-a 5 minute Horror Story, and  the Page & Row Compress Deep Dive.   I’ve been working on a lot of great Demo’s that I can bring to the table.  I presented the Deep Dive recently at SQL Saturday 85, and internally for my DBA team.  I’ve gotten a lot of good feedback, and I’m really excited to give this a 3rd time.   The 5 minute Horror Story is a real life adventure, with appropriate details blurred, from Father’s day 2009.   Show up to the session and you hear all about the wonderful world of a 24-7 on call DBA and why backups are so important.   Not just that you take them, but if something goes wrong that you grab a fresh set.


Jorge Segarra (@SQLChicken | Blog)  is the Chancellor of SQL University.  If you’ve never heard of SQL U then you should head over there and take a look.  It is a web blog that collects the input from the experts in our field and has them write on a new topic a week. 

Think college syllabus but for SQL, with different Topics, and written by MVP’s & Experts.  I’m honored beyond belief to be among them, and starting next week I’ll have a Lead Up to the Deep Dive.  I’m still writing and formulating as we speak and a lot of this will get finished up Sunday to Monday.  But I’ll have a post a day on Compression.  We’ll start up slow and work our way through a set of topics that will help you understand Compression and feel good about taking it to your work and finding ways you can apply it.

Alright Dear Reader the break from blogging is over, time to get to the grind stone and get working!  I hope your week is a good one.