Friday, September 2, 2011

Partitioning a Heap

http://www.flickr.com/photos/wouter28mm/3151414412/

During a busy day sometimes I don’t have a chance to check Twitter.  I’ve written about how Twitter is a wonderful place to learn, Top 5 Reasons You Should Be On Twitter, and I happened to glance down at just the right moment. 

Grant Fritchey (@GFritchey | Blog) asked a question on #SQLHelp. And I thought I knew the answer, YAY, but it turned out I was wrong.   Grant had asked the question if you needed a primary key in order to Partition a table.  Originally I had replied to Grant that I thought you had to have a Clustered Index in order to partition a table.

Some of the top minds in the SQL field quickly set in to give Grant the answer that he needed.  And like any time when you get something wrong this was a wonderful opportunity to learn.  The exchange went like this (You have to read this from the bottom up):






WOW!  That is a lot of information over a very short space.  Whenever I have used partitioning I’ve always used it on a Primary Key Clustered Index. 

It’s a technology that I’ve used that I’m familiar with, but not one that I’ve played with just to see what could happen.  Amit Banerjee(@banerjeeamit | Blog), Aaron Nelson (@SQLVariant|Blog), Paul White (@SQL_Kiwi|Blog), and Dave Levy (@Dave_Levy|Blog) are some pretty ridiculously smart guys.  If they said it could be done I believed them.  However, that's not enough.  You shouldn't just take it on faith alone.  You should go out there and do it yourself.



So with that Dear Reader I figured I’d work up some code we could both play with that demonstrates what the experts where talking about and lets us prove it out at the same time.





CREATE OUR DEMO_PARTITION DATABASE

Let’s Create our Database
/*
Create our Database
That we will use for the
Demo
*/
USE master;
Go
IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demo_Partition')
     BEGIN
          DROP Database demo_Partition
     END
    
CREATE DATABASE demo_Partition
GO
Then we will add some Logical File Groups
/*
Add Filegroups
*/
ALTER DATABASE demo_Partition
ADD FILEGROUP FG2
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP FG3
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP FG4
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP FG5
GO
Then we will associate those Logical File Groups with some Physical Data Files
/*
Add Files and
associate to filegroups
*/
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG2,
     FILENAME='<Insert directory Path>\demo_Partition_FG2.ndf'
) TO FILEGROUP FG2;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG3,
     FILENAME='<Insert directory Path>\demo_Partition_FG3.ndf'
) TO FILEGROUP FG3;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG4,
     FILENAME='<Insert directory Path>\demo_Partition_FG4.ndf'
) TO FILEGROUP FG4;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG5,
     FILENAME='<Insert directory Path>\demo_Partition_FG5.ndf'
) TO FILEGROUP FG5;
GO

USE demo_Partition
GO

CREATE OUR PARTITION FUNCTION & SCHEMA

Now that we’ve got our Database we need to create our Partition Function and our Partition Schema.  Our partition Function is a range of values going right or left for a particular data type.  In this example we’ll be using an integer data type and we will be doing a left range for values 2000, 4000, and 6000.

This means that anything <=2000 will go in the Filegroup that we associate with this range of the function, <=4000, and >=6000.

So we should end up with 4 partitions ranging from 1 to 2000, 2001 to 4000, 4001 to 6000, and 6000 & up.


/*
Create Partition Function
*/
CREATE PARTITION FUNCTION compDemoPartionFunc(INT)
AS RANGE LEFT FOR VALUES(2000, 4000, 6000)
GO
Now we will create our Partition Schema which we will use to associate our Partition Function to our Logical File Groups
/*
Create Partition Scheme
*/
CREATE PARTITION SCHEME compDemoPartitionPS
AS PARTITION compDemoPartionFunc
TO(fg2, fg3,fg4, fg5);




PARTITION A PRIMARY KEY CLUSTERED INDEX


http://www.flickr.com/photos/21684795@N05/4406836491/
Now that we’ve done all this, let’s start with the trusted standby of partitioning a clustered Primary Key.  First let's create our table, including our primary key and clustered index.  We’ll drive to the basket and slam dunk this.



IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myTable1')
BEGIN
     DROP TABLE dbo.myTable1
END

CREATE TABLE myTable1(
     myID INT IDENTITY(1,1),
     productName char(800) DEFAULT 'some product',
     productSKU varCHAR(500) DEFAULT 'Product SKU',
     productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
     Comments TEXT DEFAULT 'here are some genric comments',   
     CONSTRAINT PK_myTable1_myID
     PRIMARY KEY CLUSTERED(myID)
) ON compDemoPartitionPS(myID);  


Now let's populate our data with 10,000 rows.

/*
Let's populate our Primary Key
Clustered Index with some data
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
WHILE (@i<10000)
     BEGIN
          INSERT INTO myTable1 DEFAULT VALUES;
          SET @i = @i +1

     END
COMMIT TRAN

Now let's take a look at see our breakdown.  We don’t want to just scratch the surface here we want to dig deep and make sure that all of our rows are on the Filegroups that we selected for them.  To do this we’ll be looking at sys.destination_data_spaces and joining sys.partition_schemes as this table holds the pointers for our partition schema.

select
          so.name AS [Table Name]
          ,si.name AS [Index Name]
          ,ds.name AS [Filegroup Name]
          ,dds.data_space_id AS [partition number]
          ,sp.rows
          ,dds.partition_scheme_id
          ,ps.name as [partition scheme]
from
     sys.data_spaces ds
     LEFT join sys.destination_data_spaces dds
     on (ds.data_space_id = dds.data_space_id)
     left join sys.partition_schemes ps
     on (ps.data_space_id = dds.partition_scheme_id)
     LEFT JOIN sys.partitions sp
     ON dds.destination_id=sp.partition_number
     LEFT JOIN sys.objects so
     ON sp.object_id=so.object_id
     LEFT JOIN sys.indexes si
     ON so.object_id=si.object_id AND sp.index_id=si.index_id
WHERE
     so.type='U'
order by
     ds.name
     ,ps.name ASC


As you can tell from above our partitioned table went on the Logical Filegroups we had created earlier.  Because we associated them with physical Filegroups we can be confident that our data is going to the correct location.



PARTITION A CLUSTERED INDEX (NO PRIMARY KEY)

http://www.flickr.com/photos/39083668@N06/4347173049/

 
So now let’s drive the lane.  We'll do the exact same exercise, but this time with a Clustered Index with no primary key.

Start by creating our table.

/*
Let's create a Heap and then
we will create a Clustered Index
With no Primary Key Constraint
*/
IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myCLX')
BEGIN
     DROP TABLE dbo.myCLX
END

CREATE TABLE myCLX(
     myID INT IDENTITY(1,1),
     productName char(800) DEFAULT 'some product',
     productSKU varCHAR(500) DEFAULT 'Product SKU',
     productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
     Comments TEXT DEFAULT 'here are some genric comments'    
) ;


Now let’s create our index and associate it with our partition schema.

CREATE CLUSTERED INDEX clx_Index1 ON dbo.myCLX(myID)
ON compDemoPartitionPS(myID);
go

Now let’s populate our table and look at the results.

/*
let's insert some data
into our Clustered Index
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
WHILE (@i<10000)
     BEGIN
          INSERT INTO dbo.myCLX DEFAULT VALUES;
          SET @i = @i +1

     END
COMMIT TRAN

We will use the same query and set of DMV’s that we used before. 




 We see that we didn’t need a primary key to achieve the exact same results as above.




PARTITION A HEAP (NO CLUSTERED INDEX)
http://www.flickr.com/photos/lscan/2225285352/


Now we’ve hit 2 for 2, let’s see if we can go for three.  Same exercise this time we are using a Heap.   Remember a Heap is simply a table with no Clustered Index on it. We’ll start by creating our Heap.



IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myHeapNCI')
BEGIN
     DROP TABLE dbo.myHeapNCI
END

CREATE TABLE dbo.myHeapNCI(
     myID INT IDENTITY(1,1),
     productName char(800) DEFAULT 'some product',
     productSKU varCHAR(500) DEFAULT 'Product SKU',
     productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
     Comments TEXT DEFAULT 'here are some genric comments'    
) ON compDemoPartitionPS(myID);  


Then we will populate it with some data.


/*
let's insert some data
into our Heap
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
WHILE (@i<10000)
     BEGIN
          INSERT INTO dbo.myHeapNCI DEFAULT VALUES;
          SET @i = @i +1

     END
COMMIT TRAN

Now let’s run the same query as before to see how our data is spread out.



Now this is interesting.  Look at the null in the column for the Index name.  It makes sense we do not have an index on a Heap.  However our Heap is still partitioned.  The home team wins!  And the crowd goes wild!




WRAP UP


http://www.flickr.com/photos/joao/2198486114/

So once again Twitter had found a way to get information from the best and the brightest in the SQL Community.  If you would have asked me before all this I would have told you, that you had to have a Clustered Index on a table in order to partition it.  I’m not sure where I even got that bit of information, but when I picked it up along the way it was wrong.  

I’m glad during a very busy day that I looked at Twitter just in time to learn something new. 

But with new knowledge comes new questions, why would you want to create a partitioned heap, why would you create a partitioned clustered index without a primary key? 

The heap question I could take a stab at, but I wouldn’t want to do it without proofing out a couple theories.  So there we go Dear Reader more questions that will hopefully lead to more learning.

Thanks Again,

Brad

2 comments:

  1. Great post!!! I am quesitoning about partition on heap, here is answer. Thank you for sharing, just wonder how's insert/update impact on partitioned heap vs paritioned table on clustered index

    ReplyDelete
  2. Hi Kevin, I don't think you would want to use a partitioned heap on a table that you would have insert and update activity on.

    In an insert/update scenario it would be negative. However I could see partitioning a Heap on a Data Warehouse or some other database if it was large enough and it was read only.

    This way you could load data only to one partition at a time instead of locking the entire table from read operations while a bulk load occurred, like on a data warehouse.

    I had originally written this as a way to test and learn what Paul, Grant, Aaron, Amit, and Dave where all talking about. I still have yet to actually partition a heap in production. As I get exposed to more data warehousing and BI client operations I expect that will eventually change.

    ReplyDelete