Friday, September 2, 2011

Partitioning a Heap

http://www.flickr.com/photos/wouter28mm/3151414412/

During a busy day sometimes I don’t have a chance to check Twitter.  I’ve written about how Twitter is a wonderful place to learn, Top 5 Reasons You Should Be On Twitter, and I happened to glance down at just the right moment. 

Grant Fritchey (@GFritchey | Blog) asked a question on #SQLHelp. And I thought I knew the answer, YAY, but it turned out I was wrong.   Grant had asked the question if you needed a primary key in order to Partition a table.  Originally I had replied to Grant that I thought you had to have a Clustered Index in order to partition a table.

Some of the top minds in the SQL field quickly set in to give Grant the answer that he needed.  And like any time when you get something wrong this was a wonderful opportunity to learn.  The exchange went like this (You have to read this from the bottom up):






WOW!  That is a lot of information over a very short space.  Whenever I have used partitioning I’ve always used it on a Primary Key Clustered Index. 

It’s a technology that I’ve used that I’m familiar with, but not one that I’ve played with just to see what could happen.  Amit Banerjee(@banerjeeamit | Blog), Aaron Nelson (@SQLVariant|Blog), Paul White (@SQL_Kiwi|Blog), and Dave Levy (@Dave_Levy|Blog) are some pretty ridiculously smart guys.  If they said it could be done I believed them.  However, that's not enough.  You shouldn't just take it on faith alone.  You should go out there and do it yourself.



So with that Dear Reader I figured I’d work up some code we could both play with that demonstrates what the experts where talking about and lets us prove it out at the same time.





CREATE OUR DEMO_PARTITION DATABASE

Let’s Create our Database
/*
Create our Database
That we will use for the
Demo
*/
USE master;
Go
IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demo_Partition')
     BEGIN
          DROP Database demo_Partition
     END
    
CREATE DATABASE demo_Partition
GO
Then we will add some Logical File Groups
/*
Add Filegroups
*/
ALTER DATABASE demo_Partition
ADD FILEGROUP FG2
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP FG3
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP FG4
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP FG5
GO
Then we will associate those Logical File Groups with some Physical Data Files
/*
Add Files and
associate to filegroups
*/
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG2,
     FILENAME='<Insert directory Path>\demo_Partition_FG2.ndf'
) TO FILEGROUP FG2;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG3,
     FILENAME='<Insert directory Path>\demo_Partition_FG3.ndf'
) TO FILEGROUP FG3;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG4,
     FILENAME='<Insert directory Path>\demo_Partition_FG4.ndf'
) TO FILEGROUP FG4;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
     NAME=data_FG5,
     FILENAME='<Insert directory Path>\demo_Partition_FG5.ndf'
) TO FILEGROUP FG5;
GO

USE demo_Partition
GO

CREATE OUR PARTITION FUNCTION & SCHEMA

Now that we’ve got our Database we need to create our Partition Function and our Partition Schema.  Our partition Function is a range of values going right or left for a particular data type.  In this example we’ll be using an integer data type and we will be doing a left range for values 2000, 4000, and 6000.

This means that anything <=2000 will go in the Filegroup that we associate with this range of the function, <=4000, and >=6000.

So we should end up with 4 partitions ranging from 1 to 2000, 2001 to 4000, 4001 to 6000, and 6000 & up.


/*
Create Partition Function
*/
CREATE PARTITION FUNCTION compDemoPartionFunc(INT)
AS RANGE LEFT FOR VALUES(2000, 4000, 6000)
GO
Now we will create our Partition Schema which we will use to associate our Partition Function to our Logical File Groups
/*
Create Partition Scheme
*/
CREATE PARTITION SCHEME compDemoPartitionPS
AS PARTITION compDemoPartionFunc
TO(fg2, fg3,fg4, fg5);




PARTITION A PRIMARY KEY CLUSTERED INDEX


http://www.flickr.com/photos/21684795@N05/4406836491/
Now that we’ve done all this, let’s start with the trusted standby of partitioning a clustered Primary Key.  First let's create our table, including our primary key and clustered index.  We’ll drive to the basket and slam dunk this.



IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myTable1')
BEGIN
     DROP TABLE dbo.myTable1
END

CREATE TABLE myTable1(
     myID INT IDENTITY(1,1),
     productName char(800) DEFAULT 'some product',
     productSKU varCHAR(500) DEFAULT 'Product SKU',
     productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
     Comments TEXT DEFAULT 'here are some genric comments',   
     CONSTRAINT PK_myTable1_myID
     PRIMARY KEY CLUSTERED(myID)
) ON compDemoPartitionPS(myID);  


Now let's populate our data with 10,000 rows.

/*
Let's populate our Primary Key
Clustered Index with some data
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
WHILE (@i<10000)
     BEGIN
          INSERT INTO myTable1 DEFAULT VALUES;
          SET @i = @i +1

     END
COMMIT TRAN

Now let's take a look at see our breakdown.  We don’t want to just scratch the surface here we want to dig deep and make sure that all of our rows are on the Filegroups that we selected for them.  To do this we’ll be looking at sys.destination_data_spaces and joining sys.partition_schemes as this table holds the pointers for our partition schema.

select
          so.name AS [Table Name]
          ,si.name AS [Index Name]
          ,ds.name AS [Filegroup Name]
          ,dds.data_space_id AS [partition number]
          ,sp.rows
          ,dds.partition_scheme_id
          ,ps.name as [partition scheme]
from
     sys.data_spaces ds
     LEFT join sys.destination_data_spaces dds
     on (ds.data_space_id = dds.data_space_id)
     left join sys.partition_schemes ps
     on (ps.data_space_id = dds.partition_scheme_id)
     LEFT JOIN sys.partitions sp
     ON dds.destination_id=sp.partition_number
     LEFT JOIN sys.objects so
     ON sp.object_id=so.object_id
     LEFT JOIN sys.indexes si
     ON so.object_id=si.object_id AND sp.index_id=si.index_id
WHERE
     so.type='U'
order by
     ds.name
     ,ps.name ASC


As you can tell from above our partitioned table went on the Logical Filegroups we had created earlier.  Because we associated them with physical Filegroups we can be confident that our data is going to the correct location.



PARTITION A CLUSTERED INDEX (NO PRIMARY KEY)

http://www.flickr.com/photos/39083668@N06/4347173049/

 
So now let’s drive the lane.  We'll do the exact same exercise, but this time with a Clustered Index with no primary key.

Start by creating our table.

/*
Let's create a Heap and then
we will create a Clustered Index
With no Primary Key Constraint
*/
IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myCLX')
BEGIN
     DROP TABLE dbo.myCLX
END

CREATE TABLE myCLX(
     myID INT IDENTITY(1,1),
     productName char(800) DEFAULT 'some product',
     productSKU varCHAR(500) DEFAULT 'Product SKU',
     productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
     Comments TEXT DEFAULT 'here are some genric comments'    
) ;


Now let’s create our index and associate it with our partition schema.

CREATE CLUSTERED INDEX clx_Index1 ON dbo.myCLX(myID)
ON compDemoPartitionPS(myID);
go

Now let’s populate our table and look at the results.

/*
let's insert some data
into our Clustered Index
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
WHILE (@i<10000)
     BEGIN
          INSERT INTO dbo.myCLX DEFAULT VALUES;
          SET @i = @i +1

     END
COMMIT TRAN

We will use the same query and set of DMV’s that we used before. 




 We see that we didn’t need a primary key to achieve the exact same results as above.




PARTITION A HEAP (NO CLUSTERED INDEX)
http://www.flickr.com/photos/lscan/2225285352/


Now we’ve hit 2 for 2, let’s see if we can go for three.  Same exercise this time we are using a Heap.   Remember a Heap is simply a table with no Clustered Index on it. We’ll start by creating our Heap.



IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myHeapNCI')
BEGIN
     DROP TABLE dbo.myHeapNCI
END

CREATE TABLE dbo.myHeapNCI(
     myID INT IDENTITY(1,1),
     productName char(800) DEFAULT 'some product',
     productSKU varCHAR(500) DEFAULT 'Product SKU',
     productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
     Comments TEXT DEFAULT 'here are some genric comments'    
) ON compDemoPartitionPS(myID);  


Then we will populate it with some data.


/*
let's insert some data
into our Heap
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
WHILE (@i<10000)
     BEGIN
          INSERT INTO dbo.myHeapNCI DEFAULT VALUES;
          SET @i = @i +1

     END
COMMIT TRAN

Now let’s run the same query as before to see how our data is spread out.



Now this is interesting.  Look at the null in the column for the Index name.  It makes sense we do not have an index on a Heap.  However our Heap is still partitioned.  The home team wins!  And the crowd goes wild!




WRAP UP


http://www.flickr.com/photos/joao/2198486114/

So once again Twitter had found a way to get information from the best and the brightest in the SQL Community.  If you would have asked me before all this I would have told you, that you had to have a Clustered Index on a table in order to partition it.  I’m not sure where I even got that bit of information, but when I picked it up along the way it was wrong.  

I’m glad during a very busy day that I looked at Twitter just in time to learn something new. 

But with new knowledge comes new questions, why would you want to create a partitioned heap, why would you create a partitioned clustered index without a primary key? 

The heap question I could take a stab at, but I wouldn’t want to do it without proofing out a couple theories.  So there we go Dear Reader more questions that will hopefully lead to more learning.

Thanks Again,

Brad

Monday, August 29, 2011

SQL Saturday 85 for a Mid Level Business Intelligence Developer



 SQL Saturday 85 is just a little under 4 weeks away, and you if you are a Mid Level Business Intelligence Developer you maybe wondering what SQL Saturday could offer to you.  The answer is PLENTY!


“But Balls,” you say, “You’re not a Business Intelligence guy, how would you know if there is a lot to offer?”


That is an excellent point Dear Reader.  I’m not a BI Developer.  My work with cubes, SSAS, SSRS, and many things in between is slim.  Don’t get me wrong I’ve worked with it, but not to the extent that I do pure DBA work on a regular basis.  But one of the great advantages I have to putting together the schedule is that I am learning A LOT about my speakers.  And let me tell you who we have speaking.

 We have two dedicated BI tracks, an Microsoft Certified Master presenting on Parallel Data Warehouse, a SQL SSAS Maestros Instructor, Three SQL Server MVP’s presenting on BI topics, and one of the original team members that developed the Powerpivot plug-in for Excel, and Developed most of the Client.

And that isn’t to mention the Pre-Con’s!


FULL DAY OF BI TRAINING
http://www.flickr.com/photos/headlouse/1484615917/sizes/m/in/photostream/


 We have not one, not two, not three, BUT FOUR of the best Business Intelligence Experts in the game today training you in the full Business Intelligence Stack from start to finish.  These four experts are none other than SQL MVP  Jorge Segarra (@SQLChicken | Blog), Mike Davis (@MikeDavisSQL | Blog), Bradley Schacht (@BradleySchacht | Blog), and Kyle Walker (Blog).   All four are out in the field working with clients for Pragmatic Works and doing what you do every day. But don’t take my word for it, let’s get the word from the guys.  Gentlemen what will you be covering in your pre-con?

Abstract:
In this full-day workshop, you'll learn from the author team of Jorge Segarra, Mike Davis, Brad Schacht, and Kyle Walker how to build a data warehouse for your company and support it with the Microsoft business intelligence platform. We'll start with how to design and data model a data warehouse including the system preparation. Then, we'll jump into loading a data warehouse with SSIS. After SSIS, you're ready to roll the data up and provide the slice and dice reporting with SSAS. The team will walk through cube development and data enrichment with things like key performance indicators, which are essential for your future dashboards. Lastly, we will cover how to report against the data warehouse with SSRS including a primer in how to write MDX queries against the SSAS cube.
Take Home Skills:
  1. Practical knowledge of building a Dimensional Model
  2. Designing a simple ETL process using SSIS
  3. Designing a Cube
  4. Designing simple SSRS Reports
  5. Building an integrated process that fully leverages the entire MS BI stack to load a Data Warehouse


Now normally if you went to a pre-con like this at the PASS Summit you would be paying hundreds of dollars.  But we are offering this Dear Reader for $99, and this includes all the tea & coffee you can drink in a day, and lunch.  As a Mid level Business Intelligence Developer come and see the experts up close.  This will be a good way to see what you can become if you work hard and keep learning.  So do not delay!  Join us on Friday September 23rd.    Click Here to go to the Full Day Business Intelligence  Pre-Con site!

THE SCHEDULE

So  here we are you went to the Pre-con, you got to talk to Jorge, Mike, Bradley, and Kyle up close and personal.  You are brimming with new skills, what should you tackle?



Michael Antonovich (@MPAntonovich|Blog)- Introduction to PowerPivot for Excel
Eric Wisdahl(@EricWisdah | Blog)– SSIS Data Flow Buffer Breakdown
Phillip E. Rosen(@BIwPhil) – Biz Intel Dashboards W/Excel OLAP formulas
Robert Biddle(@Robert_Biddle | Blog )– Loading a Data Warehouse with SSIS
William E. Pearson III(@Bill_Pearson ) – Becoming DAX: An Introduction...
Brian Mitchell(@brianwmitchell | Blog) – Applying FastTrack & PDW Best Practices to your Data Warehouse


I’ve had the opportunity to see Michael Antonovich present at MagicPASS, the SQL Server User Group we both call home.  And I saw this presentation.  And let me tell you that if you are not using PowerPivot by the time you are done you will understand it’s power and have a few great selling points for your boss.  After a great day of Pre-con Training where this topic will be covered, this session is icing on the SQL Cake.

Eric Wisdahl is next up, and he is covering a subject that will be advanced enough to challenge you, and pertinent enough to make you WANT to listen to everything he has to say.  He will be covering how the Buffer’s work in the internal SQL Memory and how they work with SSIS.  SSIS likes to use memory and Eric will give you some tips on how to troubleshoot your systems, and make sure the memory is being used properly.

Phillip Rosen will be covering how you utilize Dashboards and he will cover the different ways you can view them.  Saving them locally to the PC using Excel, publishing them using SharePoint, and using Excel OLAP formulas.  Before this session is done he will cover SharePoint 2010 BI Center and give you some references to continue your learning.


Robert Biddle will be covering something that is a must know in our field.  How to load a Data Warehouse using SSIS, after all you’ve got your OLTP database.  You need to put the data in a Data Warehouse, but what is the best way to go about it.  Maybe you’ve done this before, but it never hurts to get the tips and tricks from the experts.  Go check out Roberts session and walk out of it with more confidence for the next time you need to pull some SSIS tricks out of your hat.

William E. Pearson III will give you an Introduction to DAX.  Being self taught often times I find that I’m missing some fundamentals.  When I go to a SQL Saturday it is always a great opportunity to find experts in the field and see how they go about doing things.  If you already know DAX, or if you are a beginner wouldn’t you want to get your tips from a SQL Server MVP?  Well look no further because William is a SQL Server MVP.  Go to his session and walk out with the tips and tricks of an MVP under your belt.

Brian Mitchell is a Microsoft Certified Master.  Brian Mitchell is a Microsoft employee.  Brian Mitchell is a Microsoft Premier Support Field Engineer that specializes in Parallel Data Warehouse.  Maybe your company isn’t looking at PDW, maybe it is.  Either way it would not hurt to go learn what a Master has to teach.  I saw Brian at SQL Rally earlier this year and it was one of the best sessions that I sat in.  He is smart, he is quick, and he knows his stuff inside and out.  What better way to end a great day of learning than with a Master.


Okay Dear Reader,  GO Sign Up for the SQL Saturday Full Day BI Pre-Con and then Go Register for SQL Saturday 85.  I look forward to seeing you on Friday September 23rd and on Saturday September 24th for two incredible days of learning!

Thanks,

Brad