Tuesday, July 2, 2013

T-SQL Tuesday #44 The Second Chance

Hello Dear Reader!  This is the first Tuesday of the month and you know what that means.  It’s time to announce the T-SQL Tuesday Topic of the month!  This is your opportunity to participate in the largest SQL Blog party on the intrawebs. 

T-SQL Tuesday is an event started by Adam Machanic(@AdamMachanic| Blog) back in 2009.  The basic idea one blogger hosts the party and others participate.  We announce the topic the first Tuesday of the month, July 2nd 2013 for today, and everyone will post their blogs on the second Tuesday of the month, July 9th 2013 for the actual posts.  This month the host is none other than ME!

I love T-SQL Tuesday, there is always so much to write about.  Our world of technology changes so fast.  Each of us has the daily constraints of a life and a job as well.  Sometimes it is great to have a topic to write on so you can express your point of view or get the opportunity to dive a little deeper into an area of SQL that may have piqued your interest.  Equally as wonderful is reading all the other blogs that people have put together on the subject.  Variety is the spice of life, and will we get it in spades.


“So Balls”, you say, “That’s great, but what’s the topic?”

Thanks for keeping me on task Dear Reader!  Without further ado the topic of T-SQL Tuesday #44, Second Chances.

SECOND CHANCES



 As a DBA or a Presenter/Speaker we have all had at least one moment we would like back.  The demo didn't work, you were green and got asked a question you now know in your sleep.  You had a presentation in front of a client, and it all went sideways.  Maybe you logged onto the prod server thinking it was dev and dropped something you shouldn't have.  These moments serve not just as painful reminders, but also as powerful instruments for learning.  Would you like another shot at getting it right?  WELL NOW'S YOUR CHANCE!   Or I guess actually your…. Second…. Chance.  Your missions should you choose to accept it, tell me one of the moments you had, and most importantly what you learned from it!

First and foremost the rules. 

Rule 1: Don’t get yourself fired.  If you almost dropped the prod DB last week, truncated an important table, or took down a prod server during critical business hours, and nobody knows it was you & the people you work for read your blog, you should probably avoid writing about it here.  You want to write about events we can look back on and reflect over, not events HR would *love* to know about.

Rule 2: Some Time next Tuesday using GMT, here’s a link to a GMT time convertor, publish your blog post.  For example in the US that would cover 8 pm Monday to 8 pm Tuesday.

Rule 3: Make sure that you include the Image at the top of the page helping to identify your post as a T-SQL Tuesday blog.  Then come back here and post a link in the comments so I can find them.  Before the end of the week I'll do a round up of all the blogs. 

Extra Credit!

Tweet your blog with the hash tag #tsql2sday & go read someone else’s blog on the subject!
As Always Dear Reader, Thanks for stopping by and I’ll see you next Tuesday!

Thanks,

Brad

Tuesday, June 11, 2013

T-SQL Tuesday #43 Crossing the Streams

Hello Dear Reader!  This post finds me back in Boston getting ready for the Pragmatic Works Performance Tuning Workshop.  But that is not the reason for the blog.  You see today is T-SQL Tuesday!  T-SQL Tuesday is the brain child of Adam Machanic (@AdamMachanic | Blog).  The idea is once a month we the SQL Community have a blog party where someone hosts a topic and we all write.  This month the host is none other than the Rob Farley (@Rob_Farley | Blog).  So without further ado, Rob what is our topic?

The topic is Plan Operators. If you ever write T-SQL, you will almost certainly have looked at execution plans (if you haven’t, go look at some now. I mean really – you should be looking at this stuff). As you look at these things, you will almost certainly have had your interest piqued by some, and tried to figure out a bit more about what’s going on.”

So Plan Operators it is! 

“So Balls”, you say, “What will you write about today?”

Excellent question Dear Reader!  Adam did a fantastic 3 hours presentation on parallelism this past year at the PASS Summit.  One of the biggest things I learned to watch was the Repartition Streams operator to see how rows were distributed across multiple cores.  It was a nice AHA moment for me, and I wanted to pass it along to you!

CROSSING THE STREAMS



The operator itself is purely logical and is only ever used in parallel plans.  When it happens you know that an operation is occurring that requires a Consumer and a Producer Thread.  The producer thread reads input rows from its subtree, assembles the rows into packets, which are then placed in the consumer subtree. The Repartition Streams operator is a push operator that consumes multiple streams of records.  The format of the records is unchanged as well as the contents.  By using this model it allows the producer and consumer threads to execute independently, but worry not there is flow control to prevent a fast producer from flooding a consumer with excessive packets.

In parallelism you use multiple cores to achieve more.  If you notice a bitmap filter after a Repatriation Streams operator, that means the number of rows in the output is going to be reduced.  You will notice changes in the Argument column if your table is partitioned or if the output is ordered.


SQL Server supports 5 different ways that producers stream results to consumers.  Broadcast, Demand, Hash, Range, and Round Robin.  These different streams partitions the data across the threads.

Broadcast sends all rows to all consumer threads. Demand is normally used with partitioned tables and is the only type of exchange that uses a pull rather than a push.  Hash evaluates a hash function to determine where to send rows using one or more columns from a row.  Range uses a range function on one column in the row to determine where to send each packet, this type is rare.  Round Robin this alternates sending packets of rows to the next consumer thread waiting in a sequence.

We’ve discussed it now we’ll actually take a look at a parallel query and how the repartition is being used.  Let’s start out with a query.  I’ve got a large table in my Adventure works that I’ll be blogging about very soon, more on that later.  Today I’m using it as the basis for my query.  We’re going to do a simple aggregate.

set statistics profile on

select
     p.EnglishProductName
     ,sum(salesAmount) as ProductSales
     ,avg(unitprice) as avgUnitPrice
     ,count(OrderQuantity) totalProductsSold
from
     dbo.factinternetsales_big fb
     inner join dbo.dimproduct p
     on fb.ProductKey=p.ProductKey
group by p.EnglishProductName
Option (MAXDOP 4)


The goal here is to select enough data that SQL Server will produce a plan that will push the cost to a parallel plan, and in doing so will read in data using multiple threads.  In this case we will get two repartition streams operators in our statements.  Let’s start off looking at the query plan. We’ll also get the output of the set statistics profile on, but we’ll cover that in a minute.


Great so we’ve got our query plan.  The icon doesn’t tell us much in and of itself.  When you click on it and hover over it we get a little more information.


We see that we pulled back over 10.5 Million rows, We see the Partitioning Type is Hash, and that our estimated subtree cost was high enough for us to get a parallel plan.  The trick is to hit F4 while you have selected to get even more information.


F4 opens the Properties tab.  In my laptop I’ve got one CPU, two cores, and Hyperthreading is turned on. So I should see a total of 5 active Threads.  Threads 0 – 4.  Thread 0 will always have 0 rows associated with it.  The reason for this is because Thread 0 is a control node.  It is overseeing all the work on the individual threads.


By expanding the Actual Number of Rows we can see that this was somewhat evenly distributed.  Thread 0 is our control Thread so it should be sitting at 0 rows.  Thread 1 had only 1.7 million rows, Thread 2 had 3.6 Million, Thread 3 had 2.3 million, and Thread 4 had 2.9 million.  


We can see there was a skew, based on the hash.  Looking further down at our properties we can see that this was a hash partition, confirmed from our graphical portion of the plan, but we can see the hash key was on the ProductKey column.  This was done as we were gathering data to prepare for our join.

Looking at the output from the set statistics profile on we can see the same information to validate the type of repartition that we had.



Examining the output even further you will see a second Repartition of the Streams, also using a Hash Partition based on the EnglishProductName (because that’s what we grouped on in our T-SQL Query).  If you use F4 to look at the properties you’ll see we’ve greatly reduced the number of columns in by the time we get to this part of the plan.


So what does this all mean?  Sometimes Parallelism messes up.  Sometimes we do not get an even breakdown of rows across threads.  When this occurs we can take steps to address it.  Adam cover’s this in depth in his presentation, and I’ll have to cover this another day Dear Reader, because the plane is about to board, and I’m Boston bound!  But now you know where to look to find out how your rows are being partitioned when they are parallelized!

As always Thanks for stopping by!

Thanks,

Brad




Wednesday, May 22, 2013

Speaking At the PASS Summit!


I’m Speaking at The PASS Summit 2013!

Hello Dear Reader!  Big news.  I’ve been selected as a speaker for the SQL PASS Summit 2013 in beautiful Charlotte NC!  I graduated High School and College in North Carolina.  

My roommate and best friend lived there and I made the trip with him to his family’s house on holidays.  I’ve not been back for several years, and I’m really looking forward to being in Charlotte for more than just an airplane stop.

Getting selected for the Summit is a huge honor.  My initial reaction….. 

 
Upon Learning He Was Speaking at the PASS Summit, SQLBalls  had this reaction


Excited, doesn’t begin to describe it Dear Reader.  My session this year will be different for the first time since I started submitting to PASS.  The past two years I did a Deep Dive on Page & Row Compression.  I love this presentation, and it is a lot of fun to give.  This year I got selected for something different. 

“So Balls”, you say, “What did you get selected for, why so excited?”

Excellent question Dear Reader!  The excitement?  This session is not just a session.  It’s a Spotlight Session.  That means 90 minutes and a little more pressure to do a really good job.  The Topic Data Internals Deep Dive.  Here’s the abstract:

Data Internals Deep Dive

As a DBA we learn backwards.  Most of the time you get a DBA job and you have Databases, that have tables, and in those tables you have data. Nobody tells you that we have records, stored on Pages, grouped in Extents, assigned by Allocation Bitmaps, with IAM Pages and Allocation Units that span Partitions and Filegroups.  Today is your lucky day, we will start at the bottom and build our way up.  We'll cover byte swapping, bitmaps, and dive a little into decoding Hex & binary.  Working to give you a full understanding of database internals, and how you use them every day! 


As much as I love data internals, the really cool part?  We have no idea when SQL V-Next is going to land.  If it is before the Summit, then expect we’ll cover some of those internal components as well!   Guess we’ll just have to see. 


As always Thanks for stopping by!

Thanks,

Brad