Tuesday, June 11, 2013

T-SQL Tuesday #43 Crossing the Streams

Hello Dear Reader!  This post finds me back in Boston getting ready for the Pragmatic Works Performance Tuning Workshop.  But that is not the reason for the blog.  You see today is T-SQL Tuesday!  T-SQL Tuesday is the brain child of Adam Machanic (@AdamMachanic | Blog).  The idea is once a month we the SQL Community have a blog party where someone hosts a topic and we all write.  This month the host is none other than the Rob Farley (@Rob_Farley | Blog).  So without further ado, Rob what is our topic?

The topic is Plan Operators. If you ever write T-SQL, you will almost certainly have looked at execution plans (if you haven’t, go look at some now. I mean really – you should be looking at this stuff). As you look at these things, you will almost certainly have had your interest piqued by some, and tried to figure out a bit more about what’s going on.”

So Plan Operators it is! 

“So Balls”, you say, “What will you write about today?”

Excellent question Dear Reader!  Adam did a fantastic 3 hours presentation on parallelism this past year at the PASS Summit.  One of the biggest things I learned to watch was the Repartition Streams operator to see how rows were distributed across multiple cores.  It was a nice AHA moment for me, and I wanted to pass it along to you!

CROSSING THE STREAMS



The operator itself is purely logical and is only ever used in parallel plans.  When it happens you know that an operation is occurring that requires a Consumer and a Producer Thread.  The producer thread reads input rows from its subtree, assembles the rows into packets, which are then placed in the consumer subtree. The Repartition Streams operator is a push operator that consumes multiple streams of records.  The format of the records is unchanged as well as the contents.  By using this model it allows the producer and consumer threads to execute independently, but worry not there is flow control to prevent a fast producer from flooding a consumer with excessive packets.

In parallelism you use multiple cores to achieve more.  If you notice a bitmap filter after a Repatriation Streams operator, that means the number of rows in the output is going to be reduced.  You will notice changes in the Argument column if your table is partitioned or if the output is ordered.


SQL Server supports 5 different ways that producers stream results to consumers.  Broadcast, Demand, Hash, Range, and Round Robin.  These different streams partitions the data across the threads.

Broadcast sends all rows to all consumer threads. Demand is normally used with partitioned tables and is the only type of exchange that uses a pull rather than a push.  Hash evaluates a hash function to determine where to send rows using one or more columns from a row.  Range uses a range function on one column in the row to determine where to send each packet, this type is rare.  Round Robin this alternates sending packets of rows to the next consumer thread waiting in a sequence.

We’ve discussed it now we’ll actually take a look at a parallel query and how the repartition is being used.  Let’s start out with a query.  I’ve got a large table in my Adventure works that I’ll be blogging about very soon, more on that later.  Today I’m using it as the basis for my query.  We’re going to do a simple aggregate.

set statistics profile on

select
     p.EnglishProductName
     ,sum(salesAmount) as ProductSales
     ,avg(unitprice) as avgUnitPrice
     ,count(OrderQuantity) totalProductsSold
from
     dbo.factinternetsales_big fb
     inner join dbo.dimproduct p
     on fb.ProductKey=p.ProductKey
group by p.EnglishProductName
Option (MAXDOP 4)


The goal here is to select enough data that SQL Server will produce a plan that will push the cost to a parallel plan, and in doing so will read in data using multiple threads.  In this case we will get two repartition streams operators in our statements.  Let’s start off looking at the query plan. We’ll also get the output of the set statistics profile on, but we’ll cover that in a minute.


Great so we’ve got our query plan.  The icon doesn’t tell us much in and of itself.  When you click on it and hover over it we get a little more information.


We see that we pulled back over 10.5 Million rows, We see the Partitioning Type is Hash, and that our estimated subtree cost was high enough for us to get a parallel plan.  The trick is to hit F4 while you have selected to get even more information.


F4 opens the Properties tab.  In my laptop I’ve got one CPU, two cores, and Hyperthreading is turned on. So I should see a total of 5 active Threads.  Threads 0 – 4.  Thread 0 will always have 0 rows associated with it.  The reason for this is because Thread 0 is a control node.  It is overseeing all the work on the individual threads.


By expanding the Actual Number of Rows we can see that this was somewhat evenly distributed.  Thread 0 is our control Thread so it should be sitting at 0 rows.  Thread 1 had only 1.7 million rows, Thread 2 had 3.6 Million, Thread 3 had 2.3 million, and Thread 4 had 2.9 million.  


We can see there was a skew, based on the hash.  Looking further down at our properties we can see that this was a hash partition, confirmed from our graphical portion of the plan, but we can see the hash key was on the ProductKey column.  This was done as we were gathering data to prepare for our join.

Looking at the output from the set statistics profile on we can see the same information to validate the type of repartition that we had.



Examining the output even further you will see a second Repartition of the Streams, also using a Hash Partition based on the EnglishProductName (because that’s what we grouped on in our T-SQL Query).  If you use F4 to look at the properties you’ll see we’ve greatly reduced the number of columns in by the time we get to this part of the plan.


So what does this all mean?  Sometimes Parallelism messes up.  Sometimes we do not get an even breakdown of rows across threads.  When this occurs we can take steps to address it.  Adam cover’s this in depth in his presentation, and I’ll have to cover this another day Dear Reader, because the plane is about to board, and I’m Boston bound!  But now you know where to look to find out how your rows are being partitioned when they are parallelized!

As always Thanks for stopping by!

Thanks,

Brad




Wednesday, May 22, 2013

Speaking At the PASS Summit!


I’m Speaking at The PASS Summit 2013!

Hello Dear Reader!  Big news.  I’ve been selected as a speaker for the SQL PASS Summit 2013 in beautiful Charlotte NC!  I graduated High School and College in North Carolina.  

My roommate and best friend lived there and I made the trip with him to his family’s house on holidays.  I’ve not been back for several years, and I’m really looking forward to being in Charlotte for more than just an airplane stop.

Getting selected for the Summit is a huge honor.  My initial reaction….. 

 
Upon Learning He Was Speaking at the PASS Summit, SQLBalls  had this reaction


Excited, doesn’t begin to describe it Dear Reader.  My session this year will be different for the first time since I started submitting to PASS.  The past two years I did a Deep Dive on Page & Row Compression.  I love this presentation, and it is a lot of fun to give.  This year I got selected for something different. 

“So Balls”, you say, “What did you get selected for, why so excited?”

Excellent question Dear Reader!  The excitement?  This session is not just a session.  It’s a Spotlight Session.  That means 90 minutes and a little more pressure to do a really good job.  The Topic Data Internals Deep Dive.  Here’s the abstract:

Data Internals Deep Dive

As a DBA we learn backwards.  Most of the time you get a DBA job and you have Databases, that have tables, and in those tables you have data. Nobody tells you that we have records, stored on Pages, grouped in Extents, assigned by Allocation Bitmaps, with IAM Pages and Allocation Units that span Partitions and Filegroups.  Today is your lucky day, we will start at the bottom and build our way up.  We'll cover byte swapping, bitmaps, and dive a little into decoding Hex & binary.  Working to give you a full understanding of database internals, and how you use them every day! 


As much as I love data internals, the really cool part?  We have no idea when SQL V-Next is going to land.  If it is before the Summit, then expect we’ll cover some of those internal components as well!   Guess we’ll just have to see. 


As always Thanks for stopping by!

Thanks,

Brad

Friday, May 3, 2013

SQL Saturday 232 Orlando Call for Speakers!


Hello Dear Reader!  We are at that time of the year again.  SQL Saturdays are popping up all over the country, spring is beginning to give way to summer, and the planning for SQL Saturday Orlando 2013, aka #SQLSAT232, is gearing up.  

This year our team of unbelievable SQL Server professionals  Shawn McGehee (@SQLShawn | Blog), Karla Kay (@karlakay22 | Blog),  SQL MVP Kendal Van Dyke (@SQLDBA | Blog), SQL MVP Andy Warren (@SQLAndy | Blog), and Ben Cork are working to bring you an incredible event.

Leading up the speaker track are myself and SQL MVP Rodney Landrum(@SQLBeat|Blog).  That brings us to you Dear Reader.  I need your help, I need YOU to submit to be a speaker for SQL Saturday 232.

“So Balls”, you say, “What is SQL Saturday and why should I submit to be a speaker?”

That my Dear Reader is the $50,000 question.  The short answer, let’s go make a little history together.

COME MAKE SOME HISTORY


What is SQL Saturday?  From its humble beginnings, SQL Saturday was started as a way to give back to the SQL Community.  The statement “give back”, is over used and often clichéd.  It is used to give meaning to deeds, when people have trouble justifying “why” they did something with greater substance than going with the truth and saying they did it because “it was popular to do”.   True justification rolls off the tongue like drops of rain water fall off leaves.   In this case, however, I find to “give back” fitting. 

The professionals that assembled saw value in sharing knowledge.  Turning hard learned lessons, production level victories, and new and unexplored concepts into group discussions and learning.  In the end it created an amazing support group.  We are one group of professionals, which have a particular job. Sometimes we are on a team, other times alone, the people we interact with outside of our field do not often understand our challenges.

The one thing I hear over and over again from first time SQL Saturday goers is, “I feel like I arrived at a home that I never knew I was missing”.  I share that feeling, it was how I felt on my first day.  My first was SQL Saturday 49 Orlando 2010.  You can see, I arrived late to the party. 

The very first SQL Saturday happened in Orlando on November 10th 2007.  It was the brain child of Andy Warren.  There was only one SQL Saturday in 2007.  The concept was simple, get MVP’s, Authors, first time presenters from the SQL Community, and put on a FREE day of training.

Six Tracks, Seven different time slots, and a total of 37 different sessions throughout the day and we had our first one in the bag.  The very next Year saw SQL Saturday 2 in Tampa, followed by SQL Saturday 3 in Jacksonville, SQL Saturday 4 back home in Orlando, and SQL Saturday 5 (the first one out of Florida) in Olympia Washington.  Five SQL Saturdays in 2 years.  Last year in 2012 there were 50 in many different States across the US and 32 in other countries and US Territories, for a total of 82 different events.

Now it’s time for SQL Saturday to come back home.  Now it’s time for SQL Saturday Orlando.  Once a year we do this.  We come back home to where it all started.  It is fitting that Orlando is associated with attractions from far across the globe.  There is a magic here that draws us in (no not just the Mouse).  This is where it all began and this is your chance to be a part.  I wouldn’t want you to miss it!


WHAT DO I NEED TO DO?

If you want to give a talk but don’t know where to begin, read this blog I put together ( I WANT YOU to Present at SQL Saturday 85).  I discuss Abstracts, how to write one, and how to put together an author biography.

Once you’ve done that all you need to do is click here to submit, the call closes 7/23/2013 so get yours in now!

“So Balls”, you say, “I already see some SQL BAD @$$es like Tim Ford, Devin Knight, and Andy Warren have already submitted?  What chance do I have?”

Not just good, but a GREAT chance Dear Reader!  SQL Saturday not only has the goal of educating people for free, but we grow local speakers. I was in charge of speakers for SQL Saturday 85 and my mission was to give everyone a slot especially first time speakers.    Rodney and I have talked about this, it took longer to drink the beer than it took to set our guidelines.

I want you to succeed Dear Reader, and so does the rest of the SQL Saturday Team.  If you have the fire in your belly to go out and give a talk to a group of IT professionals, newbie or Pro, you cannot find a better venue than SQL Saturday Orlando.

I hope to see you there!  As always Thanks for Reading!

Thanks,

Brad