Monday, December 5, 2016

Optimize For Unknown

Hello Dear Reader!  It's been a while.  I was working with a friend and we came across an interesting problem.  They had a large amount of skewness/data skew.  This led to some performance issues for them.  The way this manifested itself was in a set of queries that ran quickly, normally within seconds.  Then occasionally they ran much longer.  To be precise, they ran about x800 times longer.


As you can imagine this is a less than ideal situation for a production environment.  Their solution was to add OPTION (RECOMPILE)  to all of their stored procedures.  This solved the issue with their data skew.  It caused additional side effects.  Increased CPU as every stored procedure now had to recompile before execution.  No stored procedure could experience plan reuse.  Using DMV's to track stored procedure utilization and statistics no longer worked.


"So Balls", you say, "What is the alternative?  Is there an alternative?  And what in the name of King and Country is skewness/data skew!"

Ahh great question Dear Reader!  Dear Reader why are you English today?

"Because you watched the Imitation Game last night, and Benedict Cumberbatch's voice is still stuck in your head."

Right as always Dear Reader!  Good point let's explain it and then do a Demo!


SKEWNESS/DATA SKEW


Skewness is a term from statistics and probability theory that refers to the asymmetry on the probability distribution of a real valued random variable about its mean.   This could get complicated quickly.  In simpler terms that I can understand it means that there are patterns based on variables with an assigned real value.  Based on those variables skewness can be determined and it is the
difference of the normal.

How does this effect our query plans.  With data skew we have a over abundance of data that fits one statistical model and it does not fit for others.  This means the way the SQL Server Cardinality Estimator estimates for one may be different for another based on statistics.

Here's a quick example.  I have a school with 100,000 students.  Every student has a combination of 10 different last names.  On average one could assume that every 10,000 students will have different last names.  If we randomly assign these values, there will be a slight skewness.  Most of the ranges will be similar.  For this example I'll use my students table from my college database.

       select
              lastname, count(*)
       from
              dbo.students
       group by lastname

       order by count(*) desc;



Now we move a new student to the area.  This one student will give us quite a bit of data skew, and will be extremely asymmetrical to the other results.


In order to show this in action we'll make a stored procedure that returns our First Name, Last Name, and the Course Name of students by last name.  Remember some students will have multiple courses.  This means we will have more results than we do last names.


if exists(select name from sys.procedures where name='p_sel_get_stu_name')
begin
       drop procedure p_sel_get_stu_name
end
go
create procedure p_sel_get_stu_name(@lname varchar(50))
as
begin
      
       select
              s.FirstName
              ,s.LastName
              ,c.courseName
       from
              dbo.students s
              left join enrollment e
              on s.studentID=e.studentid
              left join courses c
              on e.courseid = c.courseid
       where
                     lastname=@lname

end

So now we will execute this query and see the difference between our query plans and benchmark  the performance.

exec p_sel_get_stu_name 'Bradley' with recompile;



exec p_sel_get_stu_name 'Segarra' with recompile;



The first query took a little over a second to return two rows.  The second query was sub-second and returned 13,843 rows.  Each execution plan was different.  One was parallel, the other was serial.  That makes sense Parallel returned over 13,000 rows, serial only returned 2 row.  The statistical variance is different.  The cardinality estimate gave us different results.

Now let's make this bad.  I'm going to run the first query and second query again.  This time I'm removing the with recompile.

exec p_sel_get_stu_name 'Bradley';
exec p_sel_get_stu_name 'Segarra';

The first query did not change.   The second one did.


We used the cached plan.  Because of data skew we forced 13,843 rows through the serial execution plan.  The result was 8 minutes and 42 seconds instead of a sub-second query.


This is data skew.  We've shown that recompiling the query forces both to execute with their least cost plan.  Is there another option?  In this case we could use the query hint OPTIMIZE FOR UNKNOWN.

The benefit of OPTIMIZE FOR UNKNOWN is that we can remove the recompile.  This will allow us to get the best/least cost plan based on data skewness of the statistics.


if exists(select name from sys.procedures where name='p_sel_get_stu_name')
begin
       drop procedure p_sel_get_stu_name
end
go
create procedure p_sel_get_stu_name(@lname varchar(50))
as
begin
      
       select
              s.FirstName
              ,s.LastName
              ,c.courseName
       from
              dbo.students s
              left join enrollment e
              on s.studentID=e.studentid
              left join courses c
              on e.courseid = c.courseid
       where
                     lastname=@lname
       option (optimize for unknown)
end

exec p_sel_get_stu_name 'Bradley';
exec p_sel_get_stu_name 'Segarra';

Now we execute our procedures and we get our execution plans.  Here are our new query plans.




You'll notice that the execution plan based on statistical variance was parallel plan.  Both queries executed sub-second.  This is not the least cost plan for the first query.  In case you were curious here is a look at the histogram.


WRAP IT UP

So what does this mean?  For the business purpose of  speeding up a query option recompile is completely valid.  

It comes at a cost.  Recompilations, increased CPU utilization, and you loose the history of the execution of the stored procedure from DMVs.  

If those costs do not effect you, or effect the system less than the fluctuation of query performance then it is valid.  

There is also another alternative to use in your tool belt.  That is what we used today.  Like all things in computers use it judiciously.  Test, test, and retest before deploying into production.  As always Dear Reader, Thanks for stopping by.


Thanks,
Brad 

Monday, August 29, 2016

SQL Azure, Hekaton, and Bob Dorr is Awesome

Hello Dear Reader!  I had recently written a blog for the PFE blog, "How Many Tables Can I Have In SQL Azure DB & SQL Server".   It was a fun blog that I had written during a very long layover when traveling home.  It covers the maximum number of objects that you can create in SQL Server.


In this blog I had written up a demo to create the maximum number of tables you can in a database.  I also explained that based on table creation rates it would take around 64 days for me to reach that limit.  As a point of interest I tried creating In-Memory OLTP/Hekaton tables vs. native T-SQL tables.  There I noticed something very interesting.  Creating In-Memory OLTP tables was slower than creating T-SQL tables.  A lot slower.

To be crystal clear this is specifically on the creation of tables nothing more.  This has nothing to do with the insert, deletion, or reading of data.  My blog was about how many OBJECTS could a database hold, and I had a tight loop creating tables over and over again.  I explored different options to get as many tables created as quickly as possible.

"So Balls", you say, "You said this was slow.  How slow?"

Great point Dear Reader, back to the question!

I was able to create 388 native T-SQL tables per second and only 3 In-Memory OLTP tables.  On the book of faces someone asked about the results very quickly.



I reached out to my colleague Robert Dorr (Blog) from the SQL Team.   He and Bob Ward, from the SQL Tiger Team, (@BobWardMS | Blog) are running a great blog called Bob SQL, https://blogs.msdn.microsoft.com/bobsql/ , you should check this out.

He was kind enough to write up a reply that covered how things work under the covers.  To read his blog click here.


QUICK SUMMARY
Each In-Memory table is managed compiled code.  By making In-Memory objects compiled this allows us to save CPU cycles and allows us to skip some of the traditional over head of native T-SQL objects.

When we issue the DDL to create an In-Memory table we follow the same processes as a native T-SQL table. We also do two additional steps we create the Access DDL code and we compile the Access DDL code.

These steps give us a one time hit on creation of an object that give us long term benefits as we execute against our In-Memory OLTP/Hekaton tables.

WRAP UP



There are nothing but great questions out there Dear Reader.  The worst kind of question is the one that is not asked.  In this case the question pushed me to look for an answer that I already wanted to know.


In this case a demo that I almost didn't write, a question from a place that I normally do not look for questions, are all that we needed to get an answer.  My thanks again to Robert Dorr for taking the time to give us the answer.



I love it when a plan comes together.  Until next time Thanks for stopping by.

Thanks,

Bradley Ball






Friday, May 20, 2016

SQL Server Random Workload Generator

Hello Dear Reader!  Before I start this blog I want to give credit to Jonathan Kehayias (@SQLPoolBoy | Blog).  He’s an Microsoft Data Platform MVP, MCM,  part of the amazing crew over at SQLSkills, and a really nice guy.  



I heard Jonathan talk about this at SQL Saturday Tampa years ago.  He joked at the time that, “in an effort to show that he did not have a life.  He make a workload out of all the current queries on BOL for SQL Server 2008 R2”.  

When I present and show mine, I tell the audience about that and joke that, “In an effort to show I have less of a life than Jonathan, here’s what I’ve done”.  The feedback I receive is often, "That shouldn't be a goal".  Point taken Dear Reader, point taken.  

Needless to say, I loved the script.  I grabbed it, tweaked it, and have used it for years. 

Today I wanted to make my version, which piggy backs heavily on Jonathan’s.  Here are the main differences:

  1.  tweaked the queries removing the compute by clause (deprecated in SQL Server 2008 R2) and replaced it with Cube or Rollup 
  2. I turned all of the queries into stored procedures, and adjusted the workload to call stored procedures
  3.   IF the table had a result set that allowed for it, I used the Randomize Function and passed in parameters to the stored procedures in order to get different data sets
  4. This gives a healthy dose of adhoc SQL as well as stored procedure use
  5.   I call Jonathan’s PowerShell from a batch script that allows me to spin up or down a larger set of concurrent connections.  (This makes the process visual, which is very handy when you are presenting to a crowd)


“So Balls”, you say, “That’s great.  Show me the money.”

Ah Dear Reader, as ever you are insightful.  We’ll cover this in 3 steps:  Download, Setup, and Run.


DOWNLOAD

You can Click Here or you can visit my Resources Page.  


At the top of the page you will find the download for the Random SQL Workload Generator.  

Download the ZIP and extract it to your C:\ Drive (or anywhere else you would like). 

 I’m going to use the C:\ root for simplicity in our next step Setup up.






SETUP

Now that you have the folder sitting in your C Drive let’s do the following steps:
  1.  Run the Create Stored Procedures for Random SQL Workload Generator.sql
  2. Change the SQL Server Instance Name in the RandomSQLWorkloadGenerator.ps1 to your instance name
  3. IF you changed the path from the C:\ folder, update ClientConnections.cmd and RandomSQLWorkloadGenerator.ps1 to change the paths to the proper file location.




The “Create Stored Procedures for Random SQL Workload Generator.sql” script references the AdventureWorks2014 database.  You can run this against 2012 as well, I have not yet added In-MemoryOLTP test queries, but I will eventually.  If you restored AdventureWorks2014 with a different name you will need to change that in the RandomSQLWorkloadGenerator.ps1, the  “Create Stored Procedures for Random SQL Workload Generator.sql”, and the Random_SQL_Workload _AdventureWorks 2014.sql scripts.



RUN



After all of the setup, you should be good to double click on Create5Connections.cmd.  It will spin up 5 command prompt windows.



Just to validate we are up and running, you can run a quick query against sys.dm_exec_requests, exec_sessions, queryplan, and sqltext to show our activity. 

select
     der.session_id
     ,der.status
     ,der.command
     ,des.program_name
     ,der.last_wait_type
     ,der.wait_type
     ,st.text
     ,qp.query_plan
from
     sys.dm_exec_requests der
     left join sys.dm_exec_sessions des
     on des.session_id=der.session_id
     cross apply sys.dm_exec_sql_text(der.sql_handle) st
     cross apply sys.dm_exec_query_plan(der.plan_handle) qp
where
     des.is_user_process=1
     and
     des.session_id<>@@spid

I can see that I have 4 connections and begin playing around. 



WRAP IT UP

Alright Dear Reader, a few final notes.  I don’t have a lot of write operations in this current workload.  You could easily add them yourself to the workload script, and I do plan to add some eventually.  Yes, some of these queries are bad.  No I’m not going to tune them, that’s part of the fun! 

If you find some cool stuff to do with this script, please let me know.  This is a pretty simple framework with a lot of cool potential.

Next up for this script, Running it against Azure SQL Database.  It’s not that hard actually just a couple configuration changes in PowerShell.  Plus, I’ve already done it.  More on that Next Week!

As always Dear Reader, Thanks for stopping by.

Thanks,

Brad

Friday, May 6, 2016

SQL Saturday Jacksonville #552 Pre-Con: A Day of DBA Fundamentals- Install- Maintenance plans-and Security

Hello Dear Reader!  I'm here in beautiful Jacksonville FL for the SQL Saturday 552 events!  Today my friend Dan Taylor (@DbaBulldog | Blog) and I are presenting our Pre-con A Day of DBA Fundamentals, Install, Maintenance Plans, and Security.

We are having a lot of fun so far, and I wanted to share.  If you are in the class or curios you can download the Deck Here.

More updates as we go!

Final Update 4:43 pm

I just wanted to say a big Thank You to all the people who attended our pre-con today and the SQL Saturday Jacksonville team!

We covered the Azure Portal, SQL Azure DB, Elastic Pools, Azure VM's, and Premium Storage.

I'm exhausted.  The class is exhausted.  It's been a great start to the event.  See you all tomorrow!

Here's the Class waving goodbye!

Update 5

We just finished up group activities working on backups and restores.  Getting the group interaction was key.  A very adapt pupil found a great way to use azure for a highly transactional system.

Not gonna lie I gave them some difficult ones to figure out.  In the process looks like Dan caught me blogging!


Alright up next Maintenance Plans, Azure, and one last group exercise!


Update 4 2:30 pm

At the beginning of the day we split the class into 5 groups.  I told them not to get comfortable and we just showed why.

In this job communication is key.  We just make all the teams regroup and reintroduce themselves.  Next we started work on discussing the importance of backups and restores.  Dan led the way preaching the importance of the right backup strategy for the right business objective.


 During lunch we had a good discussion about how to communicate with the business.  I recomended two books that have become a good starting foundation for me, and a mandatory read for everyone on my old team (Much love to the #TeamBall).

If your interested you can find them here.

Crucial Conversations Tools for Talking When Stakes Are High, Second Edition

Crucial Accountability: Tools for Resolving Violated Expectations, Broken Commitments, and Bad Behavior, Second Edition

Update 1:30 pm

Great lunch by Panera here at the SQL Saturday Jacksonville Pre-Con!  We are having a nice group exercise.  I like watching the teams interact and discuss security and what permissions should be given per scenario.


Will we use AD or SQL Authentication?


Granular permissions?


Dan never could resist a great debate on security!


Update 2 11:49 am

We've taken a break for lunch!  The class exercises so far have been a lot of fun.  We've got a smart crew here today.  So far we've covered the different versions of SQL Server, how features and licensing go together, and using SSMS to create basic options and monitor.

Great participation so far.  We had a nice side discussion on what services should sit side by side.  Lot of caveats with this and this will be a nice blog for a later day.

Oh and Devin Knight stopped by!

OK Lunch Time!



Update 1

Scott Gleason (@SQLScottGleason) Is helping us kick this off!


Great crowd so far!  I wonder what Scott thinks of my performance?


Tough crowd!




Monday, February 29, 2016

Goodbye Pragmatic Works

Normally when I write a blog, I start out will a Hello to my Dear Reader.   Not today.  This was almost a blog I didn't write because I didn't know where to start.  So let me start with the people and see if it moves along from there.


To the Consultants

You are an amazing group of women and men.  The amount I have learned from you and with you over the past four years has been staggering.  I say this to those who have left and those who remain.  It was my privileged to work with you every day.

We have too many fun memories for me to count.  I went through my pictures and found hundreds.  I think that speaks well of the times we had together, because it was important enough to take those pictures.

I've said my fair share of mushy goodbyes, listened to Baz Luhrmann's Sunscreen way to many times, and rambled on about coffee filters.  There are some words that you say in life, there are those whose meaning you know intimately because emotion surrounds them.  When I say brother, sister, call someone buddy or big guy, those words have far more meaning than their surface value.

 Each of them is the embodiment of love and relationships that have been built over time.   When I use them my heart thinks of those people in my life and those times when those words were made noble to me.

Team Ball will forever be one of those words now.


 To the Management Team

Thank you for the opportunity to be a leader.  Not just in the community but to a staff that I count as close friends.  There were challenges, struggles, loses, and wins.  The ability to shape the direction of a company is a strange and powerful gift.  So my parting advice to you is so much more simple.  Don't mess it up.

I don't say that to be glib or short, you did great before I was there, and I'm confident you will do great after I am there.  That is my hope for you.  So let me explain.

You've built something special and great.  You have an environment where people come to work with one another as part of their passion.  In my time with you I've learned just how critical proper and constant communication is to everyday success.


There is no book that tells us what to do, or how to lead or grow this company.  There will be trial and error.  Remember to listen to employee concerns, don't change course too quickly or too many times, most importantly remember Why you do what you do.  If you start with Why good things will always follow you.

You have a fantastic team.  I'm very proud to have counted myself as one of them.



 WHERE ARE YOU GOING

For everyone else reading, the next question is where am I going.   Here's a hint.


 Thank you Dear Friends.  I'm sure I will talk to you soon.

Thanks,

Brad