Showing posts with label DBA Question of The Day. Show all posts
Showing posts with label DBA Question of The Day. Show all posts

Friday, April 12, 2013

DBA Question of The Day: What is SAVE TRANSACTION?

 Hello Dear Reader!  I was teaching a class in Chicago last week and I got a lot of really great questions.  

We only had two days so time was short.  I got an email reminder today.  So this is the first blog that will answer some questions that I didn’t get to cover in depth in class, and I thought it would be an Excellent Question of the Day!  

The Question that I got was “What is the T-SQL SAVE TRANSACTION statement used for?”

The short answer is it is a way of setting a save point in multiple Nested Explicit Transactions.  For the Long answer we’ll look a little deeper.

Back in a little over two years ago I wrote a blog, We Should Have A Talk: Your Database is Using ACID.  If you are unfamiliar with ACID go read up on it, because it is fundamental to what we are doing here.

In that same blog I talk about Explicit, Implicit, and Nested Transactions.  In case you miss the ACID reading let’s cover those.

Every Transaction in SQL Server is an Implicit Transaction, unless otherwise specified.  You can turn off Implicit Transactions but that’s a talk for another time.  Implicit means implied, rather than expressly stated. 

As with the picture we don’t have to say it.  Hopefully things end a little better for us.

 SQL Server works the same way.  No, not like a cat driving a car.  Transactions are just as Implicit.


Explicit means fully and clearly expressed or demonstrated; leaving nothing merely implied.

There is no mistake about it.  The following SQL Statement is an Explicit Transaction.




So we’ve covered Implicit and Explicit, now we get to NestedNested Transactions are multiple levels of Explicit Transactions.

For Example:

Here we have 3 levels.  However, if any one of them fail or ROLLBACK then they all Rollback.  

Some companies use Explicit Transactions as part of their error handling.  Nothing wrong with that, but wouldn’t it be nice if you could commit part of the statement, even if a later part fails?


Saved Transactions allow us to save our success as we go in Nested Transaction, so that a Rollback doesn’t wipe out all of our progress.  

Depending how your code works, you MIGHT be able to use this.  Test the logic very soundly.

So let’s look at how this works.  First we’ll create a Table to test out our transaction.

if exists(select name from sys.tables where name='myTable1')
     drop table dbo.myTable1
create table dbo.myTable1(
          myid int identity(1,1)
          ,mychar char(50)
          ,mychar2 char(50)
          ,constraint pk_myid_mytable1 primary key clustered(myid)

Now let’s create a transaction to insert some data and roll it all back.

     INSERT INTO dbo.myTable1(mychar, mychar2)
     VALUES('some data 1', 'some more data 1')
          INSERT INTO dbo.myTable1(mychar, mychar2)
          VALUES('some data 2', 'some more data 2')
              UPDATE dbo.myTable1
              set mychar='some new data 1'
              WHERE myid=1
                   SELECT * FROM dbo.myTable1

Here’s the Output from our SSMS Windows.

SELECT * FROM dbo.myTable1

We had a ROLLBACK at the end.  So the work isn’t going to stick.  Sure enough another select from the table and everything is gone.  

Now let’s try the same code again, but now we’ll create a SAVE TRANSACTION and name is testSavePoint.  Note the syntax change.  *We need to call this in the ROLLBACK otherwise the whole thing will ROLLBACK again.*  ACID properties are still in play, so we need a COMMIT AFTER the ROLLBACK  
Let's Try this again
     INSERT INTO dbo.myTable1(mychar, mychar2)
     VALUES('some data 1', 'some more data 1')
     SAVE TRANSACTION testSavePoint
          INSERT INTO dbo.myTable1(mychar, mychar2)
          VALUES('some data 2', 'some more data 2')
          save transaction testSavePoint
              UPDATE dbo.myTable1
              set mychar='some new data 1'
              WHERE myid=3
                   SELECT * FROM dbo.myTable1

Here’s our output from SSMS again.  This looks very similar to last time with only changes in the identity key.


But did we save anything
SELECT * FROM dbo.myTable1

This time however, only our Update Statement was rolled back.

“So Balls,” you say, “That’s cool, but will we actually use this?  If we are using Nested Transactions would we only want to save ½ of it?”

Excellent Question Dear Reader!  Let me answer that with a real world scenario.

I was working with some friends earlier this year.  We changed an update job that ran for hours from row-by-row logic to set based logic.  The result was minutes instead of hours.  There were two parts of this job that we worked on. 

It was a four step process.  First a range of data was selected from one table.  Second we looked for missing data and inserted anything that was missing based on matching natural keys.  Third we then looked at everything that matched and updated any rows that needed the information.  Fourth we updated the range selected in another table.  We needed all four steps to run.  A Saved Transaction wouldn’t work for us using their current logic.

However the job failed when we first ran it, because there was an error on saving the range table.  Insert and Updates worked just fine.  The job would look at the range table and try the same range.  Even though it was failing the job was reporting success because the error handling did not catch the rollback.  It was explicitly called, but executed on a GO TO on error statement.

While we would have still had to fix the error on the range table, we could have used SAVE TRANSACTION to save the insert and update.  Then moved the range table update to its own Transaction that would have better error handling.  It also would save retry operations on the job by not processing expensive Updates, Inserts, and Rollbacks.

Alright Dear Reader, there it is.  SAVE TRANSACTION and how it works with Explicit Transactions.  Thanks for the questions, keep them coming!

As always Thanks for stopping by.



Friday, October 26, 2012

DBA Question of the Day: Statistic
Hello Dear Reader, yesterday I posed the question to you what are Statistics?  We could get down and dirty of how the internals of SQL Server use Statistics (and we will), but first let’s talk about the concept because it is at heart a very simple one.

In America we are in my favorite season, Fall.  The weather is cool but not to cold.  The leaves turn beautiful colors, the smell of wood burning in a fire place, fire pit, or the general smokey scent that goes with the great out doors this time of year always springs to mind.

Unfortunately we are also in my least favorite time of the every 4 years, Election Season.  But Election Season does tie nicely into our subject of the day.  Statistics are nothing but POLLS!  In election season they say things like, “Do you like this Candidate?  Do you like this particular issue? Do you believe they kill puppies when people aren’t looking?”, and we get a break down of Yes, No, and Undecided.

In SQL Server Statistics are SQL’s way of Polling our Data. 

SQL Server: “I See you are a column Called First Name”

Registered Column: “Yes I am”

SQL Server: “Mind if I ask you a few questions (QUERIES) and create a Poll based off of how you answer?”

Registered Column: “Go Right Ahead”

Breaking New Folks Registered Column has 28% of his data between Alex and David, only 5% between David and Nick, and a WHOPPING 77% between Opie and Zachary!

Now the next time we ask a question (QUERY) we (THE QUERY OPIMIZER) have an expected number of people in a particular demographic (VALUE RANGE).  We know if our candidate wants to know how all the Opie’s through Zachary’s will answer a question (QUERY), we can plan on how to best collect that information (AKA HOW THE QUERY OPTIMIZER CREATES A QUERY PLAN).  We can then figure out how many people we need to send out (WORK THAT NEEDS TO BE DONE SORT, SPOOLS, HASH’s) in order to collect that data.  For example we need less people to collect data from David to Nick (NESTED LOOP JOIN) than we do to collect data from Opie to Zachary (HASH JOIN).

Now that we have our Poll, the next time we have a question (QUERY), the folks on the new screen will say, our expected result was 77% when we selected the range of Opie’s and Zachary’s however we found that only 58% actually resided there.

Our population was moving (DATA WAS BEING UPDATED/INSERTED/DELETED) and our Statistics were not up to date.  If we had a plan to collect our data (QUERY PLAN) using a lot of people to go out in the community and collect polls we may have sent out to many and over allocated our resources (PICKED A BAD PLAN IE HASH JOIN INSTEAD OF NESTED LOOP).  If we still had 77% of Opie’s to  Zachary’s our polling plan (ESTIMATED ROWS RETURNED) 
would be good, but it wasn’t (ACTUAL ROWS RETURNED).

So our Statistics were out of wack on our poll.  Something was off.  If we had a big plant closure in our town or a big company laid off a lot of people (PURGE PROCESS ON A TABLE), then we would expect some population shift.  If we knew 20% of people (20% OF ROWS IN A TABLE) were going to be laid off we could expect some would move in with other family members or move to find new work.  We would probably send people out in the community to get new polls (AUTO UPDATE STATITICS) and find out what the new data was for.  We found 58% Alex and David,  25% between David and Nick, and  17% between Opie and Zachary

Regeneration of Statistics causes us to re-think our plan to Poll Opie’s to Zachary’s (QUERY RECOMPLIATION TO GET A NEW QUERY PLAN) in order to send the right amount of people out to ask questions (QUERY) to get our candidate some information (GET OUR DATA).  Now we see that we need much less people (NESTED LOOP JOIN) to poll Opie thru Zachary than we previously did (MERGE JOIN) and our polling plan (QUERY PLAN) reflects that.

Now that we have a general idea of how things work let’s spell it out a purely in SQL Server Language.   Clustered Indexes and Non Clustered Indexes automatically have statistics generated for their key columns.  However there are more columns in a table than just indexed columns.

SQL places those Statistics in an object named a Histogram.  A Histogram contains entries (will only ever have a max of 200) that show data values spread over a range.  This allows the Query Optimizer when constructing a plan to say, “Statistics, I’m going to run this query on this table how many rows can I expect to get back?” and then plan accordingly.

We have the following table named Students with columns StudentID, SSN, FirstName, LastName, MiddileIntial, BirthDate, and Gender.  Clustered Index on StudentID (no debate in indexes right now this is just a demo J ).

*All of the code to create a Students table along with other and generate random data was uploaded to my resources page yesterday as a part of my Trimming Indexes Getting Your Database In Shape presentation.  Download that code and play around with it however you like!

                create table students(
                   studentID int identity(100000,1)
                   ,ssn char(9)
                   ,FirstName varchar(50)
                   ,MiddileInitial char(1)
                   ,LastName varchar(100)
                   ,BirthDate datetime
                   ,Gender char(1)
                   ,constraint pk_students_studentID primary key clustered (studentID)

If we insert a couple rows into this table (*go get the code!) and then go look at SSMS.

We see that we have statistics created for my primary key.  If you right click on the Statistics and open them up and then click on Details you will see a whole host of information.  You can see when the statistics were generated, when they were last updated and what the range is.

You can see that my Average Length is 4.  That is because my Primary Key on column StudentID is an INT or a 4 byte fixed length value.  You can see in my range what my RANGE_HI_KEY is to my RANGE_ROWS.  

For my 200 different samples you can see how many rows fall in that data DISTINCT RANGE ROWS.

If I said to the Query Optimizer

     studentid between 104030 and 108969

I would expect to get back 4940 rows, BUT my statistics are OUT OF DATE and do not refelect that.  So when I execute my query, and include actual execution plan, this is what I get back.

My options at this point are to update my statistics.

     UPDATE STATISTICS dbo.students pk_students_studentID WITH FULLSCAN

And now my query plan looks like this.

As you can see the Optimizer expected the number of results it got back.  In my Query Plan (a simple trivial one), the statistics did not shift my outcome.  But had I joined on the Courses table or the Grades table it could have completely changed my plan.


So the question on twitter yesterday that spawned all of this was should I delete old statistics?  My answer to that is no.  You should update them.  The Histogram is not normally a big space consuming object.  They are not like unused Indexes.  Unused indexes occur IO, they must be maintained as the base structure is updated.  This costs your system.  Statistics just off the query optimizer a path, if the statistics are old and the range is still valid leave them be. 

Whenever a query comes along you will save the optimizer the trouble of regenerating them.  Because if they are not there we have to create them, but that is an example for another day.

As always Thanks for stopping by!



Thursday, October 25, 2012

DBA Question of the Day

When I used to work in an office I had a stack of flash cards and occasionally I'd grab a few, see if I still knew them and the answers and I'd walk around and discuss them with some of the other DBA's.

At Pragmatic Works I do this on our DBA DL list and I've been debating doing an occasional Question of the day series.  This Blog is inspired by a #SQLHELP conversation that I just saw my friend Mike Fal (@Mike_Fal| Blog) have regarding this very subject.  So here we go, first topic we will tackle in the old bag of flash cards:

What are Statistics and how are they used?  

Answer tomorrow.