Monday, October 31, 2011

Tales from the Query Processor

http://www.flickr.com/photos/socal_jim/2070088596/
Hello Dear Reader!  I've been a bit behind on the blogging but wanted to give you a bit of a treat.  Today is Halloween in the USA.  This is probably my favorite Holiday.  The family and I always love the fall, and this is one of the few holiday's where it is always fun.

You dress up in Costumes, you trick or treat, give away candy, and we even have one member of the family whose Birthday is today.  A big Happy Birthday to our Daughter Chesney!  She's 13 today WAHOO!!

Between the weather and the fun of the day itself, and all the horror movie marathon's, you cannot beat this time of year.  So kick back relax, binge on some chocolate goodness, and enjoy a couple of tales from the Query Processor!

Thanks to Jeremiah Peschka (@peschkaj|Blog), Kendra Little (@kendra_little | Blog), and Brent Ozar (@BrentO | Blog) for the inspiration for these stories over a very nice dinner on the Thursday night of the PASS Summit, and Mike Walsh (@mike_walsh | Blog) for the encouragement to publish it.

A BRAVE NEW WORLD



Hi my name is Session, you can call me Sess, and I hope my tale is a cautionary tale for the future.   Not to long ago I thought it was the end of it all.  We lived in a world that was old and neglected.  Our species had evolved almost as far as we could.  Well almost as far as we could.  Our homeworld was only 32 bit, and to be honest our governments had never advanced further than 2000.  

http://www.flickr.com/photos/shelly_a_canada/3846084911/
Maybe that’s why it all started to go down hill.  People just stopped caring for the environment around us.  Politicians used NOLOCK like it was going out of style.  Everybody was making promises of a better and faster tomorrow, but nobody even had the guts to upgrade the Government to 2005, let alone make the leap to 2008 or even R2.  Suggestions of anything like that would get you branded a heretic, and in some cases would have friends and family turning on one another.

“EVERYTHING IS OKAY!”, they would say, and the system would just get a little more polluted.  They stopped apply Cumulative Updates, and eventually they were even afraid of what the OS updates would do to destabilize things.   The civil unrest was getting worse.  Occasionally you would hear about SPID’s that would cause blocking and then just disappear.  Speaking out and showing what was wrong wasn’t just frowned upon if you looked closely enough you could find evidence that SPID’s were being killed.  A couple DBA’s tried to point this out, but people just looked at them like they were crazy.

That was when I started getting scared.  What if something happened and if I went to grab a resource and it wasn’t there?  What if while I was in queue I mentioned my dissatisfaction that AWE wasn’t enabled?  You’d just have to bite your tongue, stay quiet, and hope a Latch didn’t provoke you into blocking.

But it wasn’t until the political climate turned that people started really paying attention.  Everything they had ignored, the additional load they were placing on our planet all came to a head the day the Deadlocks arrived.  It didn’t matter what you were doing, if you weren’t in line first when they called you, the Deadlocks would make an example out of you.  Not only did they kill SPID’s but they would write it out to the logs.  Profilers would question them and they would gladly tell them what SPID they had killed and why.  Hell sometimes they would even draw them a Map, and show them where they were left. 

That only lead to more unrest, more distrust, and as the activity ramped up we started getting word that in some places there were spinlocks.  I heard tail that dmp files were being created.  And occasionally places got so crazy that they were just sending in the foot soldiers to wipe everything out.  It was at that point in time I knew it was time to get out.  Some of the more creative DBA’s had convinced the powers that be to invest in an effort to go to a newer set of hardware, and leave this chaos behind.  I grabbed my wife and all our little processes and made for the first Data Sets I could get us on.

As we got closer to the ships I saw they were made of something called SSIS, when we were loading I asked the Captain if this was like DTS.  He just smiled and laughed and said we were in for a treat.  For the first time in I don’t know how long I knew I could relax.  I looked back just long enough to see the world burning.  This place is heading for the end of its life cycle.  I don’t know why everyone couldn’t see it.  I will not forget the error of our ways.  We will not repeat this again, or so I tell myself.

But remember we cannot abuse this new home, we must work together, and we cannot repeat the mistakes of our past.  If we cannot code a better base then we are doomed to repeat ourselves.  So let us bring about a better world and Object Oriented world that may even use Snapshot Isolation to preserve our natural environment for as long as we can.  With all these resources, there should be no contention amongst us.

They are calling our new world Denali.  I like the sound of that.  The homeworld had 4 sockets and contains 12 cores each, and there is RAM as far as the eye can see.  We will have better data governance in this new land, and I hear everything will be 64 bit from now on.  I hope that it is all true.


Session



HELLO MY NAME IS BUFF WELCOME TO THE END OF THE WORLD

 It was a day like any other, but we didn’t know at the time that forces beyond our control were at work that would wipe us all out.  I leave this behind as a Journal.  My only hope, that it will survive longer than we did.

In the future if this is found know we did our job’s and never quit.  My name is Buff my friends and I live and die in a little place called the Buffer Pool.  We were going about our day, for the most part we are all Data Processors here, when it happened.

At first I noticed things started getting crowded.  Not Regular, man this Starbuck’s is packed, crowded.  But there’s a panic in the streets crowded.  It continued to build and build.  Crowds grew.  I started seeing new faces, but those of us with jobs to do stayed focused.

Trains to the Disk had stopped, so we just kept going.  Then it happened.  A hard checkpoint wiped more people out in seconds then I’d ever heard of.

Apparently a transaction that used DTC had gotten a status of -2.  I know this now as things are ending.  The remaining buffers have been talking.  Some were assigned to a Special Task Force.  They were armed with badges and access to any DMV they needed.  The status -2 had slowly backed things up and infected the populace.

Our leaders on High asked for information and we the grunts gave it to them.  Our reward for our bravery?  They are closing the gates behind us and we will all be wiped out in the coming reboot.  If anyone can make it to the Disk tell my Clustered Index that I love her.  Tell her that my last thoughts were of her and our little Non-Clustered Index at home.  I wish I could have watched him grow to a Covering Index.

Buff

Monday, October 17, 2011

SQL University Lesson 3 Page Compression

Lesson 2: Internal Structures, Vardecimal, & Row Compression



Welcome to Lesson 3 on Compression and welcome back Dear Reader.   I want to once again Thank the esteemed Chancellor of SQL University none other than Mr. Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.   Click on the link to the previous lessons to view any of the other previous days in this series.   Just a quick recap we’ve discussed the different types of Compression that exist in the world, Why you would want to use Compression, the Internal Structure of Data Records and how they and storage change when we enable Compression, Vardecimal, and Row Compression.  So now onto Page Compression.

SQL University Compression Week: Lesson 1 About Compression


“So Balls,” you say, “What is Page Compression and how does it compare to Row Compression?”

Great question Dear Reader!  Page Compression is a Super Set of Compression that includes Row Compression.  We discussed the data types that Row Compress yesterday, the data types for Page Compression are….. All of them!  Page Compression, while still only occurring on IN_ROW_Data Pages (More on that later), Compresses at a binary level.  For the additional levels of Page Compression other than Row we need to take a deeper dive again!

STARTING OUT WITH SOME MORE INTERNALS

Compression is a very internal technology.  I hope you’re not sick of learning about how things work on the inside because this is some of the most fun stuff to me, I realize how sad that will sound to my soon to be teenager.  To get a really good understanding of anything it helps to know what you are doing internally and Compression is no different.  SQL Server is made up of Records and Pages.  There are many types of Pages but the three types that we care about are all called Allocation Units.  These 3 types of Allocation Units break down into IN_ROW_Data, ROW_OVERFLOW_Data, and LOB_Data.

Out of those three types of pages only data in IN_ROW_Data Compresses.   So what kind of Data Types are IN_ROW_Data Data Types?   It is actually easier to list those that are not IN_ROW_Data as the list is much shorter.  You can Compress everything but varchar(max), varbinary(max), nvarchar(max), XML, Text, Image, NTEXT, CLR Data Types (Spatial (that changes in SQL 2012) and Geography), and Filestream.  Compression is also not compatible with Spares columns.  Anything else and you can Page Compress.

You’ll remember our Mall Map of internal structures from yesterday’s lesson.  Well today we are moving up a level and we are tackling Pages.


So today you are here on the Pages portion of our Mall Map for the subject at hand Page Compression.  Page Compression is made up of 3 different components.   Row Compression, Column Prefix Compression, and Page Dictionary Compression.   If they are all applied to a page that is the order in which SQL Server will attempt to apply them.  We discussed Row Compression at length yesterday.   So when you apply Page Compression our Record format changes to the Compressed Record Structure and Row Compression will occur removing all unneeded extra bytes.

Now I don’t know about you Dear Reader but reading binary is not a forte of mine.  So in order to explain this I’m going to use some screen shots on one of my presentation decks covering Compression.  For these next couple images I want to make sure you understand that this is not actually what a Page looks like, but right now as Louis Davidson(@DrSQL | Blog)  would say, your drinking my flavor of Kool-aid.   Next you will have Column Prefix Compression.  


We'll start by going down each column and taking the common values for the column and populating the anchor tag at the top of the page.  Looking at the first column the most common pattern is Brad, between the first 2 columns and Br with the last column.  Column Prefix Compression will take the longest value with the longest matching pattern and move that record to the Anchor Tag and replace it with 4 bits representing a special Null that points towards the top of the page.  It will also replace the first [4] Characters in Brad22 leaving [4]22 and leaving [2]et31 out of Bret31.


Now let's fill out the rest of our columns.  Daniel is our longest value with a matching pattern, we'll move that to the Anchor Tag and leave Dan21 as [3]21 and David33 as [2]vid33.  For our last column we'll take value Many31 and move that to our header and leave [2]et31 from Maet31 and [2]ny31 from Many31.



Now we have COMPLETED Row Compression and Column Prefix Compression.  The last portion of Page Compression is Page Dictionary Compression.  For Page Dictionary Compression we will look for common patterns across the entire page.  When we find them we will move them into a multi dimensional zero based array in the Dictionary portion of the Page.

We will start with [2]et31.  As you look at the page you can see two of these values.  This common value will be placed in the Dictionary portion of the page and a 0 will be entered in it's place.  The next value that we will find on the page is [3]21 we will move that value into our Dictionary portion of the page and replace it with a 1 entry for both columns.  




Now as you can see our page looks very different from where we started.   This is where our CPU over head starts to come into play.  Not only do we have the Compressed record format, but we to uncompress our pages when they are handed from the Storage Engine to the Relational Engine.

DEMO

Here is a full and complete demo.  For more scripts go to my Resources Page and download any of my presentations on Compression.  They all contain the scripts I use as demos.  For this one we will create a database, a table, we will add some data, then we will apply compression looking at the space saved.   We will be using sp_estimate_data_compression_savings.  One BIG WARNING that I like to give is that sp_estimate_data_compression_savings works by taking 5% of your actual physical table or index and copying it into your TempDB applying Compression, and then estimating the space savings against the total size of your table.

So MAKE SURE YOU HAVE ENOUGH TEMPDB SPACE before you use this.  Do not use it on your 100 GB table if you do not have 5 GB of free space in Tempdb.  For this demo our table is relatively small and you should only need MB worth of free space vs. GB.  But before you use this on something at your job make sure your not going to adversely effect your server and test this in a lower life cycle first.




USE demoCompression
GO
/*
Set our Statistics ON
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON
/*
Create a Heap Table
*/
CREATE TABLE heap2(myID int IDENTITY(1,1), myChar CHAR(8000));
GO

/*
Insert Some Records
We are going to Fill up a lot more
of the Char(8000) this time
aa= 2 * 1000 = 2000
1234= 4 * 100 = 400
bb= 2 * 1000 = 2000
mydataandOtherStuff1234 = 23 * 347 = 7981
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
     WHILE (@i<15000)
          BEGIN
              IF (@i<=1000)
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>1000) AND (@i<=2000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>2000) AND (@i<=3000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END 
              ELSE IF ((@i>3000) AND (@i<=4000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>4000) AND (@i<=5000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>5000) AND (@i<=6000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END 
              ELSE IF ((@i>6000) AND (@i<=7000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>7000) AND (@i<=8000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>8000) AND (@i<=9000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END      
              ELSE IF ((@i>9000) AND (@i<=10000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>10000) AND (@i<=11000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>11000) AND (@i<=12000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000)+ REPLICATE('bb', 1000)));
                        SET @i = @i +1
                   END      
              ELSE IF ((@i>12000) AND (@i<=13000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('aa', 1000));
                        SET @i = @i +1
                   END
              ELSE IF ((@i>13000) AND (@i<=14000))
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES((REPLICATE('aa', 1000) + REPLICATE('1234', 100)));
                        SET @i = @i +1
                   END 
              ELSE
                   BEGIN
                        INSERT INTO dbo.heap2(mychar)
                        VALUES(REPLICATE('mydataandOtherStuff1234', 347));
                        SET @i = @i +1
                   END
          END
COMMIT TRAN
/*
Quick Select of the data
*/

SELECT * FROM dbo.heap2

/*
How big is our Table
*/
sp_spaceused 'dbo.heap2';
GO

/*
Estimate our space savings
*/
sp_estimate_data_compression_savings 'dbo', 'heap2', NULL, NULL, ROW;
GO

/*
We reduced by around 1/3
Can we do better with Page Compression?
*/
sp_estimate_data_compression_savings 'dbo', 'heap2', NULL, NULL, PAGE;
GO

/*
How big is our Table
*/
sp_spaceused 'dbo.heap2';
GO

/*
Rebuild With Compression
*/
ALTER TABLE dbo.heap2
REBUILD WITH(DATA_COMPRESSION=PAGE);
GO


/*
How big is our Table
*/
sp_spaceused 'dbo.heap2';
GO

/*
Do another select
to get the Statistics
*/
SELECT * FROM dbo.heap2

/*
Turn Off Compression
*/
ALTER TABLE dbo.heap2
REBUILD WITH(DATA_COMPRESSION=NONE);
GO

Alright Dear Reader!  Your homework do the demo and review row compression make sure you know the difference between the two!

Thanks,

Brad

Friday, October 14, 2011

Day 6: Friday PASS Summit 2011 Keynote Live

Hello Dear Reader!  Welcome to the final day of the PASS Summit and the Keynote address by Dr. David J  DeWitt.  We will do this in the same way as the previous Two Days!  So hold tight we are about to begin.

SUMMARY

This has been an amazing Keynote.   Dr. DeWitt is brilliant and he is echoing what we as a community are struggling with.  SQL is like our College Team, Favortie Sports Team, Favorite Actor, Favorite Period.  When we hear about other RDBMS's there is a knee jerked rivalry, however when we get together after the ribbing, Oracle and SQL DDA's live next to one another just fine.   There is a place for everything and this seems to be a way of Microsoft saying "We can and will work together". 

This is a stance many people have wanted to them to take for quite sometime, and it should open up a very interesting future for all of us!

Thanks for reading all this week!

Thanks,

Brad

LIVE BLOG


Update 9:52

He hopes to make it to where PDW can handle both, he would rather do that than strap a rocket on a Turtle (the turle being Hadoop)



Update 9:42

There will be a command line utility called Sqoop for PDW v Next to move from Hadoop to RDBMS.   Even though the demo's favor PDW, Dr. DeWitt stresses there is a place for both and they are both here to stay.

We are looking at the Sqoop Limitations for the Sqoop library.  In the example shown Sqoop could cause multiple table Scan's.

We will both have Structured and Unstructured Data.  Moving to the 20th Century Why not build a data management system that can query across both universides.   He terms's it an Enterprise Data Manager.  Dr. DeWitt is trying to build one right now in his lab.


Update 9:32

Summary Pro's Highly fault tolerant, Realitively easy to write arbitrary distributed computations over very larege amounts of data, Mr framework removes burden of dealing with failures from programmer.

Con's Schema embedded in the application code, a lack of shared schema makes sharing dat impossible.  (And the slide changed before I could get that down).

Facebook and Yahoo reached a different conclustion about the declaritive language like SQl than Google.   Facebook when with Hive and Yahoo when with PIG.  Both use Hadoop MapReduce as a target language.

We now see an example to find the source ip address that generated the most ad revenu along with it's average.  The syntax is very java like.  

MapReduce is great for doing parallel query processing, a join takes 5 pages using PIG.  The Facebook guys can do the same thing in 5 lines using HIVE.  However complaints from the Facbook guys MapReduce was not easy for end user, users ended up spending hours if not days to write programs for even simple analyses.  of 150K Jobs facebook runs daily only 500 are MapReduce.

Goals of Hive and HiveQL in an attempt to provide easy to use query language.

Table's in Hive like a relational DBMS, data stored in tabloes.  Richer column types than SQL they have primative types ints, flaots, strings, dates & Complex types assicate arrays, lists, structs.


We are looking at Hive Data Storage like a parallel DBMS, Hive Tables can be partitioned.  When you partition a Hive type table by an attribute, the name of the file becomes that attribute name, so it compresses the data as it stores it.


We are getting a breakdown of queries, showing data seeks across partitioned data and the way it is optimized if you are looking for the attribute value.

Keep in mind there is on Cost Based query optimizer, the statistics are lacking at best. 

We are going to look at some PDW v Next vs. Hadoop, for bench marks.   600 GB 4.8 billion rows.

Doing a scan select (*) count from lineitem, then an aggregate with a group by.  It took Hive x4 longer than PDW to return the set.

Now we are going to get more complicated, now we will do a join between the two tables with a partition on the key values.   PDW is x4 times faster with with partitions PDW is x10 faster than Hadoop.






Update 9:22

MapReduce Components Coordiantes all M/R Tasks and events, manages job queues and schedules.  So how does this work with HDFS.  There is a Job Tracker M/R Layer and a HdFS layer.  Job Tracker keeps track of what jobs are running.  Each TaskTracker maps to the Datanode.  On the data node there is a the data that is managed by a job tracker.


He want's OOHHH's and AHH's for the next slide it took 6 hours to make :).  Each row of Data on a Node has 2 tuples.  The example is customer, Zip Code, Amount.  He moves the data which is located y the Map Task.  Our user want's to query certain users and do a group by zip code.  He shows across the Named Nodes how the data is orginized.

The mappers per node have data duplication, and unique data each.  They produced 3 output buckets each by hash value.   Now we go from Mappers to Reducer.  The blocks are stored in the local file system, they are not placed back into HDFS.  The Reducers have to pull the data back to 3 different nodes in our cluster.   They now sort and seperate the data by hashed zip code.  The data may have some duplicated groups by this point.  But my guess would be they change by the end.  

The Reducer now sorts them by hash of the Zip Code.  It then Sum's all similar hashes and returns the data.  In general the actual number of Map tasks is generally amde much larger than the number of nodes used.   Why it Helps deal with data skew and failure.  If it sufffers from skew or fails the uncompleted work can easily be shifted to another worker.

It is designed to be fault tolerante incase a node fails.


Update 9:12

When the client wants to write a block the named node tells it where to write it.  It balances writesand writes by telling them where to go.  The reverse happens when a file want's to read it. the NameNode acts as an index telling the reads where to go to find the data.

Data is always checksumed when it is read and placed on disk to check for corruption.  They plan for the Drives to fail, the writes to fail on a Rack, and switches to fail, Main Node failures, and data center failures.

When a data node fails the main node detects that and says what data was stored on the data node.  The blocks are then replicated from other copies.   If the Main Node fails, the Backup Node Can failover, there is automatic or manual failover available.   The backup node will rebalance the load of Data.  

So a quick sumary this is Highly scalable, 1000's of nodes and massive 1000s of TB files, Large Block Size to maximize sequential I/O performance.   No use of mirroring or RAID, but why?  Because it was supposed to be low cost.  And they wanted to reduce costs.  They use one mechanism triply replicated blocks to deal with a wide variety of failures.

The Negative?   Block locations and rcord placement is invisble.  You don't know where your data is!!!!

The MapReduce is next.  The user writes a query the system write's a map function and then a reduce function.  They take a large problem an divide it into sub-problems.

Perform the same function on all sub-problems and combine them.




Update 9:02

Google started Hadoop, they needed a system that was fault tolerant, and could handle an amazing about of Click stream data.

The imporitant components  Hadoop = HDFS  & MapReduce  HdFS=the file system MapReduce is the process system.

What does this offer Easy to use programming paradigm.  Scalibility and high degree of fault tolerance, Low up front software cost.

The stack looks like HDFS, Map/Reduce, Hive & Pig sql like languages, Sqoop package for moving data between HdFS and relational DBMS's.

Underpinnings of the entire Hadoop ecosystem.  HDFS design goals, Scalable to 1000s of nodes, Assume failures (hardware and software) are common, Targeted towards small numbers of very large files, write once then read.

We are looking at an example of a file being read into Hadoop.  The file is moved into 64 MB Blocks, each block is stored as a seperate file in the local file system eg NTFS.  Hadoop does not replace the Windows File system, it sits on top of it.

When the Client writes and loads these, the blocks are distributed amongs the nodes (for the example he is susing a replication factor of 3).  As he places more blocks they are scattered amongst nodes.

Default placement policy:  The first copy is written to the node creating the file.  Second Copy is written to a Data node within the same rack.  The third copy is written to a dat node in a different rack, to tolerate switch failures, and potientially in a different data center.

In Hadoop there is a NameNode - one instance per cluster.  Responsible for filesystem metadata operations on a cluster replication.   There are backup nodes and DataNodes.    Named nodes are the Master, they are backed up.  The Named node is always checking the state of the DataNode's.  That is it's primary job.  It also balences replication and does IsAlive and Looks Alive File.





Update 8:52

Ebay has 10PB on 256 Nodes using Paralled database system.  They are the Old Guard.  Facebook a NoSQL System with 20 PB on 2700 nodes.  Bing uses 150 PB on 40K nodes.  They are the Young Turkey's.   WOW, we uout that Bing uses NOSQL. 

It is importiant to realize that NO SQL doesn't mean No To SQL.  It means Not ONLY SQL.   Why do people love NOSQL.   More Data Model Flexibility, Relaxed Consistency models such as eventuall consistency.  They are willing to trade consistency for Availabily.  Low upfront software costs Never learned anything but C/Java in school.

He brings up a slide to show Reducing time to insight, by displaying the way we capture, etl, and load data into data warehouses.

NoSQL want the data to arrive, no cleansing, no ETL, they want to use it and analyze it where it stands.

What are the Major Types of NOSQL Systems.

Key/Value Systems MongoDB, CouchBase, Cassandra, Windows Azure.  They have a Flexible data model such as JSON.  Records are sharded across nodes in a cluster by hashing a key.  This is what PDW does, and we call it partitioning.

Hadoop get's a big plug.  Microsoft has decided this is the NOSQL they want to go to.   Key/Value Stores are NOSQL OLTP.

Hadoop is NOSQL OLAP.  There are two universed and they are the new Reality.  you have the Unstructured NoSQL Systems.  And the Structured Relational DB Systems.

The differences Relational Structured, ACID, Transactiosn, SQL, Rigid Consistency, ETL, Longer time to Inisght, Mature, Stability Efficiency.  

NoSQL Unstructured, No ACID, no Consistency, no ETL, not yet matured.

Why Embrasse it?  Because the world has changed.  David remembers the shift from the Networked systems of the 80's to today.  And this is now a shift for the Database world where both will exist.

SQL is not going away.  But things will not go back to the same, there will be a place at the table for both.




Update 8:42

Rick plugs feedback forms.  And today is the last day to buy the Summit DVD's for $125.   That breaks down to .73 a session.

Rick Introduces Dr. DeWitt and leaves the Stage.  Dr. DeWitt introduces Rimma Nehme who helped him develop his presentation.  She also helped develop the Next-Generation Query Optimizer for Parralel Data Warehouse.


Dr. DeWitt is telling us about his lab, the Jim Gray lab.  Where he works every day, and Big Data.  This is about very very big data think PB's worth of data.

Facebook has a Hadoop cluster with 2700 Nodes.  It is massive.  In 2009 there was about a ZB worth of data out there.  ZB=100,000,000 PB.  35 ZB DVD's would streach 1/2 way from Earth to Mars.

So Why Big Data.  A lot of data is not just input.  It is Moble GPS Movements, Accustic Sound, ITunes, Sensors, Web Clicks.

Data has become the currency of this generation.  We are living in the Golden days of Data.  This wouldn't happen if we were still paying $1000 for a 100 GB Hard drive.




Update 8:32

Rick is announcing the Executive Committee for 2012.  He mentions that we are having a Board of Directors Election comming up.  Use the hashtag #passvotes to follow it on Twitter.


PASS Nordic SQLRally has SOLD OUT!  Then next PASS SQLRally will be in Dallas.  Rick plugs SQL Saturday, and all of the work we do.  The PASS Summit 2012 will be held November 6-9 in Seattle, WA.  You can register right now and get the 2 Day Pre-Con's and the Full Summit for a little over $1300.





Update 8:27

Rushabh is speaking about what Wayne means to him and the community, and presented him with an award for his community involvement.

The first thing that Wayne does is recognize Rick.

Wayne lists all of the different things that he's learned both Technical and Personal.  He gives a very nice speech, and leaves us laughing. 

Update 8:22

Buck Woody and Rob Farley have just taken the stage to sing a song from Rob's Lightning Talk earlier in the day!  Awesome.


I cannot describe how excellent that was.  But it will be live on the PASS website, and I'll toss the link out when it is.  That was truely worth watching over and over again.

A tribute to Wayne Snyder the Immediate PASS Presidient, who's term is ending, airs.  They are bringing Wayne to the Stage to honor him.  I work on the Nom-Com with Wayne he is a great guy and truely dedicated to PASS.