Friday, October 14, 2011

Day 6: Friday PASS Summit 2011 Keynote Live

Hello Dear Reader!  Welcome to the final day of the PASS Summit and the Keynote address by Dr. David J  DeWitt.  We will do this in the same way as the previous Two Days!  So hold tight we are about to begin.


This has been an amazing Keynote.   Dr. DeWitt is brilliant and he is echoing what we as a community are struggling with.  SQL is like our College Team, Favortie Sports Team, Favorite Actor, Favorite Period.  When we hear about other RDBMS's there is a knee jerked rivalry, however when we get together after the ribbing, Oracle and SQL DDA's live next to one another just fine.   There is a place for everything and this seems to be a way of Microsoft saying "We can and will work together". 

This is a stance many people have wanted to them to take for quite sometime, and it should open up a very interesting future for all of us!

Thanks for reading all this week!




Update 9:52

He hopes to make it to where PDW can handle both, he would rather do that than strap a rocket on a Turtle (the turle being Hadoop)

Update 9:42

There will be a command line utility called Sqoop for PDW v Next to move from Hadoop to RDBMS.   Even though the demo's favor PDW, Dr. DeWitt stresses there is a place for both and they are both here to stay.

We are looking at the Sqoop Limitations for the Sqoop library.  In the example shown Sqoop could cause multiple table Scan's.

We will both have Structured and Unstructured Data.  Moving to the 20th Century Why not build a data management system that can query across both universides.   He terms's it an Enterprise Data Manager.  Dr. DeWitt is trying to build one right now in his lab.

Update 9:32

Summary Pro's Highly fault tolerant, Realitively easy to write arbitrary distributed computations over very larege amounts of data, Mr framework removes burden of dealing with failures from programmer.

Con's Schema embedded in the application code, a lack of shared schema makes sharing dat impossible.  (And the slide changed before I could get that down).

Facebook and Yahoo reached a different conclustion about the declaritive language like SQl than Google.   Facebook when with Hive and Yahoo when with PIG.  Both use Hadoop MapReduce as a target language.

We now see an example to find the source ip address that generated the most ad revenu along with it's average.  The syntax is very java like.  

MapReduce is great for doing parallel query processing, a join takes 5 pages using PIG.  The Facebook guys can do the same thing in 5 lines using HIVE.  However complaints from the Facbook guys MapReduce was not easy for end user, users ended up spending hours if not days to write programs for even simple analyses.  of 150K Jobs facebook runs daily only 500 are MapReduce.

Goals of Hive and HiveQL in an attempt to provide easy to use query language.

Table's in Hive like a relational DBMS, data stored in tabloes.  Richer column types than SQL they have primative types ints, flaots, strings, dates & Complex types assicate arrays, lists, structs.

We are looking at Hive Data Storage like a parallel DBMS, Hive Tables can be partitioned.  When you partition a Hive type table by an attribute, the name of the file becomes that attribute name, so it compresses the data as it stores it.

We are getting a breakdown of queries, showing data seeks across partitioned data and the way it is optimized if you are looking for the attribute value.

Keep in mind there is on Cost Based query optimizer, the statistics are lacking at best. 

We are going to look at some PDW v Next vs. Hadoop, for bench marks.   600 GB 4.8 billion rows.

Doing a scan select (*) count from lineitem, then an aggregate with a group by.  It took Hive x4 longer than PDW to return the set.

Now we are going to get more complicated, now we will do a join between the two tables with a partition on the key values.   PDW is x4 times faster with with partitions PDW is x10 faster than Hadoop.

Update 9:22

MapReduce Components Coordiantes all M/R Tasks and events, manages job queues and schedules.  So how does this work with HDFS.  There is a Job Tracker M/R Layer and a HdFS layer.  Job Tracker keeps track of what jobs are running.  Each TaskTracker maps to the Datanode.  On the data node there is a the data that is managed by a job tracker.

He want's OOHHH's and AHH's for the next slide it took 6 hours to make :).  Each row of Data on a Node has 2 tuples.  The example is customer, Zip Code, Amount.  He moves the data which is located y the Map Task.  Our user want's to query certain users and do a group by zip code.  He shows across the Named Nodes how the data is orginized.

The mappers per node have data duplication, and unique data each.  They produced 3 output buckets each by hash value.   Now we go from Mappers to Reducer.  The blocks are stored in the local file system, they are not placed back into HDFS.  The Reducers have to pull the data back to 3 different nodes in our cluster.   They now sort and seperate the data by hashed zip code.  The data may have some duplicated groups by this point.  But my guess would be they change by the end.  

The Reducer now sorts them by hash of the Zip Code.  It then Sum's all similar hashes and returns the data.  In general the actual number of Map tasks is generally amde much larger than the number of nodes used.   Why it Helps deal with data skew and failure.  If it sufffers from skew or fails the uncompleted work can easily be shifted to another worker.

It is designed to be fault tolerante incase a node fails.

Update 9:12

When the client wants to write a block the named node tells it where to write it.  It balances writesand writes by telling them where to go.  The reverse happens when a file want's to read it. the NameNode acts as an index telling the reads where to go to find the data.

Data is always checksumed when it is read and placed on disk to check for corruption.  They plan for the Drives to fail, the writes to fail on a Rack, and switches to fail, Main Node failures, and data center failures.

When a data node fails the main node detects that and says what data was stored on the data node.  The blocks are then replicated from other copies.   If the Main Node fails, the Backup Node Can failover, there is automatic or manual failover available.   The backup node will rebalance the load of Data.  

So a quick sumary this is Highly scalable, 1000's of nodes and massive 1000s of TB files, Large Block Size to maximize sequential I/O performance.   No use of mirroring or RAID, but why?  Because it was supposed to be low cost.  And they wanted to reduce costs.  They use one mechanism triply replicated blocks to deal with a wide variety of failures.

The Negative?   Block locations and rcord placement is invisble.  You don't know where your data is!!!!

The MapReduce is next.  The user writes a query the system write's a map function and then a reduce function.  They take a large problem an divide it into sub-problems.

Perform the same function on all sub-problems and combine them.

Update 9:02

Google started Hadoop, they needed a system that was fault tolerant, and could handle an amazing about of Click stream data.

The imporitant components  Hadoop = HDFS  & MapReduce  HdFS=the file system MapReduce is the process system.

What does this offer Easy to use programming paradigm.  Scalibility and high degree of fault tolerance, Low up front software cost.

The stack looks like HDFS, Map/Reduce, Hive & Pig sql like languages, Sqoop package for moving data between HdFS and relational DBMS's.

Underpinnings of the entire Hadoop ecosystem.  HDFS design goals, Scalable to 1000s of nodes, Assume failures (hardware and software) are common, Targeted towards small numbers of very large files, write once then read.

We are looking at an example of a file being read into Hadoop.  The file is moved into 64 MB Blocks, each block is stored as a seperate file in the local file system eg NTFS.  Hadoop does not replace the Windows File system, it sits on top of it.

When the Client writes and loads these, the blocks are distributed amongs the nodes (for the example he is susing a replication factor of 3).  As he places more blocks they are scattered amongst nodes.

Default placement policy:  The first copy is written to the node creating the file.  Second Copy is written to a Data node within the same rack.  The third copy is written to a dat node in a different rack, to tolerate switch failures, and potientially in a different data center.

In Hadoop there is a NameNode - one instance per cluster.  Responsible for filesystem metadata operations on a cluster replication.   There are backup nodes and DataNodes.    Named nodes are the Master, they are backed up.  The Named node is always checking the state of the DataNode's.  That is it's primary job.  It also balences replication and does IsAlive and Looks Alive File.

Update 8:52

Ebay has 10PB on 256 Nodes using Paralled database system.  They are the Old Guard.  Facebook a NoSQL System with 20 PB on 2700 nodes.  Bing uses 150 PB on 40K nodes.  They are the Young Turkey's.   WOW, we uout that Bing uses NOSQL. 

It is importiant to realize that NO SQL doesn't mean No To SQL.  It means Not ONLY SQL.   Why do people love NOSQL.   More Data Model Flexibility, Relaxed Consistency models such as eventuall consistency.  They are willing to trade consistency for Availabily.  Low upfront software costs Never learned anything but C/Java in school.

He brings up a slide to show Reducing time to insight, by displaying the way we capture, etl, and load data into data warehouses.

NoSQL want the data to arrive, no cleansing, no ETL, they want to use it and analyze it where it stands.

What are the Major Types of NOSQL Systems.

Key/Value Systems MongoDB, CouchBase, Cassandra, Windows Azure.  They have a Flexible data model such as JSON.  Records are sharded across nodes in a cluster by hashing a key.  This is what PDW does, and we call it partitioning.

Hadoop get's a big plug.  Microsoft has decided this is the NOSQL they want to go to.   Key/Value Stores are NOSQL OLTP.

Hadoop is NOSQL OLAP.  There are two universed and they are the new Reality.  you have the Unstructured NoSQL Systems.  And the Structured Relational DB Systems.

The differences Relational Structured, ACID, Transactiosn, SQL, Rigid Consistency, ETL, Longer time to Inisght, Mature, Stability Efficiency.  

NoSQL Unstructured, No ACID, no Consistency, no ETL, not yet matured.

Why Embrasse it?  Because the world has changed.  David remembers the shift from the Networked systems of the 80's to today.  And this is now a shift for the Database world where both will exist.

SQL is not going away.  But things will not go back to the same, there will be a place at the table for both.

Update 8:42

Rick plugs feedback forms.  And today is the last day to buy the Summit DVD's for $125.   That breaks down to .73 a session.

Rick Introduces Dr. DeWitt and leaves the Stage.  Dr. DeWitt introduces Rimma Nehme who helped him develop his presentation.  She also helped develop the Next-Generation Query Optimizer for Parralel Data Warehouse.

Dr. DeWitt is telling us about his lab, the Jim Gray lab.  Where he works every day, and Big Data.  This is about very very big data think PB's worth of data.

Facebook has a Hadoop cluster with 2700 Nodes.  It is massive.  In 2009 there was about a ZB worth of data out there.  ZB=100,000,000 PB.  35 ZB DVD's would streach 1/2 way from Earth to Mars.

So Why Big Data.  A lot of data is not just input.  It is Moble GPS Movements, Accustic Sound, ITunes, Sensors, Web Clicks.

Data has become the currency of this generation.  We are living in the Golden days of Data.  This wouldn't happen if we were still paying $1000 for a 100 GB Hard drive.

Update 8:32

Rick is announcing the Executive Committee for 2012.  He mentions that we are having a Board of Directors Election comming up.  Use the hashtag #passvotes to follow it on Twitter.

PASS Nordic SQLRally has SOLD OUT!  Then next PASS SQLRally will be in Dallas.  Rick plugs SQL Saturday, and all of the work we do.  The PASS Summit 2012 will be held November 6-9 in Seattle, WA.  You can register right now and get the 2 Day Pre-Con's and the Full Summit for a little over $1300.

Update 8:27

Rushabh is speaking about what Wayne means to him and the community, and presented him with an award for his community involvement.

The first thing that Wayne does is recognize Rick.

Wayne lists all of the different things that he's learned both Technical and Personal.  He gives a very nice speech, and leaves us laughing. 

Update 8:22

Buck Woody and Rob Farley have just taken the stage to sing a song from Rob's Lightning Talk earlier in the day!  Awesome.

I cannot describe how excellent that was.  But it will be live on the PASS website, and I'll toss the link out when it is.  That was truely worth watching over and over again.

A tribute to Wayne Snyder the Immediate PASS Presidient, who's term is ending, airs.  They are bringing Wayne to the Stage to honor him.  I work on the Nom-Com with Wayne he is a great guy and truely dedicated to PASS.

Thursday, October 13, 2011

Day 5: Thursday Summit Keynotes 2011, PRIVATE OLTP Cloud Appliance Announced!

Hello Again Dear Reader.   I'm at the Blogger's Table today!  The Keynote has started and I'll jump right in.

Update 9:55

We are discussing the Hybrid IT view that Microsoft is pushing.  The integration between all of their products and the Cloud is very apparent.  The code bases are merging, and this will only continue as we progress.

They want to streamline the UI tools, make it so all the products "Just Work" together.  This has been a very interesting Keynote.  Not as big as the announcements yesterday, but some subtle announcements that I think will have HUGE impact over the next couple years.

Quentin Thanks us and leaves the stage as we watch a video playing on the Appliances.  Now on to the Sessions!

Update 9:45

Nice demo the Azure Platform is being used to provide content for Samsung TV in order to have live web applications pushed down to your webenabled  TV's

Cihan Biyikoglu is taking the stage to talk about Federations for SQL Azure.  This allows the Sharding Patterns to be brought to SQL Azure.  This will allow us to access 100's of Nodes and scale out for large scale applications.

Example Blogs 'R' Us, and the unpridictable traffic and continously changing hardware requirements.  We can re-partition this on the fly. 

Hope you like the Windows phone 7.  Azure Market Place, Windows 8, all have a consolidted UI.  You can add capacity and re-align your Cloud Based Database Instances on the fly to support user patterns.  This is some very interesting stuff.

150 GB Azure Databases and Federation before the end of the year! 

Update 9:35

We have a lot of e-books on a drive, we have a full-text index set up on them using sematic search.  We are now getting a closer look at how quickly we can retrieve articles, and weight for returned terms.  This changes the way we will make internal searches for our Intranet Applications.  This is a very powerful tool for anyone looking to utilize that kind of functionallity to their companies internal network.

Next up Juneau & Optimized Productivity.  The goal is to unify deployment across Database & BI.  They just announced that there is a plug in to deploy the SQL Engine of Express Edition with .NET application code, within the same application deployment.

Now we are onto being able to Scale on Demand.  We are now going to get a Demo from Nicholas Dritsas Principal Program Manager for the SQL Server CAT Team on SQL Server Azure.   We are using SSMS 2012 to connect and deploy to SQL Azure.

There are differences in Azure between billing, size, and usage for Business and Web Editions.  For the Demo we are using Web Edition.  To Access this we can use the Web interface for Azure Manager, which Nicholas is demoing now.

We just got a Cloud Database opened in SSMS, the icon is slightly different very cool!  Backup and restore from the Cloud to your Datacenter!?  Yep, just announced.

Update 9:25

This is a game changer, Pragmatic Works, Verizon, and Accenture are all early adoptors.  Deep dive on this appliance right after the Keynote.  I just changed the session I'll be attending.

PDW's performance is imporving why?  Because it uses a Rules based Optimizer, where as the rest of SQL Uses a cost based optimizer.  Dr. David Dewitt has been helping them impliment a Cost Based Optimizer.  Big changes are a-comming.

They just announced Linnux driver support for SQL Server 2012.  Change Data Capture for ETL from Oracle to SQL Server 2012 and support for that is now announced!

We are now getting a Demo for semantic search.

Update 9:12

To discuss SQL Server Appliances Britt from the product team is coming up on stage.   To figure out the best black box to create.  We are now taking a look at the Dell Parrallel Data Warehouse.  We are discussing the way queries are Hashed and sent to Compute Nodes, accross multiple nodes and over 450 Cores.

We are also looking at the HP PDW machine.  With multiple racks this system can handle over 700 TB's of data.  We are looking at the HP Business Decision Appliance that comes pre-built with Sharepoint.  I've seen this first hand it takes 4 clicks to have up and running.  It is amazing.

We were just introduced to the HP Consolidating Appliance, it will be available in the next month.  This is the first Private Cloude appliance available on the market.  400 Disk Drives, 4 TB of RAM, over 300 cores.  This is a beast!

Update 9:07

We have cleansed our data and pushed the clean data back into the Date warehouse.  Now we reload the report and it only takes 1 second.  It was amazing the difference.

Now we are discussing Data Alerts in Reporting Services.  We pick our big customers, we pick customers that are over 1.5 million dollars in gross sales, every 1 day so we can send out Thank You's.

Now we are discussing Organizational Compliance.   The two bullets are Expanded Audit aka User defined Audits, Filtering and User-defined Server Roles.  This allows you to seperate DBA rights, from Auditing components.  

Now we are talking about Peace of Mind, Production-simulated Application Testing, using System Center for monitoring.

Update 8:57

We are about to get a demo for Data Quality Services.  It looks like we may get a ColumnStore Demo after all.

We get a view of a web application, the users are complaining about the performance.  It took about 30 Seconds to load.

Now we are creating a ColumnStore Index to fix performance.  But there is a still data issue.  For that we will be using Data Quality Services.  DQS uses Knowledge Bases in order to cleanse your data, you can create your own, or you can go to the Azure Marketplace and get a Knowledge Base to cleanse your record.  A quick example you may want to use is Address CASSing.

Update 8:52

Next up is Blazing-Fast Performance.   We have enhancements in RDBMs, SSAS, SSIS, and ColumnStore Indexes.  Quentin is now talking about the use of Vertipaq Compression, and how it is the backbone of ColumnStore Indexes.  You see that in place PowerPivot, and it will now be in SSAS and in the SQL Engine for use.

ColumnStore's will be treated as an additional Index type.  ColumnStore will always be a Non-Clustered Index.

Now we have moved on to Rapid Data Exploration, PowerView + PowerPivot, Administration through Sharepoint, and Reporting Alerts in Cresent/PowerView.   Self-Service BI and empowering the users are theme of this.

Now we are onto the BI Semantic Model, this is the model that actually runs for PowerView/Cresent and that PowerPivot utilizes.   We are now discussing Data Quality Services, next up Master Data Services.

Update 8:47

The Availability Group is completed, and the Dashboard is pulled up so you can see they way it is managed.  It integrates Policy Based Mangement to determine and display the health of the AlwaysOn Availability Groups.

Paul is now ebabling the 3 Active Secondaries, and enabling Read Only Secondaries, and showing how in an SSRS report he can set Application intent so the report would automatically go to the read only secondaries effectively offloading the read activtey for reporting with a couple click's.

Update 8:42

Bob is discussing their mission critical application that covers all the in's and out's (litterally) of their orginization.  And how essential it is to their company, and to governments.  Because they communicate with the Port Authority for each country, other wise an outage can backup ports all over the world.

Paul from the Product Management Team for SQL Server is invited to the Stage to tell us about the technical solution that Mediterranean has in Production. 

Paul is showing us a datacenter in NJ, and a particular SQL 2012 Instance running multiple database.  He's setting up an Avaiability Group between New Jersey and New York.    New Jersey is the Primary and New York the secondary, (not in real life). 
Update 8:37

Quentin said this is the largest release ever for SQL Server.   He cannot talk about all of the different features so he is picking his favorites. 
  • Required 9s & Protection (Always On)
  • Blazing-Fast Performance (Column Store)
  • Rapid Data Exploration (Data Explorer)
  • Managed Self-Service BI
He is discussing the architecture of AlwaysOn with Availability Groups.  Bob Erickson Executive VP from Mediterranean Shipping Company in over 142 Countries, over 184 Vessles in their fleet.  They are the #1 for Import and Export in the US and #2 in the World.

Update 8:32

Quentin Clark Corporate Vice President of SQL Server for the Microsoft takes the Stage.   He recaping the way that SQL 2012 fits into the overall dataplatform Vision that Microsoft has, that was discussed by Ted yesterday.

The Vision
  • Any Data, Any Size, Anywhere
  • Connecting with the World's Data
  • Immersive Experiences Whereever You Are
Foundation for the Future
  • Mission critical Confidence
  • Breakthrough Insight
  • Cloud on Your Terms

Update 8:27

Bill is reviewing the budget numbers for PASS.   They have started a feedbacksite for PASS and are taking suggestions.  The PASS Elections are comming up. 

Bill introduces Quentin Clark our Microsoft Keynote Speaker.   A video is showing of attendees talking about what we've learned and what we are hoping to bring back from the Summit.

Update 8:22

Lori Edwards was just announced as our PASSion Award Winner for 2011.   Great job Lori!!!

Update 8:17

Bill Graziano takes the stage today is SQL Kilt day.  Bill says Hi to his Mom and Dad that are watching, he had the SQL Kilt wearers.

We are recognizing Outstanding Volunteers.

Tim Radney

Jack Corbett

Both are amazing men and I'll need to come back and write more later about this.

Wednesday, October 12, 2011

Day 4 Summit Keynotes Live, DENALI HAS A NAME! SQL Server 2012!

Hello Dear Reader, this is my first live blogging.  I'm going to do this a little different, I'll be writting in reverse.  I owe you a blog on Day 3 and another lesson in Compression for SQL U, and they will be coming but for now the Keynote is about to begin for the opening day .  This will stay at the top, but I'll be bloggin in reverse order, so start from the bottom and scroll up to see the updates as they are posted.

Update 10:03

Ted is wrapping this out, he talks about how the community is essential to move this forward.  About how we could take the worlds data and use it immedieately.

1st half of the next Calendar we will have SQL Server 2012.  He plugs the next two days of Keynote sessions. 

Thank You to everyone for comming that wraps it up

Update 9:58

The Windows 8 Tablet is being demoed showing PowerView/Cresent!  It is completely Dynamic and internactive.

Ted thanks Amir for taking the stage.

Update 9:53

Amir starts talking about how if you use your data properly you can use it to tell a story.  That story is what is going on in your business, and that every business would want to know their own story by the data.

Yesterday they confirmed that they would be able to do Export to PowerPoint from Cresent/PowerView!

Now Amair is showing that Cresent/PowerView is going to work on a Windows 7 Phone, and it is fully functional! 

Wow so if your company goes with a Windows 7 phone you can use your self service BI on the Windows phone.  Now he is Demoing a PowerPoint on the IPAD 2.  He is now demoing PowerView on the Android Samsung Tablet.

Update 9:48

We continue on the Demo, there are a lot genera's to hop from, and it is very interestign to see how internactive this data is.

Amir said Samuel Jackson is no Tom Hanks, but Samuel L Jackson has the most gross ever, Ted cringes.  He doesn't know who Alan Richman is and why he is in so many high grossing power house movies.  He uses the data to show Alan is in all the Harry Potter and the first Die Hard movie.

Samuel L Jackson has twice as many movies as most actors.  The only Actor with more movies than Samuel L Jackson John Wayne.  Ha!  Nice funny demo with good interplay.

Update 9:43

Our Demo is going to come from Real Data with Cresent/Powerview from, one of my favorite movie sites, and they are owned by

He is showing the Cresent out lyers for what the number of movies and their sales in a scatter chart.   The top 2 outlyer's Computer Animation kids, and Comedy.  He uses the highlighting function in PowerView, and in comedy Meet the Focker's is the #1 comedy of all time according to the data by sales & profit.  He does a breakdown of the evolution of comedies by year and timeline.   He does a slicer that on different sales go tive us a card deck view by different profit margines.  

He shows how profit margine increases as Hollywood adopted BI, and could look at data like this, drawing quite a few laughs.

He shows how when Toy Story launched from 1995 on computer animation ruled the roost.  The most impressive thing is he did all that without touching the keyboard.  PowerView is very dynamic on what the user can do with the data.

Update 9:38

Because Data Explorer is a service and the data follows us in the Cloud, this data is mobile and available very quickly. 

tim and Nino leave the Stage and Ted returns.  Just an FYI this is code named "Data Explorer" in Azure Labs. 

Ted is talking about how this ties into Microsoft's Vision for Empouring all users through the tools they use every day.   We are discussing Self Service BI, how delivering this to the end user empowers them, and gives IT a greater roll in Governance.

He is building towards an anouncment, refrencing Cresent/Power View, PowerPivot, and mobile devices.  Ted is talking about how he used his mobile Windows Phone to look over his slide deck for today last night while he was out getting a coffee. 

He welcomes Amir Netz Technical Fellow to discuss Unlock New Insights, Anywhere.  Ted is discussing what it means to be a Technical Fellow.  Amir has a distingusihed career in BI.  "He was in BI  at Microsfot before we had a BI Stack."   With that Ted leaves the stage and turn's it over to Amir.

Update 9:31

They are now discussing what the datasets are that were brought in.  There are Bing Services will add refrence data for phone books, they are overlaying the number of High Schools within a 1 mile radious because teenagers like frozen yogurt.

I get the demographic information, however people are laughing because a demo where you track High Schooler's and where they are is a little creepy.   I get it professionally, but they should have done a different age segment.

The point is they are showing how actual Service Calls to Microsoft Applications will be able to be used to provide demographic information to provide analysis on the Azure Market Place.

Now they are using PowerPivot to pull down the information and using Sharepoint to make this available to many different users.  They were able to pull disparet datasources to determine the Shopping Centers that teenagers are most likely to shop at, and that is where the targeted location of the store should be.

Update 9:23

The demo is Hontoso Frozen Yogart to figure out the best location for their next store.   They got a SQL Azure Databae and got a normalized score about how their Stores are performing by profit.

They are extracting the data and they will use the Data Explorer to interface with that data, and then return highly relavent recommendations of what they should do next.  

He added an Excel spreadsheet of a list of Shopping Centers for the area.   He hit a button called Mashup, that will overlay the data on top of one another and start making comparisions.  There is a rank field called relative performance value.

Tim explains to us that Overlays are to the business world what Joins are for us DBA folk.  The Azure Marketplace has recommended some data that would show demographic information that would help with the decision.  Another Overlay/Join is done, and we now are starting to get recommendations.

We made a 3 way join against Excel, Azure, and the Marketplace in very little time.

Update 9:18

Ted is talking about how we integrate media from disparet sources.  Data Quality Services, Family Data, reference Data, Weather & Climate, Health and wellness, and much more all available via the Azure Marketplace.

Microsoft's Vision, being able to Enrich your data with the world's data using "Decision Engines", Empower developers to build new services and applications, offer a Vibrant marketplace ecosystem for the World's Data.

Our next demo is SQL Data Explorer.  He welcomes Tim Mllalieu and Nino Bice to the stage.   We are looking at SQL Azure Labs, it looks a lot like Windows 8 or the current Windows Phone.

Update 9:13


He is a web admin who wants to monitor his trafic.  The solution, Hadoop on Windows Cluster.  He starts showing us HiveQL a query that is similar to Java that will run against Hadoop on Windows Server.  

The Hadoop console is basically the command line.   He's telling us about the millions of rows that his multi-node cluster is processing along.   He wants to figure out a better way to see this data.  His connector of choice?  PowerPivot for Excel using the Hadoop conectors that are available now.

He pulls it all down, and shows us one of the workbooks that are coming with PowerPivot Denali.  He's joining this data against SQL Server, and Azure Market place data.

He's showing us the data about people hitting this website, by language.   He now switches over to Sharepoint to show us the server that he is running.   The report refresh took an hour, he is demoing how when you write a report and post it on Sharepoint, it will continue to refresh from disparet datasources and become something a business can rely on.

Ted Thanks Denny and Denny leaves the stage.

Update 9:08

Horton Works is taking the experts that help Google solve it's problems and brought them together.  He believes that Hadoop could be storing 1/2 the world's data in 3 years.   He is discussing working with other companies and that hortonworks will be working to expand this use.

This is an Open Source Community project.  Eric is very excited that they SQL Server community and all of our activitiy could be providing feed back to this project.

Eric Thanks us all and leaves the stage.

Now Denny Lee, a Principal Program Manager for the SQL Server team, is coming on stage next to discuss Activating New Types of Data. 

Denny takes the stage and ask's if we are ready for some Demo's!  YES! He says not yet.

Update 9:03

He is talking about participating in the Apache Program.  To make sure that SQL interfaces with it in the best way.


  • Apache Hadoop-based distribution for Windows Server and Windows Azure
  • ODBC Driver and Add-in for Excel, both for Apache Hive
  • JavaScript Framework for Hadoop
  • SQL Server and SQL Server Parallel Data Warehouse connectors for Apachee Hadoop

The other Announcement, they have formed a partnership with Horton Works.  He just welcomed Erick Baldeschwieler the CEO of Horton Works to the stage to talk about the partnership.

Where would you see or know Hadoop, that is the back end for Google!  This is big!

Update 8:58

He is pointing out
  • 40% Data growth rate
  • 15 out of 17 business sectors have more data sotred per company than the US Library of Congress
The vision going forward is in 3 points.

Manage and Process all types of data, mission criticle scale from on premises to cloud, Common management and development between SQL Server and SQL Azure.

We are discussing Big Data now.  Here is how he is defining it.

  • Large Data Volumes
  • Traditional and non-traditional data sources
  • New technologies and New Economics
  • New Insights
He points out that Microsoft because of Search, Email, and all thier other offerings they have over 700 PB's of DATA! 

Update 8:53

"We believe the Cloud is a hybrid work place.  You will want to keep things in your data center on the ground, and there are things you will want in the Cloud."   He just announced very slyly that they are MERGING the code base for the Cloud and Denali.  A move forward in that is SSRS in the Cloud, and features that they are delivering in Denali.
They are delivering in 3 area's
  • Mission Criticle Confidence
  • Breathrough insight
  • Cloud on your terms
Those 3 area's in Denali AlwaysOn for High Availability, Column Store Indexes for Denali first introduced in PowerPivot, Cresent which will be Power View in Denali.  Thoughts of the Cloud in what they do Juneau will be SQL Database Tools when Denali is released.

He points to some of the customers that have database's in production in Denali: Dell Pilot, Great Western Bank, and others.


Update 8:48

Ted is talking about what we will cover today, Denali and demos.

He is Thanking PASS for all we do to spread the word about SQL Server.  He points out that we offer over 400,000 Technical Hours, 79,000 Members, 300 Microsoft MVP's, and 233 SQL PASS Chapters.  This is the largest PASS Summit Ever!

"It's been a busy year since we last got together."  Yes it has.  Choice is a big theme, he is pointing out the advances made in releases for Hardware Devices like Parallel Data Warehouse, the releases in SQL Server 2008 SP 3, 2008 R2 SP 1, the Cloud, Azure Marketplace, Management Portal for SQL Azure.

Update 8:43

He is thanking Microsoft Dell, CA, EMC, Expressor, Fusion-IO, and HP for being sponsors for the Summit.  AND A BIG THANK YOU INDEED!  We have the largest collection of vendors ever at a PASS Summit.

We are talking about the SQL Server MVP Deep Dives 2 Book that has been on sale exclusively at the Summit.  Next week it opens up every where over 53 MVP's contributed to this.  One of the great things about this book is that the MVP's do not recieve a dime for this book.  Every penny spent goes to Operation Smile to provide dental work for children in 3rd world countries.  Great Cause.

Rushabh's rapping up now, and introducing Sr. Vice President of Business Development Platform Division Ted Kummert, he owns all the applications from the Database, Application, and the way they integrate.  From the Cloud to the Ground.

Ted said that SQL Server is the most widely adopted Database platform in the world, from Ground to Cloud.  Quick dig at Oracle who just hit the cloud, and Microsoft has been ther 18 months already.  Nice.

Update 8:38

A picture of our bloggers that are at the blogger table today is posted.  a lot of amazing people, Denny Cherry, Brent Ozar, Andy Warren, Jorge Segarra, just an amazin number of people.

I'll try to get the full list out later with links to thier blogs and twitter handles.

We are talking about Twitter and the way that we use that so heavily, some of the # Hashtags, or search terms that we have created.  

Right now he is posting information about the SQL CAT Team the clinic hours that they have open, and the different sessions that they have.  At PASS this year we have.
  • 93 MVP's
  • 11 MCM's
  • 57 Microsoft Employees
  • 11 SQL CAT Presentations

The importiance of Community and building connections has been a common theme throughout the Summit.  The First Timers Program and the session with Don Gabor (Awesome and I haven't done the recap yet), had over 800 people attend it yesterday evening until 8 pm.

Update 8:28

This year PASS has given 430,000 hours of free traing, we have 80,000 members and 1 Global Region.  The goal is to get to 1 Million hours of technical training, 250,000 members and 5 Global Regions.

To help achieve this goal 3 international Board of Director seats were added to help influence this from Denmark, Germany, and I missed the third.    PASS was a partner in SQL BITS in the UK and they felt that was a very large success.

Update 8:23

Rushabh Mehta the PASS President just took the stage.  Apparently Twitter is tipping over!

Rushabh is thanking the Board of Directors, the Committie members, and making sure we know these are the people that we should approach with ideas.