Hello Dear Reader!
Twitter is a fantastic tool if you use SQL Server. It is a great way to network, find
presentations, interact with experts, and #SQLHelp offers some of the best
technical forum conversation on the subject.
I was on Twitter and my friend Kenneth Fisher asked a question.
Kenneth’s question was interesting. Let’s define a couple terms up front. HA in this case is High Availability. It is the capability for a database or data
store to maintain availability and connectivity to a Graphical User Interface,
Web Service, or some other data consuming application despite a localized
outage or failure of the primary system.
High Availability is different from Disaster Recovery, or
DR. Disaster Recovery is required when a
disaster, natural or man-made, prevents the use of resources within a data
center and the hardware within to support regular business processes. In this blog, we will be addressing HA only
as that was what Kenneth had asked about.
Asked for further explanation Kenneth didn’t need HA at his
environment, more on this in a moment, he was looking for a rounded point of
view. I had a couple replies and then
started direct messaging, DM`ing, Kenneth to share my insight. This is a conversation you must have with the
business.
A keen understanding
of how business logic translates into technology workflows requires both IT and
business leaders making the right decisions together. Over this series I’m going to give you a
couple examples on when HA was needed. Some
in which HA was not. Later we will discuss different HA technologies and
understanding when each one is right for you.
First I will define some key things to cover in a High Availability
Conversation between the business and IT.
THE HIGH AVAILABILITY
CONVERSATION
There are three types of conversations: technical, business,
and budget. Business comes first. Our first goal is to understand what the
business is facing. Some of this you may
know in advance. If so review the
information to ensure business and IT communication.
·
Describe the business process –
o
What product or service are we providing?
o
What type of employees participate in the
process?
o
What does each employee need to get from the
process?
·
Diagram a workflow of how a customer interaction
occurs
·
Add the steps that are IT specific
·
Critical days, weeks, months, holidays, or times
of the day for the specific business process
·
Understand any current paint points
First make sure you understand the business process
surrounding any technical system. Draw
shapes on dry erase boards, connect them to business process, make sure you
understand who the business owner is for each technical system. Force
self-examination and business awareness.
Add your IT system to the diagram.
Highlight business and IT dependencies.
Talk to teams to understand what happens when a system goes down.
You can use Viso, PowerPoint, a dry erase board, or whatever
you like. Make a visual representation based on the findings from the
meeting. Confirm that everyone can agree
on it, or use it to solicit their teams for additional vetting. You
may not get it right on the first run.
That is okay. This could be an
interview process depending on the complexity of your IT environment. There are
some standard technical terms that you should use
·
SLA – Service Level Agreement. This is the expected agreement between the
business and IT for overall system availability. This may represent business hours. In some cases, Service Providers may have
contractual SLA’s. If your business has
an SLA with customers there could be a financial penalty for not meeting the
SLA.
·
RPO – Recovery Point Objective. This defines the amount of data that can be
lost on a system in and not cause impact or harm to the business. The point at which all data or transactions
must be recoverable. The price of any
solution increases with lower RPO’s.
·
RTO – Recovery Time Objective. This is the amount of time it takes to get a
system online in the event of an outage.
The time to recover will vary greatly on size, volume, and type of data
discussed. If you seek to reduce
recovery time there are strategies or technical solutions that can be
utilized. The complexity of these
solutions can add to the overall cost.
REFINING HA: USING
THE THREE BEARS TECHNIQUE
After the first business meeting the technical team should
meet to discuss their options. I like to
develop three options when working with IT Managers. This is where we, the technical staff, align
a few things.
·
Proposed Technical Solution
·
Cost of Technical Solution
·
Cost to Maintain Technical Solution
·
Migration from the existing platform to the New
There are many ways to address HA. Some could be solutions using Third party
vendor products that you may already utilize, others can be hardware or
virtualization options, and other can be database specific. IT the architecture, size of the corporation,
and business need for HA will shape these decisions.
Notice I said three options for IT Managers. When you meet with the business you should
have a cohesive vision. When I started
my IT career finding a manager with IT experience was extremely rare. It is increasingly becoming more common. As IT knowledge permeates the business world
it will be more common to give the three options to the business for
collaboration. If you are lucky you are
in an environment like that today.
For this step you need to take all business requirements and
align them to technical justification for a solution. Producing three options may seem
difficult. Think of this like the old
children’s story Goldilocks and the Three Bears. One of these technical solutions will be too
cold, too much latency and not viable for 100% of business scenarios. One of these solutions will be too hot, this
is the spare-no-expense-throw-everything-you’ve-got-at-it-as-close-to-five-9’s-as-possible
option. One will be just right. It will fit the business needs, not be cost
prohibitive, and be maintainable solution.
You’re natural first reaction will be, “Why Three options! Why not just
propose the right one?”.
Valid point Dear Reader.
The reason is simple. This is a
high stakes mental exercise. The
business, jobs, profitability, security of our data, and many other things may
rest on our solutions. What makes a good
IT person great? Their mind. Our ability to make virtual skyscrapers and
complex virtual super highways out of thin air.
There may be a natural reaction to jump at a ‘best’ quick solution. Take your time, think outside of the
box. You will find if you push yourself
to consider multiple options, one may supplant the ‘best’ quick solution. While too cold and too hot may not work in
this situation, they may next time.
We’re done, right?
Wrong.
If we are changing an existing system, how are we going to get there? We need to understand if there are production outages windows we must avoid. For example in Retail or Manufacturing there are normally Brown out and Blackout windows for IT. Leading up to Black Friday, a United States shopping holiday, most Retail environments have IT Blackouts. This means no changes can be made in production unless they are to fix an existing bug. Everything must wait. Leading up to a Blackout window there is a Brown out window. During a Brown out no changes can be made to existing server infrastructures, this can limit the ability to allocate more storage from SAN’s, networking equipment, or networking changes in general.
If we are changing an existing system, how are we going to get there? We need to understand if there are production outages windows we must avoid. For example in Retail or Manufacturing there are normally Brown out and Blackout windows for IT. Leading up to Black Friday, a United States shopping holiday, most Retail environments have IT Blackouts. This means no changes can be made in production unless they are to fix an existing bug. Everything must wait. Leading up to a Blackout window there is a Brown out window. During a Brown out no changes can be made to existing server infrastructures, this can limit the ability to allocate more storage from SAN’s, networking equipment, or networking changes in general.
If we are performing an upgrade to use a new HA technology,
when will that happen? How do we
migrate? What are the steps? Have we coordinated with the application
development teams? They will need to
regression test their applications when a migration occurs. This means changes in the development,
QA/UAT, and eventually the production environments. Additional coordination will require the
server and SAN teams. Depending on the
size of your IT organization this may be a large effort or the act of shouting
over your cubical wall.
THE HA CONVERSATION II: THE SOLUTION & THE
RESPONSE
We’ve spoken with the business. We understand their process. We drew it up, passed it around, and found a
few new things along the way. We met as
an IT team. We produced tentative
solutions. Took the ideas to Management
and now we have a winner. We’ve
coordinated with our counterparts to make sure we knew what a tentative
timeline would look like. Our next
step? Present the solution to the
business.
After this meeting the goal is to move forward. To get the project going. We’ve spent a lot of work
on this, and you
want to see it implemented. Sometimes
that happens. Sometimes it doesn’t. There are a lot of factors that can go into
why something doesn’t happen. Because
they can be legion, and it distracts from our overall topic, I’m going to skip
why things do not happen. It is an
intriguing topic that I may blog on later.
When we move forward the goal it to do so in a collaborative
way. A good solution addresses business
needs, technical requirements, project timelines, and budget. There may be changes, there may be wrinkles
that occur along the way. What looked
good on paper doesn’t always translate to technical viability. It is important to maintain the communication
amongst key stake holders along the way.
Alright Dear Reader, this has been a good kick off towards
our topic. If you have questions or
comments please sound off below! I look
forward to hearing from you. Next we
will review a couple real-life examples of when HA was needed and when it
wasn’t. As always, Thank You for
stopping by.
Thanks,
Brad