Does managing database server farms differ from database instances?


We all experience how the number of database instances keep growing (especially SQL Servers) in the last few years.

Managing tens of server instances is commonplace, but more and more DBA’s are tasked with managing hundreds or even thousands of instances – and there is no end in sight for the growth of instances.

How does this affect us? Will the same tools and methods we used when we were managing a handful of instances scale up to meet the challenges of managing large database server estates, or do we need to develop new and more efficient methods and tools in order to meet the challenges?

In dbWatch we have spent a lot of time working on this type of questions, working closely with some of our largest customers who typically have estates ranging from a few hundred to thousands of instances, mostly SQL Server, but always including a mix of other platforms such as Oracle, PostgreSQL, MySQL or others, on-premise or in hybrid cloud scenarios.

First, we thought simply making tools that would scale up to handle thousands of instances was enough. It isn’t. We still could not see the forest for all the trees. We needed new functionality that let you analyze and manage the entire estate, or parts of it. So, one of the first conclusion we reached is that we need to differentiate monitoring/managing instances from managing the entire estate. It is simply two different tasks needing separate tools.

Managing and monitoring database instances focus on optimizing performance and uptime of the individual instance. It is usually event-driven – alarms, warnings or complaining users lead you to investigate and take corrective action. This is very often the everyday firefighting so many DBA’s spend their time with, going from one instance and crisis to the next crisis.

Database server estate management is very different. The objective of estate management is to provide overview of all resources, inventory and insight information to aid in capacity and performance planning and information to plan and prioritize DBA tasks and projects.

Overview: The first thing we need in estate management is an overview of all resources. That includes a full inventory of all servers and instances, information on each instance such as version, memory, disk, CPU, and status. Automatic scanning of subnets for new instances will aid in creating and maintaining complete inventories.

Resource utilization: Compare actual allocation of disk, memory and CPU to actual utilization of same resource.

Performance indicators: For the entire estate, see which instances are suffering the most from heavy loads, and which are lightly loaded, idle or simply no longer used.

Based on all this data you will need tools to analyze and utilize the information. You should be able to sort the data on any parameter, like pivot tables in excel, so you can find answers to questions like:

Which servers are the most heavily loaded?
How many SQL Server 2008 do we still have?
Which servers are no longer in use, or idled?
Which servers need more memory the most?
Which servers can we “steal” unused memory or CPU from?

Planning: We can also use this information to monitor long-term growth in resource consumption – disk, CPU or memory and use it to predict future requirements and make plans and budgets accordingly. Short term it will aid in allocating DBAs to the jobs that will be most beneficial to the overall resource utilization and performance of the estate.

Estate management is proactive. It is concerned with optimizing resource usage across the estate. You want to identify where you are short on resources, be it disk, memory, CPU or other, and identify where you have under-utilized resources. Based on this you can identify and plan reallocation of memory or CPU resources, where you can consolidate or decommission servers, where you should split out servers to improve performance.

What we now see on large estates is that we use the global dashboards to analyze the estate and plan work so that we can optimize DBA time to work on individual instances where it will have the most value and effect.

This is a more structured and pro-active workflow that focus on optimizing resource usage and DBA time for the estate.

This is a topic Per Christopher, one of our Sr DBAs, discussed in this webinar held for PASS DBA Fundamentals group. He also shows more examples of estate monitoring and management in these videos on estate capacity and performance management.