Over the years, I’ve watched organizations struggle with trying to understand a few things about their business: their application landscape, their infrastructure, and how the former items tie back to the business itself. Obviously, having a strong and accurate CMDB (Configuration Management DataBase) is a key element to understanding your infrastructure and applications. The challenge they all face is how exactly does one proceed?
Step 1 – “get your CMDB house in order!”
One thing I do need to say here… I’ve never once seen a manually-entered/managed CMDB succeed. Definitely leverage the appropriate discovery tools to load and maintain your CMDB so that the data is usable (focus on the 3 C’s: Completeness, Compliance, and Correctness!) What’s next? Well, understanding, even at a basic level (used by, owned by, etc.) how the Configuration Items (CIs) in your CMDB relate to one another.
Step 2 – “Dependencies & Relationships”
Getting even to that first level of maturity can be a challenge. I’ve seen many CMDB projects in my career, and they each seem to have their own bumps in the road. Some examples would be nice, right? Okay… so, let’s stay tool agnostic for a second. One company I know turned on a discovery product (something that scans and probes across a given IP range) to load the information into their CMDB. And it did… EVERYTHING got loaded! In other words, they forgot to scope their scan. Instead of having useful data, they were swimming in too much data and not knowing where to start with it. It would have gone a lot better for them if they’d taken a more structured approach and defined clear use cases for data collection. Examples of this would be: data to support Change, Incident, and Event processes. Starting here lets you keep the scope very focused and allows the implementation team to target and hit attainable goals. Other use cases can be added over time (Enterprise Architecture, for example) to the CMDB following the same approach. This ensures that all of the data in the CMDB has a valid business purpose behind it.
Another customer I worked with had multiple discovery tools feeding into their CMDB (not a bad idea, overall, as each tool typically provides a little bit different attribute information). Their challenge was coalescing the data properly and avoiding duplicate CIs in the process. Some of the tools “named” the CIs discovered differently, which resulted in duplicated data and a lot of confusion. Until they made updates to the loading process (adapt the queries to find the correct CI instead of creating a new one), they had a hard time of it.
Another example. Your discovery tool finds some CIs:
- 2 Load Balancers
- 1 Firewall
- 2 Web Servers
- 3 Application Servers
- 2 Database Servers
At a base level, your discovery probes should uncover that the firewall talks to a load balancer. The load balancer talks to the 2 web servers. Then the 2 web servers talk to a 2nd load balancer, which is connected to the application servers. And, finally, the application servers talk to the database servers. That all makes sense. So, the questions that should be on every person’s mind (who works in Operations and Security, for sure), are “what applications are running on those systems? and how critical are they to the business? and for that matter, which Line of Business owns them?” Can you answer those questions about the systems in your environment? If so, you’re in a great spot! If not, then you need to move on to…
Step 3 – “Service Mapping”
Again, staying tool agnostic for the moment… the idea of a service map is simply to identify all of the infrastructure and CI elements related to a defined Business or Application Service. This transforms your CMDB into something more powerful… a “service aware” CMDB! Some easy examples: Corporate E-mail, Payroll Processing, External (Customer-facing) website, Intranet, etc. The key piece in service mapping is that you may have one CI that is actually a part of multiple service maps. So, one web server may actually support many websites, each with a different business focus, or a single database server may host multiple databases tied to many different business services.
Why is this important? To be short and simple, it is what allows you to prioritize and identify what is actually important to work on. So, if I get an alert that the above-mentioned firewall is down (offline). How important is it that we restore that service? If it’s supporting our customer-facing website, then it’s probably very important! If it’s in front of my development website, however, it may not be as critical, right? By introducing the service map concept, you start to be able to build intelligence into your system to empower your employees (and orchestration) to make better decisions. An additional benefit is the ability to shift work resources from Operations (KTLO – Keeping the lights on) to Strategy. Most organizations estimate that their resources spend about 80% of their time on KTLO and only 20% on strategic efforts to benefit the business. By improving your CMDB and getting accurate Business Service Maps in place, that ratio can be shifted to something more reasonable, like 60:40. Can you imagine if you were able to free up resources to work on project work instead of doing Incident and Problem management? Would be nice, right?
Mapping out business and service maps manually is not easy. I spoke with one prospect company’s DevOps group, and they had estimated that it takes 50-100+ hours per application or business service to get an accurate map built out. And then, it’s out of date as soon as a change is made! So, whichever level of maturity your at, you always have to keep in mind the “how do I keep everything updated and current?”. Virtual or cloud-based workloads make this even harder since they increase the rate of change in the environment, making traditional methods grossly inadequate of keeping up with the change.
In my new role at ServiceNow, I get to see this in action all the time with our Discovery and ServiceMapping (formerly ServiceWatch) products. There are a few things I really like about them. The flexibility in deployment (agentless running through a MID server), the configurability (easy to modify patterns, debug them, build custom ones where needed, etc.), and the accuracy of the data. Now, to be fair, I worked many years with ADDM (BMC Software product) and the BMC Atrium CMDB. Both of which are good, mature products. That being said, they are separate platforms tied together with an integration. They require different skill sets, infrastructure, upgrade paths, etc. That is a burden on an organization! When you look at the ServiceNow offering, I see a reduction in that overhead… less infrastructure cost (it’s a SaaS solution, so the only pieces on premise are the MID servers, which are Java processes), less cost in training and support (it’s a single platform), and no struggles with maintaining an integration since it’s all on one single platform!
Anyway, I just finished a site visit to a customer that has a very mature Business and Application Mapping process. That’s what got me going on this line of thinking. It reminded me of just how common a struggle this is for organizations to get a grasp on. However, once they do, the benefits are enormous! How many of you have participated in SWAT calls? (Critical Incident calls, where support and operations is trying to figure out what is going on for one or more critical incidents). I know I’ve been on them. How costly are those calls to an organization? I’ve seen calls with multiple Directors and VPs, not to mention all the technical people from all sorts of teams (Networking, AppDev, Security, DBAs, System Admins, etc.). A single call could be costing a company $25,000-$50,000 or more! That’s no joke! And what is happening on the call? People are trying to figure out what’s broken… who owns it… who is going to fix it… when can it be fixed… how to communicate all of this to the business… etc.
How does ServiceMapping help? Well, when an alert or call comes in, that is related to a given CI, the system will automatically make the agent aware of the application or business service impact and even let them know the correct teams and people to get involved. In our case, the Help Desk may get a call stating that the web site is down. When the agent logs the ticket, they can look at the service map and see that the upstream firewall is listed as offline (in a critical state), so instead of routing the ticket to the web application team, they could properly route it to the networking team for resolution, and escalate immediately if the business impact was high enough. No call needed, no figure pointing, etc. Work just gets done.
Anyway, that’s the end of my Friday afternoon ramble. I hope someone out there finds the information helpful! If you want to learn more about how ServiceNow approaches this, there is a series of videos posted on YouTube that can give you a great overview. Here are a couple of them: