The Problem of CSFs

If you are unable or unwilling to appoint a Problem Manager, you are not ready for Problem Management.

That’s what I said. Or at least I think that’s what I said.

The venerable and ubiquitous Chris Dancy quoted me this January 2011 on episode 1 of the re-formed Pink Elephant Practitioner Radio podcast. He quoted me as saying “you can’t do Problem Management without a Problem Manager”. I finally listened to it last Friday.

I want to apologize to Chris. First, I apologize that I didn’t listen to his podcast earlier. I am a couple months behind on my podcast queue. Second, I apologize that I didn’t thank him personally at #PINK11 in February for the mention. I love Chris in almost every conceivable way.

I don’t fully agree with the paraphrase. I think a company can successfully implement a Problem Management process without a Problem Manager. What I really wanted to say was this: If you are unable or unwilling to appoint a Problem Manager, you probably haven’t achieved all the critical success factors you need to successfully carry out Problem Management. Unfortunately, this sentence doesn’t tweet well, so I abbreviated.

ITIL v3 Service Operations lists the critical success factors for Service Operations processes:

  1. Management Support
  2. Business Support
  3. Champions
  4. Staffing and retention
  5. Service Management training
  6. Suitable tools
  7. Validity of testing
  8. Measurement and reporting

All of these are necessary to successfully implement Problem Management. Organizations that lack any of these factors won’t appoint a Problem Manager. My advice to organizations, then, is very simple: appoint a Problem Manager. If they cannot do this, they are not ready for Problem Management.

In fairness, a few organizations do meet all the above CSF’s and choose to implement Problem Management without a centralized point of contact. It is the responsibility of managers to perform Problem Management activities inside their own group. Organizations with the right culture can get away with this. Most organizations cannot.

For that matter, most organizations cannot muster the courage or resources to appoint a Problem Manager.

Rethinking the CMS

I first started this post in response to the IT Skeptic’s CMDB is crazy talk post about 2 weeks ago. My own position derives from several observations in the real world:

  1. Few organizations are willing to perform a full spectrum CMDB implementation, due to cost constraints. (In my opinion few organizations actually require this.)
  2. Observation #1 even includes those organizations that have purchased a commercial CMDB software package. They purchased the software to achieve a more specific objective.
  3. And yet most organizations perform some level of configuration management, even if that is as simple as tracking software licenses on a spreadsheet.
  4. Most organizations would benefit from doing a little more than they are currently doing. The “what” depends on the organization.

The ITSM community needs to stop thinking about the CMS / CMDB as a process (which has defined inputs and outputs) or a thing or a database. Instead we can think about it as a broad collection of activities that help maintain information and documentation on the configuration of assets and software inside an organization. This isn’t a black or white implementation where you do it all or you don’t do any–most organizations span the spectrum of gray.

The trouble with ITIL (as if there were only one) is the concept of a CMS is so abstract that most people cannot understand  it. This is by design–it saves the authors the trouble of actually doing anything. I still have trouble describing ITIL’s take on the CMS, and I have done practical implementations in a dozen organizations.

Let’s help practitioners by enumerating and describing the various CM activities that take may take place that are common in the real world. We will explain the benefits and costs associated with each.

For example:

  • Automated discovery of hardware assets
  • Automated discovery of installed software assets
  • Automated discovery of network assets
  • Automated linking of software and hardware assets
  • Automated linking of hardware and network assets
  • Automatic reporting on software compliance and unused licenses
  • Linking Incidents to CI’s
  • Linking Changes to CI’s
  • Linking Problems to CI’s
  • Linking CI’s to end-users
  • Linking CI’s to end-user organizations

This list is VERY incomplete, and there is no out of the box solution for any of the above. There is a wide variety of expression of CI names, CI types, attributes, and statuses of the above items. Each can be automated to different levels.

By making a checklist we can help practitioners and organizations understand what they can do, what other organizations do, and what they should consider in the future. It would be a list of available options, rather than a document of the One True Way dictated high from above. We can expand on Skep’s checklist concept.

A checklist of practical activities could also feed into a maturity assessment.

We can call it the Management of the Configuration of the Configuration Management Database Database. Or we can call it WikiCMDB for short. Stay tuned here for more details.

I am thinking out loud, so everything in this post may be wrong. I welcome your feedback.

OBASHI and the CMDB

In September 2010 APMG unveiled its new OBASHI Certification based on the OBASHI methodology developed in 2001 in the gas & oil industry. I won’t go into detail here, but there is at least one book available at the APMG-Business Books but apparently not on Amazon, and least of all not in Kindle format. There is also a OBASHI Explained white paper. Confession: I haven’t yet read the book.

This is just a first impression, and it was this: this is a lot like the CMDB analysis I have done several times in the past. Here is a CMDB framework which I have commonly seen used in industry.

At the top you can imagine are products that your company offers to its customers. Those products are provided by Users (of Services provided by IT), which may be individual users, departments, divisions, teams, etc. The Users use services which may be provided by other services or by applications. Applications run on servers and computers, which are connected by networks.

That sounds obvious but have found some people find it a bit abstract until they start laying out practical examples. The important thing to remember is the objects (rectangles on a diagram) are connected by arrows. In CMDB parlance, the objects are Configuration Items (CI’s) and the arrows are relationships. I typically label the diagrams.

The OBASHI framework seems to use the same concepts. When modeling a CMDB I usually allowed the customer a little more flexibility of CI Types and Relationships, depending on their requirements. OBASHI seems a little more rigid in the use of components and data flows.

At first I wondered what is the purpose of OBASHI. However, I like it after further thought. Although it describes data flows, it is easy to envision it describing other flows, such as process flows. It is an effective framework for analysis that effectively communicates complex systems. It doesn’t require the full implementation of an expensive CMDB to achieve its benefits, and the information collected will readily be reused in the implementation of a CMDB.

The Problem of Revealed Preferences

Consumers express their rational interests through their behaviors. Economists call these revealed preferences.

IT Service Management trainers and consultants tell other companies how they should run their businesses, based on industry best practice frameworks. We seldom examine revealed preferences, but perhaps we should.

One of my first engagements as an ITSM consultant involved the planning and implementation of a Problem Management process at an organization that had committed to widespread ITIL adoption. For several years after I was an acolyte of Problem Management.

I have implemented Problem Management at a dozen customers and tried to sell it to even more. Among the latter, most resisted. The excuses often included “we are too busy putting out fires”, “we aren’t ready for that”, and “that’s not our culture”. Perhaps I wasn’t selling it well enough.

Most organizations do ad-hoc Problem Management, but few organizations have defined processes. Their reasons probably contain some legitimacy.

Organizations do need to be in control of their Incident Management process before they are ready for Problem Management. They do need to be in control of putting out fires and fulfilling service requests. Most organizations find themselves with a backlog, even those under control, and that is alright. A reporting history is a prerequisite.

Organizations must also be in control of its Changes. The resolution of known errors take place through Change Management, and organizations in control of Changes are better able to prioritize the resolution of its known errors. Anyway, at most organizations, an effective Change Management process is more beneficial than Problem Management.

I usually told customers they were ready for Problem Management only if they could devote at minimum one-quarter to one-half of an FTE to the Problem Manager role, and this person would need to have a good overview of the architecture or be well-connected throughout the organization.

In other words, the Problem Manager should be reasonably senior and independent of the Service Desk Manager. Without this the Problems will be starved of resources. Someone needs to liaise with senior management, ascertain business pain points, prioritize tasks, acquire resources, and hold people accountable. In other words, Problem Management requires commitment at senior levels, and it isn’t clear all organizations have this. Many don’t.

There is another reason that is more important. Organizations that are so focused on executing strategic projects won’t have the resources to execute Problem Management processes consistently. There are several reasons this can occur. In one instance the organization had acquired another and had dozens of projects to deliver on a tight deadline. In another, the reverse situation was occurring, as they built functions to replace those provided by a former parent company. In another, the organizations simply put a lot of focus on executing projects in support of the organization’s new strategy.

Some organizations plainly require Problem Management processes. I have seen more rigorous adherence among organizations who provide IT services to other technical organizations, such as data center outsourcers, hosting providers, or organizations who provide services such as video distribution or mapping to telcos. When services are interrupted, the customers demand root causes analyses.

But it is probably true that many organizations don’t need or aren’t ready for Problem Management. Problem Management is an investment, and like all investments should deliver returns that exceed not only its own costs, but also exceed the benefits of similar efforts the organization may undertake. So it has been revealed.

Implementing a Service Desk Application, Part 1

In this multi-part series on planning and implementing a Service Desk application, I start with identifying the characteristics of the tools.

Tool Characteristics

Service Desk applications usually support at a minimum the ITIL® V3 processes of Service Request Fulfillment and Incident Management, Problem Management, and Change Management. They frequently also support Release Management, Service Catalog Management, Service Asset and Configuration Management. They less frequently support Capacity Management, Availability Management, and/or Asset Management, but the tools may support integrations with other products in these areas.

In a nutshell, a Service Desk application is a mechanism for tracking tickets. I am sometimes asked what do Service Desk tools do that cannot be done in open source bug tracking applications Bugzilla. Without commenting specifically on Bugzilla or other products, Service Desks usually have a few requirements not supported by those tools. In general Service Desks require support for multiple processes, each with different fields, statuses, priorities, workflows, and approvals, and they require integration of ticket flows between those processes. For example, an Incident can initiate a Problem, which can initiate a Change. An Incident should be held Pending until the Change is complete.

Service Desk applications usually provide some capabilities to develop workflows. They are not true workflow engines, but provide lightweight workflow development capabilities consistent with the needs of Service Desks. Usually they provide configurable priorities and statuses. Often they provide configurable fields. They usually provide the capability of one or more drop-down dependency trees for categorizing the tickets. Usually there are workflow rules around, for example, how long tickets can spend in a particular status, and what should happen if that time period is exceeded. They usually provide approval workflows.

Service Desk applications also need to provide some reporting capabilities, which can come in a variety of manners. Some provide built-in reports. Some provide built-in capabilities for reporting configurations. Others provide templates with 3rd party tools, such as Crystal Reports or Microsoft SQL Server Reporting Service. The reporting may also include exporting of ticket information for consumption, manipulation or presentation in a tool like Microsoft Excel or Microsoft Access.

It is also important to have methods of interfacing the Service Desk application with other systems. In order to automate the population of customer information in the ticket, the Service Desk tool needs to import or input data from a corporate repository such as Active Directory or an HR database. Ideally this data will be imported dynamically or real-time at the time the ticket is created, but some tools will replicate the data into its own data store. The most common method is to interface with Active Directory via the LDAP protocol. Service Desk systems might also provide the capability of importing other data into the tickets, such as asset data from a 3rd party SQL database. Some tools allow dropdown or multiselect field choices to be imported from a data table. If the application includes a CMDB, then it should also provide methods to import or confederate data from multiple 3rd party data sources.
Some Service Desk systems use a “fat client” that is installed on the machine. The issue with this is having to install the fat client on every machine which will be used for the service desk and or its customers. Other Service Desk applications support user interactions via a web browser, so there is nothing to deploy on the clients. Others support a hybrid model, whereby limited or customer interfaces are provided via the web, while assignee and/or administration functions are provide via the fat client.

Most tools provide email interfaces in which agents or customers can create or update tickets via email, and updates to tickets can be sent to agents and/or customers via email. In addition many tools provide API’s and/or web services interfaces for programatic updates to tickets. Finally, most tools provide role-based access to the tool. For example, customers have fewer rights than agents of the tools. Depending on the role, some people can approve tickets at certain times, or others can enter certain information depending on the status, for example.

Next week I will outline some steps to prepare and plan for the selection of a tool.

Are Standard Changes “Good”

ITIL practitioners commonly assume that standard changes are good, and that a high percentage of standard changes vs. normal changes indicates a mature environment. Is this maturity “good” by definition with respect to Change Management? The answer is mostly yes, but not always.

Let’s look at some ITIL definitions. ITIL defines a change as:

The addition, modification, or removal of authorized, planned, or supported service or service component, and its associated documentation.

This definition in ITIL necessarily broad and not very actionable. Every company must define on their own what is and what is not a change. For example, one customer of mine defined their Change Management scope around the financial system necessary to achieve SOX compliance. They defined specific servers, workstations, and application by name in their Change Management policy. Although their definition was narrower than most companies’, it nevertheless worked for them.

The execution of an individual Change typically contains the following steps (in various orders):

  • Recording the Change, along with some basic information about it.
  • Reviewing the Change, including filtering out Changes that are useless, ill-conceived, or otherwise inappropriate.
  • Assess the Change, including analysis of risk, efforts, resources, and possibly more detailed feasibility analysis.
  • Authorize the Change
  • Implement the Change
  • Review the Change after implementation

The most important differences between normal and standard changes are in the assessment and implementation. ITIL defines a standard change as:

A change to service or infrastructure for which the approach is pre-authorized by Change Management that has an accepted and established procedure to provide a specific change requirement.

This definition is strong but incomplete. Standard changes are typically low risk, low impact, highly repeatable, well-documented, and well-understood. Standard changes are pre-authorized by the change board or relevant authorization parties.

Typically when an organization initiates a formal Change Management process, they will have few or no standard or pre-authorized changes. As they learn and grow, they will s identify some changes that occur frequently and are low risk and low impact, and further authorization of each individual change will burden the change board.

Such changes are candidates to become standard changes. Somebody, typically the person who implements these changes, can raise a change for the purpose of pre-authorizing a specific type of change. If authorized through the Change Management process, a new standard change is born. As a consultant I recommended these standard changes should become quick change templates in the ticketing system.

In general the longer an organization has been doing Change Management, the more standard changes it will have defined, and a higher percentage of all changes will be standard changes. This is a good thing, and in general the process of defining the the implementation plan (or change model) for standard changes helps to build repeatability and organizational learning and maturity.

Is a high percentage of standard changes ever a bad thing? In general no, unless the organization is so focused on measuring percentage of standard changes that technicians stop recording normal changes. This is bad, but rare in practice (at least for this purpose).

Standard changes can be difficult to achieve in organizations that are very dynamic or growing rapidly. In these environments it can be difficult to identify classes of normal changes that are candidates for pre-authorization. Growth consumes cash, and growth consumes resources. Highly dynamic environments may simply not have the people and time required to define standard changes, despite the benefits in more static organizations. Standard changes indicate operational efficiency, and not all organizations define their strategic competitive advantage around operational efficiency.

Not a Review: SCSM 2010

At heart I am a hands-on geek. When I learned that Microsoft finally released its Service Center Service Manager 2010, I was excited to try it. I logged into my partner account and signed up for the 90-day trial.

Then reality set in. As is so frequent with Microsoft’s business applications, initial enthusiasm gave way to confusion, then to despair. Let’s look at the requirements. Microsoft defines the minimum server roles:

Server 1: Service Manager management server + Service Manager database + Service Manager console

Server 2: Service Manager data warehouse server + Service Manager data warehouse databases

Nominally there is a SQL Server 2008 instance running on each server, but kindly Microsoft allows us to use a single instance:

Server 1: Service Manager management server + Service Manager database + Service Manager console

Server 2: Service Manager data warehouse server

The point is that SCSM doesn’t allow the data warehouse server to run together with the management server. Now let’s look at the hardware requirements:

Service Manager database Dual Quad-Core 2.66 GHz CPU
8 GB of RAM
80 GB of available disk space
RAID Level 1 or Level 10 drive*
Service Manager management server Dual Quad-Core 2.66 GHz CPU
8 GB of RAM
10 GB of available disk space
Service Manager console Dual-Core 2.0 GHz CPU
2 GB of RAM
10 GB of available disk space
Data warehouse management server Dual-Core 2.66 GHz CPU
8 GB of RAM
10 GB of available disk space
Data warehouse databases Dual Quad-core 2.66 GHz CPU
8 GB of RAM
400 GB of available disk space
Self-Service Portal Dual-core 2.66 GHz CPU
8 GB of RAM
10 GB of available disk space

These hardware requirements a far higher than many competitive products, but still modest for an enterprise. On top of that you also need a domain controller. For all this you achieve capabilities that are even more modest:

  • Up to 20,000 users with up to 40 – 50 IT analysts providing concurrent support.
  • Up to 20,000 supported computers, assuming up to 10 to 12 configuration items (installed software, software updates, and hardware components) per computer.
  • 5,000 incidents per week with 3 months of retention for a total of 60,000 incidents in the Service Manager database for the 20,000 computer configuration, and 2.5 times that for the 50,000 computer configuration
  • 1,000 change requests a week with 3 months of retention for a total 12,000 change requests in the Service Manager database for the 20,000 computer configuration, and 2.5 times that for 50,000 computer configuration

What perplexes me the most, in this day and age of Web 2.0, is Microsoft’s decision to use a “fat client” aka the Service Manager console.

To be fair, Microsoft is probably making serious attempts to use their product in-house (a strategy Microsoft calls “eating your own dog food”). In fact they made a video of one of these attempts. What I saw in the video was a CMDB with a little bit of ticketing and assignment routing strapped on the front end, and some customizations required for notifications. It was an honest video, but not impressive and certainly not an inspirational endorsement of a weak first release of the product.

At the very least I can say that SCSM 2010 is a solid attempt to sell other Microsoft products (Windows Server, SQL Server). Microsoft will probably get SCSM right sooner or later. They can always be counted on that, but only after they have exhausted all other possibilities.

W00t: The ITSM podcasts are back

I posted almost two years ago about the dearth of ITSM podcasts. The IT Skeptic blog is alive and well, but the podcasts are dead.

Fortunately a new batch of podcasts have arisen. Three that appear to be alive and well are:

I think the first organization needs no introduction: Connect, Learn, Grow! The itSMF USA Podcast

A second is the ITSM Weekly The Podcast From the folks at the ServiceSphere.

And last, but not least, the ITSM Manager: the IT Service Management Podcast, from ITSM Manager.

All are linked from the iTunes Store.

Changing Priority

Question: An Incident meets the criteria for P1. However, midway through resolution the impact has changed to that of P2. How should we treat the Incident now? How should we measure the SLA, based on P1 or P2?

Answer: I don’t believe ITIL provides much specific advice about this condition. How you want to handle this is really up to your organization and, more specifically, your Incident Policy.

In general organizations will prioritize Incidents based on the Impact (i.e. how many people or systems are affected) and Urgency (how long the organization can function with that service down or degraded).

Was the original assessment of Impact and Urgency incorrect? In this case you should change the prioritization.

Did the impact change because you applied a fix or workaround? In this case you should not change the prioritization.

Your SLA’s usually measures to full restoration. You could also measure to workaround provided. Your SLA’s won’t usually include fractional measurements based on changes to prioritization. For that matter, most organizations don’t even have formally agreed SLA’s with their customers.

Exploring the Role of Steering Committees in Realizing Value From Project Management

Abstract: The impact of steering committees on project performance and their role in creating value from project management capabilities is not well understood. A case study analysis was chosen to analyze the configurations and specific functions of project steering committees. A measurement model for steering committee configurations was developed to enable further survey-based studies. One of the major insights resulting from the authors’ interviews with project managers and senior managers was that they perceived the existence of a project steering committee only when the context was defined and clarified. Furthermore, a large variety of committee involvements was identified, concluding that steering committees per se are very rare. On the project level, the cases clearly demonstrate that committees with project steering functions play an important role in the selection, initiation, definition, and control of projects. On the organizational level, they are important to implement and maintain project management standards. Finally, the results clearly indicate that steering committees directly support project success and are instrumental for attaining value from an organization’s investments in its project management system.

Reference: Project Management Journal, Vol. 40, No. 1, 42-54.

Authors: Thomas G. Lechler, Stevens Institute of Technology; Martin Cohen, Stevens Institute of Technology

Link: http://www3.interscience.wiley.com/journal/122217097/abstract

Comments: My first reaction was: I wonder what relevance this research has to Change Advisory Boards in a Change Management process. However, beyond the superficiality they are not very similar. Projects, and hence project steering committees, are more strategic. Project steering committees consist mostly of executive management; the CAB may consist of representatives of management, users, partners, and vendors among others.

What surprised me the most of the research is how little research has been performed this area. Lechler and Cohen sought to address the questions of project steering committees: 1) what are the various configurations and responsibilities, 2) which project decisions are made, 3) how are they organizations, and 4) do they increase project management value? In order to do this they examined four companies in depth, as part of a larger PMI-sponsored study.

They found wide variety in steering committee composition, which is consistent with my observations of CABs in organizations implementing Change Management. In general they found steering committees are perceived to improve project management performance with minimal detrimental elements, such as political infighting or delayed decision-making. I wonder what would have resulted from the study of project-oriented organizations that lack the element of a structured steering committee–would they perceive lack of a steering committee hinders project management success?