Q. 203 How do you tackle a situation (as commonly found in IT Infrastructure) where you find that large volumes of IT Tickets points to different isolated root causes most of the time? Do you think that fixing issues quickly as they occur is the best way to handle such a situation? What other similar situations have you faced? Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday. All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/ Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening as per Indian Standard Time. The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term.

Failure of 80:20 Rule

80:20 rule states that 80% of defects are due to 20% of the causes. This rule is also known as the Pareto Principle or the law of Vital Few (the 20% causes are the vital few that need to be focused upon)

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by R Rajesh on 17th October 2019.

Applause for the respondents- R Rajesh and Neeraja Killi

Failure of 80:20 Rule

pareto analysis

Followers

October 15, 20196 yr

Q. 203 How do you tackle a situation (as commonly found in IT Infrastructure) where you find that large volumes of IT Tickets points to different isolated root causes most of the time? Do you think that fixing issues quickly as they occur is the best way to handle such a situation? What other similar situations have you faced?

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/
Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening as per Indian Standard Time.
The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term.

Solved by R Rajesh

October 17, 20196 yr

Go to solution

October 16, 20196 yr

While I didn't personally work on a similar improvisation project, I did witness my organisation deploy a permanent solution to this kind of IT ticketing issues. While working with a 3rd party logistics company, ensuring the availability of all systems is key given that all the warehouses were spread across various countries across the world and the key IT administration teams were located at major offices. Even when someone from the IT admin team is always available given their global presence covering the global time zone, the company adapted a process to analyse the types of tickets, time taken for issue resolution and the trend of these parameters across the company every quarter. This helped them identify the most common occurrences for the major tickets and find a resolution for this and implement it immediately, which resulted in the decrease of total number of IT infrastructure related tickets by a whopping 25% in the next quarter. This approach also, made the company switch to a different service provider after identifying the root cause of major issues was due to the lack of proper SLA establishment with the previous service provider and also the previous provider not agreeing to the new requirement. This eventually showed a constant decreasing trend in the IT admin related tickets each quarter and improved the system availability drastically.

October 17, 20196 yr

Solution

Normally when an issue happen in an IT Infrastructure, an assessment is made to see the kind of impact it can have for the business and ultimately to the customer. Accordingly fix for the issue will be put in by the respective teams.

So even though there can be multiple issues that might happen and for which there may be root causes which are distinct or isolated, each of the issue needs to be tackled in terms of its severity or priority level(decided on the basis of the business impact and the resultant effect on the key customers that the business has).

Some organisations address the tickets (called as Incidents in IT Infrastructure projects) in terms of Priority and some organisations, in Severity.

Sample Classification of Priority tickets denoted by P1,P2,P3,P4 where P1 and P2 are critical/High respectively and P3-Medium and P4-Low or

Sample Classification of Severity tickets denoted by Sev1, Sev2, Sev3 and Sev4 ; where Sev1-Critical; Sev2-High, Sev 3-Medium ; Sev4- Low

Now if the issue is of a critical/high nature then we need to put a quick fix. Often the fix would be a workaround. Permanent fix might be later. But if the issue is of a lower nature(priority or severity) then those incidents (issues) are pushed to 'Problem Records' which is used for providing Permanent fixes. This is because the Service Level Agreement(SLA) for Resolution time for Sev1/Sev2 or P1/P2 is very short(in minutes to few hours) and for Sev3/Sev4 or P3/P4 , it can be mostly(in a single day to few days or so).

Therefore, it is the severity/priority of the issue that decides the best way to handle the given issue. Accordingly it is decided to have the issue either to be incident managed and needs to be addressed immediately with some sort of fix, or it can be marked as a problem record and moved to 'Problem Management ' category, for a permanent fix.

However there are few things that can be done:

1. Maintain a Known Error Database (KEDB). This is an industry standard. Since we talk about issues with isolated root causes, it becomes difficult to track things as we don't have a standard cause to provide a fix.

So what happens when an issue happens again for the 2nd time say after a month (from the 1st time when it occurred). Imagine, the person(say a Subject Matter Expert) who provided the fix is not available on that day (when the issue happens or reoccurs), for whatever reasons(assume he/she is resigned). Whoever is working on the issue may have to go through the issue all over again. It would unnecessarily take time to fix the issue. The knowledge therefore would have been lost.

Imagine if we have multiple issues with different root causes and how difficult it would be to address each one. For such scenarios, KEDB can be used - which is primarily serves as a log of issues with solutions captured(it can be workaround or permanent fix). It can be a customized tool, excel sheet or any other file.

2. When there are multiple issues and if an issue is repetitively occurring, whether it is high or low priority(severity) issue, then the issue can be tagged as a 'Problem Record' as part of Problem Management.

3. While many teams goto Problem Management based, on getting repetitive issues , which is reactive Problem Management , some wise teams - look for patterns and trends in their project/applications and based on that do a proactive problem management (by basically finding out potential problems) and have problem records created and proactively do a permanent fix avoiding in the process , potential problems (which can avoid even disasters).

There are tools like Kepner Tregoe method which can help Problem Management.

Conclusion:

IT Infrastructure projects will have different type of issues with different priority/severity level. Each may have different or distinct root causes. But as we saw, based on the severity/priority of the issue, the issue is dealt with. It is always a good practice to have a KEDB to ensure that workaround solutions or solutions for permanent fix is available.

For instance, an error in a page can happen and which may be a critical or high issue, but for which the workaround solution might be say restarting a server(just a crude example but using here to drive home the point). So this can be mentioned in KEDB for that issue

Also there could be cases where issues can be created as Problem records so as to provide permanent fixes. In such cases, for recurring issues, records are created reactively (issues have happened as incidents - be it critical/high or medium/low) and proactively also records can be created. The more proactive potential issues we find and provide upfront fixes, better would be stability of the applications in production, better customer satisfaction and minimal adverse impact for the business.

6 yr6 yr Rohit Gandhi locked this topic

October 18, 20196 yr

Author

While Neeraja Killi has provided valuable thoughts on how such a problem can be addressed, the winner for this question is R Rajesh who has comprehensively addressed the modus operandi specifically used for IT Infrastructure. It should be noted that the suggested ideas for IT Infra are useful for several other scenarios of a similar kind.

3 yr3 yr Rohit Gandhi unlocked this topic

Create an account or sign in to comment

Followers

Go to topic listing

Failure of 80:20 Rule

Featured Replies

Solved by R Rajesh

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)