Answers from Arunesh Ramalingam - Benchmark Six Sigma Forum

Arunesh Ramalingam's post in Sigma Level was marked as the answer November 9, 20178 yr

In any organization, processes are generally designed for the long term, but most of the times only short-term process data is available.
Also, short-term data mostly contains only common cause variation, while long -term may contain both common cause variation and special cause variations.
Data collection for long term Sigma level calculation would be very difficult as it would need to be collected from several lots, many shifts, many machines and operators and so on.
A reliable estimation of the long-term performance of the process can be made, by estimating the variability that would be experienced over the long term as a function of the short-term variability.
This approximation of the drift of the process mean in the long-term can be used along with short-term sigma level to calculate the Long-term sigma level.

Why 1.5 Sigma Shift?
As indicated in figure below [1], the normal probability distribution predicts a 3.4 defects per million opportunities (DPMO) for a Sigma Level of 4.5 and 0.002 DPMO for Sigma Level of 6.

Motorola has determined, through various studies on process data collected for years, that the process average (mean) is likely to shift over the long term by +/- 1.5.
The worst-case performance of a process in the long term can be estimated by shifting the short-term process mean by +/– 1.5 and then estimating the fraction of defects i.e. non-conforming to the specifications. If the business could accept that level of defects then the process can be considered capable. This is indicated in figure below [2]

References:
[1] http://www.six-sigma-material.com/Tables.html
[2] https://www.dmaictools.com/what-is-six-sigma/the-15-sigma-shift

Arunesh Ramalingam's post in Kano Model was marked as the answer October 17, 20178 yr

Kano Model Summary:

The KANO Model is a tool used to identify 5 categories of product features/services from a customer's perspective, to enable manufacturers and service providers to be competitive in the market.

KANO Model Feature
Categories
Presence of the feature
VOC
Absence of the feature
VOC
Examples

Basic
Neutral
Dissatisfaction
24hr. hot water supply at hotels
Performance
Satisfaction
Dissatisfaction
High Battery life in mobiles
Excitement
Extra Satisfaction
Neutral
Welcome drinks /complimentary chocolates at hotel check in
Indifferent
Neutral
Neutral
Material used in packing juices or milk, if the packets are durable and do not leak.
Reverse
Dissatisfaction
Satisfaction
Annoying Pop up help features in some software
(*VOC-Voice of Customer)

One important point to keep in mind is over time as the customers get used to an Excitement feature, the feature becomes more of an expectation and moves to become a Basic feature.
In other words, a feature which was earlier not even expected, becomes a “must -have”. Earlier its absence would have been unnoticed, but now its absence causes dissatisfaction among the customers.
Example: power steering in cars, camera feature in mobile phones.

What would be your approach for putting these needs to good use?

First, I would work towards developing the identified Basic and Performance features/services, so that it is maintained at a level where it continues to satisfy the customers. There should NOT be any decline in these features/services.

Second, I would focus on developing the identified Excitement features. These would eventually transform to be “basic must – haves”. I would innovate new features /services which would continue to add the WOW factor.

Third, I would work on cost optimization/cost cutting on the identified Indifferent features/services.

Fourth, last but not the least, I would take the precaution of not overwhelming the customer with product features and services. More is not always great! The product features and services should be inline with the requirements of the target customers.

Arunesh Ramalingam's post in Process Stability, Process Capability was marked as the answer October 12, 20178 yr

Process Stability refers to the consistency of the process to stay within Upper Control Limit (UCL) and Lower Control Limit (LCL). The outputs of any process will have a mean value. Then the control Limits for the process are defined as follows:
Upper Control Limit (UCL) = Mean + 3 sigma Lower Control Limit (LCL) = Mean - 3 sigma If the process behaves consistently over time, i.e. the outputs fall within the range UCL to LCL called process width, then the process is said to be stable or in control.
If the Outputs are spread across outside the limits, then the process is Unstable or Out of Control.

Process Capability is a measure of the ability of the process to meet customer specifications. The measure tells how good each individual output is. An estimation of the ppm (defective parts per million) is a method to measure process capability.
Capability Indices (Cp, Cpk) are metrics used to measure the process capability. It indicates how capable the process is in terms of meeting customer requirements.
Customers provide Upper Specification Limit (USL) and Lower Specification Limit (LSL) within which they want the product value to exist. This is called the tolerance or Allowed variation. (E.g. A customer of a Building Management system may want his air conditioning such that room temperature is 20 +/- 2 (deg C) i.e. with USL =24 deg C and LSL = 18 deg C)
The air-conditioning process may have its own Mean/Average temperature say, 21 deg. C
Cp = Tolerance/ Process Width = (USL-LSL) / (UCL-LCL) Cpk = min(Cpu,Cpl) where Cpu measures the closeness between process mean and USL; Cpl measures the closeness between process mean and LSL. Cpk accounts for change is process mean. A simplistic example: In an exam if the passing mark is 40 (USL = not specified; LSL = 40) and a student if 5 continuous attempts get 30,30.5,31,32,30 then we can say he is Stable(consistent) but not Capable.

Is Process Stability supposed to be a pre-requisite for all type of processes?

Any manufacturing /production process should be stable before being released to the production environment. Many customers request that their suppliers submit process capability data to qualify the supplier process. Any estimate of process capability depends entirely on where the process happens to be when the data is collected. For an unstable process, the mean shifts about over time. So, stability would be prerequisite for such process, as the process capability can only be estimated for stables processes.

But, I believe, processes involved in Research and Development, Innovation would be inherently unstable. These process may not need a Capability estimation , so stability will not be a prerequisite.

Arunesh Ramalingam's post in Why Process Mapping Works in Theory — but Fails in Real Organizations was marked as the answer October 9, 20178 yr

The process mapping techniques in increasing order of difficulty that I would recommend are as follows:

Difficulty Level
Process Map Technique

Sub-categories within a Technique
Level 1
SIPOC/ SIPOC -R

Level 2

Process Mapping

Top-down Flowchart (or) High Level Process Map Deployment Flowchart with Relationship Map Detailed Flowchart Level 3
Swim Lanes

Level 4
Value Stream Map or Material and Information Flow Diagrams

Current State Mapping Future State Mapping Gap Mapping & Action List
Level 5
Key Process Input/Output (KPIV/KPOV) Mapping

A brief explanation of each technique:
1.    SIPOC/ SIPOC -R: Supplier Input Process Output Customer – Requirement
Simple layout that shows what the process accomplishes while identifying the key players proving a starting point for discussion. Shows a few high-level process steps, required Input and Providers(Suppliers) and process Outputs and recipients (Customers) SIPOC-R is a variation on the SIPOC in which the requirements (or specifications) for the inputs and outputs are listed
2.    Process Mapping
a.       Top-down Flowchart (or) High Level Process Map
It is the expansion of the centre “process” from the SIPOC into six to seven more detailed boxes. Depicts the process in just a few steps providing quick and easy insights into what the process does (the major clusters of activity are) without getting into the details of how it’s done. Useful when communicating to leadership who do not need the details. b.       Deployment Flowchart
The deployment chart shows both "what" the process does and "who" are the people involved Relationship Map may be drawn to show the participants and how materials, paper or information flows between them. It is a combination of the top-down flowchart and relation map. It helps answer questions like – if right people are involved at right time, or if there are a lot of people and hand offs or if there are barriers between people who med to collaborate. c.       Detailed Flowchart
This provides additional details regarding the process like – each process step actions in detail to identify redundancies and wasted efforts and result of non-standard events. This can help understand process that has built up needless complexity, but is time consuming.
3.    Swim Lanes
A Swim Lane Map is used to better understand a process that crosses organizational, or departmental boundaries. It is at the same hierarchical level as the process map. It is used to show the flow of information/material between different organizations. This type of map is essential in showing “handoffs” between organizations, thus helping to understand failure points. It should contain all communication path, decision trees or handoffs, as these causes disconnects and failures.
4.    Value Stream Map or Material and Information Flow Diagrams
It is used to display the current state of the process including material flows, information flows and other information. This Value Stream map is used to better understand the value created, as well as “not created” – VA, NVA, ENVA in each of the process steps. It includes performance metrics from the individual processes and steps. a. Current State Mapping – “AS IS” mapping of the process.
b. Future State Mapping – mapping the future process as per strategic planning.
c. Gap Mapping & Action List - The Gap Map identifies the gap or distance in the performance metrics from current "AS IS" state to future planned state and shows the course of actions to be taken to improve the metrics.

5.    Key Process Input/Output (KPIV/KPOV) Mapping
Using the Process Flow Map or Swim Lane Map, per key strategic objectives, the area for improvement is identified. Once the target section of the process flow is identified, a standard block flow diagram of the supporting steps in that process is created. Critical Inputs (KPIV) needed for and critical outputs (KPOV) delivered by the steps are documented. The KPIV would contain potential root causes of an issue identified through RCA tools. Like “5 Why” analysis, Pareto diagrams, Fishbone diagrams and so on. References:
1.   https://www.processexcellencenetwork.com/lean-six-sigma-business-transformation/articles/process-mapping-with-flowcharts
2.   https://goleansixsigma.com/6-process-maps-know-choose-right-one/

Arunesh Ramalingam's post in Cost of Poor Quality (COPQ) sounds powerful — but is it truly driving improvement, or just measuring failure? was marked as the answer September 29, 20178 yr

Background
Cost of Quality (COQ) is a quality metric that denotes the ratio of costs involved in guaranteeing high quality products to the total revenue generated by those products.
It can be understood to be a combination of two main components as shown:
COQ = COGQ + COPQ
COGQ is the cost of good quality, which denotes the cost incurred to prevent poor quality and assure high quality products COPQ is the cost of poor quality, which denotes the cost incurred in failure to produce high quality products i.e. scraps, defects, rework, rejects. The COPQ can be denoted as following:
COPQ = IFC + EXC
IFC is Internal Failure Costs consisting of Scrap costs, Defect costs and Rework costs. Some of the factors that affect IFC are weakness in quality resolution (CAPA/FMEA), Improper Resource and Material planning, Equipment downtime, Re-engineering and Re-designing. EFC is External Failure Costs consisting of Returned Product Costs, Warranty Costs, Product Recall Costs.Some of the factors that affect EFC are Poor service management, unresolved customer complaints, environmental non-conformances and so on.
Analysis
The COPQ metric can be useful for a product line or smaller systems, but I believe, it may not be a beneficial metric to track for complex systems and the organization as a whole. The effective scope of COPQ analysis is minimal and pertains mostly to the IFC components. It will be a challenge to accurately measure the EFC (External Failure Costs) component of COPQ.

Measures should definitely be taken to produce high quality products, but a balance should be reached as to how much investments are done to improve quality. Over doing quality could actually result in over processing which is one of the wastes identified in Lean. E.g. A Six sigma process is good but for process related to aviation and flights it is not good enough while for processes such as apparel manufacturing it could be an over kill and may not be required.

Moreover, Investments into innovation, new technologies and better production techniques could have a better ROI than investing in the bottom – up approach of COPQ.

Arunesh Ramalingam's post in Common Cause, Special Cause, Black Noise was marked as the answer September 28, 20178 yr

Variation is the fluctuation in a process’s output. Every measured output may not be identically same and we may notice some variation between multiple readings. Statically it is denoted by Standard Deviation (σ) which indicates the spread of each data point in the data set from the mean/average value.
Example: Consider a machine producing 3.000 mm. diameter bolts. But each bolt may not measure 3.000 mm. diameter exactly. Some can be 2.999 mm., while some can be 3.001 mm. and there are endless possibilities. The spread of the various measurements around the mean (3.000 mm) is called the standard deviation.

Lower the Standard Deviation or Lesser the variation of the diameters indicates that the data points are closer to the mean and the process is better. Aim of a good process design is to minimise this variation.

There are two types of Variations:

Common Cause Variation

Special Cause Variation

This is a Random Variation and is natural for the process. As name indicates it is “common” to the process.
Other terms : Noise, non-controllable variation, within-group variation, or inherent variation

This is a Non - Random Variation or Assignable cause and is not part of the normal process. As name indicates it is “special” to the process and the variation can be assigned to a reason.

Though the value of each point cannot be predicted, the range of this variation is predictable. This range is called the process width or the Control limits.

This is unanticipated and sporadic. It is completely unpredictable.
Common cause variation is introduced by intrinsic variation in the process - by the variation present in People, Information systems, Machines/Equipment, Measurement, Materials and Environment.

Special cause variation is introduced by the external parameters such as Operator not available, Computer crash, Power Outage, Machine malfunction.

Generally, the process remains in control i.e. within the control limits and no corrective action may be required. If process deviates the control limits, then corrective actions are required.

The process goes out of control. Reason of variation should be identified, analyzed and corrected if possible. If unable to correct alternate solutions should be implemented.

E.g. The average normal body temperature is generally accepted as 98.6°F (37°C). Per some studies the "normal" body temperature can have a range, from 97°F (36.1°C) to 99°F (37.2°C). So, if the temperature is within this range and the person is otherwise feeling normal, then he may not need any medication

E.g. If a person’s temperature goes beyond this range then there are high chances that he has a fever and might need to take medication.

Why they should be differentiated and how misjudging one of these as the other can create problems?

Common Cause Variations may not cause a process to go beyond control limits and so corrective actions may not be required. If corrections are required, then it would be intrinsic to the process like checking on the Manpower, Material, Method, Measurement, Machine, and Environment, shifting the process mean, adjusting the variance and so on. It exists even after "Special Cause" is removed.

Special Cause Variations always cause the process to go out of control. The reason for the variation or “what has changed?” should be identified and analysed. If possible to rectify then it should be corrected else an alternate solution should be implemented.

Mistake 1: Attribute a variation to Special Cause, when it is actually a Common Cause.
Impact – Over-adjustment when not required. If deviation from target is considered due to special cause and the mean is adjusted for the deviation, then the adjustment will become a cause for further deviations and will worsen the situation.

Mistake 2: Attribute a variation to Common Cause, when it is actually a Special Cause.
Impact – Ignoring the variation and not doing anything. A special cause actually shifts the process mean, but this was ignored and no action taken to correct it. This further increases the variability.

Example: Consider a pizza delivery joint located in locality A and catering to locality A & B and running an offer of “delivery in 30 mins or free”.
Pizza production time: 10 mins Delivery Boy travel time to locality A: 10-20 mins. (Common Cause Variation) Delivery Boy travel time to locality B: 15-25 mins. (Common Cause Variation) Issue: So, some of the deliveries to locality B are free of cost.

Analysis: This is Common Cause Variation.

If the manager considers this as common cause variation, he can either continue with
a few free deliveries (if it is not heavy on the business) or try to improve the pizza production cycle time But, if the manager commits Mistake 1 (i.e. considers it a Special Cause while it is a Common cause) then he may consider excluding locality B from the offer, which would have a greater impact on the whole business.

Now consider the same scenario but with a new temporary condition:
The road connecting locality A and B is undergoing renovation and there is frequent traffic delay of 10-15 mins. (Special Cause Variation)
Analysis: This is Special Cause Variation.

If the manager considers this as Special cause variation, he can decide to temporarily
excluding locality B from the offer (or) modify the offer for locality B to “delivery in 45 mins or free” and communicate the valid reason. But, if the manager commits Mistake 2 (i.e. considers it a Common Cause while it is a Special cause), then he may:
put in efforts to reduce pizza production time which would not resolve the issues (or) land up NOT taking any corrective correction. This could lead to significant increase in pizzas being delivered for free in locality B and unsatisfied customers in locality B

Arunesh Ramalingam's post in Poka yoke / Mistake Proofing was marked as the answer September 21, 20178 yr

Poka -yoke or Mistake Proofing is about using a process or design feature and control mechanisms to
prevent defects, detect them if there are not preventable. reduce the severity of the defects. The main motive is to: PREVENT a defect from occurring and if that is not possible, DETECT the defect every time it occurs.

It is critical to prevent and detect errors/ defects as early as possible in the process because the later they are found the more expensive they become i.e. costs associated with them increases - more materials, labour, overhead, time. While implementing poka-yoke designs, care should also be taken that the implementation does not enhance any other issues or open new issues that may cause defects.

Poka -yoke or Mistake Proofing has varying degrees of effectiveness. - Control Vs Warning Poka yoke. One must balance getting the most effective poka-yoke while keeping in mind the practical and economic feasibility of the solution.

I feel all the interpretations provided in the question are all correct and validate the varying degrees of effectiveness.

1.       The human error will not happen at all.
Example:
Rectangular design of 3.5” floppy disc so that the wrong side cannot be inserted. SIM card slot in cell phones is designed in such a way that user can insert SIM card in correct way only. There is no chance for user to make a mistake while putting SIM card in a cell phone or floppy in the drive.

2.       Human error may continue to happen but the defect will not happen.
Example:
Validation check when creating new password to contain the required combination of Upper, lower case, numeric and special characters to ensure a strong password. The system does not accept a password unless it fulfils the criterion. Double Entry Box: Most websites & software where one needs to enter a critical bank account number, or a password create option, users are asked to enter the same value twice (with paste option disabled). This is to ensure people haven't made a mistake while entering the value, and that both boxes hold the same value
3.       Human error may happen, the defect is less likely to happen.
Example:
Some of the email software pop up an error message like “there is no attachment, do you want to send it anyway?”, if they find the key words “Find attached” (or other variants of the same) and do not see an attachment when the user tries to send the email. Some Email software pop up a message if the subject is missing when the user tries to send the message. Car Seat belt Warning indicator beeps to warn that the user has forgotten to put on the seat belt, if he drives without putting on the belt.
4.       Human error may happen, the defect will also happen but will be detected and corrected automatically.
Example:
Microsoft word, Google search automatically corrects typographical spelling error. Auto logout functionality in websites (especially Banks). When user forgets to logout before closing the website and reopens, then he has to provide the credentials and log back in.

Arunesh Ramalingam's post in False Alert, Missed Alarm was marked as the answer September 19, 20178 yr

Background and Concept:

False Alarm and Missed Alert are better understood with the two types of errors that are possible in statistical Hypothesis testing. Dealing with them with reference to test of hypotheses will provide more insights than otherwise.

Any hypothesis test is begun with the assumption that the null hypothesis is correct. Null hypothesis is the default position and corresponds to the idea that "one is innocent until proven guilty".

False alarm or Type I errors or False Positives (α): They happen when we reject a true null hypothesis.
Missed alert or Type II errors or False Negatives (β): They happen when we accept (fail to reject) a false null hypothesis.

Which error will you prefer over the other?

The answer to this question depends on the problem and the worst that could happen if either a Type1 or Type 2 error was committed.

Example 1: Person accused of Murder awaiting Death Sentence.

Null Hypothesis: Person did not commit murder.

Type 1 error: Person did not commit murder but pronounced guilty. (Rejected true Null Hypothesis)
Type 2 error: Person committed murder but pronounce Not guilty. (Accepted false Null Hypothesis)
In this example, though Type 2 error is not favorable to society, but hanging an innocent person is far worse. So
Type2 error or a Missed alert is preferable.

Example 2: Person being screened for a disease to prescribe further tests.
Null Hypothesis: Person does not have the disease.

Type 1 error: Person does not have the disease but recommended for further tests. (Rejected true Null Hypothesis)
Type 2 error: Person has the disease but recommended for no further tests. (Accepted false Null Hypothesis)
In this example, Type 1 error might cause the patient to undergo further tests but might finally reveal that he does not have the disease. A type 2 error would prevent a legitimate patient from undergoing further tests. But a legitimate patient can re-do the test if the symptoms persist, and it is fine for a person to do some further tests even if he does not have the disease. So Type1 error or False alarm is preferable.

Example 3: Person being screened for a disease (presence of which has a good rate of survival and normal life) to prescribe a delicate specialised surgery that has poor success rate.
Null Hypothesis: Person does not have the disease.

Type 1 error: Person does not have the disease but recommended for surgery. (Rejected true Null Hypothesis)
Type 2 error: Person has the disease but not recommended for surgery. (Accepted false Null Hypothesis)

In this example, Type 2 error might cause the legitimate patient to not have the surgery which is bad, but it is much worse to have a person without the disease undergo the delicate critical surgery. The legitimate patient may re-do the tests, if he still feels the symptoms of the disease and may be re-diagnosed to undergo the surgery. In this case, a Type2 error or a Missed alert is preferable.

Arunesh Ramalingam's post in Fault Tree Analysis / FTA was marked as the answer September 14, 20178 yr

Fault Tree Analysis (FTA) is a graphical technique for Reliability and Safety Analysis of Systems. It is used:
to investigate potential faults its mode and causes and quantify their contribution to system unreliability in the course of product design. The basic constructs in a fault tree diagram are gates (conditions) and events (causes leading to failure).

Fault tree diagrams are logic block diagrams that display the state of a system (top event) in terms of the states of its components (basic events).
An FTD is built top-down in term of events. It begins with the foreseeable, undesirable loss event (or a fault condition). Subsequently, it attempts to determine the specific causes (events) by constructing a logic diagram using a graphic model of the pathways within a system that can lead to the failure. Each cause is further broken down till a basic fault: human, hardware or software is reached. The pathways connect contributory events and conditions, using standard logic symbols (AND, OR, etc.).

Example of an FTD – The Root Causes of Hazard to Patients during surgery [1]

The two most commonly used gates in a fault tree are the AND and OR gates.
OR Gate represent Logical Addition.
Even if one of the Inputs to an OR gate is “1” or “TRUE”, then the Output is “1” or “TRUE”. If all the inputs are “0” or “FALSE”, then the Output is “0” or “FALSE” AND Gate represents Logical Multiplication.
Even if one of the Inputs to an AND gate is “0” or “FALSE”, then the Output is “0” or “FALSE”. If all the inputs are “1” or “TRUE”, then the Output is “1” or “TRUE”
The main purpose of the fault tree analysis is to help identify potential causes of system failures before the failures actually occur.
It can also be used to evaluate the probability of the top event using analytical or statistical methods. These calculations involve system quantitative reliability and maintainability information, such as failure probability, failure rate and repair rate.

After completing an FTA, efforts can be focused on improving system safety and reliability.

Situations where FTD is most effective:

It works well to identify possible causal relationships in cases where Output has a Boolean (True/False) relation with inputs, especially in small and medium sized systems where all causes /events can be conceived. It can be used in situations where specific data regarding known failure rates of components is known. It is used to supplement Root Cause analysis on engineered systems, by reviewing assumptions and design decisions made during initial system design.
Situations where FTD is least useful:
It is not effective in large complex systems as it is difficult to conceive all possible scenarios leading to the top event. The construction of fault trees can become very tedious and are prone to have errors. It does not function well as a Root Cause Analysis tool because FTD does not work well when some of the causes could be Human actions. This is because wide variance of possible human failure rates prevents FTD from providing accurate results. FTD is not very effective when there is event dependency or load sharing i.e. the occurrence of each event (cause) affects the probability of occurrence of the other events.
[1] http://asq.org/quality-progress/2002/03/problem-solving/what-is-a-fault-tree-analysis.html

Arunesh Ramalingam

Joined

Last visited

Solutions

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)