# Amlan Dutt

Lean Six Sigma Green Belt

6

1 Average

• Rank
Newbie

## Profile Information

• Name
Amlan Dutta
• Company
Applied Material Pvt. Ltd.
• Designation
Data Scientist-Manager

## Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

1. ## DPMO

Let’s see an example for DPMO calculation for cinder blocks evaluated on length, breadth and height. Item/Criteria Length Breadth Height Defective # of defects cinder block #1 correct incorrect correct yes 1 cinder block #2 correct incorrect incorrect yes 2 cinder block #3 incorrect correct correct yes 1 cinder block #4 correct correct correct no 0 cinder block #5 correct correct correct no 0 Opportunities/Unit 3 Total Units 5 Total Opportunities 15 Total Defects 4 DPO 0.266667 DPMO 266,667 Area to right 0.27 Area to left 0.73 Sigma level (with 1.5 sigma shift) 2.12 The flaws in using DPMO as metric are obvious, and listed below 1. DPMO/Sigma Level are metrics which can theoretically be used to compare unlike products and processes 2. Complexity of defects can’t be represented with DPMO; not all defects are equal sometimes 3. Defect density is not captured by DPMO; i.e. a needle in haystack OR box of needles in haystack 4. Back calculating DPMO from sigma level, if defects doesn’t follow a normal distribution then sigma level will be overestimated 5. DPMO and PPM are not the same, except if # of opportunities for a defect/unit = 1. These are used interchangeable very often 6. To make a jump from 2 to 3 sigma, DPMO has to be reduced by 241,731 while from 5 to 6 sigma is mere 230 (all with 1.5 sigma shifts). This shows that DPMO is sensitive to tails of distribution which is not always a nice thing. How? a Burr distribution with c=4.873717 and k=6.157568 perfectly resembles a standard normal distribution with mean = 0, sigma = 1, skewness = 0 and kurtosis = 3 but are very difference from DPMO standpoint. i.e. our realization of the ‘true’ distribution of a process will never coincide perfectly with the truth 7. Chasing zero defects in accordance with DPMO, a good process can be made better but not perfect. 8. Over relying on DPMO may give inappropriate approximations of Cpk
2. ## Box Plot

Check this one out! A picture worth thousand words. Box plot comic So the benefits are obvious. Box plots are easy to interpret. It is five point summary (Q1, Q2 ,Q3,Upper and Lower Fences) It handles huge uni-variate observations pretty quick; because median doesn't require an analytic solution unlike mean It handles non normal distribution pretty well Let's see at anatomy of box plots. So It helps visualizing 25th, 50th, 75th percentiles along with mean and outliers. Box part of it contains 50% of data while remaining 50% lies with both whiskers collectively (...along with outliers). One question arise that why did John Tukey (inventor of box plot) selected 1.5 times IQR? Well because the tendency really is to consider most distributions to be Gaussian by reflex, and for a Gaussian distribution, it works well. What I mean is, for a nice Gaussian distribution, (with s as Std. Dev and X as Mean) Q1 = -s0.67 + X, and Q3 = s0.67 + X, (from the property of standard scores for a Gaussian distribution) IQR would be Q3-Q1, So IQR = 1.34s (from above) Tukey’s lower bound = Q1-1.5IQR = 1.34s , which is X-2.7s If you chose Q1-1IQR, it would be X-2s (too many outliers), and if you chose Q1-2IQR, it would be 4s (too few outliers) 1.5IQR allows just under 3s, (and is easier to handle than 1.567IQR). In other words, if +-3s gives ~.3% observations as outliers, box plots with +-2.7s will give .7% observations as outliers. It handles mildly non-normal distribution well showing skewness. Below are examples of left skewed, no skew and right skewed data from top to bottom. Notice the differences in respective box plots. Another interesting fact is there are nothing like Q0 and Q4. Why? Try finding 0th or 100th percentile of any uni-variate data. These are undefined! reason being percentiles are calculated from CDF by popular softwares while assuming normal distribution by default. We already know that CDF of normal distribution is unbounded. That's why for highly skewed distributions box plots are not good choice. Some softwares call Q0 and Q4 as minimum and maximum but then they no more remain fences of whiskers because there can be observations as outliers. Notably box plots are poor with representing "holes", below is a box plot with mixture of two normal distributions (mean of 50 and 90; same SD of 1). Although there are no observations between 53 and 88 yet the box misrepresents it (...or it increases chances on misinterpretation). Another variant of box plot is with notches which give 95% CI for median.