Answers from Kaviraj - Benchmark Six Sigma Forum

Kaviraj's post in Rare Event Control Charts was marked as the answer August 19, 20223 yr

The control chart, which is used to study the data of rarely occurring incidents/events is known as the “RARE EVENT CONTROL CHART”. Rare event charts provide insight into the processes that occur infrequently enough to track them using traditional control charts. Rare event charts offer two types they are, G Charts and T Charts. They differ from each other in the way it measures rare events, the G Chart measures the count of events between incidents and the T Chart measures the time intervals between incidents.

G Charts,
It measures the number of events between errors or nonconformities which occurs rarely, each point on the chart represents the number of units between relative occurrences, E.g. In a production line materials are produced daily, and an unexpected line shutdown may happen we can use a G Chart to track the number of units produced between line shutdowns.

T Charts,
It measures the time elapsed (Interval) since the last event, each point on the chart represents several time intervals that have passed since a prior occurrence, E.g. In a production line materials are produced daily, and an unexpected line shutdown may happen we can use a T Chart to track the number days between line shutdowns. A “T chart” can be used for numeric, nonnegative data, date/time data, and time-between data.

Understanding the rare control chart, the points that appear above the UCL indicates that the number of events between errors has increased. Which is a positive event. Hence, a point flagged as out of control above the limits is usually considered as the desired effect when we read G & T charts.

ADVANTAGES OF THE G CHART
Advantages of the Rare control chart, in addition to its easiness, this chart offers better statistical sensitivity for monitoring rare events than its traditional charts (P or U charts). Since rare events occur at very low frequencies, traditional control charts are naturally not effective in detecting the changes immediately. In addition to the difficult task of collecting more data, this creates the circumstance of having to wait longer to detect a shift in the process.

On the other hand, G / T charts do not require large quantities of data to effectively detect a shift in a rare events process. Another advantage of using the G / T chart to monitor rare events is that it does not require the collection and recording of data on the total number of opportunities.

Therefore, G or T Charts are more effective and quick in detecting the shift in rare events monitoring than the traditional P or U charts.

Kaviraj's post in Dimension vs Measure was marked as the answer July 22, 20224 yr

Dimensions: Which answers the who, what, where, and when of our data
The data that contains qualitative information are categorized as dimensions. These are expressive attributes, like a category of product, address of the customer, or country of origin. We can say, Dimensions can contain numeric characters (like an alphanumeric customer ID) but are not numeric values (It wouldn’t make sense to add up all the ID numbers in a column, for example).
Let us think in this way: if we can’t (or wouldn’t) compute a field, it’s a dimension.
Eg. Title of the product, category of products, vendor list, etc.

Measures: Which are the numerical fields that we can compute
The data that can be quantified are categorized as measures. Fields like subtotal of the order, the number of items purchased, or duration spent on a specific page. “Hence measures are computable”. Say we have a measure, quantity of items purchased: we can do things like calculating the average quantity ordered, sorting by descending quantities, sum all quantities, and so on.
Eg. Price of a product, Customer rating for the product, etc.,

Note: Date fields are dimensions too. Eg, The Year of production will be a dimension because calculating min/max/sum here will not help. Instead, we may group this date according to the year of manufacturing.

Dimensions
Measures
It is an independent variable.
It is a dependent variable.
It is not dependent on the measure.
It is dependent on the dimension.
Adding to the filter will give us insight into the data, it is beneficial to add this in the filters.
Adding to the filter will not give us many insights of the data.
We can’t aggregate it.
We can aggregate it.
Min, max, and sum won’t work.
Min, max, and the sum will work.
It is used to compare the data.
It is a metric that we use to compare the dimension.
It may contain duplication of data.
It does not contain duplication of data.
Headers are generated when added to the rows or columns.
Axes are generated when added to the rows and columns.
It contains qualitative and categorical information.
It contains quantitative data.
It describes data records.
It cannot describe data records.
It cannot be continuous and discrete.
It can be continuous and discrete.
It is not possible to get several records because aggregation does not apply to it.
Due to the aggregation feature, we can get the number of records present in the database no matter how huge the dataset is

Kaviraj's post in KT Analysis was marked as the answer July 5, 20224 yr

K-T method or the Kepner – Tregoe method is a systematic method to analyse the problems and understand their root causes, without making any assumptions or jumping to the conclusions. Usually, 5 steps are there in Kepner-Tregoe Problem Analysis, (1) Define the Problem, (2) Describe the Problem, (3) Establish possible causes, (4) Test the most probable cause, (5) Verify the true root cause
1.     Define the Problem,
This is the first step in this process and the most crucial one too, many people tend to skip this step thinking they know the problem already, such conclusions may lead to a wrong diagnosis and waste of time.

For example, if an employee reports “My computer screen is blank” to an IT person. K-T method guides us to ask a few basic questions that can expose far more information about the nature of the problem and helps us to define the possible causes effectively. Let’s expand this further with a few basic questions,

Let’s assume these answers to expand the problem definition from “My computer screen is blank”
1.      Who - Mr. Moorthy
2.      Why - Needs to see his screen so he can perform his duties,
3.      What – The computer screen went blank when booting up and nothing appears on the screen and the start-up sound was heard.
4.      When - Morning when he came for work, it was working fine till the previous evening.
5.      Where - Mr. Moorthy’s computer.
Let’s restate the problem definition. “Mr. Moorthy is unable to perform his duties because his screen goes blank during boot it worked well till he shut it down yesterday”
The above statement is a much better problem description, which allows us to understand exactly what the problem is and allows us to narrow down the questions that will help us to understand the impact as well. A brief simulation before the problem statement will also help, Example. why would the screen be blank? The reasons could be,
a)      The graphics card could have failed?
·        Ans, no the computer wouldn’t boot, it would give 3 loud beeps then stops.
b)      The screen could be faulty?
·        Ans, no the manufacturer’s logo appears at first then the screen goes blank.
c)      Hard Drive could be faulty and the computer doesn’t boot?
·        Ans, no we hear the windows start-up music.
d)      The backlight might have failed and the screen is dark?
·        Ans, no the logo appears perfectly visible.
e)      The display could be switching over to an external screen?
·        Ans, no we can see everything on external display but when we switch to internal it’s still blank.

Let’s restate the problem definition now, “Mr. Moorthy is unable to perform his duties because his internal screen goes blank when Windows is starting since he shut it down yesterday”

2.      Describe the Problem,
Let’s describe 4 aspects of any problem,
1.      What the problem is,
2.      Where the problem occurs,
3.      When it occurred, and
4.      The extent to which it occurred.
We, already have the answers for these questions when we fine-tuning our Problem Definition, but the IS and IS NOT method is allowing us to explore these even further.
For the above-mentioned aspects, let’s describe what the problem IS, and also what the problem COULD BE but IS NOT.

Let’s fill out the table for the problem we have taken,

1.     Identify possible causes
The arrived comparison of “what the problem IS and IS NOT” will help us to sensibly inspect what changes could have affected the items in the 1st column but not the items in the 2nd. Our own experience will tell us the majority of the problems are because of the recent change, let’s add 2 more columns to the worksheet, the ‘Differences’ column will list the differences between the IS and IS NOT, and the ‘Changes’ column will list the changes to where the problem IS that could account for the differences.

Another important aspect is that the effects don’t always follow the action immediately, most recent changes could have uncovered the fundamental problems that were always there, so when considering the list of changes we should not limit ourselves only to recent ones.
1.     Test most probable causes
The list of changes identified in the previous step will become a list of possible causes and a Subject Matter Expert ranks the possible causes by asking “If THIS is the root cause of this problem, does it explain everything the problem IS and what the problem COULD BE but IS NOT?”
In our example,
1.     Verify true cause
In this step we need to compare the probable causes against the Problem Description and check does it satisfy all the conditions of the problem or not? When we find a cause that explains all these conditions, we must test it to confirm whether it is correct with the procedure in the ‘True if’ column starting with the most probable cause. When we are confident that all the identified root causes of the problem, then we need to develop a solution and check if we are satisfied this would prevent any reoccurrence of the problem. If we agree that we are satisfied then implement the solution, and test the problem again under the same conditions, does the issue still occur?
In our example, we have determined the problem with the display is due to a recent driver update which was installed but did not take effect until Mr. Moorthy had restarted his computer. As a corrective action we may attach an external screen and uninstall the driver update and restart the computer, issue is resolved, but has the root cause been addressed? It is not appropriate to ask all the employees not to update the drivers, hence as an immediate action we can ensure Mr. Moorthy does not install this driver again. and as a preventive step we may notify all the employees who has similar type of computer that they should not install this driver until further notice.
Conclusion,
The purpose of this method is to bring situational awareness to the solution/opportunity. This method advocates a balanced and systematic approach to analyse the problem without jumping to conclusions or making assumptions based on experience. Compared to other methods one of the biggest advantages is the IS and COULD BE but IS NOT procedure which provides an intuitive approach to identify the possible causes of a problem. This process may be faster than the other methods on the other hand it may be harder if not impossible to detect delicate variations in a process, hence this method needs to be implied sensibly.

Kaviraj's post in Grubbs Test vs Box Plot was marked as the answer June 21, 20224 yr

Grubbs test is a statistical method used to find the outlier in the data range. Also, this test is used to find a single outlier in a normally distributed data set. This test is used to find if the maximum or the minimum value is an outlier in the given data range.
Definition - Hypothesis of Grubbs test:
Ho - There are no outliers in the given data set
Ha - There is only one outlier in the given data set

Test Statistic for the Grubbs' test -

Y¯ represents sample mean and s represents standard deviation, the Grubbs test statistic is the largest absolute deviation from the sample mean in units of the given sample’s standard deviation. This is a 2-sided version of the test, the Grubbs test can also be defined as one of the following one-sided tests,
1.      Test whether the minimum value is an outlier,

2. Test whether the maximum value is an outlier,

Grubbs Test Example:
Range given - 199.31, 199.53, 200.19, 200.82, 201.92, 201.95, 202.18, 245.57
Firstly a normal probability plot was generated,

This plot indicates that the normality assumption is reasonable except for the maximum value. We, therefore, compute the Grubbs test for the given case to find whether the maximum value of 245.57, is an outlier or not.

Test Results,
     H0: there are no outliers in the data
     Ha: the maximum value is an outlier
     Test statistic: G = 2.4687
     Significance level: α = 0.05
     Critical value for an upper one-tailed test: 2.032
     Critical region: Reject H0 if G > 2.032

Hence we conclude that the maximum value is in fact an outlier at 0.05 significance level.

Boxplots are used to graphically display different parameters briefly. Among other things, the median, the interquartile range, and the outliers can be read in a boxplot. The data used must have a metric scale level. Such as a person's age, electricity consumption, or temperature.

How to interpret the boxplot?
The box indicates the range in which the middle 50% of all values lie. Therefore, the lower end of the box is the 1st quartile, and the upper end is considered the 3rd quartile. Below q1 lies 25% of the data, and above q3 lie 25% of the data.
In the boxplot, the solid line represents the median whereas the dashed line represents the mean.
The T-shaped whiskers in the boxplot are the last part, which is within 1.5 times the interquartile range. This means, that the T-shaped whisker is the maximum value of your data but at most 1.5 times the interquartile range. Therefore, if there is an outlier, then the whisker goes up to 1.5 times the interquartile range. If there is no outlier present in the data, then the whisker is the maximum value. Hence, the upper whisker is either the maximum value or 1.5 times the interquartile range. Depending on which value is smaller. The same applies to the lower whisker as well, which is either the minimum or 1.5 times the interquartile range. Points that are further away are considered outliers. If no point is further away than 1.5 times the interquartile range, the T-shaped whisker thus gives the maximum or minimum value.

Box Plot Example: Range - 199.31, 199.53, 200.19, 200.82, 201.92, 201.95, 202.18, 245.57

From the above example it’s graphically visible that the data value of 245.57 is not falling within 1.5 times the interquartile, hence it’s an outlier.

Conclusion – I would prefer a box plot to find the outliers in normally distributed data range, since its less complex and easy to easy to understand because of its graphical representation. Thanks.

Kaviraj

Joined

Last visited

Solutions

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)