• 0

# Validation with Small Samples

Sample

Sample is a subset of observations/items that are drawn from a universal set (usually referred to as the 'population'). Use of samples is a common practice in data validation exercises, statistics and quantitative research. It is preferred to work with a sample rather than the population due to the ever existing constraints of time and money. Every validation test requires a certain number of minimum observations in the sample for the test to be effective.

Control Plan

Control Plan is a written document providing details of control methods for product and process characteristics. Its purpose is to minimize variation in both product and process characteristics. It is a key deliverable under PPAP requirements as well as in Control phase in DMAIC project. Control plan essentially covers the following elements
1. Characteristics and/or inputs (both product and process) to be controlled
2. Tolerances and trigger points for these inputs
3. Methods or actions required to keep the inputs within control
4. Escalation and response plan in case the inputs go out tolerance limits
A control plan is a live document i.e. it has to be revised following changes in either the product or process characteristics

Applause for all the respondents- Prashanth Datta.

Also review the answer provided by Mr Venugopal R, Benchmark Six Sigma's in-house expert.

## Question

Q﻿﻿. 138  In many industries, it is costly to do trials while establishing solution for a problem. Verifying improved process capability with very few samples is not easy. What are the approaches for decision making with a few samples?

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

## Recommended Posts

• 0

Six Sigma gains it's edge over other Quality Management System as it uses data driven approach for problem solving. Statistics forms an integral part of Six Sigma methodology as many of it's tools refers to statistics for logical conclusions.

We essentially have two branches in Statistics - Descriptive and Inferential.

• Descriptive Statistics helps work on collecting, analyzing and presenting information as mean, standard deviation, variation, percentage, proportion etc. While Descriptive Statistics helps with description of data, it will not manifest itself with any inferences.
• Inferences about data is very important for decision making and it is Inferential Statistics which helps us with the same.

To answer above question on the approach for decision making using few samples, it is Inferential Statistics that helps us analyze sample data and predict the behavior of population.

Further, Inferential statistics helps us establish the relationship between independent variables (X, Cause) and the outcome (Y, Effect) and also help identify the critical X which needs to be focused to improve the Y.

Inferential Statistics is strongly associated with Hypothesis testing. Hypothesis testing is performed on Sample and whenever we do a Hypothesis testing, we ask below questions on whatever we saw in the sample

1. Is It True?
2. Is it Common Cause?
3. Is it Pure Chance?

Let us see how to perform a Hypothesis testing which is key for Inferential Statistics.

Step 1. Define the Business Problem in a data driven format i.e. Y=f(X)
Step 2. Select and appropriate or apt Hypothesis Test that we need to perform on the problem. We will see this in detail in next section. What drives the selection of test is basis the type of data defining both X and Y i.e. if the data type is discrete or continuous.
Step 3. Make the Statistical Hypothesis Statement ; H0 = Null Hypothesis = No Change, No Impact or Difference; HA=Alternate Hypothesis = New argument holds good basis the business case.
Step 4. Run the test on Sample data using tools like Minitab
Step 5. Calculate the "P" value - which will be an output from the tool
Step 6. Compare "P" value with "alpha" [Alpha is called as Type I error and acceptable level is generally kept at 5% or 0.05]
Step 7. Do Statistical conclusion i.e. if P is greater than alpha, your Null Hypothesis holds good else your alternate hypothesis will hold good.
Step 8. Do a Business Inference i.e. if Null Hypothesis holds good than the input sample is treated as non-critical x. Alternatively, if your alternate hypothesis holds good, we should treat the input as critical x.

W.r.t Step 2, on selecting the apt test, below inputs should serve as guiding pointers

• Output Y is Discrete and Input X is Discrete in 2 categories, we need to use 2 proportion test
• Output Y is Discrete and Input X is Discrete in multiple categories, we need to use Chi-square test
• Output Y is Continuous and Input X is Discrete in 2 categories, we need to use 2-sample t-test
• Output Y is Continuous and Input X is Discrete in more than 2 categories, we need to use ANOVA
• Output Y is Continuous and Input X is Continuous we need to use Regression Analysis.

In summary, Inferential Statistics is used draw conclusions on the larger population by taking a sample from the same and also try to establish relationship between the input and output.

##### Share on other sites
• 1

Benchmark Six Sigma Expert View by Venugopal R

Apart from the cost, sometimes it is impractical to have high number of samples to take a decision. There have been many situations where dependency on few samples was the only choice to take a decision.

I would like to share one such case study, which happens to be one of my lingering experiences in solving a very serious field failure. This happened on an IT hardware product and the failure incidents became a threat for the product acceptance in the market. The severity of the effect of this failure could be classified 8 - 9. The problem occurred in around 2 to 3 percentage of the production volume and it could occur any time between the 1st day or 30th day of the product’s usage. This means that if I dispatched 30,000 units in a month, I could expect more than 750 failures within one month of usage of that batch, which was a disastrous situation considering the severity of the failure.

The mandate was to get this problem fully resolved in no time, and say, maximum a week!

This being a product reliability related failure with unknown cause, it was not easy to find any screening method to identify and contain the potential defectives in-house. Among the suspected causes were a few components that had undergone certain changes during the recent times. The changes included change of vendor and elimination of some components based on tests and validations. All changes had undergone necessary technical evaluation in house and by third party regulatory authorities before implementation.

So, the variables that impact the failure incidence rates were 1. Component (type & presence), 2. No. of hours of operation 3. Volume of production 4. Possible interaction effects (on component combinations)

Every component change had been individually evaluated and certified, and hence the technical team was not willing to accept a cause due to any component, from a design point of view.

Without knowing the cause, if I had to contain the failure, the only way was to subject all the 30,000 units to a functional test for 30 days and then dispatch only the products that did not exhibit the failure. This was practically impossible. I had to come up with something better that this.

This was a situation that demanded quick resolution of the problem that was plaguing the population but had to be resolved by decisions based on smaller samples.

After quick deliberations with my teams, we came up with the thought of creating a customized reliability evaluation plan using 100 samples. Why 100? That was the testing facility limitation!

The test was to subject the 100 units to an accelerated burn-in test for 24 hours under extreme conditions, that was approximated as equivalent to normal life period of 30 days. The combination of the components (type / presence) was applied using factorial principle. Considering 4 factors and 2 levels, we required 16 trial combinations with a limitation of doing only 6 samples for each combination at a time. To detect a failure occurrence rate of 2%, we had to repeat the entire cycle 8 to 10 times to witness simulation of the failure and to isolate the combinations that gave rise to the failure.

Thus, the whole exercise lasted at least 10 days using “small” sample to help us identify and quarantine the cause, convince the stakeholders and successfully resolve the issue.

This event was one such situation where there was no alternative than to depend on sample to unearth the cause and decide the appropriate action. It also proved that through thoughtful usage of samples, we can identify right actions successfully and quickly. A very careful and detailed planning, even during such a panicky situation was essential to get the best from 'small' samples.

Though those were challenging times, I am glad that I have a case study to share with others who could face similar situations.

##### Share on other sites
• 0

Every sample will give you some information. Wherever possible, change your measurement to variable (continuous) data rather than discrete (attribute). This will enable you to see the shape and form of data in a better manner and hence assess for capability improvement or the lack of it. When one uses continuous data, the risk associated with decision making comes down.

Approach based on the objective can be selected carefully - one such approach has been mentioned by our expert Venugopal.

Hypothesis testing with carefully selected inputs can also help with validation. For example, do we really need to know whether the breaking strength of a cable has gone up by 1g/sq cm or is it good enough to know that it has gone up by 1kg/sq cm - the latter will need fewer samples. Similarly, what are the values of alpha and beta that can be used to make a reasonable deduction - higher values of alpha and beta need lower samples.

##### Share on other sites
This topic is now closed to further replies.

• ### Forum Statistics

• Total Topics
2,756
• Total Posts
13,840
• ### Member Statistics

• Total Members
54,606
• Most Online
888