-
What if Paired t Test returns different results from Two-sample t Test?
The paired t-test uses the statistical concept of "blocking random contributions". Its aim is to decrease the variability of the dataset, but this decrease comes with the cost of loosing degrees of freedoms. Here the example which I commonly use: Suppose we measure the height of people a) without their shoes, and b) with their shoes. Our hypothesis would surely be that the height of people is larger if they wear their shoes, but because the variation between the peoples height is much larger than the effect size we try to measure, we will have difficulties to obtain a clear result. (I tried to include a graph, but I'm not sure if it worked) The solution is to block the variation between the subjects, and only to consider the variation within the subjects. This is what the paired t-test does. However, the blocking costs half the degrees of freedoms. Thus, the blocking is a trad-off and only works in our favour, if we the blocked variation is large compared to the decrease of resolution. There are formulas describing this ration.
-
Process Capability Cpk and Ppk
I reckon you know about the short/long term difference between Cpk and Ppk. Thus, from a customer perspective, the Ppk is much more important, because it provides the scrap rate in the long run. It describes the actual scrap rate in production, when all variance components (raw material, operators, different machines etc.) are included.
-
How To Determine Sample Size?
You need to provide more information to determine the sample size. E.g. you could define a confidence level and a max. width of the associate confidence interval of the average value. Or you define a confidence level and a max. allowed relative deviation of the standard deviation. There are many options, but you have to decide what suites your needs. A good reference is Hahn & Meeker: "Statistical intervals".
-
How to calculate Sigma level in case of rework
The reference you wish to consult is ISO 13053-1.
-
What is the way out
If you have two (or more) colinear input variables, this situation might happen: Either one of the input variable does explain the output rather well, but since both are correlated the linear regression algorithm is unable to decide which one is stat. significant.