Jump to content

Prasoon Bhargav

Fraternity Members
  • Content Count

  • Joined

  • Last visited

  • Days Won


Prasoon Bhargav last won the day on May 8 2013

Prasoon Bhargav had the most liked content!

Community Reputation

10 Good

About Prasoon Bhargav

  • Rank
    Advanced Member

Recent Profile Visitors

3,988 profile views
  1. Sir, Thanks giving us a excellent training

  2. Dear All, We have extensively discussed applicability of Six Sigma tools and techniques in our workshops, however, when I discuss Six Sigma application in sports, a significant percentage of participants find it unusual. Here is an example: Author Steven Falk wrote "Six Sigma Tennis" upon returning to Stanford Tennis after recovering from a Traumatic Brain Injury (TBI) to share his successful methodology with tennis enthusiasts, students, and instructors. Please refer to links below to learn more http://www.prnewswir...s-90204382.html http://www.stevenfal...igmaTennis.html http://www.ereleases...ma-tennis-39832 Regards, Shantanu Kumar
  3. Dear All, Please see below a query posted by a past participant. Please feel free to respond to this query. Regards, shantanu kumar --------------x--------------------------x-----------------------x--------------------------x--------------------------x---- Hi Shantanu, Hope you are doing well. I was trying to understand p-charts. Consider a case where in the output is varying and defectives are identified. Data for the same is as follows. When I tried to plot a p-chart for the same in Minitab, the UCL line varies as shown because of varying output, but a value of UCL (0.05439) is highlighted; Similarly for LCL (0.00675). How is this figure arrived at? Defective Volume 15 500 20 528 13 465 12 470 Warm Regards, Krishnan S
  4. I am posting a question asked by Shijoy Varghese Shijoy Varghese asks: 1) When I create the current state value stream how do I eliminate the decision boxes as there should be no decision boxes in the current sate VSM 2) When I calculate the wait time in the current state VSM, lets say I have 100 invoices pending in the first step to process, it takes 1 minute to process one invoice, which means rest 99 are waiting, how to calculate that wait time as all of them wait differently( wait time for 50th and 100th invocie would be different) 3) Do I have to put all the detailed steps in current state VSM ?
  5. Hi All, Good questions & responses by Sujeet and Pankaj. If you wish to evaluate whether download time is affected by shift, you may do the following: 1. CTQ-Download Time/Size of File---suggested by Sujeet---I also feel that it is a better CTQ 2. Compare average download time/size of file across shifts. Please treat data for different shifts as different samples (suggested by Pankaj and I completely agree). If there are 3 shifts, you should use ANOVA to compare average download time/size of file across shifts. Considering the data to be independent, steps for ANOVA are: a. Normality test b. Test for equal variances----Levene/Bartlett c. ANOVA - One way 3. To determine the minimum sample size required for ANOVA, you may choose to use Minitab>STAT>Power and Sample Size>One-Way ANOVA 4. If data are not normally distributed, let me know and I will list down basic steps to deal with Non-Normal data. Regards, shantanu kumar
  6. Dear Arvind, You can use the following functions in excel fdist(x,numerator degrees of freedom,denominator degrees of freedom) chidist(x,degress of freedom) Note: 'x' represents the calculated statistic value, in case of Fdist, it is the F stat value and for Chidist, it is the Chi-stat value. Regards, shantanu kumar
  7. Letting people put easier assignments in DMAIC format so that they atleast practice the tools soon enough.---- I agree. We can ask them to follow the roadmap however for advanced tool practice, they can attend FBB. Not insisting on validation from seniors within company and going by whatever they say.-------This may result in plagiarism and loss of reputation. This may become an easy way out. We may also like to think about Practice Project completion certificate based on a case study examination----6-8 hour examination.
  8. Hi Ravi, You can read up on Non-Parametric tests and Data Transformation. We will discuss Non-Parametric tests and Data Transformation in detail during the BB workshop. To prepare for BB, I would suggest that you revisit the GB material and Basic Statistics. I will suggest that you list down your Six Sigma queries and e-mail me. I will pick up the queries during BB workshop. Regards, shantanu kumar
  9. Dear RaviShankar, We will discuss analysis of non-normal data during the BB program. We can of course pick up your queries during the workshop. Regards, shantanu kumar
  10. Hi All, Sorry. Forgot to mention the references. Here are the references Data Analysis Using Regression and Multilevel/Hierarchical Models by Jennifer Hill, Andrew Gelman Linear least squares computations By R. W. Farebrother Applied regression analysis, linear models, and related methods By John Fox Numerical methods for least squares problems By Åke Björck Weisberg, S. 1985. Applied Linear Regression, 2nd ed. New York: John Wiley and Sons. Regards, shantanu kumar
  11. Dear All, Apologies for the late response. Shalini, Sanjay and Srinivas have provided answers to most questions. It is slightly tough to explain these concepts in a post however I am still trying. To know more about these concepts, you can attend the MBB workshop. Regards, shantanu kumar Q1: What are the assumptions in Ordinary Least Square Regression? Answer: The assumptions in Ordinary Least Square Regression are: Model is linear in parameters The data are a random sample of the population The errors are statistically independent from one another The expected value of the errors is always zero The independent variables are not strongly col linear The residuals have constant variance The errors are normally distributed And of course the independent variables are measured precisely Q2: What does VIF signify? Are high values of VIF desirable? Answer: The easiest way to understand what VIF signifies is to understand how it is calculated. Let's say that we have four independent variables X1, X2, X3, X4 and one dependent variable Y. To calculate VIF of X1, we will first regress independent X1 on the remaining independent variables. We will get a Rsquare value and the use the following formula VIF (X1) = 1/(1- Rsquare of X1) VIF =1 would mean 0% variation explained by remaining independent variables in X1.There is no Multicollinearity - IT IS GOOD VIF=5 or more would mean 80% or more of the variation explained by remaining independent variables in X1. - IT IS NOT GOOD- High Multicollinearity Therefore VIF of X1 will be high when Rsquare of X1 when regressed on X2, X3, and X4 is high. Therefore if remaining variables are strongly correlated with X1, VIF will be high. Multi-Collinearity refers to independent variable (predictor) that is correlated with other Independent variables (predictors). We will follow the same procedure for the remaining independent variables (predictors). High values of VIF are undesirable as it indicates multicollinearity. If independent variables are strongly correlated to each other, the regression coefficients will be imprecise. Q3. What does Durbin Watson statistic help us infer? Answer: As mentioned earlier, in OLS Regression, one of the assumptions is Errors(residuals) are independent of each other. If Errors(Residuals) are correlated, in OLS regression, predictors may appear to be significant when they aren't. Durbin-Watson test for the presence of autocorrelation in Errors (residuals) by determining whether or not the correlation between two adjacent error terms is zero. We compare Durbin-Watson statistic to lower and upper bound in Durbin Watson Table. If Durbin Watson Statistic < lower bound, we infer positive auto-correlation in residuals If Durbin Watson Statistic > upper bound, we infer no positive auto-correlation in residuals If lower bound<Durbin Watson Statistic<upper bound, we consider the test to be inconclusive If 4-Durbin Watson Statistic<lower bound, we infer negative auto-correlation in residuals If 4-Durbin Watson Statistic>upper bound, we infer no negative auto-correlation in residuals If lower bound<(4-Durbin Watson Statistic)<upper bound, we consider the test to be inconclusive Note: Durbin Watson statistic should be used when data is in meaningful order Q4. What is the difference between R-sq, Adjusted R-Sq, and Predicted R-Sq? Answer: R square is the coefficient of determination. It indicates how much variation in the response is explained by the model. Rsquare= 1-(SSError/SSTotal) R-Square Adjusted: In our effort to increase Rsquare value, we add unnecessary predictors in our model. Rsquare adjusted accounts for the number of predictors in the model. We should use it for comparing models with different number of predictors. R square Adjusted= 1-(MS Error/MS Total) Predicted R-Square: It is calculated from PRESS (Predicted Residual Sum of Squares). Larger Values of Predicted R square suggests models of greater predictive ability. Predicted Rsquare = 1- (PRESS/ (SS Total) Predicted R-squared can prevent over-fitting the model and can be more useful than adjusted R-squared for comparing models because it is calculated using observations not included in model estimation. Over-fitting refers to models that appear to explain the relationship between the predictor and response variables for the data set used for model calculation but fail to provide valid predictions for new observations. Q5. What does PRESS assess? Answer: Predicted-Residual-Sum-of squares procedure, or PRESS procedure as described by Weisberg (1985). Models are repeatedly estimated using data sets of n -1 observations, each time omitting a different observation from calibration and using the estimated model to generate a predicted value of the predictand for the deleted observation. Lower values of PRESS are desired. Lower values of PRESS would mean High Predicted Rsquare. Q6. What do we mean by unusual observations? Answer: Unusual Observation is a single observation that can have a large effect on results. Data can be unusual in the following ways: Outlier: observation with a large residual (i.e. far from regression line). Leverage - explained below Q7. What do Leverage values, Cook's Distance and Mahalanobis distance help us identify? Answer: Leverage values are one or more cases whose X values give them more potential to influence the results than the others. Observation with large leverage values may exert considerable influence on the fitted value, and thus the regression model. Leverage values fall between 0 and 1. As a rule of thumb, it is advised that we investigate observations with leverage values greater than 3c/n, where c is the number of model (including the constant) and n is the number of observations. Cook's Distance: A measure of how much the residual of all the cases would change if a particular case were excluded from the calculation of the regression co-efficients. A large Cook's Distance indicates that excluding a case from computation of the regression statistics changes the coefficients substantially. measures the influence of a particular observation on the entire model lowest possible value is 0 higher value means the observation is more influential Mahanlanobis Distance: A measure of how much a case's values on the independent variables differ from the average of all cases. A large Mahalanobis distance identifies a case as having extreme values on one or more of the independent variables Key SSTotal -Sum of Squares Total SSE - Sum of Squares Error MSTotal - Mean Square Total MSE - Mean Square Error PRESS - Predicted Residual Sum of Squares OLS - Ordinary Least Square Regression
  12. Hi Shalini, Good response on Adjusted R-Sq and Predicted R-Sq. I will post answers to all questions next week. Regards, shantanu kumar
  13. Dear All, These are few frequently asked Master Black Belt interview questions on Ordinary Least Square Regression Basics. Let's get answers to these questions What does VIF signify? Are high values of VIF desirable? What does Durbin Watson statistic help us infer? What is the difference between R-sq, Adjusted R-Sq, and Predicted R-Sq? What does PRESS assess? What do we mean by unusual observations? What do Leverage values, Cook's Distance and Mahalanobis distance help us identify? What are the assumptions in Ordinary Least Square Regression? Everybody is invited to post answers to these questions. Regards, shantanu kumar
  14. Why is ANOVA called Analysis of Variances when it compares means of more than two samples? Answer: To determine whether means of several groups are equal or not or whether the samples belong to the same population or not, ANOVA compares the variance of the group means(between groups), with the variance of values within the groups (within group) Mathematically in a balanced ANOVA design, a. Calculate the variance of each group and then calculate the mean of the variances - this will give you the ‘error variance' ---------(MSW) b. Calculate the arithmetic mean of each group and then calculate the variance of the arithmetic means of the groups - this will give you ‘variance of sample means' c. Multiply the ‘variance of sample mean' by sample size of one group-----(MSB) d. F= (sample size*variance of sample mean)/ error variance OR MSB/MSW ANOVA Table What does the Sum of Squares (SS) represent? The sum of squares between groups (BSS) measures the variation between the group means The sum of squares within groups (WSS) measures the variation of values within each group What is MS and what does it represent? Mean Squares Between = Sum of Squares between / (no of groups-1), Mean Squares Within= Sum of Squares within / (total observations - no of groups). How do we get F and what does it mean? It is simply the ratio of two variance estimates. F = (sample size*variance of sample mean)/ error variance OR Mean Squares between groups/ Mean Squares with group. How do we get the p-value? p value in ANOVA is F distribution's probability density function of x= F statistic with numerator df between group and denominator df within group. To find the p-value, you can use MS-Excel by inserting the function ‘fdist (F, df between groups, df within group)'. In Minitab, follow this path - Calc>>Probability distribution>>F distribution. Comparisons or Post Hocs Why do we use multiple comparisons? ANOVA only indicates whether there are differences between one or more pair of treatment means. It doesn't indicate which pairs are different. Multiple Comparisons are used to identify significant differences between specific factor levels What is the difference between Tukey and Hsu's MCB? Tukey's method or the Tukey-Kramer method is used to compare all possible pairs of treatments while controlling the family error rate. HSU's method compares the best treatment (the one with the lowest or highest mean) to all other treatments while controlling the family error rate. What is the difference Tukey and Fisher comparisons? Fisher's method is used to compare all possible pairs of treatments using a specified error rate for individual tests. It doesn't control the family error rate. Therefore it is advised that we use Fisher's method with appropriate Bonferroni corrected alpha if only a subset of all possible comparisons is of interest.
  • Create New...