Jump to content

Recommended Posts

How do we compute the P Value for a F Test. I came across a situation where I computed the F Stat and F Critical Value (from the F Table). But was unable to calculate the P Value using the same method used in Normal Z Distribution.

If we take a log of both the F Values, will this make the F DIst to a Lognormal Distribution as the lower bound be capped at zero. If yes, can this dist give us the accurate P Value ?

Kindly let me know how to calculate the P Value for F Dist and Chi Squared Distributions.

Thanks & Regards,

Arvind N

 

Link to post
Share on other sites

Dear Arvind,

You can use the following functions in excel

  • fdist(x,numerator degrees of freedom,denominator degrees of freedom)
  • chidist(x,degress of freedom)

Note: 'x' represents the calculated statistic value, in case of Fdist, it is the F stat value and for Chidist, it is the Chi-stat value.

Regards,

shantanu kumar

Link to post
Share on other sites
Guest Pankaj K.

Dear Shantanu ,

Kindly look into the below situation :

We have a software developed by our internal software team. Users of my department have been complaining of slowness in downloading each file and uploading. ( weare medical transcription company and I hope you know it's business. listen to the audio and type. Out dept does the quality proofing so listen and read, no typing as such).Downloadinga file includes voice file and a word document, which take some where around 15-20 seconds, around 10-15 seconds to load a demographic pane which will have patient related infromation. And once the proofing is done, the uploading of file take around 10-15 seconds.This is observed based on manual observation. Users have also complained that at times it takes ages to download. so, appx 45 seconds are wasted for each file. Each QA does appx 50 files and we have 50 QAs. So, 45 seconds x 50 files x 50 QAs = Unproductive time per day.

Formally, we have notified to software team and they are working on it. WE have nt even given them the target of within how many seconds a file should download or upload. WE just said optimize it. They are working on that.

With my persoanl interestI have taken the logs ofusers from different shifts and extracted the file name, the time when user clicked to download, when the file got downloaded. Similarly I have extracted the file name, when the user clikced on upload and when exactly the file got uploaded. I have added the total time for upload from the time of click and total time of upload from the time of click and grand total of time spent on download and upload. I have taken first, general,

second and night shift logs. And collected for three days as sample for each QA.

Can any one please tell me what should I do next.

Link to post
Share on other sites

Hi Pankaj,

I believe I can offer my suggestion to your case. As you have already collected the user logs, you can do a Root Cause Analysis to identify whether the download time exhibit variation with respect to specific factors (Like Specific User A,B,etc or any specific time of the day). You can use Pareto Chart to identify the % wise breakdown which you can further use to drill down or forward it to your IT Team for detailed technical analysis. I would try mapping the data on a scatter plot to check for outliers and identify them as special causes which trigger greater download time. Further, you can check for normality by plotting the data as a histogram in an excel and fitting a gaussian curve over them. I believe that you would end up doing Hypothesis Testing using F Test ( For comparing variances of different data from different periods) as we need to estimate whether significant difference in download time for different periods exists. Please check whether you have to use ANOVA or CHI for this purpose. Even though you have more than 30 data sets, I think F Test or CHI Test would suit this analysis more than normal distribution. Once you have identified whether significant difference in download time exists for different periods for the data, you can arrive at a solution either by rescheduling your process flow or restructuring your handling time. I hope I am adding value to your idea.

I would like to know if Mr. Shantanu is in sync with mine.....

Thanks & Regards,

Arvind N

Link to post
Share on other sites

@ Pankaj; I am not able to see the base line target for the process. In my opinion. First of all we need to set a base line in conjunction with software development team for the time process should take to download and upload with respect to the size of the file. Afterwards, compare the current performance with the benchmark goals(SD,mean,sigma level) and find out the gaps. There can be a situation when you may need to set the benchmark on the basis of past best performance with respect to upload and download where you floor ground level executive will be the best source of information as they finish this task on daily basis.

 

Cheers!

Sujeet Singh

 

Link to post
Share on other sites

Dear Pankaj,

I had collated three days data for each shift. The data is 40 for each shift. I collated all in one column and did the following:

Display Descriptive Statistics and here are the results.

Variable  N      N*     Mean        SE Mean     StDev     Minimum    Q1           Median    Q3

C4          360    0      0.3983      0.0205       0.3889    0.1000       0.1525     0.2550    0.4100

Variable Maximum

C4 2.3800

=========================

Later I did the "Graphical Summary" and got the

P-Value as <0.005

Mean 0.39828

StDev 0.38891

Variance 0.15125

Link to post
Share on other sites
Guest Pankaj K.

Hi Yunus ,

There are 2 questions here :

1.Different shifts should have different mean and sd .Not sure why u merged all shifts to one column

2.If P value less than 0.05 , then it is non-normal data . As i am not very much aware of how to handle non-normal data , i wnt be able to commnet on this

Link to post
Share on other sites

Hi All,

Good questions & responses by Sujeet and Pankaj.

If you wish to evaluate whether download time is affected by shift, you may do the following:

1. CTQ-Download Time/Size of File---suggested by Sujeet---I also feel that it is a better CTQ

2. Compare average download time/size of file across shifts. Please treat data for different shifts as different samples (suggested by Pankaj and I completely agree). If there are 3 shifts, you should use ANOVA to compare average download time/size of file across shifts. Considering the data to be independent, steps for ANOVA are:

a. Normality test

b. Test for equal variances----Levene/Bartlett

c. ANOVA - One way

3. To determine the minimum sample size required for ANOVA, you may choose to use Minitab>STAT>Power and Sample Size>One-Way ANOVA

4. If data are not normally distributed, let me know and I will list down basic steps to deal with Non-Normal data.

Regards,

shantanu kumar

Link to post
Share on other sites
Guest Pankaj K.

Many Thanks Shantanu

WISHING EVERYONE IN THIS FORUM A VERY HAPPY NEW YEAR 2011 smiley-laughing.gif

Hi All,

The below is a new situation :

I am working a GB project and here is the situtation .The project is to reduce average closure time for Prio1 and Prio2 Tickets received from the customer end (Prio1 and Prio2 are the severity which customer assigned it to the tickets after the product is delived ).

Here is what i have followed and please request all to confirm that the below steps followed is correct or not :

1.Samples collected seperately for Prio1 and Prio2 .

2.For each ticket there will be a start date and end date for the ticket .That is when you got a ticket and when u closed the ticket

3.So for Prio1 and Prio2 , for each sample , calcluated the difference (end date -start date ).

4.Here by i have big list of prio1 with closure days(end date-start date) and similarly for prio2

Lets take only prio1 for time being .

5.I executed noramlity test for 60 samples under prio1 and found that the days of closure(start date-end date) is NOT normal .

6.Since this is PASS/FAIL opportunity kind of sample .Either you solved it in define period or you dont .So its a PASS or FAIL situation .

7.Used the Sigx Sigma Calculatior to calculate DPMO as below :

Total Size = 60

No of Defects =45 (for tickets which could not be resolved in prescribed time under prio1)

No of Opportunite =1 (Since either pass or fail )

I got DPMPO through this and Sigma Level respectively

8.Hence i got Sigma Level for both prio1 and prio2

PLease let me know if the above steps follwed by me are OK or not ,Are there any deviations.

Now , i am planning to draw the existing process in the form of flowchart to understand the process flaws in it .

Let me know if my direction is right or not .Your response will be highly appreciated .

Link to post
Share on other sites

Dear Shantanu and All,

First, I cannot have file size as part of CTQ as it is not possible to collect data as this information is no where stored. I can, however, get file length which is possible though very difficult to extract that now.

As advised by you, I have collated three days' data for each shift and put in three different columns. Tried for Test for Equal Variance (Levene/Bartlett). In the dialogue box, for "Factors" box I took first shift data and for "Response" column I took both second and third shift data. The Levenes P-Value is 0.785 and Barlett's p-Value is 0.152. Please let me know what was followed is correct.

Normality Test (STAT-> Basic Stats -> Normality Test) resulted in <0.005 for all three shifts.

I could not help my self determine the minimum sample size required for ANOVA(STAT>Power and Sample Size>One-Way ANOVA). Not sure what data should go under which field.

Link to post
Share on other sites

Dear Yunus,

Two important points.

  • Before you use the "Test for Equal Variance", you shall need to Stack Data. This can be done by using Data>>Stack>>Columns. Use subscripts also so that you can get shift name in second column. After stacking your data shall be in two columns, one carrying shift data and the other carrying shift names. Use them and you shall get right results

  • You should have done Normality test first. If Normality test shows p value less than 0.05, data is not normally distributed and you cannot use ANOVA (no need to test equality of variances). You need to use test for Non Normal data (comparing medians)

Regards,

VK

Link to post
Share on other sites
Guest Pankaj K.

HI Vishwadeep ,

Based on your last response to use YIELD method , i have different view point can u please check and reply.

Here is the link :

http://forum.benchma...closure+%2Bdays

have a reason to debate on your answer :) and the thing is the approach i have used i.e No of Units,No of Defect,No of Opportunities gives me the same sigma level as

when used yield method to calculate sigma level .

1. My approach: Here are the data :

  • No of Units :82
  • No of Defcts :45
  • No of Opp :1
  • gives me "Sigma Level = 1.62"

2. YIELD approach:

I plugged in the yield percenatge (i.e 45 out of 82 i.e 54.87%) in the YEILD section of Sigma Level Calculator gives me too 1.62.

Hence, the question is are they really different .If yes , are there any concrete examples ?

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...