Jump to content

Question

Q 36 - Measures of central tendency and spread have their own relevance in most situations. Please explain situations where variation (spread) is of highest importance and central tendency is very low in importance (or is irrelevant.) 

 

This question is a part of the November Episode and can be answered by approved Excellence Ambassadors till 10 PM on November 2, 2017. There are many rewards. Being regular earns you a reward. Even a streak of 3 good answers can get you a reward. Rewards are mentioned here - https://www.benchmarksixsigma.com/forum/excellence-ambassador-rewards/

 

All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/ 

Share this post


Link to post
Share on other sites

22 answers to this question

Recommended Posts

  • 2

By and large, we come across situations where we favor the mean value of the outcome of a process (central tendency) to be focused around a specified targeted value with as less variation as possible (dispersion). There are situations where the variation assumes relatively higher importance than the central tendency; mostly because higher variations are more intolerable than some shifts in central tendency. Interestingly, there may be certain situations where variation or controlled variation is advantageous as well.

 

Study of Process Potential:

The process potential index Cp is used to study the variation, or spread of a process with respect to specified limits. While we study process potential, we are interested in the variation and not in the central tendency. The underlying idea is that if the process is able to maintain the variation within specified limits, it is considered to possess the required potential. The centering of mean can always be achieved by setting adjustments. Or in other words, if Cp is not satisfactory, Cpk (process capability) can never be achieved, since Cpk can never exceed Cp; it can at best equal Cp.

 

Many examples where the variation is generally considered unfavorable to the outcome:

1. Analysis of Variance

While evaluating whether there is a significant difference between means (central tendency) for multiple sets of trials as in ANOVA, the variation between sets and within sets are compared using F tests. Thus in such situations, the variation comparison assumes high importance.

2. Relative grading systems

For many competitive examinations, the concept of ‘percentile’ is used, which is actually a relative grading system. Here, more than the absolute mark by a student, the relative variation from the highest mark is more important, thus the relative variability becomes key decisive factor.

3. Control chart analysis

While studying a process variation using a control chart, first the instability and variation are given the importance. Only if we have control on these parameters we will be able to meaningfully study the ‘Off-target’ i.e. the central tendency.

4. Temperature variation in a mold

While performing certain compression molding process, temperature variation across different points on the surface of the mold does more harm than the mean temperature. Here the mean temperature is permitted to have a wider tolerance, but the variation across mold does more warping of the product.

5. Voltage fluctuations

Many electrical appliances get damaged due to high variation (fluctuation) in the voltage, although the mean voltage (central tendency) is maintained.

 

Controlled variation is favorable:

1. Load distribution in a ship

While loading a ship the mean value of the load can vary, but the distribution of the load is more important to maintain the balance of the ship on water.

2. Science of music

Those who understand the science of music would agree that more than the base note, the appropriate variation of the other notes with respect to the base note is extremely important to produce good music.

 

Some examples where variation is favorable:

  • Systematic Investment plans (SIPs) take advantage of the variation in the NAVs to accumulate wealth. Here even an adverse shift of the central tendency is compensated by the variation!
  • Law of physics states that Force = Mass x Acceleration (F = ma). Thus, if we consider speed as the variable, it is the variation of speed that decides the force and the mean speed (central tendency) appears to have little relevance.

Share this post


Link to post
Share on other sites
  • 2

Those who have already heard this very old joke must be old themselves. A tourist passing by a picturesque lake feels like having a refreshing bath inspite of not knowing swimming. To check, he asks a local relaxing on the bank, “Hi, What is the depth of this lake”, to which the local casually drawls, “Around three feet”. Reassured, the six footer tourist happily descends into the lake and is shocked to feel himself sinking slowly in an apparently bottom-less lake. With an effort, he screams, “You said three feet, but I am sinking”. The guy on the bank stops chewing the grass in his hand and says, “The average depth is three feet. At the point you are in, it is 20 feet”.

 

Apart from the above apocryphal situation, any one of the following situations or a combination of two or more of the following situations could also see variation having more relevance than central tendency.

 

1.     Relatively easy target

Sometimes, perhaps due to the technology used, the target centre in a process would be relatively easy to achieve. There would not be any effort required to meet the target. Further improvement in performance is possible only in reducing variation. Therefore, the focus now shifts to minimising the variation.

 

2.     Narrow specification range

For some processes, the target specification range could be very narrow. With such a low tolerance, variation needs to be very low. So the focus changes to minimize variation.

 

3.     Many processes upstream based on the output

If based on the output of a process, many upstream processes are to be run, the focus would be to restrict the variation of the process output as planning and running the subsequent processes would be easier and less expensive if the inputs are within control. If the output of the process under question were to vary beyond control, then there would need to be some rework and / or scrap, both of which are wastes, before starting the next process. This would repeat for every subsequent process. To reduce these wastes, the only option available would be to ensure that the process spread is minimal.

 

4.     Use of less robust machines for further processing

If the machines further processing this output are not very robust and require their inputs to be within a small tolerance, then the focus would be on reducing variation rather than on central tendency. If control of variation is ignored, the machines in the next process will not be able to handle the input and either breakdown or produce sub-optimal output resulting in waste.

 

5.     Batch Processing

When the next process is a batch process and the settings are for the entire batch which requires inputs to the batch to be varying within certain specifications, the focus will be to contain variation because as long as the input variation is within control, the average would not matter as the settings can be accordingly changed. If variation is uncontrolled, then the batch may have to be split into two or more batches and processed under different settings which would involve additional cost and delays.

Share this post


Link to post
Share on other sites
  • 1

When the process is too much unpredictable, then variation would be a key measure than central tendency. In case of special causes having major influence on the system variation then in that case as well variation would be more relevant compared to central tendency. More skewed the data is then more relevant would variation become compared to central tendency.

Share this post


Link to post
Share on other sites
  • 0

This in Particular is example of Calculation is in Gauges (mechanical relevant )related to costing and selection of method of manufacturing 

Practicale example :

1) Shaft/Rod of 50mm Dia (50mm Dia is central tendency)

 tolerance ranges 

a)+/- 0.5mm (range become 49.5-50.5 is spread)

b)+/-0.2mm(range becomes 49.8-50.2 is spread)

c)+/-0.05 mm(range becomes 49.95-50.05 is spread)

d)+/-0.02mm(range becomes 49.98-50.02 is spread)

when we look at the vales  mean /dimension 50 which central tendency is irrelevant but the tolerance plays a very vital role in selection of Manufacturing Methods i.e turning, grinding, honing and lapping etc.... at the same time methods of measuring the values from vernier caliper , micrometer , dial gauges etc.. for Inspection Methods and Instruments 

in fact it also defines the process and costing ie for a) only turning is ok / extruded rod itself in some case (bright bars) for d ) turning grinding and honing may be required which in turn defines no of process to be done and cost exponential varies as you go to more precision manufacturing method .

 

 

Share this post


Link to post
Share on other sites
  • 0

Data will always show variation. One of the key questions is whether the variation is normal for the process or is unexpected, indicating that something is common or out of the ordinary is happening. Thus variation is far more important to understand the process behavior. Variation will gives us meaningful insights about the process over a period.

 

If common causes are producing too much variation in the system, then improvement is required. Since common causes create the normal everyday variation in the system, then improving them will involve systemic changes. However, if special causes are producing problems in the system, they will be specific events. In such scenarios, one needs to understand the underlying cause, optimize the current conditions of the process and put control to prevent it from recurring.

 

Processes with more variation are more difficult to predict and are unstable.

For example – let’s imagine you are part of old Indian Cricket selection board and you are left with a choice to either choose Rahul Dravid or Virendar Shewag. In this scenario – whom do you choose? (This is crucial match for India). It is quite obvious – one would have selected Rahul Dravid, because he is more predictable and stable when compared to Shewag.

 

In the above example – when you closely look at the average scores of Dravid and Shewag, the scores will be of same or Shewag’s average would be better than Dravid’s. Measure of central tendency, the average score of Shewag does not qualify him to be part of the Indian squad because of unpredictability. Hence, it is not always measure of central tendency but variation that helps us bridge the gap in a process. Hence, understanding variation of a process is more important to decide on the next course action for a process

Share this post


Link to post
Share on other sites
  • 0

Measures of central tendency like mean, median and mode are typically useful when spread is small. They are not representative of the data if the spread of data is high.

 

Measures of spread or variation helps to describe the variability in data. One of the main reasons for importance of measures of spread is its relationship with measures of central tendency. It provides us understanding on how well the mean represents the data.

 

Range is the simplest measure of spread and while it doesn’t stand out as measure that is often used, however it helps identify the boundaries of the data and also helps identify if any values are outliers instantly.

 

Quartiles are a useful measure of spread because they are less affected by outliers or a skewed data set that the equivalent measures of mean and standard deviation. Therefore quartiles are often reported along with median. Interquartile range is popular measure that describes the difference between third and first quartile, therefore helping us understand the range of middle half of the values in data. Quartiles are however limited in that they do not take every score into account.

 

The absolute deviation, standard deviation and variance address this concern as they show the amount of deviation of each score from the mean. Standard deviation is arrived at by calculating sum of deviation from the mean and divide by no. of measures. However in this case the negatives and positives cancel each other, to address this absolution deviation is used in which the sign is ignored when calculating the sum of deviation from mean and then dividing by no. of measures. Variance on the other hand uses sum of squared deviation from mean instead of ignoring the sign to eliminate the effective negatives.

 

Both mean and standard deviation are highly influenced by extreme observations, they are more suitable for numerical data that is roughly symmetric. For asymmetric data the five number summary comprising of minimum, Q1, median, Q3 and maximum is more useful measure.

 

Measures of central tendency and spread are often therefore used in conjunction to provide an overall description of set of data.

Share this post


Link to post
Share on other sites
  • 0

When the data set is huge and the range is high central tendency wouldn't be  much of use and we would use spread.

 

One real life example would be while comparing salaries of state Ranji players and international cricketers the central tendency would not tell the ideal distribution of data.

 

 

Edited by Sathyanarayanan
Spelling errors

Share this post


Link to post
Share on other sites
  • 0

Both measures of central tendency and variance can be used for decision making. But variance or the spread is more trusted upon especially in life line industries like automobile,aerospace and medical equipments . It's because variance takes a call on each data point which is very crucial especially for the above mentioned industries. Unlike central tendency, It never generalizes a set of data points.Infact it uses central tendency as a reference to assess each data point and arrives at a spread that the process or the product is capable of. 

 

Central tendency can be misleading at times. Let me quote one example I studied during my BB course in Benchmark.A decision has to be taken on whether the newly appointed cricket coach can continue in his job or not.  A survey has to be taken. Cricket coach needs to get a minimum 25 percent vote to continue in his job.Off 2000 people 482 support the coach.This ends up at 24.10 percent which might go against the coach.But if we run a 1 proportion hypothesis test to check if the support has truly gone below 25 percent,  it gives us more light on the support and goes in favour of the new coach. It doesn't merely go by division of numbers but takes other factors like confidence level, p value into consideration.

 

If we want to regulate things like controlling speed of a car. We establish a monitoring mechanism where a car can go as slow as x km/hr or go as fast as y km/hr. Any deviations or variations beyond the levels by any car will be tracked and reported through a device.

 

Share this post


Link to post
Share on other sites
  • 0

It is true to say that different measures of central tendency and spread have their own relevance and used. The measures of central tendency include: Mean, Median, Mode.

 

Mean is very susceptible to outliers. Hence using the same when the some values are way different from the rest of the data would make the analysis futile.

Mode would not be a good measure when most frequently occurring mark is way away from the rest of the data.

 

Variance on the other hand, helps in handling on whether the data is close to the mean or not. This helps in making the analysis more insightful. eg:  In case of a class of students, mean of 1 subject was 75% and when variation was calculated for all subjects it was very small. This would help the teacher to confidently say that majority of the students scored around 75%. Thus by using measures of central tendency with spread would help in making more informed decision.

Share this post


Link to post
Share on other sites
  • 0

Central tendency is usually expressed by a measure of location such as mean,  median and mode.  For a normal distribution or when continous data is there we prefer to take mean but mean is very sensitive towards outliers.  It fluctuates drastically with the extreme values.  For example group of five people have salary of one lack each having a average of one lack but if someone is having a salary of 10 lack coming there then we take average of all 6 people,  mean will change drastically it's position from one lack to 2.5 lack which is not possible because they are having salary of one lack only so mean is sensitive to extreme values.  So in this type of situation where data is skewed or not distributed properly we consider to take a median for measuring as it is robust towards outliers and extreme values.  Where distribution might be skewed there also median is used to measure central tendency or where small number of subjects.  Outliers affects the distribution because there are extreme values but median is no more sensitive to outliers.  In distribution of average salary of engineers working in a company drastically increase the mean if consider salary of CEO and upper management.  Mode is used for measuring central tendency for discrete data.  Median a d mean have only one value and mode can have more than one in the data while measurement.  

 

Measure of dispersion is also means of spread.  Or in other words variations  is the amount of dispersion or spread in a set of data.  Three frequently used measures of variation are the range get,  the variance and the standard deviation. The range is equal to the largest value minus the smallest value.  It does give full information so we use inter quartile range which is the difference between third and first quartile.  Quartiles divide the sample into four equal parts.  The range does not Co sides how the values distribute around the mean.  Two commonly use d measure of variation that take into account how all the values in the data are distributed around the mean are variance and standard deviation.  These measure how the values fluctuated around the mean.  If we consider a expanse of five reading of a instrument while measurements 0,0,0,0,25 and another five reading in second case 5,5,5,5,5. In both cases mean is same but if we see the variation and spread then it is more in first case so it's not always to consider or to look after only central tendency by seeing g mean only we need to see dispersion pr variation in data also.  The range,  variance or standard deviation will always be greater than or equal to zero.  It depends upon the spread.  The more the spread out,  the larger will be the range,  variance,  and standard deviation.  The more concentrated the data are the smaller will be the range,  the variaance and standard deviation  will all be zero.  In a class while we will be more focusing on the range of gardes of student what is minimum grade they got and what is maximum grade for doing extra effort to call all the student on same platform.  We don't consider average grades of all student because in this case if we consider average of class then we are living behind all student who gets low grades.  That is not good for result of school or to increase performance of students.  Another example of consumption of Wasing oil in manufacturing industries whole production of parts on machines is very less in summer as compare to winter and rainy days to prevent from rust.  So if we consider average of oil per year that will be wrong me this because it changes due to more consumption in winter and rainy days.  So better to go with median and look after range of consumption of oil.  Sales figure of cars more during festival in comparison to normal days of a year.  Ice-cream or cold rinks sale is more in summer as compare to winter.  Same case with air-conditioning manufacturer sakes of which is high in summer as compare to winter so it depends on data type,  that what we are consider to metre it's variation and central tendency.  

Share this post


Link to post
Share on other sites
  • 0

Two kinds of statistics are most often used to describe data. They are measures of central tendency and measures of dispersion (spread). They are also called descriptive statistics because they help describe your data.

Mean, median and mode are all measures of central tendency. They help to summarize and communicate data with a single number and in such a manner that they are easy to understand.

Range, variance and standard deviation are all measures of dispersion. They are used to describe the variability in a sample or population and help to know the spread of scores within a bunch of scores like whether the scores are close together or are they far apart.

For example if we were to record the height of students in a class and to come to a conclusion as to how tall the students are, the measures of central tendency like mean, mode or median are used but if one wants to know how much the heights vary and how many students are 5 feet 2 inches tall for example, then measures of dispersion like range, variance or standard deviation needs to be used. When conducting research generally only random sampling of data is done as examination of all data would incur heavy cost and time.
Whether to use the median, mean or mode will depend on the type of data such as nominal or continuous data or whether the  data has outliers and/or is skewed and what is to be inferred from the data. One should not use mean where the data is skewed instead one should normally use median or mode, with the median being usually preferred
.

The first step in assessing spread of data is to examine it in either a table or in a graphical form. In a graph one can clearly see symmetry (or lack of it) in the spread of data, whether there are obvious atypical values (outliers) and whether the data is skewed in one direction or the other. It is extremely important to detect outliers within a distribution, because they can alter the results of the data analysis. The most important and useful distribution of data in statistical analysis is the normal distribution which is characterized by a bell-shaped curve when interval data is represented by a histogram or line graph.

When faced with a sample that comprises non-normally distributed (skewed) data, there are two choices: one is to accept the distribution as it is and another is to use statistical analysis or to attempt to transform the data into a normal distribution. Common methods of transforming skewed data into a normal distribution are logarithmic, square root, and reciprocal transformations.

Two types of Standard Deviation can be used to describe the variability of the actual sample data. Population standard deviation and Sample standard deviation. For example if a researcher has recruited males aged 50 to 65 years old to investigate risk markers for heart disease for e.g., cholesterol he would use sample standard deviation because though not explicitly stated he will not be concerned just with health related issues of the participants of the study he will also want to generalise the results for the whole population, in this case males aged 50 to 65 years. Whereas if a teacher sets exam for his students he would use population standard deviation because he will be only interested in the scores of his students and nobody else.

 

 

Share this post


Link to post
Share on other sites
  • 0

In defect control,  identification of Data points above usl and below lsl. Example delay in pizza delivery, 

during process engineering, when the data is skewed. 

 

Share this post


Link to post
Share on other sites
  • 0

Measurement of Central tendency (centre of data)and spread(measure of variability) are the part of descriptive statistics presenting the key to understand data and the process that generated them.Different measurements of Central tendency is to realise what might variously be termed the typical, normal, expected or average value of a set of data.Three ways to quantify the Central tendency of a data set are mean,median and mode.  Whereas spread of data can be examined in a table or a graphical form. Three ways to quantify Spread of a data set are range,variance and standard deviations.A graph  makes clarity of any symmetry (or lack of it) in the spread of data, whether there are obvious atypical values (outliers) and whether the data is skewed in one direction or the other (a tendency for more values to fall in the upper or lower tail of the distribution).To assess and understand the data having extreme outliers, measurement of central tendency has less relevance.

Share this post


Link to post
Share on other sites
  • 0

Let us see some basic information about Central Tendency and Variation

 

Central Tendency

It is about the tendency of data to cluster around some value. It is used to determine the location and is usually expressed by mean, median or mode.  

 

Variation

It is the amount of spread (dispersion) in a set of data, which can be either sample or a population. It is usually expressed by Range, Standard Deviation, Variance, Inter Quartile Range

Now let us see two cases.

 

Case 1:

A cricketer’s (Sportsperson) performance is being evaluated after he has come out of injury. The chairman of the cricket board is evaluating the player’s past performance (runs scored) as how the player has played across various countries in various cricketing venues , in all the bilateral series (it means - few cricket matches played between two cricket playing nations),  that the player participated in the last 2 years. Let us see how the player has performed.

 

The performance measured here is for the Limited Overs international match [which is a form of cricket match – with a limitation on the overs, to be delivered (bowled)]

 

Table contains the player’s batting performance against various opponents for the past 2 years

 

Name

 

 

 

Opponent

 

 

 

Runs Scored

 

 

 

 

 

 

 

 

 

 

 

Raj

Match 1

Match 2

Match 3

Match 4

Match 5

Total

Average

(Mean)

Country 1

40

35

70

100

35

280

=280/5=56

Country 2

20

0

0

10

20

50

=50/5=10

Country 3

10

40

60

10

5

125

=125/5=25

Country 4

30

20

56

70

38

214

=214/5=42.8

Country 5

40

35

36

80

45

236

=236/5=47.2

Country 6

100

95

140

100

50

485

=485/5=97

 

 

Grand Total

=1390

Avg=1390/30=46.33

 

Observed fact from the table:  The player got out every time in each of those matches, after scoring those runs.

 

Now a simple look at the above table will tell that the player has been inconsistent in each of the bilateral series that was played against different countries.  There is lot of variation in each series and even in an overall perspective.   But the mean (average) looks decent as per the cricketing standards. 

 

In this case, would the cricket chairman consider the variation in the player’s performance or would he take it onto account the average of the player’s performance which seems to be good.  Considering the recent past performance of the player and the average of the player, the chairman would select the player for the upcoming cricket series.   Therefore, here the mean (average), a measuring technique for central tendency is given high importance and the variation , which is quite big in this case, is not considered.

 

Case 2:

A heart patient, who is hospitalized in a hospital, is monitored for a month, for variations in his Blood Pressure.  On a normal human being, it should be > 90/60 and <120/80. Here the doctor is keen to find how much is the variation of patient’s blood pressure throughout the day , on a routine basis and is not bothered about the average (mean) value of the patient’s blood pressure in a day , on a routine basis.  

 

The doctor is concerned about the variation in the blood pressure which should not cross the stipulated value; else it would be a health or life threatening risk to the patient. Hence we can infer in this case that Variation is given higher importance and mean, which is a representation of measuring the central tendency, is given least importance.

 

Conclusion
Central Tendency and Variation are two key aspects when we talk about Six Sigma. How much your data is spread out and how much data is clustered around a particular value is what we infer from this aspects.  As we have seen there can be cases, where importance can be given to central tendency in one case and in another case, importance can be given to Variation. It can vary based on the needs or context in which this is applied.  

 

Share this post


Link to post
Share on other sites
  • 0

A data distribution tends to have a central or most typical value. This concentration of value around a central location is called central tendency and is commonly defined as:

  • Mean (Interval and Ratio Level Data)
  • Median (Ordinal Level Data) and
  • Mode (Nominal Level data)

A variation or spread is a measure of how much the data points in the data set are close to or away from the identified central tendency. In most applications, it is desired that to have minimum variation or data points close to the central tendency.

 

But in some applications a variation (spread) may be important. Couple of such applications, that I could think of are as follows:

1.       Distribution of the Years of Experience /Age of the Workforce in the organization

It is always desirable to have a proper mix of all experience levels (spanning from new entrants to mid-level experience to senior level) in the workforce for a proper balance. The distribution of “Years of Experience” should have a variance and should not be around a mean value.

Also, if a lot of folks in the workforce is around retiring age then younger people would need to be recruited so that there is no man power shortage.

 

2.       Population age distribution

A countries’ population tends to age and become older, and it is necessary that there are enough younger people working and earning the money and paying the taxes to support the economy. So the distribution of age demographics of a country should be ideally spread out.

Share this post


Link to post
Share on other sites
  • 0

Q1 - Episode 3 - Measures of central tendency and spread have their own relevance in most situations. Please explain situations where variation (spread) is of highest importance and central tendency is very low in importance (or is irrelevant.) 

 

Measures of central tendency are Arithmetic mean, median and mode. Also called as central location, which follows a normal distribution. Hence called as summary statistics. The best measure to describe the data set is mean and again it depends on the situation, which measure to be used.

 

 

Measures of dispersion are called as statistical dispersion, which follows a distribution with spreader / stretched effect. It measures variance, range and standard deviation.

 

Measures of central tendency is used to denote the normal values of the dataset, where dispersion is used to denote the variability, scatter and spread of he data around the central value. Dispersion is important because it explains how well the mean represents the wellness of the dataset collected. It means it describes the relationship with the measures of central tendency. When the data has a large value, the spread is more and when the data has a smaller value, the spread is tightly packed. Hence given a measure of dispersion,  it is important for one to find how well the data is spreaded over the central location.

 

Range: it is a measure of dispersion, which tells the difference between highest and lowest value in the dataset.

Range = Max value – Min value.

 

Variance: It tells us how well the data is spread out of each other. It is square of SD.

 

Standard deviation: It tells us how well the data are spread out from the mean/central location. It is used for data which is normally distributed. SD is determined by variance. It is root of variance. The formula for a sample (S) is

image.png.274f0289a894b094a9a4881903dd222b.png

 Where x represents each value in the population, x is the mean value of the sample, Σ is the summation (or total), and n-1 is the number of values in the sample minus 1.

 

Conclusion:

It is important that you measure the spread of the data to reduce the variation around the mean / target of the process than considering only the central location of the data set collected. A well defined process should be centrally located with reduced process variations around the mean. Hence calculating only central tendency provide us insights to how well the process is distributed normally. But calculating the dispersion tells us the relationship of spread around the central location.

 

Thanks

Kavitha

 

 

 

 

 

Share this post


Link to post
Share on other sites
  • 0

It is very important to understand the concept of the 2 measures:

A Measure of Central Tendency is a single value which describes the way in which a group of data cluster around a central value. It is a method to describe the center of a data set.  It is to be noted that There are three measures of central tendency- Mean, Median and the mode. The central tendency can be calculated for a finite set of values or for a theoretical distribution, such as the normal distribution and is used to identify the location of the center of various distributions.

Whereas a measure of spread or measure of dispersion is used to describe the variability in a sample or population. It is mostly used in conjunction with a measure of central tendency, like mean or median, to provide an overall description of a set of data.

There are many situations when the measure of spread data values serve better purpose than the measure of central tendency. It gives us an idea of how well the mean represents the data. If the spread of values in a given data set is large, the mean cannot be as representative of the data as it would be if the spread of the data is small, because there would probably be large difference between individual datapoints.

Measure of spread is relevant in an unstable process where the data points are scattered from the mean or normal distribution. As aforementioned, the relevance of the mean is understood through this method.

Example:

In a QSR, with a VOC process in place, if a range of scores from a customer satisfaction report is to be considered with varying scores from a large database- and the range of data points or variation between the scores is very high and spatial. The mean of the same will not be relevant as the median will also be very distant from the mean. The measure of central tendency then becomes irrelevant and the measure of spread becomes more potent.

 

 

 

Share this post


Link to post
Share on other sites
  • 0

Spread shows us how consistent the underlying subject is. The spread would be considered more important where we are comparing two or more objects with respect to it's own average than the service levels or targets.

 

For example 

 

A hypothetical scenario..

 

Two people A and B are appearing for a shooting competition. 

 

Person A shoots 10 bullets ranging from outermost circle to innermost circle, averaging around  say 7, spread being say 4.

 

Person B shoots 10 bullets and all 10 between 3 and  4. Spread being  at .5

 

If we are to provide some training to one of these player and nominate them for tournament then better bet would be Person B. As he may have issue with the basics of aiming but once he is set with that... he will be more consistent and will achieve better results. 

 

Share this post


Link to post
Share on other sites
  • 0

Central Tendency VS Spread

Measure of Central Tendency:

A measure of central tendency is a measurement of central location of set of data. This is generally done by measuring

  • Mean: This is equal to the sum of all the values in the data set divided by the number of values in the data set. Also known as Average value.

  • Median: The median is the middle score for a set of data that has been arranged in order of greatness.

  • Mode: The mode is the most frequent score in our data set

 

Measure of Spread:

Measures of spread explains how similar or varied the set of observed values are for a set of data. This includes

  • Range: The range is the difference between the smallest value and the largest value in a dataset.

  • Quartile: Quartiles divide an ordered dataset into four equal parts, and refer to the values of the point between the quarters. A dataset may also be divided into quintiles (five equal parts) or deciles (ten equal parts).

  • Interquartile range: The interquartile range (IQR) is the difference between the upper (Q3) and lower (Q1) quartiles, and describes the middle 50% of values when ordered from lowest to highest.

  • Variance: The variance measures the spread of the data around the mean. Variance is the expectation of the squared deviation of a random variable from its mean.

  • Standard deviation: The standard deviation measures of the spread of the data around the mean or dispersion of a set of data from its mean. It is calculated as the square root of variance by determining the variation between each data point relative to the mean.

 

Example of situation where variation (spread) is of highest importance and central tendency is very low in importance (or is irrelevant.):

Example 1: Age of demography:

Demography is the statistical study of populations, especially human beings. And In the given example and with the explanation given above, if you want to determine the age of demography, you should not take the measure of central tendency and say that average age of given demography is “X”. Ideally in this case you should use the measure of Spread and give the Range of populations age, how the population spread when they are put in to different quartile etc…

  

Example 2: Height of the door for below given data:

Data: 5.73 ft, 5.51 ft, 4.37 ft, 4.82 ft, 4.28 ft, 4.91 ft, 4.27 ft, 4.41 ft, 4.97 ft and 4.34 ft

In the given example and with the explanation given above, if you want to determine the height of the door, you should not take the measure of central tendency and say that average height of the given data is 4.76 ft so the height of the door must be 4.76 ft. Ideally in this case you should use the measure of spread and make the height of the door so that even the 5.73 ft person can easily go through the door.

Share this post


Link to post
Share on other sites
  • 0

A Tough One : However, when put a thinking hat on, in some situations, even after the Segmentation, the Spread(within the Segmented Data) and the way in which the data is spread could hold significance. Eg: If one was to study the spread of the marks of all students in a state / country who took a 12th Class examination, the percentage of students scoring 90-100, 80-90, 70-80, 60-70 etc could be of "Significance" to someone studying the future workforce trends/needs of the Country.

 

Such studies holds  significance for developed economies which take proactive steps to ensure stability for generations to come.

 

This (having healthy numbers within all score bands) is important, as having passouts at all levels is important so as to have all kinds of careers(Blue Collar/ White Collar etc) filled with adequate numbers so that 10 years after, there is no crisis of one industry/heirarchy not having required number of skilled professionals. Scarcity is likely to create imbalance in the societal/national ecosystem.

 

Hope the Example & Imagination took flight !! :-)

 

Share this post


Link to post
Share on other sites
  • 0

I think when spread exceeds customers's specification limit, then controlling deviation become more important than central tendency. 

 

Share this post


Link to post
Share on other sites
  • 0

 

I thought this one is particularly hard and I must say, I am delighted by the top responses.

The three selected answers are by Venugopal R, Mohan PB, and Raghavendra Rao. Study of process potential in my view is the most valid situation where variation rules supreme.

Venu has highlighted ANOVA, Relative grading systems, Control Chart logic in general and Temp variation in a mould, and impact of voltage fluctuation on electric appliances in specific. All these are well thought and presented. He has also highlighted situations where controlled variation is favourable and where variation is favourable.

Mohan has presented very strong and valid situations as well. He has highlighted relative ease of target and a few others. Mohan’s specific example of batch processing was a brilliant piece.  

Raghavendra Rao has stated specific scenarios accurately – unpredictable process, influence of special causes, skewness of data. A few examples would have brought his response few notches up.

While there are several other noteworthy responses too, in my view these three as the better ones and the one by Venugopal R as the best.  

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.

  • Who's Online (See full list)

    There are no registered users currently online

  • Forum Statistics

    • Total Topics
      2,474
    • Total Posts
      10,746
  • Member Statistics

    • Total Members
      52,326
    • Most Online
      330

    Newest Member
    Sandipan Trivedi
    Joined
×