Question 4 in Episode 2:
While continuous data is measured and attribute data is counted, there is sometimes confusion if some specific dataset should be considered continuous or attribute. Provide some examples of confusing datasets and your inference.
Data – is defined as a collection of avalues / useful information that is required for any analysis to the receipient. Data is genereally used to prove / disprove hypothesis.
Data is of two types basis statistics. It is Quantitative or Qualitative.
Quantitative is descriptive data, which can be categorized into subgroups for analysis and qualitative is numerical which means either measurable / countable. Qualitative data is again divided into 2 types continuous and discrete data.
For Eg.
Charlie chaplin is fair, short, has small mustache, thin built and wears black colored jacket. – it is qualitative data.
Charlie chaplin has one hat, one walking stick and 2 legs. – it is Quantitative –discrete data.
Charlie chaplin aged 45 years is 57.2 kgs built and 4.8 inches tall . – it is quantitative continuous data.
4 types of measurement scales:
It is divided into four categories – Nominal and ordinal, interval and ratio
Ø Nominal data: It assigns a numerical value as an attribute to any object / animal / person / any non-numerical data.
Ø Ordinal data: Any data which can be ordered and ranked is called ordinal data. This can’t be measured.
Eg. 1. A horse is numbered in the race court, represents the nominal data.
2. The numbered winning horses are ordered and ranked as “1st, 2nd and 3rd place” in race club, which represents ordinal data. Another best examples is progress report of the student.
Ø Interval: It is a numeric scale where we know order as well as the differences between values. There is no origin.
Eg. Temperature of the room is set to be normal if it is between 25 and 28 degrees C. Time is another good example of an interval scale in which the increments are known, consistent, and measurable.
Ø Ratio: Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero–which allows for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition of zero.
Good examples of ratio variables include height and weight.
Qualitative data:
It is otherwise called as categorical data.
Quantitative data:
It is divided into two contionus and discrete data.
Difference between Continuous and discrete data:
Continuous data
Discrete data
It is measureable on a scale
It is countable
The data falls within finite or infinite range
The data has only finite numbers.
Can be broken into subcategories
Can't be broken since it is a whole number.
The frequency is depicted in histogram, where skewness is shown clearly
the values take a distinct value hence it is represented in bar diagram, skewness can't be seen.
Values are allowed to group within the range
The values are individual values.
Eg. Temperature of the person, Height, Weight, Age, time, Cycle time taken to complete a task
Eg. No. of cumputeers, No. of students, no. of books, no. of certificates, no. of errors, etc
Confusion between Contionus and Discrete data:
Eg. 1:
Person
Age
Weight (Kgs)
Height(Inches)
Color
Ajay
34
51
5.1
Wheatish
Sharma
35
65.5
5.2
Fair
Roshini
23
45.5
4.8
Wheatish
Gaithri
53
72.5
4.8
Dark
Linda
43
46.5
5.1
Fair
Tanya
36
43
5.3
Wheatish
Balu
27
56
5.6
Fair
Vignesh
32
77
6.1
Dark
Aarav
43
76
5.9
Wheatish
Rithesh
45
64
5.3
Dark
Qualitative data / categorical data:
Categorize 10 people in the group into wheatish, dark, fair basis the color. This represents categorical data.
Continuous data:
Age , Height and weight of the people displayed above in the table depicts a good example of continuous data, where these numbers falls within the infinite ranges.
Discrete data:
No . of Wheatish – 4
No. of fair – 3
No. of dark – 3
Total no. of people – 10
Conclusion of Eg. 1: Though age is continuous numerical variables. Although the recorded ages have been truncated to whole numbers, the concept of age is continuous.) Number of aged people is a discrete numerical variable (a count).
Age can be rounded down to a whole number, if so it represents the discrete data. Though it falls under discrete(when all data is shown as whole integers), it is actually a continuous data because it has ranges. Age is not a constant factor, though the DOB is constant.
Basis the context / concept of the requirement – lets say to fill a form, the exact age is required. In such case, though age is discrete, it is continuous.
“12 years, 153 days” really means a continuous age that is between 12Y152.5D and 12Y153.5D.”
Eg. 2 : Income is another example of continuous data.
Eg. 3: “
In practice, percentage data are often treated as continuous because thepercentage can take on any value along the continuum from zero to 100%. In addition, dividing a percentage point into two or more parts still makes sense.Discrete data are easy to collect and interpret.
% is always to be considered as continuous but it depends on the concept.
If I have to track the error percentage, the right metric is as below..
Error % = No of errors (Discrete)
Total charts audited.(Discrete)
Hence Error % is discrete.
Another example:
If I have to track the availability of the machine, the formula is as follows…
Availability % = Total hours available (Continuous) / Expected hours of production for 8 hours(Continuous)
Hence Availability % is continuous, since time is continuous.
Conclusion:
It depends…. In certain situations, discrete data may take on characteristics of continuous data. But, if counts are large, distribution of values are relatively wide, and the the values are distributed across the values, you can “pretend” it is continuous and use the appropriate tools.
Thanks
Kavitha