Jump to content
Vishwadeep Khatri

Continuous Data and Discrete Data

Recommended Posts

Dear All,

In principle, if the values could be measured, data is said to be continuous. Being a part of a continuum, there can always be a value between any two given values for continuous data.

For example, between 2mm and 3mm, you can have 2.5 mm. Therefore, linear measurement (length width, height) obtained by using a measuring instrument is continuous data.

In reality, continuous variables can only be measured to a certain accuracy. Let us consider bank balance. The degree of accuracy is 1 paisa. So, technically balance is discrete, i.e the number of values is finite or perhaps countable. However, as in this bank balance example, if the number of possible values is very large, it is often considered to be continuous.

There are several situations where you may have a debate on whether data should be considered continuous. Any Comments?

 

 

Share this post


Link to post
Share on other sites

Hi,

I have a couple of questions based on the above views on continous and discrete data.

1. Pearsons Correlation should be done on continous data only but i have seen it being applied to discrete data.  Can we use it for attrition numbers which is technically discrete?

2. Can the Accuracy % based on defectives taken week on week be used for creating the IMR control chart /2 sample t-test etc that is done on continous data or should we use the number of errors over the sample size for creating the P chart and chi-square test.

 

Share this post


Link to post
Share on other sites

Hi Rajendra,

Let me pitch in here:It all depends on how you are looking at the data?

1. If data on attrition is taken in isolation ie just the count of pepole attrited its discrete but as it doesn't make much sense, we need to see this data over something. It could be over a period of time which make it continous but if the same data is seen over the total no. of people hired then its discrete in nature.

VK seek your comments pse.

Share this post


Link to post
Share on other sites

Hi Shalini,

 

Interesting questions.

 

1. If you notice carefully - No. of Rejections per day divided by No. of files processed per day SHALL BE SAME AS No. of rejections divided by No. of files. So it remains discrete divided by another discrete. If both these numbers (numerator and denominator) have very few possibile values, the ratio or percentage can take very few possible values. So data should be treated as dicrete.

 

2. If you are treating number of rejection per day as continuous, you will need to test normality to be able to use any of the tests that assume normality.

 

3. All continuous data is not normally distributed. All normally distributed data may not be continuous at source.

 

Questions and comments welcome.

 

Regards,

 

VK

 

 

 

Share this post


Link to post
Share on other sites

I am posting a query from AMSR below on - Differentiating Discrete and Continuous Data. 

Written by Arun Maruthi Selvan Ramasamy 

Hi!!

I had a big discussion regarding a project that I'm currently working on. It boils down to the basic question of how I can differentiate Discrete data from Continuous data.

The data collected was the number of errors found in each drawing.

I assumed that X is each drawing and is discrete..Y is number or errors..which is also discrete because it is a count of the number of errors. However my BB suggested that Y is continuous, because, it can have any value from 0 to say 100 and is totally random.

I tried to fit the data to a normal distribution using a software that we use in our company.

When fitted to a normal distribution(Glog), the p value was 0.125. which indicates that the data is normal. When fitted to a discrete distribution, the p value was 0.96, which indicates that the data is discrete. I'm confused.

Its a fine line between discrete and continuous data, but what type of data am I currently looking at?

Another question is whether there are different types of normal distribution and discrete distribution?

Looking forward to a response.

Thanks & Regards,

AMS

 

Share this post


Link to post
Share on other sites

Dear Arun Maruthi Selvan Ramasamy ,

Kindly refer the response posted by VK dated Sep 8th and thereafter, on a very similar query.

Although No. of errors per drawing can have "n" values so treating it as continous will be ok however as numerator and Denominator are discrete in natutre so the ratio of the two will still remain discrete.

The data that passes the normality doesn't necessarily mean that its a continous. The p value is more than 0.05 as you have n possibilities of the above defined ratio ie the two different tests that you have done suggests that you are actually treating a discrete data as continous. Thats perfectly ok!

Your 3rd query's - for continous data use parametric tests and for discrete perform non parametric tests.

I hope 'm making scense....

regards

 

Share this post


Link to post
Share on other sites

Dear VK,

Many time during the analysis phase , the data which I work upon is REJECTIONS/DAY,number of SRs/DAY,number of queries uploaded on the system/day.

It becomes quite elusive as to whether I should consider these data types as CONTINUOUS or DISCRETE while carrying out the analysis.

For such type of data,while analysing in Minitab,should we go for STAT>>Quality tools>>Capability analysis>>Normal or STAT>>Quality tools>>Capability analysis>>Poisson/Binomial?

Request you to help me on this.....

 

Thanking you,

Nieranjan Argade

Share this post


Link to post
Share on other sites

Dear All,

Debate on Discrete and Contineous is more older than six sigma,

In most of the cases when data is in % then the people starts getting confused.

Here is the golden rule to decide data type in proportion data.

discrete numerator, contineus denominator = contineus is the data

contineous numerator and contineus denominator = contineus is the data

discrete numeratior and discrete denominator   = data is discrete

Contineus numerator and discrete denominator =  data is contineus

In  proportion cases numerator decides the fate of the data type.

Thease are the theoritical explanation of data type.

Practically,

Discrete data may behave like contineous as per CL theorum or due to incorrect data collecion or msa error. if no msa error then

Contineus data may behave like discrete due to biologican or mechanical life of data

Work with data  and select the tools for data as per there distribution behviour not as per there theoritical definations

 

I hope it will help

 

Navin Rohilla

BB

Share this post


Link to post
Share on other sites

Dear Niranjan,

First check the theoritcally that the data is dicrete or contineus ,

Then check the behaviour of the data based on distribution

If data is contineus as per defination(theoritically), it should folllow normal distribution

if not then use non parametrica test . there is no need to check other distribution pattern like binomial and poisson distribution(Which are exclusively for discrete).

If the data is discrete theoritically, and behaves like discrete then check binomial (if the ans is yes or no format) and for ordinal data check for poisson distribution.

ALWAYS TAKE CARE OF CLT BEFORE CONCLUDING ORDINAL DATA HAS DISCRETE OR CONTINEOUS BEHAVIOUR.

USE DIFFERENT TOOLS FOR DISCRETE AND CONTINEUS AS PER THERE DISTRIBUTION BEHAVIOR

regards

Navin Rohilla

BB

Share this post


Link to post
Share on other sites

 

Hi Durga,

We can certainly use chi square test for a discrete data.

As I think I m 2 late to reply on the query of the data discontinuous or discrete. 

If v r testing defectives the data will follow a binomial distribution (Mr.Niranjan query)

And data will be discrete, if v r testing defect data will follow a poison distribution and discrete and if the data is whole number it will be discrete only.....

However we can change discrete data into continuous data...

adding more as per Mr.Naveen we need to check data theoretically if it's a discrete one or continuous one ...then we need to check the behaviour of data based on distribution then we will do the normality test (if data = Normal)

I HAD A QUESTION

If the data is continuous then what we will be going to do by identifying the distribution of data, suppose we identified that data is following an exponential distribution y=e^-x , for x>=0 ,standard exponential distribution????

Secondly if the distribution is log normal which have a normally distributed logarithm?

Do we still continue to test data for normality????

Warm Regards

Mahi

 

Share this post


Link to post
Share on other sites

Hello Vishwadeep,

 

One of the interesting dumps we have created is the possible data type in the portal. (I will down load and share with all).The toll gate reviews are specifically focused by MBBs for the type of data decision.
It is amazing the thought of type of data and your instigation goes to various topics like correlation, distributions, chi-square etc
I agree with you, it is tricky but a good learning. Thanking you once again
Srinivas

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×