In recent weeks, I have enjoyed interactions on certain concepts with games on problem solving, the why- why analysis and Pareto principle. This week, let us have a contest that focuses on assessing genuineness of data.
When I used to conduct management system audits, I faced several amusing situations where the audited process’s owner wanted me to believe that the data being shown to me was real while I strongly felt that it was fudged. Such situations do get tricky sometimes as I have a tendency to go deeper to find the truth, and maintaining good relations moves to a slightly lower priority. Here is what happened in one of such situations.
In the middle of an audit, I found myself looking at complaints received for the account creation process in a financial services company. According to the process owner, the number of complaints was to the tune of 1%. He showed me complaints in the system which matched his claim. To be more specific, there were 110 complaints for about 11000 accounts that were opened. In another location of this company, as per my audit records, the complaints were consistent over time and much higher in terms of percentages. I could not find any specific difference in processes at this location that should keep the number as low as 1%. Fortunately, each complaint here had an auto generated serial number. These complaints started from serial number 1 and the biggest serial number was 998. Obviously, hundreds of serial numbers were missing as visible complaints were just 110 in number. On inquiring the reason for missing serial numbers, I was told that many complaint get wrongly marked as account creation complaint, and they are moved to another queue which is managed by a team overseas. I looked at the complaint serial numbers carefully after arranging them according to their first digit, and I was sure that these books are cooked. The serial numbers actually indicated that the data was fudged. To convert the situation into a game, I have created two data sets which are shown in the image below. The question is – which out of these is likely to be genuine and which one is supposed to be a fudged data set?
Question – Which data set is more likely to be genuine?
Like always, you may compare your thinking with others by clicking on A or B. Once you have clicked on your preferred response and are ready to compare with mine, please click here – http://benchmarksixsigma.com/blog/how-to-find-if-a-data-set-is-genuine-part-2/. To create a comment on this post, please click here.