Jump to content
  • 0
Sign in to follow this  
Vishwadeep Khatri

Benford's Law

Benford's Law (also known as first digit law or law of anomalous numbers) states that in real life sets of data the frequency distribution of leading digit follows a particular order. E.g.the digit 1 tends to occur approximately 30% of the times while the digit 9 tends to occur less than 5% of the times.

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Pradeepan Sekar on 19th May 2020

 

Applause for all the respondents - Mohamed Asif, Selva Mariappan Subramanian, Pradeepan Sekar, Senthilkumar G
 

Also review the answer provided by Mr Venugopal R, Benchmark Six Sigma's in-house expert.

Question

Q 262. According to Benford's Law - "In any collection of statistics, a given statistic has roughly a 30% chance of starting with the digit 1". What are some of the business applications of this law?
Hint: Refer the link - https://www.benchmarksixsigma.com/how-to-find-if-a-data-set-is-genuine/

 

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Share this post


Link to post
Share on other sites

6 answers to this question

Recommended Posts

  • 0

Benford’s Law:

         Benford’s law is also called as the law of anamoulous number, law of first digit. This law is named after the physicist, Frank Branford who stated in 1938 and it describes the observation on the probability distribution of leading digits in many real time data sets.

         According to Benford, leading digit in any set of data most likely to be small. I.e the number of datum which starts with the digit 1 will be having the highest probability of around 30% , followed it the digit 2 will be having the second highest probability of 175 and so on, the probability of having 9 as the leading digit is the lowest as 5%.

         Benford’s law applies to the set of data if the logarithms of the number applies to the normal or uniform distribution, but not the number themselves.

         If a number x constrained to be between 1 and 9, set of data starts with digit 1  will be between 1 and 2, in similar way, x starts with digit 1 will be in the interval of [ Log 1, Log 2], and digit 2 will be in the interval [Log 2, Log 3] and so on

image.png.5256f3d4071e83ce43f38c71f5c477f2.png

 

 

Picking a random number in a uniform manner on this line results 30% of time with leading digit 1.

 

image.png.e1c825a0b8b203ce3ad7894fec790a75.png

image.png.f50f81029b927c9ac44b9763e1c4a746.png

Benford’s law application:

         This law applies to the large number of data with multiple orders of magnitude. Most of the data such as population of country, Account Balance,  Bills, Tax, data of tallest building heights in the world, COVID-19 spread.

 

         Of course , it will not apply to the set of data that has been divided and each data will be between 300  - 900 and are uniformly or normally distributed then the benfords law will not be applied there.

 

         In Business application, Benford’s law can be widely used in auditing the set of data. Based on the human plausible assumption , they will try to fabricate the figure  and distribute them uniformly. When it applied to the large set of data of multiple orders of magnitude, by simpliy finding the frequency of first digit, the data can be audited.

 

References:

        A movie released in 2016, The Accountant, a detective uses benford’s law to find the theft of funds.

         In US, criminal cases had been admitted with the evidence of benford’s law.

 

Benford’s law tested on COVID-19 affected cases:

 

         The distribution of COVID-19 as on today (19-May-2020) follows Benford’s law which is shown below.

Source: https://www.worldometers.info/coronavirus/#countries

 

image.png.1c9639af97041a3545316694e48a18a6.png

 

 

 

Share this post


Link to post
Share on other sites
  • 1

Benchmark Six Sigma Expert View by Venugopal R

Benford’s law states that for a data set of numbers that represent a random sample from any population, the expected value for the percentage of occurrence of 1 as the first numeral is around 30%. Similarly, an expected percentage is assigned for 2, 3 etc as the first numeral and so on. When we move from the first digit to the ninth digit, the expected percentage keeps decreasing.

 

image.png.e0656efecee1a64fb55bc4bc916272d7.png

 

The above expectation follows a logarithmic equation  p = log (n+1) – log (n); where n is the first digit and p is the probability of occurrence. This was first discovered in 1881 by an astronomer, Simon Newcomb. However this hypothesis was tested and verified by Frank Benford in 1938.

 

In today’s world, we come across numerous situations where big data analysis is made use of. Hence, it has become important to ensure that we have some quick and effective method of evaluating the genuineness of data and rule out possibilities of fraud or inadvertent lapses. Benford’s law becomes an useful tool to study and compare the pattern of the data in line with the expectations for the occurrence of starting digits.

 

While there are applications of this phenomenon in business and practical life, we need to be cautious before we draw conclusions by applying Benford’s law. Before we move into discussions about applying this principle, let’s discuss a bit more about the probability of occurrence of the first digit numerals. Some insight into the basis of this law will help us to exercise our discretion on its application. However, the actual analysis of the Benford’s law and the scientific proof would be complex. In an attempt to provide a simplified view, I am trying to give a simple insight to have a quick idea about the dynamics of the behavior.

 

Table-1 below, gives the actual percentage of occurrence of 1 as the starting digit for sets of numbers up to 100, 1000 and so on up to 1000000. We can observe that the actual percentage of occurrence is almost same across these sets for the starting digits of both 1 and 9.

image.png.e8147211afc188a7bb9a85b3526d0719.png

 

Now see the table-2, where additional grouping have been inserted and highlighted yellow, viz. up to 125000, 250000, 500000 and 750000.

image.png.33ef036efd4969fdcb814c3ac19f456d.png

 

Evidently the percentage of 1 being the first digit goes up in the highlighted sets in the regions between 100,000 and 1,000,000. You may observe the dynamics of the variation in percentages across these sets and how they compare with that of 9 as the first digit.

 

This clearly brings out the fact that among many other aspects, it is important to examine the data and look for certain characteristics before applying Benford’s law. This law applies to many types of data such as stock prices, tax calculations, electricity bills, census, birth rates, bank accounts etc. The law finds good application in Data science to catch anomalies and fraud detection.  

 

The law would not work well with less number of data. Expert opinions require at least 500 data points for effective application of Benford’s law. 

 

The nature of data should be such that there is equal probability of occurrence for all the digits from 1 to 9. For instance, if the study involves analysis of heights of humans in cms, the data is unlikely to have any starting digit other than 1.  Similarly, if we are dealing with data of invoices whose minimum and maximum values are knows to be 45 and 75, we are not going to see any data beginning with 1,2,3,8 and 9.

 

Data has to be distributed in multiple orders of magnitude. For instance, if my data is ranging from 1 to 1000, the probability of occurrence for each digit is almost the same and Benford law would not work. (See the example narrated in the link on the question)

 

Benford’s law may be used as a tool for screening data, where applicable by the nature of data, but cannot be used as a conclusive proof for deciding the credibility of the data. Where data is suspected based on Benford’s law, further investigations will be required to arrive at final conclusion.

Share this post


Link to post
Share on other sites
  • 0

Benford's law also referred as first digit law highlights about distribution of digits of randomly collected numbers to be in non-uniform way, especially the digit 1 tends to occur with probability of around 30%, which is much greater than expected 11.1% (1 out of 9 leading digits)

 

1359750152_benfordlawpicforforum.thumb.jpg.b2568dbddab08a501eb0e1acc01bf251.jpg

 

Flip side, Non-naturally occurring data would have pre-defined number like Zip Codes or Universal Product Code (barcode symbology) for instance. Computer-Generated Numbers using Rand() does not follow Benford's law

 

Scientifically, this Law is based on base-10 log that shows the probability that the leading digit of a number will be n can be calculated based on log10(1+1/n)

 

This is specifically used in businesses and accounting services to detect fraud and have application in organizational and business environment and can be used while dealing with:

General ledgers,

Trial balance reports,

Income statements,

Balance sheets,

Invoice listings,

Inventory listings,

Depreciation schedules,

Investment statements,

Expenses reimbursement,

Accounts payable and Receivable reports,

Timesheet data,

Portfolios,

Expense reports.

 

In Risk based Audits: This law could serve as an early indicator showing abnormality in the data patterns

Forensic Audits: Checking frauds, bypassing threshold limits, improper payments

Financial Statement Audits: Manipulation of checks, cash on hand

Corporate Finance: Examine cash-flow forecast for profit centers

It is widely used by Income tax agencies, auditors and fraud examiners to detect abnormal patterns  

 

It works when,

we have large data sets

we have equal opportunity for metrics considered and

we don't have definitive proof

No build-in minimum or maximum values are in the data set

 

Insurance Industry:

In US, general accounting office has estimated fraud accounts for up to 10% of annual expenditure on health care or $100 billion in the US. In health insurance industry, there is a large amount of claims data submitted by health care providers. Benford's law can be used to analyze and detect abnormalities in the data.

 

We can use Z-statistics to determine the difference between the actual and expected proportions and check for their significance.

 

One can use this tool as a method of detecting possible fraudulent or errant claims received on behalf of health insurance company.

 

Closure Points:

Benford's Law is an excellent tool to predict the distribution of the first digit in a large population of data, given that the data has not been inferred with human touch.

 

Share this post


Link to post
Share on other sites
  • 0

Benford’s law:

Benford’s law states that in a naturally occurring data set, the frequency of leading digits does not follow a uniform distribution even if the data set is random. The probability of low order digits (digits starting with 1) has the highest probabilistic occurrence than the next order (digits starting with 2) and so on. This law however is relevant only when the data set is in high orders of magnitude. The figure below depicts the proportion of occurrence of numbers 1-9  as  the leading digit.

 

174248940_BenfordsLaw.jpg.c75f60c7e956132ef0f1a73678c12917.jpg

 

Limitations to the applicability of Benford’s law:

·       Smaller datasets do not follow Benford’s law accurately.

·       Datasets must follow natural order (no human imposed limits are allowed). Example: Population of cities, countries.

 

Application of Benford’s law:

1.  Risk based audits

Any manipulation in data set for vendor invoices can be compared with Benford’s distribution in audits.

 

First Digit

Benford's Set

Data Set X

Deviation

1

30.10%

24.00%

0.06

2

17.61%

18%

0.00

3

12.49%

26.00%

-0.14

4

9.69%

11.00%

-0.01

5

7.92%

5%

0.03

6

6.70%

7%

0.00

7

5.80%

5%

0.01

8

5.12%

2%

0.03

9

4.58%

2%

0.03

 

1329328654_BenfordsLaw1.png.43577fc7c07aae16f9fefd68f1bd634b.png

 

Data set X represents the frequency of first digit for 10,000 vendor invoices.

 

2.       Fraud detection

1.       Forensic audits for detecting fraud, security breaches and irregular payments

2.       Financial Statement audits for manipulation of cash on hand, inventory values etc.

3.       Corporate finance or company valuations in examining free cash flow forecasts.

Share this post


Link to post
Share on other sites
  • 0

Q 262. According to Benford's Law - "In any collection of statistics, a given statistic has roughly a 30% chance of starting with the digit 1". What are some of the business applications of this law?
Hint: Refer the link - https://www.benchmarksixsigma.com/how-to-find-if-a-data-set-is-genuine/

 

Benford’s Law:

 

Benford’s Law is a Legal Science and Ethical framework driven also called the NewComb-Benford’s Law, the law of anomalous numbers or First Digit Law, is an observation about the frequency distribution of leading digits in many real-time diverse data sets of numerical data. Benford’s Law is mainly used for fraud detection in scientific or any technical publications or research and exploration oriented that may give positive results.

 

When to Use Benford’s Law:

 

Benford’s Law maintains that the numerical 1 would be the leading digit in a data set of numbers 30.1%, Next, the numerical 2 will be the leading digit 17.6%. The numeral starting from 3 through 9, will be the leading digit with decreasing frequency. This expected number of leading digits can be illustrated as shown in the chart "Benford's Law."

Microsoft Excel can count the leading digits contained in virtually any data set, chart the findings, and compare the respective results to Benford's curve to see if that particular data set obeys the expectations set by Benford's Law.

 

image.png.f03216de5f79a6e4aef3cd104a19cbff.png

 

 

Benford’s Law: Why this works. 

 

With respect to non-technical description, Benford’s Law works whether you are counting dollars, inventory, populations and acres, because we must count 1 before counting 2,3 or 4 and so on. Every counting starts with lowest numbers.

With this respective simplified counting exercise, we can see Benford's Law at work—rather than each digit having an equal chance at being the first digit, lower numerals always have a greater chance of leading digit as compared to higher numerals.

 

In following Chart, which data set is more likely to be Genuine?

 

Scenario: Financial Complaints system (Financial Audit)

 

image.png.c51a0e23303b2f1f507058096bd006bc.png

 

With respect to recollect the categorization logic of above chart A &B, Chart B shows the number of complaints marked against their first digit of serial number.

The Chart B complaints trends shows an unusual pattern and it need more clarity on specific data sets. With respect to further analysis as well as to be more specific, the number of complaints starting with digit 1 is higher than those starting with 2 and so on.

 

With respect to Benford’s law, we wanted to use a method that will ensure that data will not look fudged or corrupted. After looking all these data, Benford’s law does not apply because of above Chart B values are not distributed across multiple orders of magnitude.

 

The Benford’s law does apply to a large number of different or complex data sets as prescribed below:

Larger data sets:

i)                  General Ledgers,

ii)                 Trial Balance Reports,

iii)                Income Statements,

iv)                Balance Sheets,

v)                 Invoice Listings,

vi)                Inventory Listings,

vii)               Depreciation Schedules,

viii)              Investment Statements,

ix)                Account Payable and Receivable Reports,

x)                 Expense Report, Time Sheet Data, Portfolios &

xi)                Electricity Bills, Stock Prices, Tax calculations, Death rates.

 

It tends to be more and more accurate when you have a multiple order. We need to simulate this exercise in excel more precisely so that we could avoid repeated numbers, duplication digits and we can also search “random numbers in excel without repeats” on SearchEncrypt.com & Google and you shall find a more suitable method. We also need to ensure Benfords law where we can apply and which data sets will give positive results.  

 

Equal Opportunity:

Data sets must contain data in which each number 1 through 9 has an equal chance of being the leading digit otherwise Benford’s law does not apply in the above larger data sets. 

 

No Definitive Proof:

Benfords law calculation can never definitely prove or disprove the presence or absence of genuine numbers. Based on the results, Financial portfolio colleague to have deeper analysis about the results and move forward accordingly. A simple visual examination (Visual Management) of the resulting chart will helpful to raise questions or concerns with the data—if the chart doesn't closely follow Benford's curve, then consideration should be given to scrutinizing the data more carefully.

 

Benford’s law importance in real time Business and Data Science:

The following simple process flow chart providing inputs together about Benford’s law importance in real time Business and Data Science:

 

image.png.9dc3b0969992980f9f29ef90df7177e9.png

 

 

 

 

The above integrated article shows how the Benford’s law can be used effectively to detect financial statement frauds. Kindly note, Benford’s law should not be used as a final decision-making tool by itself, it may prove to be a useful screening tool indicate that a set of financial statements, reports and portfolios deserves a deeper analysis.

 

 

References:

https://www.benchmarksixsigma.com/how-to-find-if-a-data-set-is-genuine/

https://en.wikipedia.org/wiki/Benford's_law

https://www.journalofaccountancy.com/issues/2017/apr/excel-and-benfords-law-to-detect-fraud.html

https://towardsdatascience.com/what-is-benfords-law-and-why-is-it-important-for-data-science-312cb8b61048

 

Thanks and Regards,

Senthilkumar Ganesan.

Email: senthillak@gmail.com

Mobile: +91-7598124052

 

 

Edited by Senthilkumar G
Link (Ctrl+Click) has been removed in one of the content.

Share this post


Link to post
Share on other sites
  • 0

The best answer for this question has been provided by Pradeepan Sekar. 

 

Also review the answer provided by Mr Venugopal R, Benchmark Six Sigma's in-house expert.

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.
Sign in to follow this  

  • Who's Online (See full list)

    There are no registered users currently online

  • Forum Statistics

    • Total Topics
      2,824
    • Total Posts
      14,274
  • Member Statistics

    • Total Members
      54,895
    • Most Online
      888

    Newest Member
    OSRSgoldgame
    Joined
×
×
  • Create New...