Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta,

Dark Data is the information that an organization acquires during its business as usual activities but this data is not analyzed for making any business decisions. E.g. log files, incident reports, old employees data, old financial data, old video files etc.).

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Mohamed Asif on 27th Jul 2022.

 

Applause for all the respondents - Sohan Subhash Mirajkar, Chandra Shekar, Rahul Arora, Mohamed Asif, Ravindra Kulkarni, Kiran Kumar Gadhamsetty, Rohit Chaudhary.

Featured Replies

Q 490. As per data analytics industry reports, about 50-65% of data that any company collects is not being used for making business decisions. This is known as Dark Data. What are the challenges associated with its usage and storage? Provide examples of organizations that have been able to successfully tap the dark data for analytics. 

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Solved by Mohamed Asif Abdul Hameed

As Gartner describes dark data as the information assets companies collect, process and store during regular business tasks, but generally forget or refrain to use for other purposes (Egs. Analytics, Business relationships and Direct Monetising). Dark data often comprises most organisations’ entire universe of information assets. Hence companies often keep the dark data for compliance purposes only. Storing and securing dark data mostly incurs more expense and sometimes greater risk than value.

 

Dark Data types mainly include:

  1. Log files (servers, systems, architecture, etc.)
  2. Previous employee data
  3. Financial statements
  4. Geolocation data
  5. Raw survey data
  6. Surveillance video footage
  7. Customer call records
  8. Email correspondences
  9. Notes, presentations, or old documents

Companies like IBM, BMC, Deloitte, Globallogic have used dark data for successfully tapping the benefits of dark data for analytics.

Dark data is the information that organizations collect, process and store during regular business activities, but generally fail to use for other purposes like analytics, business relationships and direct monetizing. Dark data often comprises most of the organizations’ universe of information assets. Thus, organizations often retain dark data only for compliance purposes. Storing and securing this dark data typically incurs more expense & sometimes greater risk than value to the organization.

 

Examples of data :

·       Emails

·       Employee records

·       Internal processes

·       Video and sound recording data

·       Log files

·       Geolocation information

 

Organizations may consider this dark data as too old to provide value, incomplete or redundant, or limited by a format that can’t be accessed or processed with available tools. All too often, they don’t even know that this data it exists with them. However, dark data may be one of the organization’s biggest untapped resources.

Data is increasingly a major organizational assets and competitive organizations will need to tap into its full value.

Globally, about 55% of an organization’s data is considered “dark data”. That means this data is unknown, undiscovered, unquantified, underutilized or completely untapped.

 

Challenges with dark data:

·       Storage Costs

·       Regulatory Compliance

·       Data Governance

·       Data Visibility

 

Companies using dark data for analytics:

1.       Netflix is using data to create new blockbuster hit series. Netflix utilizes data to run predictive analytics to learn what exactly their customers would be receptive and interested to watch

2.       Google is utilizing people analytics for a better workplace. People analytics teams dig deep in to their data and analyze employee performance review and feedback surveys to better understand how to build a better boss

3.       Coca coal uses power of image recognition technology and data analytics to target uses based on the photos they share socially. It gives them insights in to the individuals drinking their products, where they are from and how their brand is being mentioned.

Dark Data refers to the data generated by organizations in their daily activities which is not being leveraged to derive insights for decision making & it is estimated that most organizations are only 1% if their data & the data is only being stored for record keeping & in some cases for regulatory compliance purpose.
 
A lot of the dark data collected by organizations is unstructured which makes it difficult to categorize & perform analysis upon. Also most of the organizations do not have the adequate reporting capability to extract insights from the data. Another aspect is that even the useful data will have a probability to become dark data if organizations are not able to extract timely insights from the same as those insights will lose its value. For eg: Information on customer demographics can help an organization to roll out promotional offers to that customer, however if data is not processed immediately then it becomes irrelevant to be actioned upon later.  
 
From a storage perspective there will be opportunity costs associated with under-utilization of the data & performing meaningful actions accordingly in addition to the energy costs linked to storage of the data. Now from a value perspective only approximately 14% of the data stored in data centres are critical to the organizations & rest of the portion contains dark data & under-utilized data. This ongoing storage of dark data can pose a serious risk to organizations particularly if the data contains sensitive information & can result in serious legal & financial repercussions in case of any data breach which will ultimately tarnish an organization’s reputation.
 
However there are many organizations in recent times that are able to successfully leverage the dark data in order to derive valuable insights from it. Some of these examples are:-
  • E-commerce giant Amazon harnesses the insights generated from each & every user activity to provide recommendations to its users. In addition to that it also leverages Amazon Web Services (AWS), their cloud computing capability to store massive customer data that gets generated on daily basis & also provides AWS cloud services to other organizations for efficient storage as well as insights on the data.
  • Indiana University Health or IU Health is exploring ways to leverage dark data in order to personalize health care for its patients.
  • Stitch Fix, an online subscription shopping services is deeply monitoring its customer preferences in order to thoroughly understand each customer’s sense of style & then leveraging this understanding to create personalized fashion.
 
Thus with advancements in analytics tools & technologies, it is now very much possible to extract deeper insights from a large amounts of data.
  • Solution

Data is more valuable than Oil”, nevertheless are we leveraging it to the extreme capacity? 
The answer is simple, it is "No” and it simply becomes dark data!
 

Gartner, coined this term ‘Dark data’ and defines it as “The information assets organizations collect, process, and store during regular business activities, but generally fail to use for their analytics, business relationships and direct monetizing” 

 

Dark data can be generated by organization’s systems, devices, and interactions and typically most of the time it is the CRM, ERP, SCADA, HTTP, IoT and even WIFI systems which collects the data. 

 

It can be stored physically or on the storage peripherals or in cloud. While most of the data is unstructured, some of the examples of Dark data includes that of below, but not limited to the list,

  • Application logs
  • Customer records
  • Geolocation
  • Survey data
  • Financial statements
  • Customer Address 
  • Contact details
  • CCTV footage
  • Emails
  • Chat messages
  • Medical records
  • Zip files 
  • Archived web content 
  • Code snippets 

Biggest challenges with regards to dark data is with regards to:

  • Security dangers (hacks) 
  • Compliance issues 
  • Data authenticity and 
  • High Storage cost
  • Brand Reputation 
  • Opportunity Cost

Risk associated with the dark data can be easily mitigated by adhering to audit and retention policies defined by the organization. However, some best practices can have high impact to manage the risk associated with the dark data. 

 

The below model typically shows how the data is collected, stored, retained and deleted, more from an analyze, categorize and classify approach. 

 

Model.thumb.png.a9224f338b42b19f13826cbe388b04c5.png

 

Model Explained: 

 

Starting from Data classification (Public, Internal, Restricted) 

Class.png.71462815cbae376b1b987919e5526960.png

 

While we classify, it is vital to bucketize based on few critical factors, viz.,
Critical data?
Permanent document?
Proprietary Intellectual Property?
Document/data serves the current needs of the operations?

Legal and regulatory requirement? (For instance, w.r.t HIPAA, 6 years minimum retention. In contrary, GDPR allow data storage for an extended period, however, solely should be used for the purpose of public interest, statistical analysis and for historical research only)
Hot Data or Cold data? (hot data is accessed frequently and used for quick decision whereas cold data is old data and are not frequently used)

1995429864_Hotdata.jpg.25908a6008e1a3259e2137ef1f43736c.jpg

Based on the classification, then deciding whether to store or delete.
If we wanted to store what is the retention period and how it will be useful.

 

When we follow this approach, along with Regular data Audit and internal Data Life Cycle Management (DLCM), we can make the maximum utilization of the data from the data pool. 
 

Ways to leverage Dark data:

  • Text Mining / Word mining 
  • Data mining methods 
  • Voice to Text analytics 
  • Data analytics 
  • Prescriptive analytics 
  • Behavior analysis, which can be used to train AI models for prediction 
  • Big data analytics and visualization (SAP HANA)
  • Data Forecasting 
  • Trend Analysis 
  • Investigate past complaints 

Google’s approach to data management:
“Some data you can delete whenever you like, some data is deleted automatically, and some data we retain for longer periods of time when necessary. When you delete data, we follow a deletion policy to make sure that your data is safely and completely removed from our servers or retained only in anonymized form.”

 

Apple’s approach to data storage:
Apple uses personal data to power our services, to process your transactions, to communicate with you, for security and fraud prevention, and to comply with law. We may also use personal data for other purposes with your consent.
 

Final say:

Data violations have earned a lot of notice in recent years as businesses become more dependent on digital data, cloud computing, and remote working. As a result, compliance and regulations have emerged as a requirement for ensuring information security. 


Using data analytic application suites can manage unified unstructured data effectively and can provide intelligent identification of data sets in the organization which can be in line with the industry legal and regulatory requirements.
 

Dark Data is defined as the data resources that businesses gather and keep as part of their normal business operations and protocols, but do not use — for example, data analytics, decision making, monetizing etc.,

 

 

Challenges associated with Dark Data:

 

1.     Dark data is mostly unstructured. It takes enormous human effort to data wrangling and make it usable.

2.     High Storage Cost: If dark data is not actively managed, it can turn in to clutter and end up taking valuable space on storage devices, slowing down the servers. If dark data grows, the maintenance and storage cost will grow as well.

 

Advantages:

1.     Businesses can utilize dark data to extract useful insights which otherwise would have remained hidden. These insights will help decision making, reduce risk and increase ROI.

2.     Dark data has potential to create new revenue streams, streamlining processes and reduce costs. It helps understand the relationships between apparently unrelated pieces of information. 

 

 

Companies like Google, Amazon, Facebook make extensive use of dark data.

 

Stitch Fix an online subscription shopping service uses images from social media and other sources to understand emerging fashion trends and evolving customer needs. They make clients answer a detailed questionnaire about their tastes in clothing. With client permission, the companies data science team augments that information by scanning clothes images on customers  social media platforms. From the insights generated, they gain a deeper understanding of each customer’s sense of style and customise their product offerings.

Dark data refers to all the information that is generated by systems & applications but not being used for any value adding purpose. It also includes the data that organizations are unaware of.

 

Storage challenges:

  1. Volume of dark data is approximately 50-65% of all data generated. In 2020, 2.5 quintillion bytes of data was generated. We need an extensive storage network to accommodate increasing amount of proportional dark data that's getting generated.
  2. Though data storage has become inexpensive, organizations continue to have this expense of storing dark data in the anticipation of future use.
  3. Risk of security comes with the volume of data. Though not being used after storage, the data needs to be protected from unauthorized use. Cost factor on storage increases to accommodate data security.

Usage challenges:

  1. If the purpose is unknown or if the data availability is unknown, there is a greater difficulty in using the dark data.
  2. Dark data is highly unstructured. Bespoke applications need to be developed to structure dark data.
  3. Need advanced big data applications to understand inferences from high volume of dark data.
  4. High data retrieval times can consume big chunk of the dark data analytics budget.

Organizations are increasingly using dark data analytics to unlock hidden opportunities. Indiana University Health uses dark data analytics to personalize patient care model. Stitch Fix uses unused social media data to customize clothing styles. (Source: Deloitte Insights).

 With advent of data collection techniques, such as data generated by sensors and analog-to-digital converters, organisations are inherently enabled to capture more data than they can analyse using current Analytical abilities & Business Intelligence tools to derive insights for decision making.

 

 Key challenges with usage of Dark data are:

1.      Limitations of resources, such as lack of highly skilled analytics teams or limitation of BI tools, which are currently available in organisation to process such huge amount of data.

2.     Data being unstructured, that is data is in difficult to categorise formats, hence not feasible for analysis.

3.     Expensiveness of processing dark data leading to non-usage.

4.     Delay in processing dark data leading to un-usable insights. Such as, geo-location of customer if not known immediately might turn out to be un-usable at later point.

 

 Storage of Dark data also poses few issues for organisations, such as:

1.      Storage of dark data, which could be upto 99% of data collected for some organisation, requires lot of energy. As per one study by James Glanz in Sep 2012, published in New York Times, 90% of energy is wasted by data centres for storing dark data.

2.     Breach of sensitive information, especially of customers and organisation, could lead to issues like Identity theft and might do more harm than any good.

3.     Higher cost of audit & compliance for storing dark data.

4.     Losing, or requiring more effort to obtain good data and insights due to enormous volume of dark data.

 

 However, there are efforts made in direction to reduce harmful effects of storing Dark data, such as discarding data which are not used through timely audits.

 

 Also, to improve usability, organisation are looking to employ higher computing capability, highly skilled MIS teams, AI capabilities & use of techniques like Hadoop or Splunk integrated with Big Data analytics to process faster and larger amount of dark data. There are technological advancement happening at rapid pace to reduce cost pressure also to analyse dark data by companies such as Veritas & Datumise.

 

 Organisations such as Amazon, Apple, Google, Facebook & Bloomberg are aggressively tapping into Dark data using cutting edge methodologies such as AI enabled data collection and analysis, advanced big data analytics etc.

Winning answer to this question has been written by Mohamed Asif. His answer highlights the techniques to correctly classify the data and then decide whether it is required to be stored or not.

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.