Back to Insights
unstructured wire
Share this Article

The truth about unstructured data

  • Publish Date: Posted over 1 year ago
  • Author:by Kevin Hall

What is the true cost of unstructured data? Unstructured data makes up around 80% of many organisations' data. Kevin Hall, Delivery Director at ECMS, offers up three compelling reasons why you should know where it is and what to do with it.

There is a sector wide challenge around data, more specifically unstructured data which has traditionally been owned and used by many functions within organisations. The lack of a single point of accountability however can create a lack of impetus for solving the challenges around it.

This comes into sharper focus when it is recognised that data, structured and unstructured, is being more heavily scrutinised by the regulators, under the guises of operational resilience, which extends further to financial and technical resilience. 

Data, data everywhere 

We live in an age of Big Data, where the explosion of information means we have more opportunity for insights at our fingertips than ever before. For the insurance industry, the perennial challenge is how to distil these down into what is material to the organisation, and how to gain access to them -in real time if we can.

But a growing amount of our organisation's data - up to 80% to 90% according to Gartner - is held in 'unstructured' formats, such as email messages, PDFs, photos, videos, and audio files.

Here are three reasons why your organisation needs to keep tabs on and actively manage its unstructured data.

Cutting cost

With inflation spiralling and recession looming, the unnecessary cost of storing duplicate (or multiple) copies of the same unstructured data files suddenly appears extravagant.

Many organisations encourage the use of collaboration portals, such as SharePoint, to encourage information to be stored centrally, but in reality people opt for the most convenient tool - typically email - which means there can be multiple versions of the same files in different repositories.

This is particularly wasteful when you consider that most unstructured data in the enterprise is either cool data (over 30 days old and infrequently accessed) or cold data (over 90 days old and rarely accessed). It is like clutter in a house, as Larry Ponemon describes it, lying around but not being used.

Consider that the cost of storing a petabyte of data on premises for five years could be well over £1m (for both on premises storage and cloud) and that some insurers store hundreds of petabytes. The ability to delete unused files and reduce duplication is therefore a big saving off the company's bottom line.

According to some estimates, we will be creating 463 exabytes (with one exabyte equivalent to 1,000 petabytes) of data globally by 2025, much of it generated by IoT technology. Some of this information will be useful and will need to be stored long term, but much of it will not.

If too much of your unstructured data is sitting on expensive primary storage systems, there are obvious cost cutting benefits to taking a different approach, particularly as the gap between data growth and storage budgets widens.

Preventing leaks

Unchecked data growth, combined with a lack of visibility, is increasing the risk of breaches, ransomware and compliance violations dramatically. It is a trend that has been exacerbated with working from home and hybrid working, which has encouraged the wide use of multiple data repositories. 

Cybercriminals know unstructured data is the low hanging fruit. Most employees create, share and store thousands of documents in a given year in informal repositories such as email and Slack, very few of which are properly classified, stored or secured.

Moreover, unstructured data is an attractive target because it contains a wealth of personally identifiable and sensitive business information that can be leveraged for ransom or identity theft.

If consumer data within unmanaged files is exposed during a data breach, it can result in hefty fines (up to 4% of global annual turnover under GDPR regulation) in addition to significant reputational damage and potential litigation.

And for insurance buyers, cyber premiums are likely to be significantly higher if you cannot demonstrate to your underwriters that you know where all your unstructured data is held and that you have taken proactive steps to manage it.

Extracting value

Determining whether there is any value to the organisation contained within unstructured data is no easy task, but it can offer an important competitive advantage. 

For insurers, the insights contained in some data sources - including satellite and aerial imagery and IoT connected devices - could offer the opportunity to innovate, using telematics or parametric triggers, for instance. There are use cases for claims processing, underwriting and detection of fraud among other things.

But in order to identify and extract the useful data (and discard content that is lacking in value) unstructured sources of information need to be approached and mined in a different way to conventional data sources.

These days, some of the more emerging technologies - including AI and NLP - are being used to develop scanning tools that can extract value and save time. The hardest part is identifying rules around the data you want to source, or having the ability to write your own custom rules so you can tailor the product to your own organisational needs.

Inevitably, many of the early scanning tools are based around data compliance with a focus on identifying PII. The next stage will be the creation of something that is more bespoke and tailored, so organisations can start to properly unlock the unstructured data goldmine.