Open Water: Six key data risks

David Tyler

David Tyler explains why MAP2 is a perfect opportunity to dive deeper into data quality

We’ve written before about the challenges posed by Open Water and how UK water companies can get a head start by working out how to align their data with the requirements of the Market Architecture Plan (MAP).

A new version of the plan – MAP2 – was released at the end of last year. Crucially, MAP2 includes a first draft catalogue of the data flows that water companies will need to be able to support by the “shadow launch” of the Open Water market in October 2016.

Now is the time for action
As the deadline looms ever larger, the release of the draft catalogue is a positive step. Although water companies still can’t count on the final requirements being identical to those listed in MAP2, at least the catalogue provides a rough benchmark that they can measure their existing data against.

At AMT-SYBEX, we’re advising several leading UK water suppliers on how to perform comprehensive appraisals of their data-sets, using a methodology that assesses six key risk factors and provides a score for the likely accuracy of the data.

Six key risks for data quality
So what are these six key risk factors, and why are they so important when deciding how to tackle your data quality issues?

  1. Source suitability
    Is each item of data being taken from the system where its master record resides? Or is it coming from a downstream system, where errors may have been introduced? If the latter, the risk of low data quality is much higher.
  2. Architectural strength
    How did the data get into your systems in the first place? Is the ecosystem engineered to maintain the integrity of data? Or does it depend on ‘leaps of faith’ – for example, relying on the data captured by the technician who installed the meter, without cross-referencing it against the meter manufacturer’s own data-sets The latter is a warning sign that this data may need to be examined more closely.
  3. Integrity of controls
    How complete is the data-set, and how much duplication does it contain? How well does it conform to a valid set of values? If it doesn’t, that’s another red flag for data quality problems.
  4. Unity of purpose
    Has a given data field always been used for the same purpose, or has it spent different periods being used to capture different things? Users will often pick a rarely used field, and use it to store information for a special project or temporary requirement. This overwrites the existing data, so it’s a potential problem if that particular field needs to be used in an Open Water dataflow.
  5. Ease of correction
    If errors in a particular field are easy for users to correct, it’s more likely that users will have fixed them already. On the other hand, if a piece of data can only be verified and updated by visiting a site and checking it manually, it’s more likely that the error will go unnoticed and uncorrected – which means more errors are likely to build up over time.
  6. Ease of transformation
    Some types of data, such as dates and postcodes, are likely to be in the right format already, or at least should be easy to convert into whatever format Open Water finally specifies. But for other types, the transformation process itself is likely to be more difficult and error-prone. For example, the location of a meter on a site may need to be expressed as a code; whereas in many current data-sets, meter location is described in a free text field. Converting the text into codes could perhaps be achieved using automated text analytics – but the results won’t be perfect, and the error rate is likely to be high.

Getting your priorities right
Once we’ve fed these factors into our model, we can generate a set of risk scores that predict which areas of the data are likely to need the most attention. Next, we assess which of those areas are most critical for Open Water compliance, and whether they will be required from day one of Open Water or can be phased in later.

By identifying the data-sets that are most critical and have the highest risk of data quality issues, we can advise on how to proceed with data improvement initiatives. So we can now create a practical, prioritised programme of work to remedy critical data quality issues well ahead of the Open Water deadlines.

Conclusion
A lot of the water industry’s concerns about Open Water’s data requirements relate to the sheer scale of the problem. A very steep rise in data quality seems likely to be required, and there is an ocean of data to deal with.

The beauty of our approach is that you don’t have to boil that ocean. By assessing the risk of poor data quality in each data-set, you can understand where to focus and prioritise. Don’t waste effort on improving things that don’t matter, or that don’t need to be done by day one. Instead, plot a smooth, simple course for Open Water compliance by charting the biggest data risks before you embark – and then you can use this learning as the basis for improving how you manage data in the future.

For more information on our offering to Water companies facing into market opening, visit our water competition page »

AMT-Sybex