• Sign Up or Sign In

Data: Predictive analysis pays off for Welsh Water

Data scientists at Welsh Water built an analysis tool that can predict when service reservoirs are at risk of bacterial water quality compliance failures, and shaped it into an easy-to-use dashboard for its operational staff.

Kevin Parry with the bacterial compliance predictive toolKevin Parry with the bacterial compliance predictive tool

The set-up

-Service reservoirs are large tanks, often underground, that store treated drinking water ready for distribution to customers through the pipe network. Since drinking water is expected to meet the highest quality standards it is subject to a weekly regulatory testing regime.

-The presence of bacteria such as E.coli (right), which may be harmful to human health, or coliforms, which can be an indicator of the conditions for E.coli to develop, leads to a water quality failure. These failures are extremely rare events, but when they occur they can require costly remedial action such as draining the service reservoir and rezoning the network; they can also disrupt service to customers and cause reputational damage.

-For this reason, if a water company can predict where a failure might develop based on the analysis of data showing contributory factors, this will prove an advantage because relatively minor interventions may be able to prevent a failure.

-Welsh Water investigated the association between bateriological non-compliance and several contributory factors including residual disinfection level (free chlorine), heterotrophic plate counts (colony counts), asset condition, rainfall and the distance between the treatment works and the service reservoir.

-The predictive model has been used to create a dashboard for Welsh Water’s operational staff to easily view the service reservoirs which have the highest risk of failure using a single metric which is colour coded. While more data is available if required, this single point provides a spur to action for time-pressed operational workers to prioritise any remedial actions necessary.

-The predictive model is an example of how data can be marshalled into actionable intelligence. Welsh Water’s Information Strategy Enterprise Roadmap (WISER) is a plan to better exploit the not-for-profit company’s data assets and turn them into beneficial action more quickly and effectively.

by James Brockett

The Service Reservoir Bacterial Compliance Model uses data from all the factors which can contribute to water quality failures at service reservoirs – such as the level of chlorine in the water, the bacterial colony count from samples, temperature and asset condition – and calculates the risk of failure from this. This is then expressed as a single, colour coded figure that gives a spur to action for workers at the utility company to carry out remedial action and investigations to protect against such a failure occurring.

Bacterial compliance failures are caused by the detection of bacteria such as E.coli or coliforms in the regular sampling regime overseen by the DWI. Such incidents usually mean that a service reservoir must be drained in order to investigate the cause, leading to significant costs to the water company, reputational damage and potential disruption to customers’ water supply.

The predictive model can help Welsh Water reduce the risk of non-compliance by prompting the workforce to carry out proactive maintenance, raise the level of disinfection or undertake other interventions.

Dwr Cymru Welsh Water has a data strategy programme called WISER (Welsh Water Information Strategy Enterprise Roadmap) which supports the predictive model and other such data initiatives acorss the business. This strategy (see column, right) provides an effective data governance framework that orchestrates people, process and technology to enable the leveraging of data as an enterprise asset.

A huge amount of data is available to water companies on their operations, and increasingly, the challenge for utilities is how to bring this data to the attention of time-pressed operational staff in the most timely, succinct and digestible fashion so they can act upon it.

If used well, today’s data tools have the power to not only give information about what is happening and to analyse it for patterns, but also to use these patterns to predict the future and to prescribe actions to improve matters – this use of data is integral to the modern vision of a ‘smart network’ for water.

For this reason, data has been described as water industry’s biggest resource, with much attention being paid to the best way to harness this resource to the benefit of customers, employees and companies.

The Application

The man behind the Bacterial Compliance Model is Kevin Parry, Principal Statistician at Welsh Water, who began it in 2015 as part of a Masters Degree he was completing in operational research and applied statistics.

The first step in the project was a literature review and research exercise, to establish a list of all possible contributory factors to a bacterial compliance failure in service reservoirs (SRVs). The laboratory testing results themselves provided the most obvious data source, giving information on water temperature, bacterial colony counts and residual chlorine levels. However, other asset data - such as the source of the water used and the distance between the treatment works and the service reservoir – were also used, along with rainfall data.

The further water travels between treatment works and the SRV, the more likely it is that chlorine efficacy diminishes; rainfall is relevant because bacteria may enter the water through structural issues, such as cracks in the roof of the SRV (ingress).

However, one area in which data was insufficient for inclusion was asset condition: while Welsh Water employees were carrying out regular condition checks every three years, the documents that were being filled out were not in a form that could be easily harvested for data analysis in the model. In response to this finding, the company has started using a new e-form with simple drop-down menus for such visits, the information from which is automatically fed into a database.

Before building the model, Parry used two stages of statistical analysis. First, univariate testing, looking for direct relationships between individual factors and failures; and second, multivariate testing, where multiple factors are examined to see how they relate to each other. The findings from these processes were displayed visually and sense checked by operational colleagues before being taken forward for use in the predictive model.

“This was very much a collaborative effort where we worked closely with our operational teams - they had plenty of buy-in and were very interested in the results as they were coming out,” says Parry. “They were in a position to say ‘that makes sense’ or ‘that’s a bit surprising’ or whatever it may be. They were very involved in that process and I used data visualisation techniques to make the more complex results as understandable as possible for them which benefited both parties.”

The strongest factors that emerged as predictors of failure were the residual amount of free chlorine in the water, and the total colony count of bacteria present. On the other hand, rainfall was found to be less significant.

For building the predictive element of the model, Parry had the choice of using any of 12 different machine learning and sampling techniques. The chosen method was eventually dictated by the imbalance of the dataset: Parry was able to draw on six years of data in total, running from 2009 to 2015, but the number of failures in this period was still relatively small. To remedy this imbalance, a method was used which created additional synthetic observations to add to the dataset, before a more standard classifier based on regression analysis was applied.

The model produced gives a single metric for the risk of failure at each SRV, and was found to be able to predict failure with an accuracy of 70%.

The next stage was to present this in an interactive tool which would be a spur to action for operational staff. This took the form of a ‘dashboard’ with the level of risk at each service reservoir represented as red, yellow or green; operational colleagues can see instantly where the points of greatest risk are by glancing at the colours on a map, and can also pull up a list of the five SRVs that present the highest risk in any geographical area.

“The advantage of using a single score is that our operational colleagues don’t have to look through lots of different reports, graphs and tables to make the decision about where the highest risk is,” concludes Parry. “It means that they can spend less time looking at data analysis and more time carrying out the remedial works that might be required.”

The User

“The bacti-predictor model has allowed my colleagues and I in North East Wales to take a strategic and scientific approach to managing service reservoir actions and to track performance across distribution areas. The tool itself is easy to use and gives us a great overview of data which otherwise would be spread over several different locations. I can compare sites and review several years’ worth of data.” Rosie Winter, Water Quality Scientist, Dwr Cymru Welsh Water

Wide Angle: Natalie Jakomis, Head of Data, Welsh Water

"Welsh Water is driving and funding a long-term permanent change in mindset and behaviour when it comes to data. We treat data as a corporate asset: it’s funded and managed just like all our other assets across the business.

“Historically, we’ve had data challenges from our corporate legacy systems, and we could have been better at getting access to data, extracting that data and generating insight from it in a timely and efficient manner.

“So what we’ve embarked on here is a data strategy journey, which we’ve called WISER - Welsh Water Information Strategy Enterprise Roadmap.

“It’s all about allowing Welsh Water to better exploit its data assets, so the insight that the data science team and the data teams across the business are able to generate can be turned into actionable behaviour more quickly and more effectively.

“We’ve undertaken workshops to look at which business areas require immediate data governance attention to improve our service: assets, customers and water quality came out as the three prioritised ‘data domains’. Within those data domains we are undertaking multiple workstreams: everything from business term definitions to classification standards and data quality key performance indicators.

“Each domain has data owners - who are accountable at a very high level for the trustworthiness and safeguarding of the data – and data stewards, who are the ‘doers’ who are directly involved with it. In the past, members of the data team have spent a lot of time improving the quality of the data, and mapping and aligning data across systems – but that’s not where their time is best spent. So as part of WISER, we’re also looking at improving that data quality automatically, by ensuring the data flows correctly across different systems.

“In a nutshell, WISER is about getting the right people involved, at the right time, in the right way, using the right data, to make the right decision, and ultimately leading to the right solution, to earn the trust of our customers every day.”

comments powered by Disqus

© Faversham House Group Ltd 2013. WWT and WET News news articles may be copied or forwarded for individual use only. No other reproduction or distribution is permitted without prior written consent.

Environmental policy           Cookie Policy