The value of open data in the water sector
Published on: 06 Jun 2023
By: Amy Jones
Together, Spring, United Utilities, and WRc have launched the first of Ofwat’s open data projects, with a free-to-download library of sewer images. This incredible advancement will be used to accelerate the development of AI software to assist in the inspection of sewers, helping prevent sewer flooding and overflow spills.
The launch of the main project two weeks ago showcased the results of an Ofwat-funded project that sought to remove one of the key obstacles to development and acceleration in the use of AI in the assessment of sewer condition. The well-attended event also signalled the release of a library of classified images - as an open data source - that can be used to train AI software to recognise features found within sewer pipes. The image library, which is a global first, is seen as a means of enabling innovation in this key sector by providing the key ingredient to the successful creation of an accurate and reliable AI solution: data!
This collaborative project between seven water companies (United Utilities, Thames Water, South West Water, Dŵr Cymru, Scottish Water, Severn Trent, and Yorkshire Water), saw WRc use previously coded CCTV survey footage. This was provided by the project delivery partners who collated the footage into a central storage location. A common standard format for the coded CCTV survey footage was agreed and specified along with the footage. Accompanying metadata was also provided, which included:
With some companies providing up to a terabyte of CCTV footage, the transfer of such significant quantities of data was challenging. Images were spliced from the coded CCTV survey footage using the metadata, which indicated there was a defect, and further images were captured just before and after the defect was coded to ensure the clearest image was obtained.
For each defect, five images, along with the metadata, were uploaded within the storage area to the relevant defect code folder for image categorisation validation. In undertaking this process, a total of 726,290 images were created across a range of defects. (Many defects had no images assigned to them as they simply did not appear in the data set.) WRc then successfully developed an image library of 27,262 images to act as the single benchmark dataset by checking and categorising each of the images for accuracy and clarity. For each defect, an optimum target of 1000 images was sought from the original total of 726,290 images. (This target was selected after discussion with current AI software providers as the ideal to train an AI solution.)
WRc was able to identify sufficient images for 17 defect codes (with 1000 images each), and a further 55 defects having images identified and classified which will still be useful in the development of AI solutions. So, the now released library contains images of 72 defect codes - significantly above the expected 60 defect codes identified at the onset of the project.
Feedback from the launch has been hugely positive, and ongoing discussions with current AI software providers has shown that the library has already been used by their data specialists to enhance and improve their solutions. The library is free to download from the Spring website by registering and selecting the Sewer and AI case study. Or for more information contact WRc at solutions@wrcgroup.com