s4h logo

A recurring problem when processing public health information is the vast amount of data in spreadsheets that needs processing. When data is gathered from a number of different locations, data is collected in different formats and there is not necessarily constancy between collection sites. As a result, spreadsheet data collected for public health projects often requires reformatting and manual cleansing, a process which can be extremely time consuming. To address this issue and simplify the process, AoS Health has developed a desktop application, Open Data XLS transformer (or ODX), which can be mapped to automatically cleanse and standardise spreadsheets ahead of loading into an analytics platforms or health information systems. Although this issue is very common, this development initially came from another project, the installation of an AMR One Health Surveillance System in Nepal; the amount of time required to manually cleanse spreadsheets was raised as a major issue in-country, and work on the first version of ODX started as a customised solution to address this issue. The current version of ODX can be manually mapped to automatically standardise selected spreadsheets, however the next steps of this development include a web component to simplify the initial mapping process and ensure that the program can be adaptable and therefore more widely distributed.

Using ODX to reformat, fix and cleanse spreadsheet data

ODX has initially been developed to allow for a clean transfer of information from multiple public health laboratory spreadsheets to DHIS2 specifically for AMR AST/DST testing. The application aims to solve the manual process required to sort data and can make both “corrections” and identify “errors” in spreadsheets uploaded to a data analytics system. It can be modified to map for spreadsheets from different origins and can be programmed to be sent to the desired analytics platform (not exclusively DHIS2). The current version of ODX has been programmed to identify relevant headings and information for data analytics.

The ODX mappings can be set up to ‘correct’ and standardise column headings (e.g. gender / sex), common spelling mistakes or name variations for pathogens and antibiotics and date formats (e.g. Gregorian and Bikram Sambat date formats). It is also able to identify any ‘errors’ in a spreadsheet that would prevent data rows from being sent into an analytics platform, such as missing information or information entered in an unrecognised format. If header information in the public health spreadsheets is not located in the top rows, then the user is able to re-assign the header row; when uploading a spreadsheet into ODX, the user (data manager) will have to enter the row number for where the header. If a laboratory wants to submit a spreadsheet with information for individual sample type, the user must enter the correct sample type before selecting ‘cleanse’, and ODX will automatically include a column with this information for submission.

As no personal or identifiable information is collected as part of the ODX data transmission, ODX can generate a record ID so that the individual record is traceable without it being possible to identify an individual from information contained within the record. This record ID is randomly generated and contains no record information.