Manual versus automated coding of free-text self-reported medication data in the 45 and Up Study: a validation study

Danijela Gnjidic; Sallie-Anne Pearson; Sarah Hilmer; Jim Basilakis; Andrea Schaffer; Fiona Blyth; Emily Banks

doi:10.1071/phrp2521518

RESEARCH ARTICLE (Open Access)

Manual versus automated coding of free-text self-reported medication data in the 45 and Up Study: a validation study

Danijela Gnjidic ^A ^B ^* , Sallie-Anne Pearson ^A ^C , Sarah Hilmer ^B ^D , Jim Basilakis ^E , Andrea Schaffer ^A , Fiona Blyth ^B ^F and Emily Banks

+ Author Affiliations

- Author Affiliations

^A Faculty of Pharmacy, University of Sydney, NSW, Australia.

^B Sydney Medical School, University of Sydney, NSW, Australia.

^C Sydney School of Public Health, University of Sydney, NSW, Australia.

^D Royal North Shore Hospital and Kolling Institute of Medical Research, Sydney, NSW, Australia.

^E School of Computing, Engineering and Mathematics, University of Western Sydney, NSW, Australia.

^F Centre for Education and Research on Ageing (CERA), Concord Hospital, Sydney, NSW, Australia.

^G Sax Institute, Sydney, NSW, Australia.

^* Correspondence to: danijela.gnjidic@sydney.edu.au

Public Health Research and Practice 25, e2521518 https://doi.org/10.17061/phrp2521518

Published: 30 March 2015

2015 © Gnjidic et al. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence, which allows others to redistribute, adapt and share this work non-commercially provided they attribute the work and any adapted version of it is distributed under the same Creative Commons licence terms.

Abstract

Background: Increasingly, automated methods are being used to code free-text medication data, but evidence on the validity of these methods is limited. Aim: To examine the accuracy of automated coding of previously keyed in free-text medication data compared with manual coding of original handwritten free-text responses (the ‘gold standard’). Methods: A random sample of 500 participants (475 with and 25 without medication data in the free-text box) enrolled in the 45 and Up Study was selected. Manual coding involved medication experts keying in free-text responses and coding using Anatomical Therapeutic Chemical (ATC) codes (i.e. chemical substance 7-digit level; chemical subgroup 5-digit; pharmacological subgroup 4-digit; therapeutic subgroup 3-digit). Using keyed-in free-text responses entered by non-experts, the automated approach coded entries using the Australian Medicines Terminology database and assigned corresponding ATC codes. Results: Based on manual coding, 1377 free-text entries were recorded and, of these, 1282 medications were coded to ATCs manually. The sensitivity of automated coding compared with manual coding was 79% (n = 1014) for entries coded at the exact ATC level, and 81.6% (n = 1046), 83.0% (n = 1064) and 83.8% (n = 1074) at the 5, 4 and 3-digit ATC levels, respectively. The sensitivity of automated coding for blank responses was 100% compared with manual coding. Sensitivity of automated coding was highest for prescription medications and lowest for vitamins and supplements, compared with the manual approach. Positive predictive values for automated coding were above 95% for 34 of the 38 individual prescription medications examined. Conclusions: Automated coding for free-text prescription medication data shows very high to excellent sensitivity and positive predictive values, indicating that automated methods can potentially be useful for large-scale, medication-related research.