Skip to main content

Life Sciences

[Webinar Recording] Automate the Coding of Drug Names With WHODrug Koda

Whodrug Koda

On July 23, 2020, Caz Halsey, Director, Life Sciences, Perficient, and Damon Fahimi, Product Manager, Uppsala Monitoring Center, hosted a webinar that discussed WHODrug Koda and how it can be integrated with other systems using an application developed by Perficient.

The topics covered include:

  • WHODrug Global and Drug Coding Background
  • An Introduction to WHODrug Koda
  • Why Use Koda
  • Methods of Using Koda
  • Perficient’s Koda Interface – Prototype
  • Use Cases

For more information about WHODrug Koda or Perficient’s solution that is capable of integrating Koda with your systems, please reach out to Caz Halsey at


Webinar Transcript

My name is Caroline Halsey. I am a Director in the Life Sciences Business Unit at Perficient. I have 20 years’ experience in the pharmaceutical industry, including:

  • Clinical data management
  • Project & program management
  • Process design & system implementation
  • Computer system validation

For the past few years, I have been focussed on medical coding processes and systems. Today, I am here to describe how you could integrate the UMC’s WHODrug Koda coding engine into your coding process via its webservice API and I am pleased to be joined by Damon Fahimi from the UMC to discuss this.

Damon will introduce WHODrug coding and WHODrug Koda. Then I will discuss how it could be used and what we are working on at Perficient to enable the integration of Koda into medical coding processes.

Thanks Caz! I’d like to start us off by giving you all some background on who we –UMC –are, and what WHODrug Global is, before talking about the star of today, WHODrug Koda. After that, I’ll hand over to Caz to let her guide you through Perficient’s work and experiences with Koda. And after our respective presentations, we’ll of course both be available for questions, and I’ll be glad to answer questions related to Koda itself from the UMC side, and again Caz can answer questions related to the work done by Perficient.

But now, to get everyone on the same page, I wanted to start us off with some basic information on who we, the UMC, are. So let’s take it from the very beginning.

UMC, or Uppsala Monitoring Centre was established in 1978to support the WHO Programme for International Drug Monitoring, and since this point we‘ve been working as a WHO Collaborating Centre from our office in Uppsala, Sweden, which also happens to be my hometown!

We have a vision of a world where all patients and health care professionals make wise therapeutic decisions in their use of medicines and our mission is basically to achieve this by working with and promoting pharmacovigilance on a global level.

We are doing this in different ways, but given the scope of today’s presentation, I would like to draw your attention to this; WHODrug Global, which is something that we at the UMC have developed and are maintaining. I think most are more or less familiar with WHODrug Global, but basically WHODrug Global is a drug dictionary used for drug coding and mandated by leading regulatory authorities.

Before we start to talk about WHODrug Koda, I think it’s really important that we all understand what the most significant challenges in regards to drug coding are. So on these slides are some of the key challenges, as reported by WHODrug users when we’ve been talking to them,

First of all, drug coding requires highly trained staff, not only within the field of medical coding, but also when it comes to internal coding convention.

What’s more, the volumes of drug data to structure is constantly increasing. Did you for example know that the number of unique clinical trial studies increased with nearly 90 % between 2010 and 2019. Of course this leads to more drug data to code. To top it off, we also need to have a scalable operation, so that we can handle temporary peaks of data to code, often with tight deadlines.

To be able to make best use of the coded data, of course both coding quality and consistency is really important. How do we make sure that everyone within a coding team, all codes the same?

Finally, we have come to understand that the available supportive tools and processes, such as synonym lists and existing autoencoders, do not always fulfil user needs and takes a lot of time to figure out, maintain and version.

With all of this in our minds we asked ourselves; could the UMC potentially step in and do something that could facilitate the drug coding process by increasing coding efficiency & consistency and making use of technology and WHODrug know-how?

So, a couple of years ago, we started to ask ourselves, do we, the UMC, have a role to play here.

What if…

Based on all of this, we kind of said that, well, maybe we can’t just be an organisation that provides WHODrug Global, and leave all the challenges that is pertaining to the drug coding itself to our users. Don’t we have a responsibility, as to also make sure that we facilitate the drug coding process? We also realised that we actually have unique insights on WHODrug Global, as we are the maintenance organisation and also have connections with expert WHODrug users world-wide, both from the industry but also on the regulatory side of things.

Based on this, we initiated a project in which we wanted to explore if we, with the help of machines, could find an intelligent approach to automating drug coding while still using the human brain, when needed.

The project evolved and I am actually really glad that we, after spending very, and I mean VERY, much time discussing this internally and with reference WHODrug users worldwide, were able to make something available to WHODrug users in March 2019 namely, WHODrug Koda.,

Koda feeds on raw data that is normally collected in both drug safety and clinical trials, namely; drug verbatim, route and indication. None of the information needs to be pre-coded, so you just provide Koda with the information coming directly from the sites.

Drug coding can in fact be divided into two parts; drug NAME coding, and ATC selection, if there are more than ATC code assigned for a specific drug name. What WHODrug Koda spits out is both coded drug names but also one selected ATC code, based on the raw data.

Please note that Coding is performed even if indication or route, or both are missing. But naturally, the performance and accuracy are much higher if all three values are present, as it would be for a manual coder.

This is a screen shot from the WHODrug Koda web application, which we will talk about in a few more minutes. But basically, this slide shows you the same information as on the previous slide.

You have the input data on your left hand side. Please note that none of the input data needs to be pre-coded or structured as Koda for example can handle spelling errors.

On your right hand side, you can find the output from Koda; coded drug names and selected ATC codes.

WHODrug Koda performs drug coding on two confidence levels. Whenever WHODrug Koda is able to provide high certainty predictions, the output data has normal colours. When WHODrug Koda is providing lower certainty predictions, there will be a yellowish background colour as seen here.

On the next few slides, I will try to describe how WHODrug Koda can do all of this. This is something I could speak on for hours, but I’ve tried to condense it a bit for you today, so please bear with me.

From the very beginning, we were kind of certain that appropriate use of the latest technology would be a crucial factor for us to succeed and we looked into Artificial intelligence and Machine learning basically from day 1. In the end, we did make use of a supervised machine learning model via logistical regression. Basically, this is a classification algorithm used to assign observations to a discrete set of classes.

We provided the model with a lot of training data, which I will talk more about in a minute, which is basically teaching the machine learning model how to perform the task we want it to learn. And eventually, the idea is that the model should be able to deal with data it never has seen before, and make correct predictions.

An interesting aspect is that the machine learning model is mostly applied to the ATC selection part of Koda, and the reason for that is that the number of so called classes available for drug name coding is as many as there exist drug names, that is over half a million. The number of classes of ATC codes, is much more suitable for this machine learning model, with around 1,400 classes. With that said, we did actually also use machine learning for the drug name coding as well, but then as one component out of many other, which I will describe in a minute.

So how were we able to teach our friend Koda to perform in a good way? Well, training is absolute key, and we were extremely fortunate in that we could make use of something called VigiBase.

VigiBaseisa WHO global data base consisting of ICSRs from all over the world that is maintained by the UMC. Within VigiBase, there are millions unique combinations of verbatim, route and indication. What’s more, the drug data in VigiBasehas already been coded by the coding team at UMC –which makes it a perfect training data set for Koda. Also, as VigiBaseis basically refilled each and every day, we automatically get new training data with we could use to optimise Koda for each and every version of Koda.

To be honest with you, from the very beginning, we thought that technology such as artificial intelligence and machine learning would alone be able to solve a large part of the drug coding dilemma. However, we quickly learned that this was not the case. And I want to stress that one important thing we learnt with this project it’s that technology itself does never solve problems. What solve problems is how technology is being applied.

In our experience, this meant that the key success factor for us was to make use of our –and WHODrug user’s –WHODrug expertise into the solution.

One important piece of the puzzle was to incorporate built-in coding rules, based on the latest regulatory expectations and best practices. Both systemic, coding rules, which are static rules that can be regarded as foundational for all WHODrug use cases. In addition, there are also some rules that are so called dynamic, and can be turned on or off, dependent on internal coding conventions.

We also included spell checks and algorithms to help Koda identify key components of the raw data provided to Koda. As you can see from the examples on this slide, drug verbatims can be quite complex and making sense of the data is of course then very important for Koda to make wise predictions.

I would say that maybe the most important piece of Koda’s intelligence originates from WHODrug expertise, from pharmacists and medical expertise at UMC, who are working with maintaining and developing WHODrug on a full-time basis but also by making use of the intelligence of WHODrug users world-wide.

There is also a direct feedback loop which users of Koda can make use of to influence coding predictions. This feedback functionality is available for all WHODrug Koda users and allows users to indicate potential prediction errors. This feedback is then made available to the Koda maintenance team at UMC which valides the feedback and can retrain Koda accordingly.

So now you know quite a lot about how WHODrug Koda works, but how about performance you might ask? Let’s have a look at some results.

First of all, it’s important to understand that there aren’t any absolute results. Of course, the results will vary according to the data you feed WHODrug Koda with, and how many of the three input parameters that you have and so on. But with that said, we have seen quite a few big batches of data, from various external sources, been run in Koda, with more or less consistent results, as per this slide.

We are gonna start with the efficiency, that is the percentage of the terms that WHODrug Koda can actually code.

For the drug name coding, Koda codes around 95% of all drug names, 80% of which are high-certainty predictions and with an additional 15% lower certainty predictions. And around 5% of the data remains uncoded.

If we then look at the ATC selection, WHODrug Koda can successfully and confidently select ATC codes for around 80 % of the coded drug names, with an additional 20% selected with lower certainty. Please note that Koda always provides at least a lower certainty ATC selection, as long as indication information is provided. If you do not provide indication information, you will of course see lower percentages.

I think that the maybe most important aspect of looking at the results is probably how often WHODrug Koda is right vs. wrong?

When we have manually evaluated the precision for Koda predictions, we’ve seen that Koda is correct in more than 99% of the cases. We have also seen that Koda actually makes mistakes just as often, or oven more rarely than human coding teams, which to me is very promising.

If you want to dig into the details of what you can expect with Koda in terms of results, I would finally like to recommend you to follow the link on this slide, describing an evaluation of Koda from Novo Nordisk.

Finally, how can WHODrug Koda be accessed? The service of WHODrug Koda is available as both a web application and as an API service, which can be integrated in your existing coding tool or similar.

Our vision is that the web application might be helpful for evaluations of WHODrug koda features or coding concomitant medications and ATC classes in smaller studies.

While, if you are planning to code large amount of data, UMC recommends to implement the Koda API in your workflow, for example by making use of Koda within your coding tool.

Before handing over to Caz, I just wanted to explain a bit about the collaboration we, UMC, have with Perficient.

This means that they have all information about how the API works, they have validation information and of course, we are also supporting our Perficient colleagues with all the support they need in regards to their work with Koda.

For us, collaborations with Perficient and other vendors and implementation partners are extremely important, and I really want to thank Perficient for a fruitful collaboration so far. And of course, thanks for having me here today on this webinar as well.

With that, we’ve reached the end of my part of the presentation. So Caz, please take it away! And I’ll of course be here for questions in the end of our slot.

As you have heard, Koda can help you with your daily drug coding business, but it can also help to recode data by upgrading your WHODrug version or even your WHODrug format.

For example, when recoding a study from B2 to B3 a large number of terms may no longer code. Submitting those terms to WHODrug Koda could dramatically reduce the manual effort of recoding.

One of the main challenges in the use of AI in medical coding is the significant time, effort and volume of data required to train the AI system. With WHODrug KODA the training has already been done for you.”

We’ve heard from Damon, one way to use Koda is via manually uploading and downloading data through the Koda UI, but it is also possible to fully integrate it into your systems by using the Koda webservice Application Programming Interface.

Now, using webservice APIs is not always straightforward, as you have to deal with technicalities like transforming data, generating network traffic, security cerificates etc.

It would simplify integration if you had an additional interface level available, which hides all the technical bits. Some of my Perficient colleagues have built a prototype of such an interface, and this is what I’d like to talk about today.

I’ll give an overview of the design, usage and possible integration and let you know how to find out more if you are interested.

In order to build such an interface we needed first to decide on a platform, and we have chosen Oracle. This choice was kind of arbitrary, and of course you could build a similar interface on any other platform.

This prototype consists basically of

  1. a table which contains the drug data to be coded and mimics the source application
  2. plus some programming objects in PL/SQL, which interact with the webservice API right out of the database.

The only system requirement is that you have Oracle version 11 or higher, and you need to have the UMC certificates installed and an open network connection from your database to the UMC server. No further infrastructure is needed.

In addition to the interface a custom adapter is required for each source system

Here you can see the designed workflow.

The source system or application sends the drugs to be coded via the custom adapter to the interface.

The interface converts the data into the format expected by the Koda application and calls the Koda webservice API.

It then gets back the response from Koda with coding information, parses the response and updates the drug data with the coding information in the source system.

Triggering the interface could of course be an automated process, e.g. a regular batchjob or automated execution if the table of drugs has been updated.

This design actually hides all the technical complicated bits from the end user or end application.

Now lets have a look at a real example.

The columns with blue heading here reflect data which is exchanged with Koda.

Initially the drug table contains drug verbatims, some of them with route and / or indication.

Now we trigger the interface.

The interface now

  1. Transforms the data into Koda format
  2. Builds the network connection and send data to Koda
  3. Gets the response in Koda format
  4. Parses the response
  5. Puts coding information back into the source table

This is the part that nobody wants to deal with as it looks like endless lines of code.

This is what the table of coded data looks like after the interface execution.

The interface has updated the code and ATC code column with some values.

Please note that here you can only see the encoded information, the comments column shows some information on suggested drugs if available. Let’s have a look at a few examples in more detail…

Line 1 is paracetamol, which is correctly coded to its drug code and ATC code although no indication was provided. This is because Paracetamol has only one ATC code.

Lines 2 to 6 are some records with Aspirin, which has 4 ATC codes from 4 different classes, so is fairly distinctive. For correctly spelled indications the drug code and ATC is always correct, for the misspelled indication in line 4 the ATC code is incorrect.

Line 7 to 11 are some examples where we have tried non-unique drugs, non-existing names, umbrella terms, drugs with multiple ATC codes from the same class etc. All codes which were provided were as expected and correct. Suggested information is also as expected and correct.

Of special interest are lines 12 to 14, which is an example on how you could use Koda for a version or format upgrade. The tradename SEBCUR was non-unique in a previous WHODrug version and became unique in a more recent version. Koda is able to correctly encode or suggest the code from the current version if you pass in the tradename with ingredients attached as it was stored in the previous version.

Such an Oracle based interface could be potentially integrated with any Oracle based system, for example Argus Safety, where you could use it for coding during case entry, or for coding during case save.

Or you could integrate it with Oracle TMS, where you could for example auto-create non-approved VTAs for coding omissions which you send to Koda and where Koda returns a code.

As already mentioned, this prototype is currently Oracle based, but could be explored for other platforms as well, so please don’t go away with the impression that this would only work if you use Argus and TMS.

This wraps up today’s webinar, thank you all for attending.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Garrett Hill

More from this Author

Follow Us