The creation, validation and maintenance of the medical knowledge base follow rigorous and well-established procedures. The content development process is divided into stages, as outlined in the diagram below. Stages include:
The process can be efficiently repeated with automated regression tests to ensure the stability of the system.
Content development begins with defining the scope of the desired changes. With the help of our data scientists, our medical team analyzes statistical information collected by our symptom checkers to discover which areas of our system should be further expanded. This may include adding new pathologies or symptoms or expanding representations of currently available conditions.
Scope definition may also take into account the requirements of or data provided by our customers and partners, specifying their target audience (e.g. geography, age groups, or range of expected conditions).
Once the scope is defined our medical content editors collect quality literature regarding the newly introduced topic. Metabase allows them to enter evidence-based information in the form of probabilistic characteristics of a given condition’s prevalence, the influence of risk factors, and sensitivities of symptoms that can be assessed in both categorical and numerical formats. Each piece of introduced medical information contains a reference to the original source, such as a book or article.
Each change in the medical content is stored in the Metabase history, including information regarding the time and author. The system makes it possible to track, review and comment on any change.
The next step is peer-review of the new content by another medical content editor. The newly provided data and all sources are validated. Possible improvements are discussed and introduced by the editors. Metabase simplifies collaboration between doctors by offering features enabling exchange and resolution of comments. Doctors can also notify other editors about their progress by using a system of flags and statuses (e.g. condition in progress, ready for peer-review, requires attention, etc.).
After finalizing the peer-reviewed definition of the new content, our doctors are responsible for finding and entering literature-based clinical cases to validate the performance of the system on the newly provided medical content.
We continually test our medical knowledge base and reasoning algorithm against cases reported in journals, including complex CPC cases published in The New England Journal of Medicine, to ensure that it performs for every possible clinical presentation.
In order to provide a satisfactory performance level, we have developed a methodology for monitoring and ensuring the quality of the system using real patient clinical cases. In parallel with medical knowledge base development, we assemble case reports consisting of clinical situation descriptions and a list of expected outcomes (diagnoses). We collect patient cases from reputable and well-known sources such as BMJ, NEJM, The American Journal of Medicine, Mayo Clinic Proceedings, BioMed Central, Oxford Journals, and many others including educational literature (e.g. 100 Cases in Clinical Medicine) as well as United States Medical Licensing Examination Step 2 CK tests. In the future we may consider publishing or open sourcing a complete list of sources and cases used.
Construction of a clinical test case begins with finding a source report. In the example below we refer to the case of a “33-Year-Old Woman With Epigastric Pain and Hematemesis”, published in Mayo Clinic Proceedings, Mayo Clin Proc. 2012 Feb; 87(2): 194–197.
The source article is then carefully analyzed by a physician and all clinical features are extracted and inserted into a newly created case in MetaBase. The clinical features include the age and sex of the patient and any confirmed or excluded findings, such as symptoms, risk factors and lab test results. Every test case must also contain the resulting condition (diagnosis) and the acceptance criterion.
The acceptance criterion applied to each patient case is that the condition is in the top 3 or top 5 positions—depending on the complexity of the case—of the differential diagnosis ranking. Since the goal of the system is to identify a group of likely conditions, rather than suggesting a definitive single diagnosis, positioning recommendations within such an interval is a reasonable approach that has been used in studies of various CDSS systems.
Once all clinical features and test criteria have been provided, MetaBase will automatically validate the newly created test case against the diagnostic engine. This procedure is repeated in cycles and serves as a regression testing framework for our system. As shown in the figure below, the case of the “33-Year-Old Woman With Epigastric Pain and Hematemesis” meets the acceptance criteria, as “Peptic ulcer” is listed in the first position of the differential diagnosis ranking.
The newly created content is then verified by a doctor from our expert panel who has relevant experience in the given area. Experts can return the process to peer-review in case of any issues found in either content definition or clinical cases. When the new content is accepted the technical review begins.
Technical review is performed by a data scientist, who checks whether the content has been developed according to internal guidelines and identifies any potential issues involving the structuring of the medical content (e.g. checking for duplicated symptoms and verifying the hierarchy of symptoms and numerical parameters introduced by the doctors). The technical reviewer works directly with the authors or experts to resolve any doubts that arise.
At this point the new diagnostic model is built and all clinical cases are executed. This process is called regression testing. It allows us to test how the newly introduced content influences the performance of the previous model. This guarantees the stability of this complex system and allows us to continuously measure its behavior.
It is important to note that clinical test cases vary in terms of complexity and rarity of disease representation. The table below summarizes the actual performance of Infermedica’s diagnostic engine when compared against the set of several thousand clinical cases.
|Excluding complex cases||88%|
|Common condition cases only||93%|
Once regression testing is completed and the results have been accepted, our doctors manually test the newly introduced conditions and symptoms. While subjective, this step provides us with a simulation of the real-life experience of using an Infermedica product.
Metabase content is currently available in English, Polish, Russian, Spanish, Portuguese and Chinese. We have rich experience in preparing localized versions of the system. Translation of the content is provided by a native speaker with medical education and takes about 120-160h. With each new release of the content we make sure all language versions are up to date.
Lastly, the fully tested models are ready to be deployed to the cloud-based API. Once completed, the new content will be available to all users of Infermedica products. With every release of a new model, Infermedica provides a fully transparent changelog of the medical content. This allows all partners to track modifications and the pace of growth of our medical knowledge base.