Covid Research Support with NLP
In this project, a solution was developed to facilitate the organization-wide research of SARS-CoV-2 by automating the analysis of documents using NLP.

Challenge
The systematic research of the SARS-CoV-2 virus, for example, for the development of vaccines or medications against COVID-19, was one of the central tasks for many companies in the pharmaceutical industry at the time of the project. Due to worldwide attention and urgency, hundreds of new relevant results in the form of research and study findings emerged daily. Systematic capturing of this information was essential but very challenging due to the sheer volume of unstructured data. This situation either led to high manual efforts for research or inefficient duplication of work.
Approach
Together with the client, a deep learning-based solution was developed to reliably identify documents (such as study results) related to COVID-19. A transformer model, pre-trained on scientific publications (similar to the model currently used on google.com), was employed and fine-tuned on existing client data for further use. With this transfer learning approach, a prediction accuracy of more than 99% was achieved even with relatively few training data.
Result
The developed model significantly improved the results of existing methods, which were, for example, based on predefined search terms. The developed deep learning model is currently being integrated into an application and will support scientists in the future in searching for relevant documents in the client's central knowledge database.