Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Contextual Discourse Vectors (CDV)

CDV is a distributed document representation for efficient answer retrieval from long documents.

This demonstration shows how the CDV vector space model can be used to retrieve information from a large healthcare dataset. The model used is trained on Wikipedia data. See our WWW2020 paper and GitHub for more details on the implementation.

How to use:

  • Search: Enter the name of a disease and optionally a specific aspect into the query field. The system will retrieve the top 25 passages from the dataset that answer your query.
  • Highlight: Browse through the dataset and analyze how relevant each sentence in a document is for your query. The shade of blue visualizes the relevance score of a sentence.

Try some examples:

Datasets used in this demo:

Access a different dataset:

  • Wikipedia (Encyclopedia articles about diseases)
  • CORD-19 (COVID-19 Open Research Dataset)