Medicat：医学图像，标题和文本参考的数据集

论文标题

Medicat：医学图像，标题和文本参考的数据集

MedICaT: A Dataset of Medical Images, Captions, and Textual References

论文作者

Subramanian, Sanjay, Wang, Lucy Lu, Mehta, Sachin, Bogin, Ben, van Zuylen, Madeleine, Parasa, Sravanthi, Singh, Sameer, Gardner, Matt, Hajishirzi, Hannaneh

论文摘要

了解图和文本之间的关系是科学文档理解的关键。特别是医疗数字非常复杂，通常由几个亚法图（我们的数据集中的75％）组成，并详细描述了其内容。先前研究科学论文中的数字的工作重点是对图形内容进行分类，而不是了解图像与文本的关系。为了解决图检索和图表对齐中的挑战，我们介绍了Medicat，Medicat是上下文中医学图像的数据集。 Medicat由131k开放访问生物医学论文的217K图像组成，并包括字幕，74％的图形的内联引用以及手动注释的子图和子图表的子集和子集合。使用Medicat，我们将子图的任务介绍到化合物中的子图表对齐的任务，并证明了图像文本匹配中的内联引用的实用性。可以在https://github.com/allenai/medicat访问我们的数据和代码。

Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of figures in our dataset), with detailed text describing their content. Previous work studying figures in scientific papers focused on classifying figure content rather than understanding how images relate to the text. To address challenges in figure retrieval and figure-to-text alignment, we introduce MedICaT, a dataset of medical images in context. MedICaT consists of 217K images from 131K open access biomedical papers, and includes captions, inline references for 74% of figures, and manually annotated subfigures and subcaptions for a subset of figures. Using MedICaT, we introduce the task of subfigure to subcaption alignment in compound figures and demonstrate the utility of inline references in image-text matching. Our data and code can be accessed at https://github.com/allenai/medicat.

下载PDF全文

下载文献需遵守相关版权规定

论文标题