论文标题
Acrobat-从常规诊断的多污乳腺癌组织学全图像数据集用于计算病理学
ACROBAT -- a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology
论文作者
论文摘要
用血久毒素和曙红(H&E)或免疫组织化学(IHC)染色的FFPE组织切片的分析是手术切除的乳腺癌标本的病理评估的重要组成部分。 IHC染色已被广泛采用诊断指南和常规工作流程,以手动评估几种已建立的生物标志物,包括ER,PGR,HER2和KI67。但是,这是一项任务也可以通过计算病理图像分析方法来促进。计算病理学的研究最近基于公开可用的整个幻灯片图像(WSI)数据集取得了许多实质性进步。但是,该领域仍然受公共数据集的稀疏性的限制。特别是,没有匹配IHC和H&E染色组织部分的WSI,没有大型的高质量公开数据集。在这里,我们发布了目前最大的可公开可用数据集的WSI,来自女性原发性乳腺癌患者的手术切除标本的WSI,与相应的H&E和IHC染色组织的WSI相匹配,由1,153例患者组成4,212个WSI。数据集的主要目的是促进Acrobat WSI注册挑战,旨在准确地对齐H&E和IHC图像。为了在图像注册领域进行研究,基于13个注释者的37,000多个手动注释的地标对,可以通过Acrobat Challenge网站获得有关注册算法性能的自动定量反馈。除了注册之外,该数据集还可以实现许多不同的计算病理研究途径,包括染色引导的学习,虚拟染色,无监督的预训练,人工伪像检测和独立模型。
The analysis of FFPE tissue sections stained with haematoxylin and eosin (H&E) or immunohistochemistry (IHC) is an essential part of the pathologic assessment of surgically resected breast cancer specimens. IHC staining has been broadly adopted into diagnostic guidelines and routine workflows to manually assess status and scoring of several established biomarkers, including ER, PGR, HER2 and KI67. However, this is a task that can also be facilitated by computational pathology image analysis methods. The research in computational pathology has recently made numerous substantial advances, often based on publicly available whole slide image (WSI) data sets. However, the field is still considerably limited by the sparsity of public data sets. In particular, there are no large, high quality publicly available data sets with WSIs of matching IHC and H&E-stained tissue sections. Here, we publish the currently largest publicly available data set of WSIs of tissue sections from surgical resection specimens from female primary breast cancer patients with matched WSIs of corresponding H&E and IHC-stained tissue, consisting of 4,212 WSIs from 1,153 patients. The primary purpose of the data set was to facilitate the ACROBAT WSI registration challenge, aiming at accurately aligning H&E and IHC images. For research in the area of image registration, automatic quantitative feedback on registration algorithm performance remains available through the ACROBAT challenge website, based on more than 37,000 manually annotated landmark pairs from 13 annotators. Beyond registration, this data set has the potential to enable many different avenues of computational pathology research, including stain-guided learning, virtual staining, unsupervised pre-training, artefact detection and stain-independent models.