论文标题
使用多波长的机器学习方法对Chandra源目录中的身份不明的X射线源进行分类
Classifying Unidentified X-ray Sources in the Chandra Source Catalog Using a Multiwavelength Machine-learning Approach
论文作者
论文摘要
串行X射线源检测的快速增加需要开发新的方法,以有效探索X射线源的性质。如果这些来源中的一小部分也可以可靠地分类,则可以在各种天体物理源类型的规模上对各种天体物理源类型进行研究。必须自动完成以多个属性(功能)为特征的多个类别的大量来源的分类,并且监督机器学习(ML)似乎提供了唯一可行的方法。我们对Chandra Source目录2.0版(CSCV2)来源进行分类,以探索ML方法的潜力,并确定在这些研究中表现出的各种偏见,局限性和瓶颈。我们建立了框架,并提出了灵活且可扩展的Python管道,该管道可以由其他人使用和改进。我们还发布了2941个X射线源的培训数据集,并具有自信建立的类别。 In addition to providing probabilistic classifications of 66,369 CSCv2 sources (21% of the entire CSCv2 catalog), we perform several narrower-focused case studies (high-mass X-ray binary candidates and X-ray sources within the extent of the H.E.S.S. TeV sources) to demonstrate some possible applications of our ML approach.我们还讨论了所提出的管道的未来可能修改,这些管道有望导致分类信心的实质性改善。
The rapid increase in serendipitous X-ray source detections requires the development of novel approaches to efficiently explore the nature of X-ray sources. If even a fraction of these sources could be reliably classified, it would enable population studies for various astrophysical source types on a much larger scale than currently possible. Classification of large numbers of sources from multiple classes characterized by multiple properties (features) must be done automatically and supervised machine learning (ML) seems to provide the only feasible approach. We perform classification of Chandra Source Catalog version 2.0 (CSCv2) sources to explore the potential of the ML approach and identify various biases, limitations, and bottlenecks that present themselves in these kinds of studies. We establish the framework and present a flexible and expandable Python pipeline, which can be used and improved by others. We also release the training data set of 2941 X-ray sources with confidently established classes. In addition to providing probabilistic classifications of 66,369 CSCv2 sources (21% of the entire CSCv2 catalog), we perform several narrower-focused case studies (high-mass X-ray binary candidates and X-ray sources within the extent of the H.E.S.S. TeV sources) to demonstrate some possible applications of our ML approach. We also discuss future possible modifications of the presented pipeline, which are expected to lead to substantial improvements in classification confidences.