论文标题
自动识别新小物种的机器学习
Machine Learning for automatic identification of new minor species
论文作者
论文摘要
分析现代光谱数据集的主要困难之一是由于大量数据。例如,在大气透射光谱中,ESA ESA ESA Exomars2016卫星的太阳能掩星通道(SO)被称为Trace Gas Orbiter(TGO),自2018年4月15日至15日的2020年1月15日以来,$ sim $ y Sims $ y Migh $ sim $ y MARD。在火星侦察轨道上表达或室里。通常,经过长时间的模型拟合和手动残留分析的迭代过程,发现新线条。在这里,我们提出了一种基于无监督的机器学习的新方法,以自动检测新的小物种。尽管精确的量化超出了范围,但该工具也可以通过提供几个端成员(“源”)及其丰度来快速汇总数据集。我们通过丰度和源光谱(Endmembers)的线性混合物近似数据集。我们以非阴性基质分解的形式使用了无监督的源分离来估计这些数量。在合成和仿真数据上测试了几种方法。我们的方法致力于检测次要物种光谱,而不是精确地量化它们。在综合示例中,这种方法能够以100美元的噪声水平的1.5倍以100张^4 $的形式检测出100个隐藏光谱的化合物。 Nomad-So靶向CH $ _ {4} $的模拟光谱的结果表明,在有利条件下,检测限在100-500 ppt的范围内。从Nomad-so中获得的实际火星数据结果表明,如预期的那样,存在Co $ _ {2} $和H $ _ {2} $ O的结果,但是不存在CH $ _ {4} $。尽管如此,我们确认了数据库中的一组新的意外行,这是由ACS仪器团队归因于Co $ _ {2} $磁性偶极子。
One of the main difficulties to analyze modern spectroscopic datasets is due to the large amount of data. For example, in atmospheric transmittance spectroscopy, the solar occultation channel (SO) of the NOMAD instrument onboard the ESA ExoMars2016 satellite called Trace Gas Orbiter (TGO) had produced $\sim$10 millions of spectra in 20000 acquisition sequences since the beginning of the mission in April 2018 until 15 January 2020. Other datasets are even larger with $\sim$billions of spectra for OMEGA onboard Mars Express or CRISM onboard Mars Reconnaissance Orbiter. Usually, new lines are discovered after a long iterative process of model fitting and manual residual analysis. Here we propose a new method based on unsupervised machine learning, to automatically detect new minor species. Although precise quantification is out of scope, this tool can also be used to quickly summarize the dataset, by giving few endmembers ("source") and their abundances. We approximate the dataset non-linearity by a linear mixture of abundance and source spectra (endmembers). We used unsupervised source separation in form of non-negative matrix factorization to estimate those quantities. Several methods are tested on synthetic and simulation data. Our approach is dedicated to detect minor species spectra rather than precisely quantifying them. On synthetic example, this approach is able to detect chemical compounds present in form of 100 hidden spectra out of $10^4$, at 1.5 times the noise level. Results on simulated spectra of NOMAD-SO targeting CH$_{4}$ show that detection limits goes in the range of 100-500 ppt in favorable conditions. Results on real martian data from NOMAD-SO show that CO$_{2}$ and H$_{2}$O are present, as expected, but CH$_{4}$ is absent. Nevertheless, we confirm a set of new unexpected lines in the database, attributed by ACS instrument Team to the CO$_{2}$ magnetic dipole.