AI发现化学元件的坐标系：各种自动编码器的双重表示

论文标题

AI发现化学元件的坐标系：各种自动编码器的双重表示

AI Discovering a Coordinate System of Chemical Elements: Dual Representation by Variational Autoencoders

论文作者

Glushkovsky, Alex

论文摘要

元素周期表是化学元素的基本表示，具有重要的理论和实际作用。研究文章讨论了对神经网络的无监督培训的经验，以根据其电子配置来代表2D潜在空间上的元素。为了强调元素的化学特性，电子构型的原始数据已对价轨道进行了重新调整。识别七个壳和四个子壳，输入数据已被布置为7x4图像。潜在空间表示已使用卷积beta变量自动编码器（beta-vae）进行。尽管输入数据离散和稀疏数据，但β-VAE删除了不同时期，块，组和类型的元素。潜在空间上元素的无监督表示，揭示了与相应元素的量子数量相关的时期和元素的成对对称性。此外，它隔离了被证明是玛德隆违反兰烷尼德和actacinide元素的已知案例的异常值。考虑到Beta-vae的生成能力，已经设定了监督的机器学习，以找出是否有深刻的模式区分真实元素和解码人工元素之间的电子构型。此外，该文章还介绍了自动编码器双重表示的能力。通常，自动编码器表示潜在空间的输入数据的观察。通过转移和复制原始输入数据，可以在潜在空间上表示变量，从而导致输入变量之间发现有意义的模式。将无监督的学习应用于电子配置的转移数据，编码器在潜在空间上安排的输入变量的顺序已证明与Madelung规则的序列完全匹配。

The periodic table is a fundamental representation of chemical elements that plays essential theoretical and practical roles. The research article discusses the experiences of unsupervised training of neural networks to represent elements on the 2D latent space based on their electron configurations. To emphasize chemical properties of the elements, the original data of electron configurations has been realigned towards valence orbitals. Recognizing seven shells and four subshells, the input data has been arranged as 7x4 images. Latent space representation has been performed using a convolutional beta variational autoencoder (beta-VAE). Despite discrete and sparse input data, the beta-VAE disentangles elements of different periods, blocks, groups, and types. The unsupervised representation of elements on the latent space reveals pairwise symmetries of periods and elements related to the invariance of quantum numbers of corresponding elements. In addition, it isolates outliers that turned out to be known cases of Madelung's rule violations for lanthanide and actinide elements. Considering the generative capabilities of beta-VAE, the supervised machine learning has been set to find out if there are insightful patterns distinguishing electron configurations between real elements and decoded artificial ones. Also, the article addresses the capability of dual representation by autoencoders. Conventionally, autoencoders represent observations of input data on the latent space. By transposing and duplicating original input data, it is possible to represent variables on the latent space which can lead to the discovery of meaningful patterns among input variables. Applying that unsupervised learning for transposed data of electron configurations, the order of input variables that has been arranged by the encoder on the latent space has turned out to exactly match the sequence of Madelung's rule.

下载PDF全文

下载文献需遵守相关版权规定

论文标题