论文标题
同构类别在多关系数据集中的作用
The Role of Isomorphism Classes in Multi-Relational Datasets
论文作者
论文摘要
从胶体悬浮液到基因调节回路,多性相互作用系统遍布自然界。这些系统可以产生复杂的动力学,并提出了图形神经网络,作为提取基本相互作用并预测系统如何发展的方法。然而,通过使用合成多关系数据集为这些模型的当前培训和评估程序是不可知与交互网络同构类别的不可知论,这些类别具有相同的动力学,直到初始条件。我们广泛分析了同构阶级意识如何影响这些模型,重点是神经关系推理(NRI)模型,这些模型在明确推断相互作用方面独特,以预测无监督的环境中的动态。具体而言,我们证明,同构泄漏高估了多关系推断中的性能,并且在多相反网络生成过程中存在的采样偏差会损害概括。为了解决这一点,我们建议用于模型评估的同构合成基准。我们使用这些基准测试概括能力,并证明具有同构类别的阈值采样频率以进行成功学习。此外,我们证明了同构类别可以通过简单的优先级方案来利用,以提高模型性能,训练期间的稳定性并减少训练时间。
Multi-interaction systems abound in nature, from colloidal suspensions to gene regulatory circuits. These systems can produce complex dynamics and graph neural networks have been proposed as a method to extract underlying interactions and predict how systems will evolve. The current training and evaluation procedures for these models through the use of synthetic multi-relational datasets however are agnostic to interaction network isomorphism classes, which produce identical dynamics up to initial conditions. We extensively analyse how isomorphism class awareness affects these models, focusing on neural relational inference (NRI) models, which are unique in explicitly inferring interactions to predict dynamics in the unsupervised setting. Specifically, we demonstrate that isomorphism leakage overestimates performance in multi-relational inference and that sampling biases present in the multi-interaction network generation process can impair generalisation. To remedy this, we propose isomorphism-aware synthetic benchmarks for model evaluation. We use these benchmarks to test generalisation abilities and demonstrate the existence of a threshold sampling frequency of isomorphism classes for successful learning. In addition, we demonstrate that isomorphism classes can be utilised through a simple prioritisation scheme to improve model performance, stability during training and reduce training time.