论文标题
theta-resonance:一种设计空间探索的单步加固学习方法
Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration
论文作者
论文摘要
给定一个环境(例如,模拟器),用于评估指定的设计空间中的样品和一组加权评估指标 - 可以使用Theta-resonance(单步马尔可夫决策过程(MDP))训练逐渐产生的智能代理,从而逐渐最佳的样本。在theta-resonance中,神经网络会消耗一个恒定的输入张量,并产生策略作为一组条件概率密度函数(PDFS),用于对每个设计维度进行采样。我们在深度强化学习(D-RL)中专门将现有的策略梯度算法专注于使用评估反馈(就成本,罚款或奖励而言),以强大的算法稳定性和最小的设计评估来更新我们的策略网络。我们在简单的SOC设计空间的背景下研究多个神经体系结构(对于我们的政策网络),并提出了一种构建合成空间探索问题以比较和改善设计空间探索(DSE)算法的方法。尽管我们仅介绍分类设计空间,但我们还概述了如何使用theta-resonance来探索连续和混合的连续污染设计空间。
Given an environment (e.g., a simulator) for evaluating samples in a specified design space and a set of weighted evaluation metrics -- one can use Theta-Resonance, a single-step Markov Decision Process (MDP), to train an intelligent agent producing progressively more optimal samples. In Theta-Resonance, a neural network consumes a constant input tensor and produces a policy as a set of conditional probability density functions (PDFs) for sampling each design dimension. We specialize existing policy gradient algorithms in deep reinforcement learning (D-RL) in order to use evaluation feedback (in terms of cost, penalty or reward) to update our policy network with robust algorithmic stability and minimal design evaluations. We study multiple neural architectures (for our policy network) within the context of a simple SoC design space and propose a method of constructing synthetic space exploration problems to compare and improve design space exploration (DSE) algorithms. Although we only present categorical design spaces, we also outline how to use Theta-Resonance in order to explore continuous and mixed continuous-discrete design spaces.