通过描述性策略，用于基于MDP的动态调度的系统不合时宜的元学习

论文标题

通过描述性策略，用于基于MDP的动态调度的系统不合时宜的元学习

System-Agnostic Meta-Learning for MDP-based Dynamic Scheduling via Descriptive Policy

论文作者

Lee, Hyun-Suk

论文摘要

动态调度是从排队到无线网络的应用程序的重要问题。它解决了如何在每个时间步中的多个调度项目之间选择一个项目以实现一个长期目标。动态调度的常规方法为给定特定系统找到最佳策略，以便这些方法的策略仅适用于相应的系统特征。因此，很难将这种方法用于动态变化的实用系统。本文为基于MDP的动态调度（一种描述性策略）提出了一种新颖的策略结构，该策略具有系统不合时宜的能力，可以适应相同任务（动态调度）的看不见的系统特征。为此，描述性策略可以从简而言之来学习系统不足的调度原则，“哪些项目的条件在调度方面应具有更高的优先级”。可以将调度原则应用于任何系统，以便可以将一个系统中学到的描述性策略用于另一个系统。具有简单的解释性和现实应用方案的实验表明，与系统特定的传统策略相比，它可以实现系统不合时宜的元学习，而性能降解很少。

Dynamic scheduling is an important problem in applications from queuing to wireless networks. It addresses how to choose an item among multiple scheduling items in each timestep to achieve a long-term goal. Conventional approaches for dynamic scheduling find the optimal policy for a given specific system so that the policy from these approaches is usable only for the corresponding system characteristics. Hence, it is hard to use such approaches for a practical system in which system characteristics dynamically change. This paper proposes a novel policy structure for MDP-based dynamic scheduling, a descriptive policy, which has a system-agnostic capability to adapt to unseen system characteristics for an identical task (dynamic scheduling). To this end, the descriptive policy learns a system-agnostic scheduling principle--in a nutshell, "which condition of items should have a higher priority in scheduling". The scheduling principle can be applied to any system so that the descriptive policy learned in one system can be used for another system. Experiments with simple explanatory and realistic application scenarios demonstrate that it enables system-agnostic meta-learning with very little performance degradation compared with the system-specific conventional policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题