随机合金代码和编码分布张量的基本限制

论文标题

随机合金代码和编码分布张量的基本限制

Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors

论文作者

Soto, Pedro

论文摘要

张量是分布式计算中的基本操作，\ emph {e.g。，}机器学习，通常分布在大型数据集的多个并行任务中。 Stragglers和其他失败会严重影响整体完成时间。编码计算中的最新作品提供了一种新的策略，以减轻编码任务的散乱者，目的是最大程度地减少恢复整体结果所需的任务数量，称为恢复阈值。但是，我们证明了这种严格的组合定义并不能直接优化故障的概率。在本文中，我们专注于最可能的事件，并通过其解码的可能性更直接地衡量编码方案的最佳性。我们的概率方法使我们实现了矩阵乘法的随机代码，即局部随机合金代码，这些代码相对于度量是最佳的。此外，概率方法使我们能够发现关于随机和确定性编码分布量张量的令人惊讶的不可能定理。

Tensors are a fundamental operation in distributed computing, \emph{e.g.,} machine learning, that are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks, with an objective of minimizing the number of tasks needed to recover the overall result, known as the recovery threshold. However, we demonstrate that this strict combinatorial definition does not directly optimize the probability of failure. In this paper, we focus on the most likely event and measure the optimality of a coding scheme more directly by its probability of decoding. Our probabilistic approach leads us to a practical construction of random codes for matrix multiplication, i.e., locally random alloy codes, which are optimal with respect to the measures. Furthermore, the probabilistic approach allows us to discover a surprising impossibility theorem about both random and deterministic coded distributed tensors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题