高度可扩展的混合，跨平台的时机分析框架，通过指令级跟踪提供准确的微分吞吐量估算

论文标题

高度可扩展的混合，跨平台的时机分析框架，通过指令级跟踪提供准确的微分吞吐量估算

A Highly Scalable, Hybrid, Cross-Platform Timing Analysis Framework Providing Accurate Differential Throughput Estimation via Instruction-Level Tracing

论文作者

Hsu, Min-Yih, Hetzelt, Felicitas, Gens, David, Maitland, Michael, Franz, Michael

论文摘要

估计指令级别的吞吐量对于许多应用至关重要：多媒体，低延迟网络，医疗，汽车，航空和工业控制系统都取决于其软件的紧密计算和准确的时机范围。不幸的是，在总体情况下，程序可能会运行多长时间 - 或者如果确实可能完全停止 - 无法回答。这就是为什么最新的吞吐量估计工具通常集中在操作的子集上，并做出几个简化的假设。正确识别程序中的这些约束和感兴趣的区域通常需要源代码，专业工具和专门的专家知识。每当修改单个指令时，必须重复此过程，在迭代地在实践中开发定时敏感代码时会产生高成本。在本文中，我们介绍了MCAD，这是一个新颖且轻巧的时序分析框架，可以确定代码变化对二进制程序的微体系式级别的影响。 MCAD通过使用QEMU模拟整个程序执行并将轨迹转发到LLVM进行指导级分析来提供准确的微分吞吐量估计。这使开发人员可以使用常见工具快速迭代，以低开销的开销：识别对定时关键路径上仅需要几分钟的执行路径。据我们所知，这代表了一种全新的功能，与最先进的工具相比，差异吞吐量估计的周转时间减少了几个数量级。我们的详细评估表明，与FFMPEG和Clang等现实世界应用程序相比，MCAD量表与FFMPEG和Clang一样，与X86和ARM机器上硬件绩效计数器的地面真相时机相比，达到了<3％的GEO平均错误。

Estimating instruction-level throughput is critical for many applications: multimedia, low-latency networking, medical, automotive, avionic, and industrial control systems all rely on tightly calculable and accurate timing bounds of their software. Unfortunately, how long a program may run - or if it may indeed stop at all - cannot be answered in the general case. This is why state-of-the-art throughput estimation tools usually focus on a subset of operations and make several simplifying assumptions. Correctly identifying these sets of constraints and regions of interest in the program typically requires source code, specialized tools, and dedicated expert knowledge. Whenever a single instruction is modified, this process must be repeated, incurring high costs when iteratively developing timing sensitive code in practice. In this paper, we present MCAD, a novel and lightweight timing analysis framework that can identify the effects of code changes on the microarchitectural level for binary programs. MCAD provides accurate differential throughput estimates by emulating whole program execution using QEMU and forwarding traces to LLVM for instruction-level analysis. This allows developers to iterate quickly, with low overhead, using common tools: identifying execution paths that are less sensitive to changes over timing-critical paths only takes minutes within MCAD. To the best of our knowledge this represents an entirely new capability that reduces turnaround times for differential throughput estimation by several orders of magnitude compared to state-of-the-art tools. Our detailed evaluation shows that MCAD scales to real-world applications like FFmpeg and Clang with millions of instructions, achieving < 3% geo mean error compared to ground truth timings from hardware-performance counters on x86 and ARM machines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题