论文标题
低延迟,高通量垃圾收集(扩展版)
Low-Latency, High-Throughput Garbage Collection (Extended Version)
论文作者
论文摘要
生产垃圾收集者在追求减少的停顿时做出了实质性的妥协。与先前的简单收藏家相比,它们需要更多的CPU周期和内存。并发复制收藏家(C4,ZGC和Shenandoah)遭受以下设计限制。 1)并发复制。他们只通过复制来回收内存,这本质上是昂贵的,具有高内存带宽的需求。并发复制还需要昂贵的读写障碍。 2)可伸缩性。它们取决于跟踪,在极限和实践中,这不会扩展。 3)即时性。他们不会及时收回旧的物体,从而产生高内存开销。 我们提出LXR,该方法采用了一种非常不同的方法来通过最大程度地减少并发收集工作和间接费用来优化响应能力和吞吐量。 1)LXR通过使用Immix Heap结构来回收大多数内存,而无需任何复制。然后,它以有限的明智停止范围的复制来打击碎片化。 2)LXR使用参考计数来实现可伸缩性和即时性,迅速回收年轻和旧物体。它根据需要使用并发跟踪来识别循环垃圾。 3)为了最大程度地减少暂停时间,同时允许明智地复制成熟的对象,LXR引入了记忆的集合,以进行参考计数和并发减少处理。 4)LXR引入了一种新型的低空写障碍,该障碍物结合了共同的参考计数,并发跟踪和记住的设置维护。 结果是一个具有出色响应能力和吞吐量的收藏家。在广泛使用的Lucene搜索引擎上,LXR的吞吐量高6倍,同时比流行的Shenandoah生产收集器低于99.9%的尾部潜伏期低30倍。
Production garbage collectors make substantial compromises in pursuit of reduced pause times. They require far more CPU cycles and memory than prior simpler collectors. concurrent copying collectors (C4, ZGC, and Shenandoah) suffer from the following design limitations. 1) Concurrent copying. They only reclaim memory by copying, which is inherently expensive with high memory bandwidth demands. Concurrent copying also requires expensive read and write barriers. 2) Scalability. They depend on tracing, which in the limit and in practice does not scale. 3) Immediacy. They do not reclaim older objects promptly, incurring high memory overheads. We present LXR, which takes a very different approach to optimizing responsiveness and throughput by minimizing concurrent collection work and overheads. 1) LXR reclaims most memory without any copying by using the Immix heap structure. It then combats fragmentation with limited judicious stop-the-world copying. 2) LXR uses reference counting to achieve both scalability and immediacy, promptly reclaiming young and old objects. It uses concurrent tracing as needed for identifying cyclic garbage. 3) To minimize pause times while allowing judicious copying of mature objects, LXR introduces remembered sets for reference counting and concurrent decrement processing. 4) LXR introduces a novel low-overhead write barrier that combines coalescing reference counting, concurrent tracing, and remembered set maintenance. The result is a collector with excellent responsiveness and throughput. On the widely-used Lucene search engine with a generously sized heap, LXR has 6x higher throughput while delivering 30x lower 99.9 percentile tail latency than the popular Shenandoah production collector in its default configuration.