G1: One Garbage Collector To Rule Them All

Many articles describe how a poorly tuned garbage collector can bring an application's Service Level Agreement (SLA) commitments to its knees. For example, an unpredictably protracted garbage collection pause can easily exceed the response-time requirements of an otherwise performant application. Moreover, the irregularity increases when you have a non-compacting Garbage Collector (GC) such as Concurrent Mark and Sweep (CMS) that tries to reclaim its fragmented heap with a serial (single-threaded) full garbage collection that is stop-the-world (STW).
Let us now expand on the above paragraph: Suppose an allocation failure in the young generation triggers a young collection, leading to promotions to the old generation. Further, suppose that the fragmented old generation has insufficient space for the newly promoted objects. Such conditions would trigger a full garbage collection cycle, which will perform compaction of the heap.
With CMS GC, the full collection is serial and STW, hence your application threads are stopped for the entire duration while the heap space is reclaimed and then compacted. The duration for the STW pause depends on your heap size and the surviving objects.

Alternatively, even if you do have parallel (multi-threaded) compaction to combat fragmentation, you still end up with a full garbage collection (that involves all the generations of the Java heap), when it might have been sufficient to just reclaim some of the free space from the old generation.
This is a common scenario with Parallel Old GC. With Parallel Old, the reclamation of old generation is with a parallel STW full garbage collection pause. This full garbage collection is not incremental; it is one big STW pause and does not interleave with the application execution.

With the above information, we would like to consider one solution in the form of the "Garbage First” (G1) collector, HotSpot's latest GC (introduced in JDK7 update 4).
G1 GC is an incremental parallel compacting GC that provides more predictable pause times compared to CMS GC and Parallel Old GC. By introducing a parallel, concurrent and multi-phased marking cycle, G1 GC can work with much larger heaps while providing reasonable worst-case pause times. The basic idea with G1 GC is to set your heap ranges (using -Xms for min heap size and -Xmx for the max size) and a realistic (soft real time) pause time goal (using -XX:MaxGCPauseMillis) and then let the GC do its job.
With the introduction of G1 GC, HotSpot moves away from its conventional GC layout where a contiguous Java heap splits into (contiguous) young and old generations. In G1 GC, HotSpot introduces the concept of “regions”. A single large contiguous Java heap space divides into multiple fixed-sized heap regions. A list of “free” regions maintains these regions. As the need arises, the free regions are assigned to either the young or the old generation. These regions can span from 1MB to 32MB in size depending on your total Java heap size. The goal is to have around 2048 regions for the total heap. Once a region frees up, it goes back to the "free" regions list. The principle of G1 GC is to reclaim the Java heap as much as possible (while trying its best to meet the pause time goal) by collecting the regions with the least amount of live data i.e. the ones with most garbage, first; hence the name Garbage First.

Fig. 1: Conventional GC Layout

Fig. 1: Conventional GC Layout
One thing to note is that for G1 GC, neither the young nor the old generation has to be contiguous. This is a handy feature since the sizing of the generation is now more dynamic.
Adaptive sized GC algorithms like the Parallel Old GC, end up reserving the extra space that may be required by each generation so that they can fit in their contiguous space constraint. In case of CMS, a full garbage collection is required to resize the Java heap and the generations.
In contrast, G1 GC uses logical generations (a collection of non-contiguous regions of the young generation and a remainder in the old generation), so there is not much wastage in space or time.
To be sure, the G1 GC algorithm does utilize some of HotSpot’s basic concepts. For example, the concepts of allocation, copying to survivor space and promotion to old generation are similar to previous HotSpot GC implementations. Eden regions and survivor regions still make up the young generation. Most allocations happen in eden except for “humongous” allocations. (Note: For G1 GC, objects that span more than half a region size are considered “Humongous objects” and are directly allocated into “humongous” regions out of the old generation.) G1 GC selects an adaptive young generation size based on your pause time goal. The young generation can range anywhere from the preset min to the preset max sizes, that are a function of the Java heap size. When eden reaches capacity, a “young garbage collection”, also known as an “evacuation pause”, will occur. This is a STW pause that copies (evacuates) the live objects from the regions that make up the eden, to the 'to-space' survivor regions.


Fig. 2: Garbage First GC Layout
In addition, live objects from the 'from-space' survivor regions will be either copied to the 'to-space' survivor regions or, based on the object's age and the 'tenuring threshold', will be promoted to region(s) from the old generation space.
Every young collection involves parallel worker time and sequential/serial time. To explain this further, I will use a log output from the latest Java 7 update release, which at the time of this publication is 7u25. (We also have an Early Access (EA) for 7u40. Please feel free to try out the EA bundles for your platform. With 7u40 EA, you may see a difference in the log format, but the basic premise remains the same.)
The following command line options generated the GC log output thereafter –
java –Xmx1G –Xms1G –XX:+UseG1GC –XX:+PrintGCDetails –XX:+PrintGCTimeStamps GCTestBench
Note: I went with the default pause time goal of 200ms.

0.189: [GC pause (young), 0.00080776 secs]
   [Parallel Time: 0.4 ms]
      [GC Worker Start (ms): 188.7 188.7 188.8 188.8
       Avg: 188.8, Min: 188.7, Max: 188.8, Diff: 0.1]
      [Ext Root Scanning (ms): 0.2 0.2 0.2 0.1
       Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.1]
      [Update RS (ms): 0.0 0.0 0.0 0.0
       Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
         [Processed Buffers : 0 0 0 1
          Sum: 1, Avg: 0, Min: 0, Max: 1, Diff: 1]
      [Scan RS (ms): 0.0 0.0 0.0 0.0
       Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
      [Object Copy (ms): 0.2 0.2 0.1 0.2
       Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.0]
      [Termination (ms): 0.0 0.0 0.0 0.0
       Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0]
         [Termination Attempts : 1 2 1 2
          Sum: 6, Avg: 1, Min: 1, Max: 2, Diff: 1]
      [GC Worker End (ms): 189.1 189.1 189.1 189.1
       Avg: 189.1, Min: 189.1, Max: 189.1, Diff: 0.0]
      [GC Worker (ms): 0.4 0.4 0.3 0.3
       Avg: 0.4, Min: 0.3, Max: 0.4, Diff: 0.1]
      [GC Worker Other (ms): 0.0 0.0 0.1 0.1
       Avg: 0.1, Min: 0.0, Max: 0.1, Diff: 0.1]
   [Clear CT: 0.2 ms]
   [Other: 0.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Free CSet: 0.0 ms]

The indentation demarcates the parallel and the **sequential **work groups. The parallel worker time is further split into -
External Root Scanning:
The time spent by the parallel GC worker threads in scanning the external roots such as registers, thread stacks, etc that point into the Collection Set.
Update Remembered Sets (RSets):
RSets aid G1 GC in tracking reference that point into a region. The time shown here is the amount of time the parallel worker threads spent in updating the RSets.
Processed Buffers:
The count shows how many ‘Update Buffers’ were processed by the worker threads.
Scan RSets:
The time spent in Scanning the RSets for references into a region. This time will depend on the “coarseness” of the RSet data structures.
Object Copy:
During every young collection, the GC copies all live data from the eden and ‘from-space’ survivor, either to the regions in the ‘to-space’ survivor or to the old generation regions. The amount of time it takes the worker threads to complete this task is listed here.
Termination:
After completing their particular work (e.g. object scan and copy), each worker thread enters its ‘termination protocol’. Prior to terminating, the worker thread looks for work from the other threads to steal and terminates when there is none. The time listed here indicates the time spent by the worker threads offering to terminate.
Parallel worker ‘Other’ time:
Time spent by the worker threads that was not accounted in any of the parallel activities listed above.

The sequential work (which could be parallelized, individually) is divided into -
Clear CT: Time spent by the GC worker threads in clearing the Card Table of RSet scanning meta-data.
And a few others clubbed under the ‘Other’ time, comprised of:

Choose Collection Set (CSet): A garbage collection cycle collects the set of regions in the CSet. The collection pause collects/evacuates all the live data in a particular CSet. The time listed here is the time spent in finalizing the set of regions added to the CSet.

Reference Processing: The time spent in processing the deferred references (soft, weak, final and phantom) from the prior garbage collection phases.

Reference En-queuing: The time spent in placing the references on to the pending list.

Free CSet: Time spent in freeing the just collected set of regions. This includes the time spent in freeing their RSets as well.

I have just skimmed the surface with respect to many things like the RSets, its coarsening, the update buffers, the CSet, and in the next few paragraphs there will be a few more things like the Snapshot-At-The-Beginning (SATB) algorithm and barriers, etc. However, in-order to learn more about them, we would have to “deep dive” into the internals of G1 GC, an interesting topic that is outside the scope of this article.
Now that we understand how the young collections start filling up the old generation, we need to introduce (and understand) the concept of a ‘marking threshold’. When the occupancy of the total heap crosses this threshold, G1 GC will trigger a multi-phased concurrent marking cycle. The command line option that sets the threshold is –XX:InitiatingHeapOccupancyPercent and it defaults to 45 percent of the total Java heap size. G1 GC uses a marking algorithm called Snapshot-At-The-Beginning (SATB) that takes a logical snapshot of the set of live objects in the heap at the ‘beginning’ of the marking cycle. This algorithm uses a pre-write barrier to record and mark the objects that are a part of the logical snapshot. Now let us spend some time discussing the individual phases of the multi-phased concurrent marking and first a look at the output from the GC log:
0.078: [GC pause (young) (initial-mark
), 0.00262460 secs][Parallel Time: 2.3 ms][GC Worker Start (ms): 78.1 78.2 78.2 78.2Avg: 78.2, Min: 78.1, Max: 78.2, Diff: 0.1][Ext Root Scanning (ms): 0.2 0.1 0.2 0.1Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.1][Update RS (ms): 0.2 0.2 0.2 0.2Avg: 0.2, Min: 0.2, Max: 0.2, Diff: 0.0][Processed Buffers : 2 3 2 2Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1][Scan RS (ms): 0.0 0.0 0.0 0.0Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0][Object Copy (ms): 1.8 1.8 1.8 1.8Avg: 1.8, Min: 1.8, Max: 1.8, Diff: 0.0][Termination (ms): 0.0 0.0 0.0 0.0Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0][Termination Attempts : 1 1 1 1Sum: 4, Avg: 1, Min: 1, Max: 1, Diff: 0][GC Worker End (ms): 80.4 80.4 80.4 80.4Avg: 80.4, Min: 80.4, Max: 80.4, Diff: 0.0][GC Worker (ms): 2.2 2.2 2.2 2.2Avg: 2.2, Min: 2.2, Max: 2.2, Diff: 0.1][GC Worker Other (ms): 0.0 0.1 0.1 0.1Avg: 0.1, Min: 0.0, Max: 0.1, Diff: 0.1][Clear CT: 0.2 ms][Other: 0.2 ms][Choose CSet: 0.0 ms][Ref Proc: 0.1 ms][Ref Enq: 0.0 ms][Free CSet: 0.0 ms][Eden: 3072K(5120K)->0B(5120K) Survivors: 1024K->1024K Heap: 16M(32M)->16M(32M)][Times: user=0.06 sys=0.00, real=0.00 secs]0.081: [GC concurrent-root-region-scan-start
]0.082: [GC concurrent-root-region-scan-end, 0.0009122
]0.082: [GC concurrent-mark-start
]<snip> [Zero or more embedded young garbage collections are possible here,but removed for brevity.]
0.094: [GC concurrent-mark-end, 0.0115579 sec
]0.094: [GC remark 0.094: [GC ref-proc, 0.0000033 secs], 0.0004374 secs]
[Times: user=0.00 sys=0.00, real=0.00 secs
] 0.094: [**GC cleanup 22M->10M(32M), 0.0003031 secs
**]
[
**Times: user=0.00 sys=0.00, real=0.00 secs
*]0.095: [GC concurrent-cleanup-start
]0.095: [GC concurrent-cleanup-end, 0.0000350
*]
In addition, here are the details:
The Initial Mark Phase
– G1 GC marks the roots during the initial-mark phase. This is what the first line of output above is telling us. The initial-mark phase is piggy backed (done at the same time) on a normal (STW) young garbage collection. Hence, the output is similar to what you see during a young evacuation pause.
The Root Region Scanning Phase
– During this phase, G1 GC scans survivor regions of the initial mark phase for references into the old generation and marks the referenced objects. This phase runs concurrently (not STW) with the application. It is important that this phase complete before the next young garbage collection happens.
The Concurrent Marking Phase
– During this phase, G1 GC looks for reachable (live) objects across the entire Java heap. This phase happens concurrently with the application and a young garbage collection can interrupt the concurrent marking phase (shown above
).
The Remark Phase
– The remark phase helps the completion of marking. During this STW phase, G1 GC drains any remaining SATB buffers and traces any as-yet unvisited live objects. G1 GC also does reference processing during the remark phase.
The Cleanup Phase
– This is the final phase of the multi-phase marking cycle. It is **partly STW
**when G1 GC does live-ness accounting (to identify completely free regions and mixed garbage collection candidate regions) and when G1 GC scrubs the RSets. It is *partly concurrent

  • when G1 GC resets and returns the empty regions to the free list.

Once G1 GC successfully completes the concurrent marking cycle, it has the information that it needs to start the old generation collection. Up until now, the collection of the old regions was not possible since G1 GC did not have any marking information associated with those regions. A collection that facilitates the compaction and evacuation of old generation is appropriately called a 'mixed' collection since G1 GC not only collects the eden and the survivor regions, but also (optionally) adds old regions to the mix. Let us now discuss some details that are important to understand a mixed collection.
A mixed collection can (and usually does) happen over multiple mixed garbage collection cycles. When a sufficient number of old regions are collected, G1 GC reverts to performing the young garbage collections until the next marking cycle completes. A number of flags listed and defined here control the exact number of old regions added to the CSets:
–XX:G1MixedGCLiveThresholdPercent: The occupancy threshold of live objects in the old region to be included in the mixed collection.
–XX:G1HeapWastePercent: The threshold of garbage that you can tolerate in the heap.
–XX:G1MixedGCCountTarget: The target number of mixed garbage collections within which the regions with at most G1MixedGCLiveThresholdPercent live data should be collected.
–XX:G1OldCSetRegionThresholdPercent: A limit on the max number of old regions that can be collected during a mixed collection.
Let us look at a mixed collection cycle output from a G1 GC log:
1.269: [GC pause (mixed
), 0.00373874 secs][Parallel Time: 3.0 ms][GC Worker Start (ms): 1268.9 1268.9 1268.9 1268.9 Avg: 1268.9, Min: 1268.9, Max: 1268.9, Diff: 0.0][Ext Root Scanning (ms): 0.2 0.2 0.2 0.1Avg: 0.2, Min: 0.1, Max: 0.2, Diff: 0.1] [Update RS (ms): 0.0 0.0 0.0 0.0Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0][Processed Buffers : 0 0 0 1Sum: 1, Avg: 0, Min: 0, Max: 1, Diff: 1][Scan RS (ms): 0.1 0.0 0.0 0.1Avg: 0.1, Min: 0.0, Max: 0.1, Diff: 0.1][Object Copy (ms): 2.6 2.7 2.7 2.6Avg: 2.7, Min: 2.6, Max: 2.7, Diff: 0.1][Termination (ms): 0.1 0.1 0.0 0.1Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1][Termination Attempts : 2 1 2 2Sum: 7, Avg: 1, Min: 1, Max: 2, Diff: 1] [GC Worker End (ms): 1271.9 1271.9 1271.9 1271.9Avg: 1271.9, Min: 1271.9, Max: 1271.9, Diff: 0.0][GC Worker (ms): 3.0 3.0 3.0 2.9Avg: 3.0, Min: 2.9, Max: 3.0, Diff: 0.0][GC Worker Other (ms): 0.1 0.1 0.1 0.1Avg: 0.1, Min: 0.1, Max: 0.1, Diff: 0.0][Clear CT: 0.1 ms][Other: 0.6 ms][Choose CSet: 0.0 ms][Ref Proc: 0.1 ms][Ref Enq: 0.0 ms][Free CSet: 0.3 ms]
In summary, G1 improves upon its predecessor GCs by introducing the concept of regions that make up a logical generation. The regions help provide finer granularity for an incremental collection of the old generation. G1 does most of its reclamation through copying of the live data, thus achieving compaction. This is definitely a step up from in-space de-allocation without compaction, which lends the old generation looking like Swiss cheese! J
The first level of reclamation happens during the Cleanup phase (of the multi-phased marking cycle) when the completely free (i.e. full of garbage) regions are reclaimed and returned to the free list. The next level happens during the incremental mixed garbage collections. If all else fails, the entire Java heap is collected. This is the well-known fail-safe full garbage collection.
All of the above makes the reclamation of the old generation a lot easier and in a way tiered.
I hope this article helped in painting a basic picture of the differences and the makeup of G1 GC. Thank you for tuning in!
Editor's note: Please stay tuned for part 2, coming in September 2013, where Monica will discuss some advanced topics and offer some advice about how to use these metrics to tune your application performance.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 158,847评论 4 362
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,208评论 1 292
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,587评论 0 243
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,942评论 0 205
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,332评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,587评论 1 218
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,853评论 2 312
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,568评论 0 198
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,273评论 1 242
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,542评论 2 246
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,033评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,373评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,031评论 3 236
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,073评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,830评论 0 195
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,628评论 2 274
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,537评论 2 269

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 8,577评论 0 23
  • 转自 http://tech.meituan.com/g1.html 前言G1 GC,全称Garbage-Firs...
    noexceptionsir阅读 1,529评论 0 11
  • 你很安静,从不打扰我,在我码字时,在我阅读时 你就那样静静旳默默地注视着我 看着我一点一点也变得越来越安静 寡言少...
    密小度阅读 266评论 0 1
  • 谈谈我的一些想法,和我对导图的实际应用 1.思维导图用于时间管理 这方面我主要是用思维导图做一天的规划。就是每天早...
    阳光小花阅读 542评论 9 13
  • 1.“罂粟是美丽的,有罪的只是吸毒的人。”每次想写点东西的时候,总会想起这句存在我脑海中十余年的话。 2.今天郑州...
    chbbing8641阅读 169评论 1 1