Last time we went through some of the main strategies and algorithms used in garbage collection. Now as we have some ground knowledge we can look into their combinations used in different garbage collectors.
To be able to somehow categorize GCs and find patterns in how they function, we’ll need some basic concepts defining them:
Concurrent — this means that garbage collection is done while an application is still running, concurrently with its execution.
Stop-the-world (STW) — meaning application is stopped while garbage collection is being performed.
Parallel —GC uses multiple threads to perform collection. Note that it has nothing to do with concurrency. GC can be both concurrent and parallel, neither of those, concurrent but not parallel, or vice versa.
Serial — all work is done in one thread, meaning no parallel execution. As in the case with the parallel, we are not talking about concurrency.
Monolithic — all garbage is collected in a single, indivisible step.
Incremental — garbage collection is done in a series of smaller, divisible steps with gaps in between.
Generational — we already discussed that, but just to remind, it means that the heap where objects are stored and from which they are collected is divided into two main spaces — old generation and young generation.
Regionalized — meaning that the heap is divided into a number of regions (they can be of equal or varying size, depending on the implementation) . Upon startup, the JVM sets the region size. The region sizes can vary among different GC, depending on the heap size.
Mostly — it is not my original idea to include this important “term” here, since you’ll see it in most of the talks or papers, but it is very important to keep in mind that nothing is absolute — GCs can be mostly concurrent, mostly parallel and so on, meaning they sometimes stop the world, fallback to monolithic, etc.
The most popular GCs that you’ll find in the listings are:
Epsilon (do nothing)
Concurrent mark sweep collector (CMS)
and the newest ones:
Serial GCs are not widely used as they are single-threaded and designed mainly for small heaps. Meanwhile a Parallel collector is default in applications using all Java versions till 8 (e.g. Wix uses it). Next goes CMS. This type of collector is not default in any of the versions of Java since it requires some additional CPU resources to perform its best and it’s not recommended for heaps bigger than 4Gb. If that’s the case there’s G1 GC, introduced in Java 7, which is default starting Java 9. But if you want something even more exciting, there are two very hot and exciting GCs out there— ZGC and Shenandoah (we’ll talk about them a bit later).
So now that we know basic algorithms, strategies and terms that are used to define GCs, we can summarize some of them in the following table:
Summary of some of the most common GCs
Why do we need so many GCs? Why there’s these new ZGC and Shenandoah to which we event can’t apply previously discussed strategies and algorithms? Why not make everyone use G1 or Parallel since they are default in some Java versions? (and note that not only Oracle is developing GCs — Shenandoah is the product of Red Hat, which implies that after all there must be a good reason to do it)
Main reasons some developers are not happy with existing GCs:
No fully-concurrent GC. Both Shenandoah and ZGC focus on reducing pause-times while still compacting the heap. Main challenge for concurrent compaction is that it requires the GC to move objects around while Java threads are still running, while all references to those objects immediately start accessing new copies. Both ZGC and Shenandoah are developed so they can move objects around the memory, while an application is running.
Aleksey Shipilёv did a great job comparing some of the GCs in the context of concurrency in the presentation “Shenandoah GC Part I: The Garbage Collector That Could” here (slide 7).
Heaps are getting bigger — back when G1 was introduced (2006) the most powerful AWS instance available had 1 vCPU and 1.75GB of RAM. Today you can rent one with 128 vCPU and 3,904GB of RAM. New GCs are being developed for our times, when such capacities are common.
Both ZGC and Shenandoah were developed to meet the following goals:
Pause times do not exceed 10ms
Pause times do not increase with the heap or live-set size
Be able to handle heaps ranging from a few hundred megabytes to multi terabytes in size
If you look back at the summary table above you see that first 4 GCs are fairly simple and all our terms can be applied to them. But it’s not the case with the last two — ZGC and Shenandoah. In order to understand them we’ll need some new terms.
Pointer coloring —in a few words, it’s simply refers to storing additional data in pointers. Pointers used here are 64 bits (ZGC is 64-bit only), meaning there is more available space that can actually be used, so 4 bits are used for additional data — pointer’s “color”. Possible states are finalizable, remap, mark0 and mark1. finalizable bit — the object is only reachable through a finalizer; remap bit — the reference points to the current address of the object; marked0 and marked1 bits — these are used to flag reachable objects.
There’s a nice diagram in the ZGC source for it.
Brooks forwarding pointers — we mentioned above that for a long time there was no fully-concurrent GC. And the reason behind this is that the compacting phase had to be done in STW mode to ensure no one could access an object until it was in it’s new location. Problems could occur if you tried accessing an object, which the GC moved. When objects are moving, an object graph is considered to be inconsistent, so it’s better to prevent access to it. Forwarding pointer is a reference in the old object’s location which is pointing to the new location. So as objects are moving you can still access them with the old reference, since it can still be accessed through the forwarding pointer.
Load barriers (sometimes called read barriers) — it’s a small piece of machine code, injected by the JIN in strategic places, more specifically when loading an object reference from the heap. For instance, in ZGC this piece of code checks if the pointer has the correct color, and if not, corrects it.
Write barriers — almost the same as load barrier, except the aforementioned piece of code is emitted by the compiler immediately before every store (write) operation.
So now as we are loaded with terms, we can actually try to understand how these new GCs work in a very simplified manner.
In order to understand how ZGC works we’ll need some of the terms we just introduced:
As in all GCs, once triggered, ZGC needs to do a marking stage, to see which objects are eligible for garbage collection. ZGC breaks marking into the following phases:
A STW phase of identifying and marking GC roots.
A concurrent phase of tracing objects and marking them. marked0 and marked1 metadata bits are used for marking.
Once marking is done we can move on to another stage — relocation. In ZGC the memory is split into blocks (called pages) and reallocation is done either for all objects on the page or for none. This way memory allocation is faster since there are whole empty blocks in the memory. Reallocation consists of three phases:
A concurrent phase looks for pages that have live objects to be relocated and puts them into a relocation set (set of the pages chosen for evacuation, e.g. those pages with the most amount of garbage).
A STW phase relocates all root references in the previously selected relocation set and updates their references.
A concurrent phase relocates all remaining objects in the relocation set and stores the mapping between old and new addresses in the forward table.
Then, finally, remapping happens. Notice, that the relocation phase didn’t include rewriting references for the objects that were moved (except roots). This is done later, once objects are loaded, using load barrier and coloured pointers to indicate what needs to be done. The remapping stage can be presented as following:
ZGC is a non-generational GC, but as mentioned in the talk given by Erik Österlund, they are looking into possibilities to add Young Generation to it.
In contrast with ZGC, to be able to move objects while an application is running, Shenandoah uses:
Brooks forwarding pointers.
In Shenandoah all reads and writes by an application go through a forwarding pointer.
A STW Initial Marking: scan the root set.
A Concurrent Marking: trace the heap marking live objects, updating any references to regions evacuated in the previous gc cycle.
A STW Final Marking: re-scan the root set, copy and update roots to point to to-region copies. Initiate concurrent compaction. Free any fully evacuated regions from previous compaction.
Concurrent Compaction: Evacuate live objects from targeted regions.
So basically it marks the garbage starting from usual GC roots and relocates live objects from selected regions by copying them to new regions. And only the second time GC is triggered, during a Concurrent Marking phase, is when it updates all references to point to the evacuated (copied) objects. At the end of this phase the evacuated regions may be reclaimed. And how does GC know which references need to be updated and to what? Brooks forwarding pointers.
Each GC cycle consists of two stop-the-world phases and 2 concurrent phases. Illustration of simplified Shenandoah phases can be seen here.
No silver bullet - conclusion
There’s a need for new GCs - since we always want better performance, lower pauses, etc. But nothing’s perfect — neither ZGC nor Shenandoah nor any other GC is the ultimate one everyone should or will use. Because it all depends. They all have their trade offs — e.g. ZGC can’t be used on machines other than x64/Linux, both ZGC and Shenandoah have time overhead because of read and write barriers, there’s maintenance costs for forwarding pointers, CMS is not compacting, etc.
What’s important to understand is that garbage collection matters — although no one might be talking about it, everyone’s still doing it because they have to.
Oh, and don’t worry if you didn’t get how the last or any of GCs I’ve presented here work in actuality — to be honest I didn’t as well. My goal wasn’t to explain to you the very details of how they’re implemented, but rather to provide you, and myself, with some ground knowledge and concepts in order to be able to analyze them, think about, and maybe one day, with some additional reading, tune them or be able to decide which to use for your project.
This post was written by Brygida Ivona Pliška
For more engineering updates and insights: