Introduction to JVM Method Profiling

In my latest blog post, I mentioned that the size of the method compiled by the JIT compiler depends on the amount of profiling data included. But what does that actually mean? In this article, I will answer that question and also explain how the profiling mechanism works in general.
Experiment
Environment
I was conducting some research using a VM on GCP, specifically running Debian GNU/Linux 12 (bookworm) on an amd64 architecture. The Java version installed was Temurin-25+36 (build 25+36-LTS). You can find the complete source code on GitHub. To gain insight into the state of JVM internal objects residing in Metaspace, I utilized the jhsdb utility.
Run the program
The program I used for the experiment is basically a simple loop, as shown below:
static void main(String[] args) throws InterruptedException {
var nrOfIterations = Integer.parseInt(args[0]);
var sleepInMillis = Integer.parseInt(args[1]);
var argumentHolder = new ArgumentsHolder(nrOfIterations, sleepInMillis);
var sum = 0;
for (int i = 1; i <= argumentHolder.getNrOfIterations(); i++) {
IO.println("This is: " + i + " iteration");
Thread.sleep(resolveSleepInMillis(i, argumentHolder.getSleepInMillis()));
sum += resolveNumber(i);
}
IO.println("My fantastic sum is: " + sum);
}The Thread.sleep() is included to allow me to stop the JVM at specific points.
I ran the program with the following command:
java -XX:+PrintCompilation -jar target/app.jar 100000 100
I included the PrintCompilation flag in the run command because it allows me to see when specific methods are compiled by the JIT compiler or decompiled back to interpreter mode.
To attach a command-line debugger to a running JVM, I first needed to know the process ID. Once I had the pid, I could run:
jhsdb clhsdb --pid <pid>
After the debugger was attached I could see the method details, but I needed to perform some initial steps first.
- To find the class memory pointer using the class fully qualified name:
class org.zygiert.Main
- This command returned the pointer to actual
InstanceKlassobject in Metaspace - With this information, I was able to run:
inspect <memory_address>
- This command returned detailed information about the class. The most relevant part was a pointer to the methods array:
Array<Method*>* InstanceKlass::_methods: Array<Method*> @ 0x00007f307c400400
- To inspect actual array content, I ran the following command (I used 5 here to view five memory words):
mem 0x00007f307c400400/5
- The output looked as follows (the first line indicates the array size, followed by four consecutive pointers to the methods of this class):
0x00007f307c400400: 0x0000000000000004
0x00007f307c400408: 0x00007f307c400478
0x00007f307c400410: 0x00007f307c4005c0
0x00007f307c400418: 0x00007f307c400768
0x00007f307c400420: 0x00007f307c400698With this information, I gained interesting insights into how the method metadata changed as the application ran. I used the inspect command, which I previously mentioned. Similar steps can be repeated to examine each class in the codebase. In my case, I was particularly interested in one more class.
Tier 0
The first stop of a program I created was at the 235th iteration. By knowing the memory addresses for each method, I was able to access the method metadata. For the main method, it looked as follows:
Type is Method (size of 88)
ConstMethod* Method::_constMethod: ConstMethod @ 0x00007f307c4004d0
MethodData* Method::_method_data: MethodData @ null
MethodCounters* Method::_method_counters: MethodCounters @ 0x00007f307c4009a0
AccessFlags Method::_access_flags: 8
int Method::_vtable_index: -2
u2 Method::_intrinsic_id: 0
nmethod* Method::_code: nmethod @ null
address Method::_i2i_entry: address @ 0x00007f307c4005f8
address Method::_from_compiled_entry: address @ 0x00007f307c400600
address Method::_from_interpreted_entry: address @ 0x00007f307c400610From a profiling perspective, a few important observations can be made:
- There is no reference to
MethodDataat this point. - There is a
MethodCountersreference, indicating that profiling has already started. - We do not have a reference to the compiled machine code, the
nmethodpointer is null, which means that method is currently in interpreted mode.
Next we can examine the MethodCounters objects. Here is the output for three methods:
- main:
Type is MethodCounters (size of 64)
int MethodCounters::_invoke_mask: 254
int MethodCounters::_backedge_mask: 2046
u2 MethodCounters::_interpreter_throwout_count: 0
u2 MethodCounters::_number_of_breakpoints: 0
InvocationCounter MethodCounters::_invocation_counter: 2
InvocationCounter MethodCounters::_backedge_counter: 468- resolveNumber:
Type is MethodCounters (size of 64)
int MethodCounters::_invoke_mask: 254
int MethodCounters::_backedge_mask: 2046
u2 MethodCounters::_interpreter_throwout_count: 0
u2 MethodCounters::_number_of_breakpoints: 0
InvocationCounter MethodCounters::_invocation_counter: 468
InvocationCounter MethodCounters::_backedge_counter: 0- resolveSleepInMillis:
Type is MethodCounters (size of 64)
int MethodCounters::_invoke_mask: 254
int MethodCounters::_backedge_mask: 2046
u2 MethodCounters::_interpreter_throwout_count: 0
u2 MethodCounters::_number_of_breakpoints: 0
InvocationCounter MethodCounters::_invocation_counter: 470
InvocationCounter MethodCounters::_backedge_counter: 0There is also some interesting information regarding how profiling works. For the main method, we can observe that the invocation counter is 2 and the backedge counter is 468. This indicates that in interpreted mode, the JVM gathers information on:
- How many times a method is invoked - this is the invocation counter
- How many times a loop iterates within that method - this is the backedge counter
The question arises: why, at the 235th iteration, do we see the backedge counter value as 468 instead of 235? This occurs because the JVM increments counters by 2 instead of 1. The value is 468 instead of 470 because we are in the middle of processing an iteration step, which is why the backedge counter has not yet been incremented to 470.
To answer the original question, the JVM increments counters by 2 for performance reasons. The counter value is stored as an integer, which is 32 bits long. The first bit is known as the carry bit. Incrementing by 2 means that the carry bit remains unchanged.
This is significant because, to determine whether a method should be checked for compilation to a higher Tier, the JVM performs a bitwise AND operation on the counter value and the mask, which is more efficient than conducting an arithmetic comparison. The existence of a carry bit facilitates this process.
Moreover, the JVM does not check on each increment whether a given method should be compiled to a higher Tier. Instead, the frequency of these checks is determined using invoke_mask and backedge_mask values. In interpreter mode, these are calculated based on Tier0InvokeNotifyFreqLog (default value: 7) and Tier0BackedgeNotifyFreqLog (default value: 10). The code responsible for doing actual calculations can be seen in the methodCounters.cpp.
In our example, with an invoke_mask 254, checks for compiling to a higher Tier will occur after every 128 invocations of a method. For backedge cases, the checks will happen after 1024 backedges.
Tier 1
In my code, I intentionally introduced the ArgumentsHolder class with two getters to illustrate that there is no profiling data at Tier 1. For both getters, the counters appear as follows:
InvocationCounter MethodCounters::_invocation_counter: 256
InvocationCounter MethodCounters::_backedge_counter: 0Counter bumping concluded after 128 method invocations. At this point, the getters have been compiled by the C1 compiler. We can verify this by inspecting the method metadata. For getNrOfIterations, it appears as follows:
Type is Method (size of 88)
ConstMethod* Method::_constMethod: ConstMethod @ 0x00007f307c400c60
MethodData* Method::_method_data: MethodData @ null
MethodCounters* Method::_method_counters: MethodCounters @ 0x00007f307c400ec8
AccessFlags Method::_access_flags: 2
int Method::_vtable_index: -2
u2 Method::_intrinsic_id: 0
nmethod* Method::_code: nmethod @ 0x00007f30a3ecc788
address Method::_i2i_entry: address @ 0x00007f307c400ce8
address Method::_from_compiled_entry: address @ 0x00007f307c400cf0
address Method::_from_interpreted_entry: address @ 0x00007f307c400d00From this, we can observe the following:
- There is no reference to
MethodData, similar to what we see in interpreter mode. - The
nmethodis not null, indicating that we have machine code for this method stored in the code cache. However, this machine code does not include any additional instrumentation code for incrementing counters.
It is worth mentioning that to obtain disassembled machine code, it is also possible to utilize jhsdb. All we need is the memory address of the method, and then we can run the following command:
disassemble <memory_address>Tier 3
The next stop was at the 314th iteration. At this stage, the resolveNumber method has been compiled by the C1 compiler as well. When inspecting the method metadata, we can see details such as:
Type is Method (size of 88)
ConstMethod* Method::_constMethod: ConstMethod @ 0x00007f307c400638
MethodData* Method::_method_data: MethodData @ 0x00007f307c401928
MethodCounters* Method::_method_counters: MethodCounters @ 0x00007f307c400f88
AccessFlags Method::_access_flags: 10
int Method::_vtable_index: -2
u2 Method::_intrinsic_id: 0
nmethod* Method::_code: nmethod @ 0x00007f309c427588
address Method::_i2i_entry: address @ 0x00007f307c4006d0
address Method::_from_compiled_entry: address @ 0x00007f307c4006d8
address Method::_from_interpreted_entry: address @ 0x00007f307c4006e8The main points to note are:
- A
MethodDataobject has been created, which will gather more detailed profiling information - The
MethodCountersobject is still present but has 'frozen' at this point - Compiled machine code exists, as indicated by the presence of a non-null
nmethod
This is the source code of resolveNumber method:
private static int resolveNumber(int i) {
if (i < 10000) {
return i;
} else {
return i * 2;
}
}
In the generated machine code for this method, there is additional code responsible for loading counters, incrementing, and storing them back again. The term "counters" refers not only to the invocation counter but also to the taken and not taken branch counters stored within the MethodData. This is what is meant by "full profiling data" for this particular method.
From this point onward, the MethodCounters will not be modified further:
InvocationCounter MethodCounters::_invocation_counter: 512
InvocationCounter MethodCounters::_backedge_counter: 0The invocation_counter stopped after 256 iterations. However, there is another invocation_counter within MethodData.
By examining the method's machine code, we can identify the memory address it occupies. The relevant machine code snippet for the invocation counter is as follows:
0x00007f309c42769b: movabs $0x7f307c401928,%rax
0x00007f309c4276a5: mov 0x90(%rax),%edi
0x00007f309c4276ab: add $0x2,%edi
0x00007f309c4276ae: mov %edi,0x90(%rax)In the second line, the counter is loaded from offset 0x90 (hexadecimal, equivalent to 144 in decimal), calculated from the MethodData pointer. Since we have the MethodData address, we can compute the invocation counter's address and then use the following jhsdb command to inspect it:
printas InvocationCounter <memory_address>
The output is:
pointer to unsigned InvocationCounter @ 0x00007f307c4019b8 (size = 4)
unsigned int InvocationCounter::_counter: 114This counter is also incremented by 2 with each invocation. This indicates that the counter has been incremented 57 times since it was created. The counter was established after this method transitioned to a higher Tier, which occurred after 256 invocations. Although my application log showed 314 iterations of the loop, the resolveNumber method has not been invoked yet. This is why we see the counter value as 114 (314 - 256 = 58; since there was no invocation yet, we end up with 57).
We can observe that this counter has been in use since its creation, and at the 1739th iteration, the counter value is:
pointer to unsigned InvocationCounter @ 0x00007f307c4019b8 (size = 4)
unsigned int InvocationCounter::_counter: 2964Tier 4
After 5,120 invocations, the resolveNumber method has been compiled by the C2 compiler. I forced another pause at the 5,513th iteration of the loop. At this point, the invocation counter is frozen at the value that triggered the C2 compilation, as no profiling data is collected for code compiled by C2. Here is the value of the invocation counter:
pointer to unsigned InvocationCounter @ 0x00007f307c4019b8 (size = 4)
unsigned int InvocationCounter::_counter: 10240Deoptimization
The resolveNumber method has been deoptimized after 10,000 invocations. Here is how the method is defined:
private static int resolveNumber(int i) {
if (i < 10000) {
return i;
} else {
return i * 2;
}
}
The JIT compiler initially performed a speculative optimization, as it determined that for values of i less than 10,000, the method consistently returned the value of i. However, after more than 10,000 calls, the code began to execute the else branch, necessitating deoptimization.
Let's further explore what this means in practice. Below is the method's metadata:
Type is Method (size of 88)
ConstMethod* Method::_constMethod: ConstMethod @ 0x00007f307c400638
MethodData* Method::_method_data: MethodData @ 0x00007f307c401928
MethodCounters* Method::_method_counters: MethodCounters @ 0x00007f307c400f88
AccessFlags Method::_access_flags: 10
int Method::_vtable_index: -2
u2 Method::_intrinsic_id: 0
nmethod* Method::_code: nmethod @ null
address Method::_i2i_entry: address @ 0x00007f307c4006d0
address Method::_from_compiled_entry: address @ 0x00007f307c4006d8
address Method::_from_interpreted_entry: address @ 0x00007f307c4006e8- There is non-null reference to
MethodData, indicating that it wasn’t removed after deoptimization - There is no reference to
nmethod, as it has been deoptimized
The invocation counter has started to increase once again:
pointer to unsigned InvocationCounter @ 0x00007f307c4019b8 (size = 4)
unsigned int InvocationCounter::_counter: 10290Furthermore there is an update in MethodData:
uint MethodData::_compiler_counters._nof_decompiles: 1
Missing piece
In previous sections, I covered all the compilation Tiers except for one - Tier 2. This Tier appears to include "limited profiling data". Compiling a method at the Tier 2 level is not straightforward, so I forced the JVM to do so by running the following command:
java -XX:+PrintCompilation -XX:TieredStopAtLevel=2 -jar target/app.jar 1000 100The term "limited profiling" means that at this tier, the JVM does not collect detailed profiling information, such as, e.g., taken branches. Instead, it only increments invocation and backedge counters. Although a MethodData object exists, the data collected remains minimal.
Type is Method (size of 88)
ConstMethod* Method::_constMethod: ConstMethod @ 0x00007fa840400638
MethodData* Method::_method_data: MethodData @ 0x00007fa840401928
MethodCounters* Method::_method_counters: MethodCounters @ 0x00007fa840400f88
AccessFlags Method::_access_flags: 10
int Method::_vtable_index: -2
u2 Method::_intrinsic_id: 0
nmethod* Method::_code: nmethod @ 0x00007fa860546188
address Method::_i2i_entry: address @ 0x00007fa8404006d0
address Method::_from_compiled_entry: address @ 0x00007fa8404006d8
address Method::_from_interpreted_entry: address @ 0x00007fa8404006e8Conclusion
In the JVM, there are two key data structures used to store profiling information:
MethodCounterswhich are utilized at the start of program executionMethodDatathis structure is created once a method is compiled by the JIT compiler. It remains in use even if the method is deoptimized and reverts to interpreter mode. Additionally,MethodDataallows the JVM to gather more detailed information, which can later be leveraged by the C2 compiler to optimize machine code more effectively
For a more in-depth look at my analysis, the source code is available on GitHub. I hope this article provides valuable insights into how method profiling generally works.
Reviewed by: Paweł Stawicki
