Developer's Notebook: Understanding JVM internals

What is the Java Virtual Machine?

The Java Virtual Machine, or JVM, is an abstract computer that runs compiled Java programs.

The JVM is "virtual" because it is generally implemented in software on top of a "real" hardware platform and operating system. All Java programs are compiled for the JVM. Therefore, the JVM must be implemented on a particular platform before compiled Java programs will run on that platform.

How it differs from C++?

C++ is compiled directly to machine code which is then executed directly by the central processing unit.

Java is compiled to byte-code which the Java virtual machine (JVM) then interprets at runtime. Actual Java implementations do just-in-time compilation to native machine code.
So you could say there are 2 levels of compilation in Java.
Alternatively, the GNU Compiler for Java can compile directly to machine code.

Is Java truly portable?

Portability isn't a black and white, yes or no kind of thing. Portability is how easily one can I take a program and run it on all of the platforms one cares about.

There are a few things that affect this. One is the language itself. The Java language spec generally leaves much less up to "the implementation". For example, "i = i++" is undefined in C and C++, but has a defined meaning in Java. More practically speaking, types like "int" have a specific size in Java (eg: int is always 32-bits), while in C and C++ the size varies depending on platform and compiler. These differences alone don't prevent you from writing portable code in C and C++, but you need to be a lot more diligent.

Another is the libraries. Java has a bunch of standard libraries that C and C++ don't have. For example, threading, networking and GUI libraries. Libraries of these sorts exist for C and C++, but they aren't part of the standard and the corresponding libraries available can vary widely from platform to platform.

Finally, there's the whole question of whether you can just take an executable and drop it on the other platform and have it work there. This generally works with Java, assuming there's a JVM for the target platform. (and there are JVMs for many/most platforms people care about) This is generally not true with C and C++. You're typically going to at least need a recompile, and that's assuming you've already taken care of the previous two points.

Java bytecodes

Java programs are compiled into a form called Java bytecodes. The JVM executes Java bytecodes, so Java bytecodes can be thought of as the machine language of the JVM. The Java compiler reads Java language source (.java) files, translates the source into Java bytecodes, and places the bytecodes into class (.class) files. The compiler generates one class file per class in the source.

JVM Instructions

To the JVM, a stream of bytecodes is a sequence of instructions. Each instruction consists of a one-byte opcode and zero or more operands. The opcode tells the JVM what action to take. If the JVM requires more information to perform the action than just the opcode, the required information immediately follows the opcode as operands.
A mnemonic is defined for each bytecode instruction. The mnemonics can be thought of as an assembly language for the JVM. For example, there is an instruction that will cause the JVM to push a zero onto the stack. The mnemonic for this instruction is iconst_0, and its bytecode value is 60 hex.

Word Size

The basic unit of size for data values in the Java virtual machine is the word--a fixed size chosen by the designer of each Java virtual machine implementation.

One word size must be large enough to hold a value of type byte (8 bits), short (16 bits), int (32 bits), char (16 bits), float (32 bits) , returnAddress (address of an opcode within the same method), or reference(ref to obj on heap).
Two words must be large enough to hold a value of type long or double (64 bits).

An implementation designer must therefore choose a word size that is at least 32 bits, but otherwise can pick whatever word size will yield the most efficient implementation.
The word size is often chosen to be the size of a native pointer on the host platform.
In practice, pointers will be of 16-bit on a 16-bit system (if you can find one), 32-bit on a 32-bit system, and 64bits on a 64-bit system.

Doesn’t this mean that there are no performance or memory gains when using types small that 64-bit?

In a 64-bit processor, the registers are all 64-bit so if your local variable is assigned to a register and is a boolean, byte, short, char, int, float, double or long it doesn't use memory and doesn't save any resources. Objects are 8-byte aligned so they always take up a multiple of 8-byte in memory. This means Boolean, Byte, Short, Character, Integer, Long , Float and Double, AtomicBoolean, AtomicInteger, AtomicLong, AtomicReference all use the same amount of memory.

Another interesting fact is that some integers are more performant on certain platforms.
For example, in a 32-bit computer (referenced by terms like 32-bit platform and Win32) the CPU is optimized to handle a 32-bit value at a time, and the 32 refers to the number of bits that the CPU can consume or produce in a single cycle. (This is a really simplistic explanation, but it gets the general idea across).

In a 64-bit computer (most recent AMD and Intel processors fall into this category), the CPU is optimized to handle 64-bit values at a time.

So, on a 32-bit platform, a 16-bit integer loaded into a 32-bit address would need to have 16 bits zeroed out so that the CPU could operate on it; a 32-bit integer would be immediately usable without any alteration, and a 64-bit integer would need to be operated on in two or more CPU cycles (once for the low 32-bits, and then again for the high 32-bits).
Conversely, on a 64-bit platform, 16-bit integers would need to have 48 bits zeroed, 32-bit integers would need to have 32 bits zeroed, and 64-bit integers could be operated on immediately.
Each platform and CPU has a 'native' bit-ness (like 32 or 64), and this usually limits some of the other resources that can be accessed by that CPU (for example, the 3GB/4GB memory limitation of 32-bit processors). The 80386 processor family (and later x86) processors made 32-bit the norm, but now companies like AMD and then Intel are currently making 64-bit the norm.

The method area, because it contains bytecodes, is aligned on byte boundaries.

The stack and garbage-collected heap are aligned on word (32-bit) boundaries.

So now when should we even be using short or byte types?

In most cases it wouldn’t be needed unless your use case requires for eg: you are working in image manipulation or reading/writing binary streams, etc.

JVM Architecture

The "virtual hardware" of the Java Virtual Machine can be divided into four basic parts:
the pc registers,
the JVM stack,
the garbage-collected heap,
the method area,
the runtime constant pool and
the native method stack

These parts are abstract, just like the machine they compose, but they must exist in some form in every JVM implementation.

Let's cover the components that are created for each thread first then then rest of the components.

Threads

A thread is a thread of execution in a program.

Why Threads?

They are light wait processes. Moores law states that execution speed increases with transisters increasing but now-a-days we create the overall speed by adding more cores. Adding threads enables the concept of executing/utilizing these cores parallel.

The JVM allows an application to have multiple threads of execution running concurrently.

In the Hotspot JVM there is a direct mapping between a Java Thread and a native operating system Thread. After preparing all of the state for a Java thread such as thread-local storage, allocation buffers, synchronization objects, stacks and the program counter, the native thread is created. The native thread is reclaimed once the Java thread terminates. The operating system is therefore responsible for scheduling all threads and dispatching them to any available CPU. Once the native thread has initialized it invokes the run() method in the Java thread. When the run() method returns, uncaught exceptions are handled, then the native thread confirms if the JVM needs to be terminated as a result of the thread terminating (i.e. is it the last non-deamon thread).

JVM System Threads

If you use jconsole or any debugger it is possible to see there are numerous threads running in the background. These background threads run in addition to the main thread, which is created as part of invoking public static void main(String[]), and any threads created by the main thread.

The main background system threads in the Hotspot JVM are:

> VM thread
This thread waits for operations to appear that require the JVM to reach a safe-point. The reason these operations have to happen on a separate thread is because they all require the JVM to be at a safe point where modifications to the heap can not occur. The type of operations performed by this thread are "stop-the-world" garbage collections, thread stack dumps, thread suspension and biased locking revocation.

> Periodic task thread
This thread is responsible for timer events (i.e. interrupts) that are used to schedule execution of periodic operations

> GC threads
These threads support the different types of garbage collection activities that occur in the JVM

> Compiler threads
These threads compile byte code to native code at runtime

> Signal dispatcher thread
This thread receives signals sent to the JVM process and handle them inside the JVM by calling the appropriate JVM methods.

PC Registers

> Each thread of a running program has its own pc register, or program counter, which is created when the thread is started.
> The pc register is one word in size, so it can hold both a native pointer and a returnAddress.
> As a thread executes a Java method, the pc register contains the address of the current instruction being executed by the thread. An "address" can be a native pointer (to the method area) or an offset from the beginning of a method's bytecodes. If a thread is executing a native method, the value of the pc register is undefined.

JVM stack

> Each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread.
> A Java Virtual Machine stack stores frames.

Frame

> A new frame is created and added (pushed) to the top of stack for every method invocation.
> The frame is removed (popped) when the method returns normally or if an uncaught exception is thrown during the method invocation.

Each frame contains:
1. Local variable array
2. Return value
3. Operand stack
4. Reference to runtime constant pool for class of the current method

1. Local Variables Array

> The array of local variables contains all the variables used during the execution of the method, including a reference to this, all method parameters and other locally defined variables.
> For class methods (i.e. static methods) the method parameters start from zero, however, for instance method the zero slot is reserved for this.

A local variable can be:
• boolean
• byte
• char
• long
• short
• int
• float
• double
• reference
• returnAddress

> Each slot of local variable array is of 32-bits. So to store a character, short or byte variables one slot is used. Means all smaller data-types internally gets converted into int data type.
> Long and double take 2 slots.

2. Return value

This is the return value of the method.

3. Operand stack

Other than the program counter, which can't be directly accessed by instructions, the Java virtual machine has no registers. The Java virtual machine is stack-based rather than register-based because its instructions take their operands from the operand stack rather than from registers
> The Java virtual machine uses the operand stack as a work space.
> Most JVM byte code spends its time manipulating the operand stack by pushing, popping, duplicating, swapping, or executing operations that produce or consume values.

For example, a simple variable initialization results in two byte codes that interact with the operand stack.

int i;

Gets compiled to the following byte code:

0: iconst_0 // Push 0 to top of the operand stack
 1: istore_1 // Pop value from top of operand stack and store as local variable 1

4. References to runtime constant pool

Each frame contains a reference to the runtime constant pool. The reference points to the constant pool for the class of the method being executed for that frame. This reference helps to support dynamic linking.

C/C++ code is typically compiled to an object file then multiple object files are linked together to product a usable artifact such as an executable or dll. During the linking phase symbolic references in each object file are replaced with an actual memory address relative to the final executable.

In Java this linking phase is done dynamically at runtime.

When a Java class is compiled, all references to variables and methods are stored in the class's constant pool as a symbolic reference. A symbolic reference is a logical reference not a reference that actually points to a physical memory location. The JVM implementation can choose when to resolve symbolic references, this can happen when the class file is verified, after being loaded, called eager or static resolution, instead this can happen when the symbolic reference is used for the first time called lazy or late resolution.

However the JVM has to behave as if the resolution occurred when each reference is first used and throw any resolution errors at this point. Binding is the process of the field, method or class identified by the symbolic reference being replaced by a direct reference, this only happens once because the symbolic reference is completely replaced. If the symbolic reference refers to a class that has not yet been resolved then this class will be loaded. Each direct reference is stored as an offset against the storage structure associated with the runtime location of the variable or method.

Native Method Stack

When a thread invokes a native method, it enters a new world in which the structures and security restrictions of the Java virtual machine no longer hamper its freedom. A native method can likely access the runtime data areas of the virtual machine (it depends upon the native method interface), but can also do anything else it wants.

Let's now look a the aspects that are common/independent of Threads

Heap

> The Heap is used to allocate class instances and arrays at runtime.
> Arrays and objects can never be stored on the stack because a frame is not designed to change in size after it has been created. The frame only stores references that point to objects or arrays on the heap.
> Unlike primitive variables and references in the local variable array (in each frame) objects are always stored on the heap so they are not removed when a method ends. Instead objects are only removed by the garbage collector.

To support garbage collection the heap is divided into three sections:
1. Young Generation
Often split between Eden and Survivor
2. Old Generation (also called Tenured Generation)
3. Permanent Generation

Developer's Notebook

Understanding JVM internals

No comments:

Post a Comment

About Me

Me Elsewhere