Developer's Notebook: JVM and JRE

What is the Java Virtual Machine?

The Java Virtual Machine, or JVM, is an abstract computer that runs compiled Java programs. The JVM is "virtual" because it is generally implemented in software on top of a "real" hardware platform and operating system. All Java programs are compiled for the JVM. Therefore, the JVM must be implemented on a particular platform before compiled Java programs will run on that platform.

How is Java execution different from C/C++?

In short, C is compiled directly to machine code which is then executed directly by the central processing unit.

Normally the C’s program building process involves four stages and utilizes different ‘tools’ such as a preprocessor, compiler, assembler, and linker.

1. Preprocessing is the first pass of any C compilation. It processes include-files, conditional compilation instructions and macros.
2. Compilation is the second pass. It takes the output of the preprocessor, and the source code, and generates assembler source code.
3. Assembly is the third stage of compilation. It takes the assembly source code and produces an assembly listing with offsets. The assembler output is stored in an object file.
4. Linking is the final stage of compilation. It takes one or more object files or libraries as input and combines them to produce a single (usually executable) file. In doing so, it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises code and data to reflect new addresses (a process called relocation).

Java is compiled to byte-code which the Java virtual machine (JVM) then interprets at runtime. Actual Java implementations do just-in-time compilation to native machine code. Alternatively, the GNU Compiler for Java can compile directly to machine code.

So now what is the difference between interpreter and JIT compiler?

Answer given below from stackoverflow.

With JVM, both interpreter and compiler (the JVM compiler and not the source-code compiler like javac) produce native code (aka Machine language code for the underlying physical CPU like x86) from byte code.

What's the difference then:
The difference is in how they generate the native code, how optimized it is as well how costly the optimization is. Informally, an interpreter pretty much converts each byte-code instruction to corresponding native instruction by looking up a predefined JVM-instruction to machine instruction mapping (see below pic). Interestingly, a further speedup in execution can be achieved, if we take a section of byte-code and convert it into machine code - because considering a whole logical section often provides rooms for optimization as opposed to converting (interpreting) each line in isolation (to machine instruction). This very act of converting a section of byte-code into (presumably optimized) machine instruction is called compiling (in the current context). When the compilation is done at run-time, the compiler is called JIT compiler.

What makes Java portable?

Portability isn't a black and white, yes or no kind of thing. Portability is how easily one can I take a program and run it on all of the platforms one cares about.
There are a few things that affect this. One is the language itself. The Java language spec generally leaves much less up to "the implementation". For example, "i = i++" is undefined in C and C++, but has a defined meaning in Java. More practically speaking, types like "int" have a specific size in Java (eg: int is always 32-bits), while in C and C++ the size varies depending on platform and compiler. These differences alone don't prevent you from writing portable code in C and C++, but you need to be a lot more diligent.

Another is the libraries. Java has a bunch of standard libraries that C and C++ don't have. For example, threading, networking and GUI libraries. Libraries of these sorts exist for C and C++, but they aren't part of the standard and the corresponding libraries available can vary widely from platform to platform.
Finally, there's the whole question of whether you can just take an executable and drop it on the other platform and have it work there. This generally works with Java, assuming there's a JVM for the target platform. (and there are JVMs for many/most platforms people care about) This is generally not true with C and C++. You're typically going to at least need a recompile, and that's assuming you've already taken care of the previous two points.

Java bytecodes

Java programs are compiled into a form called Java bytecodes. The JVM executes Java bytecodes, so Java bytecodes can be thought of as the machine language of the JVM. The Java compiler reads Java language source (.java) files, translates the source into Java bytecodes, and places the bytecodes into class (.class) files. The compiler generates one class file per class in the source.

To the JVM, a stream of bytecodes is a sequence of instructions. Each instruction consists of a one-byte opcode and zero or more operands. The opcode tells the JVM what action to take. If the JVM requires more information to perform the action than just the opcode, the required information immediately follows the opcode as operands.
A mnemonic is defined for each bytecode instruction. The mnemonics can be thought of as an assembly language for the JVM. For example, there is an instruction that will cause the JVM to push a zero onto the stack. The mnemonic for this instruction is iconst_0, and its bytecode value is 60 hex.

Java Language Specification

Not all major programming languages have specifications, and languages can exist and be popular for decades without a specification. ALGOL 68 was the first (and possibly one of the last) major language for which a full formal definition was made before it was implemented.

When Sun was the steward of Java, they would write specifications for features like Java EE and JPA, giving other vendors the freedom to implement as they wished. Sun would usually have an implementation of their own with which to compete, but their goal was to sell hardware. Encouraging other vendors to provide competing implementations only furthered that aim.

In Java-land these specifications are started as Java Specification Requests (JSRs). Defined here: https://jcp.org/en/jsr/overview. Once accepted, these are specifications.

Sun made most of the JDK source code (not the JVM) freely available under open-source license.

Java Virtual Machine Specification

On November 13, 2006, Sun released much of its Java virtual machine (JVM) as free and open-source software, (FOSS), under the terms of the GNU General Public License (GPL). On May 8, 2007, Sun finished the process, making all of its JVM's core code available under free software/open-source distribution terms, aside from a small portion of code to which Sun did not hold the copyright.

https://docs.oracle.com/javase/specs/

JVM specification dictates how a JVM needs to respond to byte code commands.

Other famous JVM implementation are OpenJDK, IBM J9, GCJ, etc.

The OpenJDK is the open-source implementation of the Java SE 7 JSR (JSR 336). Now there is almost no difference between the Oracle JDK and the OpenJDK. Last year, Oracle took this decision : Moving to OpenJDK as the official Java SE 7 Reference Implementation

Word Size

The basic unit of size for data values in the Java virtual machine is the word--a fixed size chosen by the designer of each Java virtual machine implementation.
One word size must be large enough to hold a value of type byte (8 bits), short (16 bits), int (32 bits), char (16 bits), float (32 bits) , returnAddress (address of an opcode within the same method), or reference(ref to obj on heap).

Two words must be large enough to hold a value of type long or double (64 bits).

On a 64-bit machine one word will be able to hold all types.

An implementation designer must therefore choose a word size that is at least 32 bits, but otherwise can pick whatever word size will yield the most efficient implementation.
The word size is often chosen to be the size of a native pointer on the host platform.
In practice, pointers will be of 16-bit on a 16-bit system (if you can find one), 32-bit on a 32-bit system, and 64bits on a 64-bit system.

Doesn’t this mean that there are no performance or memory gains when using types small that 64-bit?

In a 64-bit processor, the registers are all 64-bit so if your local variable is assigned to a register and is a boolean, byte, short, char, int, float, double or long it doesn't use memory and doesn't save any resources. Objects are 8-byte aligned so they always take up a multiple of 8-byte in memory. This means Boolean, Byte, Short, Character, Integer, Long , Float and Double, AtomicBoolean, AtomicInteger, AtomicLong, AtomicReference all use the same amount of memory.
Another interesting fact is that some integers are more performant on certain platforms.

For example, in a 32-bit computer (referenced by terms like 32-bit platform and Win32) the CPU is optimized to handle a 32-bit value at a time, and the 32 refers to the number of bits that the CPU can consume or produce in a single cycle. (This is a really simplistic explanation, but it gets the general idea across).
In a 64-bit computer (most recent AMD and Intel processors fall into this category), the CPU is optimized to handle 64-bit values at a time.
So, on a 32-bit platform, a 16-bit integer loaded into a 32-bit address would need to have 16 bits zeroed out so that the CPU could operate on it; a 32-bit integer would be immediately usable without any alteration, and a 64-bit integer would need to be operated on in two or more CPU cycles (once for the low 32-bits, and then again for the high 32-bits).

Conversely, on a 64-bit platform, 16-bit integers would need to have 48 bits zeroed, 32-bit integers would need to have 32 bits zeroed, and 64-bit integers could be operated on immediately.
Each platform and CPU has a 'native' bit-ness (like 32 or 64), and this usually limits some of the other resources that can be accessed by that CPU (for example, the 3GB/4GB memory limitation of 32-bit processors). The 80386 processor family (and later x86) processors made 32-bit the norm, but now companies like AMD and then Intel are currently making 64-bit the norm.

So now when should we even be using short or byte types?

In most cases it wouldn’t be needed unless your use case requires for eg: you are working in image manipulation or reading/writing binary streams, etc.
The method area, because it contains bytecodes, is aligned on byte boundaries. The stack and garbage-collected heap are aligned on word (32-bit) boundaries.

JVM Architecture

The "virtual hardware" of the Java Virtual Machine can be divided into 6 basic parts:
The pc registers,
the Java stack,
the garbage-collected heap,
the method area,
the runtime constant pool
and the native method stack

These parts are abstract, just like the machine they compose, but they must exist in some form in every JVM implementation.

PC Registers

Each thread of a running program has its own pc register, or program counter, which is created when the thread is started.
The pc register is one word in size, so it can hold both a native pointer and a returnAddress.

As a thread executes a Java method, the pc register contains the address of the current instruction being executed by the thread. An "address" can be a native pointer (to the method area) or an offset from the beginning of a method's bytecodes. If a thread is executing a native method, the value of the pc register is undefined.

Java stack

Each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread.
A Java Virtual Machine stack stores frames.

Frame

A frame is used to store data and partial results, as well as to perform dynamic linking, return values for methods, and dispatch exceptions.
A new frame is created each time a method is invoked. A frame is destroyed when its method invocation completes, whether that completion is normal or abrupt (it throws an uncaught exception).
Each frame contains:
• Local variable array
• Operand stack
• Frame data

Local Variables Array

The array of local variables contains all the variables used during the execution of the method, including a reference to this, all method parameters and other locally defined variables. For class methods (i.e. static methods) the method parameters start from zero, however, for instance method the zero slot is reserved for this.
A local variable can be:
• boolean
• byte
• char
• long
• short
• int
• float
• double
• reference
• returnAddress

Each slot of local variable array is of 32-bits. So to store a character, short or byte variables one slot is used. Means all smaller data-types internally gets converted into int data type.
Long and double take 2 slots.

Operand Stack

Like the local variables, the operand stack is organized as an array of words. But unlike the local variables, which are accessed via array indices, the operand stack is accessed by pushing and popping values.
The virtual machine stores the same data types in the operand stack that it stores in the local variables, Other than the program counter, which can't be directly accessed by instructions, the Java virtual machine has no registers. The Java virtual machine is stack-based rather than register-based because its instructions take their operands from the operand stack rather than from registers.
The Java virtual machine uses the operand stack as a work space. Many instructions pop values from the operand stack, operate on them, and push the result. For example, the iadd instruction adds two integers by popping two ints off the top of the operand stack, adding them, and pushing the int result.

Here is how a Java virtual machine would add two local variables that contain ints and store the intresult in a third local variable:
iload_0 // push the int in local variable 0
iload_1 // push the int in local variable 1
iadd // pop two ints, add them, push result
istore_2 // pop int, store into local variable

Frame Data

This is to support constant pool resolution, normal method return, and exception dispatch.

Many instructions in the Java virtual machine's instruction set refer to entries in the constant pool. Some instructions merely push constant values of type int, long, float, double, or String from the constant pool onto the operand stack. Some instructions use constant pool entries to refer to classes or arrays to instantiate, fields to access, or methods to invoke. Other instructions determine whether a particular object is a descendant of a particular class or interface specified by a constant pool entry.

Aside from constant pool resolution, the frame data must assist the virtual machine in processing a normal or abrupt method completion. If a method completes normally (by returning), the virtual machine must restore the stack frame of the invoking method. It must set the pc register to point to the instruction in the invoking method that follows the instruction that invoked the completing method. If the completing method returns a value, the virtual machine must push that value onto the operand stack of the invoking method.

The frame data must also contain some kind of reference to the method's exception table, which the virtual machine uses to process any exceptions thrown during the course of execution of the method.
An exception table defines ranges within the bytecodes of a method that are protected by catch clauses. Each entry in an exception table gives a starting and ending position of the range protected by a catch clause, an index into the constant pool that gives the exception class being caught, and a starting position of the catch clause's code.

Native Method Stacks
In addition to all the runtime data areas defined by the Java virtual machine specification and described previously, a running Java application may use other data areas created by or for native methods. When a thread invokes a native method, it enters a new world in which the structures and security restrictions of the Java virtual machine no longer hamper its freedom. A native method can likely access the runtime data areas of the virtual machine (it depends upon the native method interface), but can also do anything else it wants. It may use registers inside the native processor, allocate memory on any number of native heaps, or use any kind of stack.

The Heap

Whenever a class instance or array is created in a running Java application, the memory for the new object is allocated from a single heap. As there is only one heap inside a Java virtual machine instance, all threads share it. Because a Java application runs inside its "own" exclusive Java virtual machine instance, there is a separate heap for every individual running application. There is no way two different Java applications could trample on each other's heap data.

Garbage Collection

A garbage collector's primary function is to automatically reclaim the memory used by objects that are no longer referenced by the running application. It may also move objects as the application runs to reduce heap fragmentation.

A garbage collector is not strictly required by the Java virtual machine specification. The specification only requires that an implementation manage its own heap in some manner. For example, an implementation could simply have a fixed amount of heap space available and throw an OutOfMemory exception when that space fills up. While this implementation may not win many prizes, it does qualify as a Java virtual machine.

Object Representation

The Java virtual machine specification is silent on how objects should be represented on the heap.

Developer's Notebook

JVM and JRE

No comments:

Post a Comment

About Me

Me Elsewhere