1
/
5

Beautiful algorithms used in a compiler and a stack based virtual machine ~Part 5. What Monkey does not have, but Java has~

Photo by Edson Saldaña on Unsplash

This is the December 19 article for Wantedly 2021 Advent Calendar.

This is the fifth and final part of the series "Beautiful algorithms used in a compiler and a stack based virtual machine". In this series, I share algorithms, which I found interesting while implementing a compiler in Go by following two books, "Writing an interpreter in Go" and "Writing a compiler in Go" by Thorsten Ball. The specification of the Monkey language is explained in this page.

In this post, I will compare Monkey with Java. In the first section, I explain three similarities between them. You will notice that they have quite similar mechanisms, which is why I pick Java as a comparison of Monkey in this post. In the second section, I will describe the staff which Java has, but Monkey does not have.

For readers who would like to see the Monkey compiler and the VM, please take a look at my github repository kudojp/MonkeyCompiler-Golang2021. The contents of this post is based on Monkey at the commit 925793d.

§1. Similarities between Monkey and Java

In this section, I will introduce two similarities in the mechanisms of how each of Monkey and Java compiles and executes source code.

The first similarity is that original source code is compiled into the bytecode, and then it is executed in the virtual machine. The VM provides the runtime environment, which is aware of the specific instruction lengths of the processor and other particularities of the platform (Operating System)[1]. This brings the portability of the bytecode, that is, the compiler does not have to take into consideration the differences of the environment where the generated bytecode is executed.

The second similarity is that in the runtime, some values (e.g. string) are represented as the objects and the references to them are stored in the local stack. In the Monkey VM, objects are stored in the constant pool and their pointers are stored in the data stack. In the Java Virtual Machine, objects are stored in the heap memory and the references are stored in the thread stacks.

Since these two languages have quite similar mechanisms, I consider that Java is a suitable language for comparison with Monkey. I will list what Java has, but Monkey does not have in the following section.

§2. Monkey does not have, but Java has

In this section, I will list some of language features and internal mechanisms which are included in Java, but not in Monkey. These include both language features and the internal mechanisms.

Please note that each of these features have the trade off. For example, adding richer language features leads to poor performance in general. Introducing the static typing system will decrease the flexibility of the code. Under the explicit error handling paradigm, the source code may become verbose.

Support for various syntax

Java supports the loop statement, comment out, etc. These are not supported in the current version of Monkey.

Primitive Data Types

I menioned in the previous section that some values are represented as the objects and the references to them are stored in the local stack in Java and Monkey. This data type is called the object data type. In Monkey VM, all values are held in Object data type, while in Java, some are held in Object data types and the others are held in the primitive data type.

For example, an integer value is treated as primitive data type, either int or short in Java. This occupies exactly 16/32 bits of memory in the local stack. (Programmers select which type to use according to the estimated range of the value.) This saves the usage of both time and space, because it is not required to create the object, which takes more space, in the Java Virtual Machine.

In Java, there are six other primitive types: byte (8 bits), long (64 bits), float (32 bits), double (64 bits), char (8 bits), and boolean (1 bit).

Static typing system

In Java, types of variables in the source code are explicitly declared. This enables detecting many programming errors quickly, before the runtime. They are detected at the compile time, or even when programmers writing the code if one is doing it in the IDE (Integrated development environment). Also, type declarations serve as automatically-checked documentation. They make programs easier to understand and maintain[2].

Support for Object Oriented Programming

Java is categorized as an object oriented programming (OOP) language, where objects are instantiated from the classes.

In an OOP paradigm, business logic is defined in the methods of related classes, which brings modularity to the source code. The concept of the inheritance brings about the reusability of the logic by enabling a class to reuse the logic implemented in the parent class. The concept of polymorphism provides flexibility, because objects of multiple classes could be treated in the same way as long as they implement the same interface.

Explicit error handling paradigm

Java has a robust policy on error handlings. Each method explicitly declares which types of errors could be thrown from it, except for errors of java.lang.RuntimeException and its sub classes. These errros have to be caught somewhere in upstream caller methods.

This robust system guarantees that unexpected errors which halt the program do not occur in the runtime.

Support for multithread programming

The Java Virtual Machine can run programs in multithreads, whose each thread corresponds to the different OS thread. In the source code, the initializatino of a new thread corresponding to the construction of a new Thread object.

And even better, the standard library, java.util.concurrent, provides simpler interfaces for multithread programs.

Standard libraries

Java provides standard libraries, which are definitions for commonly used algorithms, data structures, and mechanisms for input and output[3]. Below, I will introduce some libraries which are good to know.

  • java.lang package provides classes that are fundamental to the design of the Java programming language. Classes including String, Exception, Runtime, System, and Thread are included in this package.
  • java.util.concurrent package provides utility classes commonly useful in concurrent programming. As aforementioned, this package provides the framework to write multithread programming simply. Executor interface, Future interface, ThreadFactory interface, etc, are included in this package.
  • java.io package provides for system input and output through data streams, serialization and the file system.
  • java.net package provides the classes for implementing networking applications. ServerSocket class, Socket class (the client socket), URI class, URL class, URLConnection class, etc, are included in this package.

Perfomant garbage collector (in the VM)

Garbage collection is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced[4].

The Java Virtual Machine runs the garbage collector as a daemon thread when the memory is low. In Monkey, I have not implemented the garbage collector by myself. Instead, the garbage collector embedded in Go, in which the VM is written, plays that role. Building the garbage collector which is customized for the Monkey VM may increase the performance at the runtime.

Support for the debugger (in the VM)

In Java, programmes can easily debug the program by the Java Virtual Machine working together with the debugger. In Oracle Java, the Java Debugger (jdb), which is a simple command-line debugger, is included in Java Development Kit. The jdb command launches a new Java Virtual Machine with the main class of the application to be debugged, or is attached to a Java VM that is already running[5].

JIT compiler (in the VM)

The Just-In-Time (JIT) compiler is provided as a part of Java Runtime Environment, that is responsible for performance optimization of java based applications at run time[6]. Instead of interpreting bytecode every time a method is invoked, the JIT compiler will compile the bytecode into the native machine language, and then invoke this object code instead[7].

In this process, the JIT compiler does simple optimizations including data flow analysis, and reduction of memory accesses. In the case of Oracle's Java Virtual Machine named HotSpot, it sets only the hotspots, which are executed often or repeatedly, as the target of the optimization. This leads to high-performance execution with a minimum of overhead for less performance-critical code[8].

... And tons of additional refinements and optimizations

§3. Summary

In this post, I compared Monkey with Java. Even though they have similar processes for source code execution, Java is more sophisticated in terms of language features and internal mechanisms. This is because the first version of Java was released in 1995, and the tremendous resource has been taken for the discussion and the developement. If you are intersted in sophisticating Monkey language, please create a PR in kudojp/MonkeyCompiler-Golang2021.

References

[1] "Is JVM platform independent?" http://net-informations.com/java/cjava/independent.htm

[2] "Advantages of Dynamic and Static type checking", an answer in stack overflow https://stackoverflow.com/questions/14368499/advantages-of-dynamic-and-static-type-checking

[3] "Standard library", Wikipedia https://en.wikipedia.org/wiki/Standard_library

[4] "Garbage collection (computer science)", Wikipedia https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

[5] "jdb - The Java Debugger", Oracle Java SE Edition https://docs.oracle.com/javase/7/docs/technotes/tools/windows/jdb.html

[6] "Just In Time Compiler", GeeksforGeeks https://www.geeksforgeeks.org/just-in-time-compiler

[7] "Just in Time Compilation Explained", freeCodeCamp https://www.freecodecamp.org/news/just-in-time-compilation-explained/

[8] "HotSpot (virtual machine)" https://en.wikipedia.org/wiki/HotSpot_(virtual_machine)

工藤 大暉さんにいいねを伝えよう
工藤 大暉さんや会社があなたに興味を持つかも