Writing a JVM Compiler – Parsing

Parsing is used to check whether a string matches a regular expression. If there is a match then the string is tokenized and the data is extracted to be stored in the memory.

Example variable definition

example1: int = 10

mut example2: int = 20

Parsing for the variable definition will ignore the “= 10” and “= 20”. Other parsers such as an “AssignmentParser” and “IntegerParser” would be used for these. We are going to focus on the “example1:int” and “mut example2: int” sections.

Parser – Abstract class

First is an abstract parser class to define the structure of all parsers. All parsers must inherit from this class.

// Parser.scala
abstract class Parser[T <: Block] {
* A list of all regular expressions
* @return list of regexs
def getRegexs: List[String]

* Whether the regex is contained within the string
* @return true if exists
def shouldParse(line: String): Boolean = getRegexs.exists(_.r.findFirstIn(line).nonEmpty)

* Take the superBlock and the tokenizer for the line and return a blocks of this parsers's type.
def parse(superBlock: Block, tokenizer: Tokenizer): Block

The “shouldParse” method loops through all regular expressions contained within the parser and returns true if it is contained within the string.

The “parse” method is used to step through the tokenized string to extract all required information. E.g. variable name and type.

The “Block” class that is returned is used to store the extracted information in memory. You can create a  stub class for now.

Define Variable Parser – Extends Parser.scala

To start we will create a parser that will be looking for a variable definition.

// DefineVariableParser.scala
class DefineVariableParser extends Parser[DefineVariableBlock] {

* A list of all regular expressions
* @return list of regexs
override def getRegexs: List[String] = List(
"(mut[ ]+)?[a-zA-Z][a-zA-Z0-9]*[ ]*:[ ]*[a-zA-Z][a-zA-Z0-9]*[ ]*"

def parse(superBlock: Block, tokenizer: Tokenizer): DefineVariableBlock = {

   val nextToken: String = tokenizer.nextToken.token

   val mutable: Boolean = nextToken == "mut" // mutable/immutable
   val name: String = if (mutable) tokenizer.nextToken.token else nextToken // Variable name
   tokenizer.nextToken // skip ":"

   val varType = tokenizer.nextToken.token // Variable Type

   new DefineVariableBlock(superBlock, mutable, name, varType)

There first is a check to see if the “mut” keyword is used. If so “mutable” is set to true. If not then the variable name is extracted.

Next the “:” is skipped over.

Lastly the variable type is extracted.

The information is then passed as arguments to the Block class.


We have now defined a parser with a “shouldParse” and “parse” method. This will allow for us check what the string represents, and if matched will allow for us to extract all relevant data.

Source Files

To access the complete files access the structure/parser directory within my project.



One thought on “Writing a JVM Compiler – Parsing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s