Writing a Compiler for Verilog – part 1

by Mithrandir

This year, I took a class named Numerical Computers II (I had Numerical Computers I last year). The course was about Computer Architecture and things like this. However, I ended up with a very fascinating homework: to write a compiler for Verilog in a team of 2 to 4 people. The deadline is coming soon (I have to finish it by Friday) but I need a break so I’ll post here my story.

The first part is always the hardest. We decided to write it in Python because the development speed is high enough and we could avoid possible dead-ends. C wasn’t accepted because, in the end, we would like to create a GUI for this project using GTK. PyGTK is very attractive and weighted in favour of choosing Python.

After this, I started reading the Dragon Book. I learned that there are several stages in the compiling process. A good description of them is here, though I wouldn’t recommend using those articles for their implementation.. Then, I looked on the internet to see where could I find a starting point.

I’ve come to this post which pointed me to use PLY in the project. More posts about PLY exists in various parts of the internet: either for C or Pascal. Someone even used it to parse chemical formulas.

Reading those articles, I realized that PLY is a must to the project and stopped looking for lex and bison alternatives for Python.

Now, the trickier part was coming. I needed a reference to the Verilog grammar, preferable in a BNF format. I had a book from Numerical Computers I which described the grammar but in a non standardized way, thus useless. I came across this link and started implementing after it. When in doubt, I used this Verilog reference. Today, I came across another description of the grammar, more recent. The second one is used whenever the first one is ambiguous (and it is in some places).

Now, I could start the project. Right now, it is divided into several parts, the first two being a lexer and a parser. The lexer is used to read the source file and convert each structure to tokens, depending on their meaning. The parser will read this token list and will generate a tree-representation of the input.

But I will describe them on another post.

About these ads