Quick Course Overview

This course focuses on the theoretical and practical aspects of programming language implementations. Emphases will be placed on the projects to construct an interpreter for a subset of the Postscript page description language, and a compiler for the Trend high level cellular automata programming language. The compiler projects will be targeting the Java Virtual Machine. Important topics that will be covered in this class include lexical, syntactic and semantic analyses; syntax-directed translation; implementation of interpreters; LALR parsing and automatic parser generators; semantic actions, code generation and optimization; runtime support libraries and debugging tools. Similar projects with different difficulty levels will be assigned to 440 and 540 students.

Course outcomes

Upon completion of this course, students will possess the following general skills and knowledge:

  • Thorough understanding of the overall architecture of a modern compiler.
  • Being familiar with both top-down and bottom-up parsing paradigms.
  • Fluent with syntax-directed translation scheme and different compiler-compilers.
  • Knowledgeable with assembly language and code-block based code generation scheme.
  • Knowing the inner details of compilers, libraries, operating systems/platforms, and how they interact with each other to form modern computing environments.
  • The experience and confidence of having developed a major software system with thousands of lines of code based on four years of CS training.

Class place and time

Tu Th 11:00-12:20 Science II 119

Instructor

Hui-Hsien Chou
Email:
Office: 507 Science II, 294-9242
Office hours: Wed 11-12

Teaching assistant

Yaping Jing
Email:
Office: 115 Atanasoff, 294-9757
Office hours: Tue 4-5, Fri 1-2

Class resources

Class information web site: http://www.cs.iastate.edu/~cs440

You should periodically check the class web site for class schedule updates, lecture notes, and project assignments. Most documents are readable directly within a web browser except project assignments, which will be released in Adobe Potable Document Format (PDF).

Class discussion web site: https://webct.its.iastate.edu

Questions related to this class should be posted to WebCT discussion boards. The Instructor and TA will address your questions there. Everybody should look at WebCT discussion frequently to see if some of your questions have already been answered. Please ONLY send emails to the Instructor or TA for personal issues that cannot be discussed on WebCT.

Class sample code and library files at class account ~cs440 which can be accessed through your own CS account.

Textbooks

These two books are required to complete the projects and prepare for the two exams:

Compilers: Principles, Techniques, and Tools (2nd Edition), by Aho, Lam, Sethi and Ullman. Addison-Wesley Publishing Company, 2006. ISBN 0-321-48681-1. Please check the book errata to be used with the book.

Programming for the Java Virtual Machine, by Joshua Engel. Addison-Wesley Publishing Company, 1999. ISBN 0-201-30972-6. Please download the book errata to be used with the book.

References

These other books are optional but can be helpful when you wish to learn more about specific topics:

Lex & Yacc, by Levine, Mason and Brown. O'Reilly & Associates, 1992. ISBN 1-565-92000-7

The Java Virtual Machine Specification, 2nd Edition, by Lindholm and Yellin. Addison-Wesley Publishing Company, 1999. ISBN 0-201-43294-3. Also available from Sun's website.

The Java Language Specification, 3rd Edition (The Java Series), by Gosling, Joy, Steele, and Bracha, 2005. ISBN: 0-321-24678-0. Also available from Sun's website.

The Jikes Java compiler, in case you plan to write your compiler projects in Java, http://jikes.sourceforge.net/. Sometimes, it's both fun and useful to use completely different Java compilers and compare their bytecode output as a learning method.

Gnu Document Online, available from http://www.gnu.org/manual/manual.html. This Internet web site provides documents for many software tools you are most likely going to use for your projects, which include Make, Flex, Bison, GCC, GDB, Gnu C Library, Emacs, among others.

Exams

Midterm: Tuesday, March 10, 11:00-12:30.
Final: May 5,
9:45-11:45.

Disability statement

If you have a documented disability and anticipate needing special accommodations in this course, please contact the instructor. You may also need a Disability Resources Staff send a SAAR form to us verifying your disability and specifying the accommodation you will need.


Detail Course Description

Compilers and interpreters are among the most widely used tools in software development, i.e., they are the foundation tools for building other software ranging from operating systems to application programs. It is essential for a computer scientist to understand the process by which programs written in high-level languages are translated into the machine code and executed. Even though full-fledged compiler writing is no longer the duty for many programmers, compiler techniques are generally applicable in many modern software systems. Understanding the compiling process enables us to write powerful software that conforms to the compiler technology and utilizes the target hardware more efficiently. The main objective of this course is to provide a hands-on understanding of the compilation process through the project implementations of small but fully functional interpreters and compilers.

In this course, we will cover the principles and practice of developing compilers and interpreters. We will begin with an overview of the language translation process including steps such as keyword recognition, syntax analysis and semantic action. At the end of this initial introduction, students are going to implement an interpreter for a subset of the famous Postscript page description language, to visualize pages described in that language.

After the introduction, the various modules of a contemporary compiler will be formally taught in detail , including the following:

  • Lexical analysis (to recognize keywords in the source program),
  • Symbol table (to keep track of information gathered from the source program),
  • Syntax analysis (to understand the source program description),
  • Semantic analysis (to tell if the source program is meaningful),
  • Code generation (to turn the source program into machine code),
  • Code optimization (to generate more efficient machine code), and
  • Runtime library (to support the execution of the compiled code).

A simple but nontrivial compiler will be built incrementally based on what we will learn in the class. Our target machine for this compiler will be the popular Java Virtual Machine (JVM). It has the benefits of being standard, portable, flexible and widely available on most platforms. Our source language of choice is a high level cellular automata programming language Trend. We will introduce this language in the class, so it is ok if you have never heard about it before. Note that the Java Virtual Machine will just be used as a target machine; we are not going to write a compiler for the Java language nor must you use Java to develop your project code. Knowing the Java programming language will help you understand the JVM architecture more easily, but it is not essential. Some issues regarding the design and implementation of runtime support libraries for the Trend language under the JVM will also be addressed in the class.

You may choose C, C#, C++, or Java to implement most of your projects except project 4 which is better implemented in Java directly. We will not accept projects written in the other programming languages. Note that for a project to be accepted for grading you need to make sure that it can run on CS Department lab machines, i.e., one of the lab computers in Pearson Hall. Project 1, the Postscript interpreter, will be a standalone project by itself. All the rest of the projects incrementally build the Trend compiler. Project 2 will create the first module, a lexical analyzer and a symbol table manager. Following that, we will develop modules for each phase of the compilation process as outlined above. Advanced topics such as code optimization, dynamic memory management or error recovery will be discussed in the class, but will not need to be implemented in your projects.

Course benefits

This course will expose you to several useful problem-solving techniques. The field of programming language translation has been at the core of computer science since the advent of modern computers. Having implemented a complete compiler will set you apart from the other software engineers. A modern compiler is a large piece of software with deep theoretical underpinnings coupled with state-of-the-art software engineering practices. When you are building a compiler all by yourself, you will gain the insight of how your previous training in fundamental computer science techniques, such as algorithms and data structures, can really help you put together a complex but elegant piece of software.

As computer scientists, many of you will go on to design and develop large, complex software systems in the future. This course can also serve as a very useful training on how to tackle a large programming project by divide-and-conquer techniques. After this class, you will also have hands-on experience with the Java Virtual Machine. As many of you might know, modern Internet techniques are built on many virtual machines, e.g., Javascript VM, .NET Framework, Flash VM, and Java VM. Knowing how the compiled Java bytecode works inside a Java Virtual Machine will help you understand the other virtual machines, and definitely make you more marketable in the future.

Grading

There will be a midterm and a final exam; each will make up 25% of your semester score. The remaining 50% of your score will be divided among the six programming projects. The weight of each project is determined based on the difficulty level of the project, as given in the Project Page. Homeworks may be assigned in the class but will not be graded or counted toward your final score. This does not mean that homeworks are optional; they will be a significant source of midterm and final exam questions! Therefore, you want to complete the homeworks, even if you don't need to turn them in.

Since your performance on the programming projects will critically determine your final score, you are advised to start working on projects at the earliest possible time after they are assigned. The date each project will be due is clearly marked and will be automatically controlled by our project submission system. Late projects will be penalized at the rate of 5% for the first day and 10% per day thereafter. Because we will be building a compiler incrementally, later projects will depend on earlier ones to function properly. If you omit any project you cannot possibly finish the whole compiler. Therefore, although it may sound pointless to submit a project more than 11 days late, you still need to complete it unless you wish to give up all subsequent projects.

We will take a normalization of everybody's scores at the end of the semester, and adjust the final scores accordingly. Your final grade will then be determined by the following table:

  • A: 90% and above
  • A-: 87% and above
  • B+: 84% and above
  • B: 80% and above
  • B-: 77% and above
  • C+: 74% and above
  • C: 70% and above
  • C-: 67% and above
  • D+: 64% and above
  • D: 60% and above
  • D-: 57% and above
  • F: all other scores, or cheating in anyway

Academic integrity

Any academic dishonesty, including but not limited to, exchange of program code, plagiarism of code, fabrication of results, cheating during exams, sabotaging others' efforts, etc., will all result in a failing F grade. Based on University rules, all cases will be referred to the university academic integrity committees for additional disciplinary actions.

General discussions of the projects at a conceptual level, including for example the use of any particular software tools, help in debugging, clarification of project description, etc., are allowed and encouraged. The preferred forum for this kind of communication is through the public class discussion boards on WebCT.

Additional tips

Do not procrastinate working on projects! Start working on projects as soon as they become available. Do not wait any moment or it may be too late to finish our complex projects. You will also feel that projects are much harder when you are working under the deadline pressure. This cannot be overstated enough, do not postpone working on your projects!

Be in touch with the information. Check the class web site and discuss boards regularly to see if there are updates to the project description or hints about the projects. Also check if others have already asked the same questions you have before you post yours to the discussion boards .

Design before you code. Writing a well thought-out piece of code is easier than starting with scratchy code and "patching while you go". You should have an overall picture of what you are going to do before you start each project. Otherwise, you may end up spending many hours debugging your code or having to rewrite most of your code in a different way.

Follow good programming practices. Trust what you have learned in the class, structure your programs, make good use of subroutines and data structures, learn how to use symbolic debugging tools, be familiar with your chosen programming environment, and never feel embarrassed to ask questions if you are in doubt.

Test thoroughly before you submit your projects. Note that it is how the TA sees your projects that will determine your score. There is no point arguing with the TA if your projects fail his tests. Remember, all projects have to work correctly on CS department machines for grading purposes. So test and test again to make sure your programs function as specified in the project description before you submit them. We strongly suggest that you write your own test cases based on the project descriptions. You can submit your projects as many times as you wish before the due day, and continue to submit thereafter with a penalty.