Phases of compiler

  • Compiler
    • Compiler is a simple program that takes as input a program written in High level language (Source language) and produces as output an equivalent program in Low level language (Object language or Target language).
  • Diagram
  • Phase categories
    • Phases of compiler can be categorized into
      1. Analysis phase
      2. Synthesis phase
  • Analysis Phase
    • Analysis phase analyzes the source program and performs lexical, Syntactical and semantic actions.
    • It can be categorized into following sub phases
      • Lexical analysis
      • Syntax analysis
      • Semantic analysis
  • Lexical Analysis 
    • Breaks the source language into small pieces called Tokens.
    • Each token is single atomic unit of the language such as keyword, identifier, or symbol name.
    • Method is known as Lexing or Scanning.
    • Software doing lexical analysis is called Lexical analyzer or Scanner
  • Syntax Analysis 
    • Involves parsing the token sequence to identify the syntactic structure of the program
    • Replaces linear sequence of tokens with tree structure known as Parse tree based on language grammar.
  • Semantic Analysis 
    • Adds semantic information to the parse tree and builds symbol table
    • Performs semantic checks such as
    • Type checking to check type errors
    • Object Binding to associate variable/function reference with their definitions
  • Synthesis Phase
    • Constructs the desired target program.
    • It can be categorized into following sub phases
      • Intermediate code generation
      • Code optimization
      • Code generation
  • Intermediate code generation 
    • Generates machine independent language that is close to machine language
    • The purpose of this step is to allow the compiler writers to support different target computers and different languages with a minimum of effort.
    • Also known as Intermediate Representation (IR)
  • Code optimization 
    • An optional phase which transforms intermediate code into functionally equivalent but efficient form either in terms of time and/or space.
    • Popular optimization techniques are
      • Dead code elimination
      • Loop transformation
      • Inline expansion
  • Code generation 
    • Converts Intermediate Representation of source language into output language (e.g. Assembly language) that can be readily executed by machine
  • Symbol table manager 
    • When an identifier in the source program is detected by the lex analyzer, the identifier is entered into the symbol table.
    • Collected information about each named object in the program is used by all the phases as and when required.
  • Error handler 
    • Each phase can encounter errors.
    • The syntax and semantic analysis phases usually handle a large fraction of the errors detectable by the compiler.