CB Login

ANTLR - language parser
Written by Edvin Eshagh   

The Definitive ANTLR Reference (ISBN-10: 0-9787392-5-6)
Another Tool for Language Recognition

Another Tool for Language Recognition

p3f
Domain Specific Language (DSL) - very high level language tailored to specific tasks.  Usages include data formats, configuration file formats, network protocols, text-processing languages, protein patterns, gene sequences, space probe control language, and domain-specific programming languages.  For example: NASA uses domain-specific command languages for space missions to improve reliability, reduce risk, reduce cost, and increase the speed of development.

p4
Recognition break into two similar but distinct tasks/phases.
1) Lexical analysis - operates on the incoming character stream, and breaks them up into tokens.
2) Parsing - operates on a stream of vocabulary symbols, called tokens, to recognize sentence structure.

Performing translation often means just embedding actions (code) within a grammar.

Some translation require multiple passes.  Rather than re parse the input characters for each phase, it is more convenient to construct an intermediate form (usually a tree data structure) to pass between phases. 
Abstract Syntax Tree (AST) - the intermediate phase; highly processed, condensed version of the input.
Emiter - Final phase that ultimately emits output using all data structures and computations from previous phases.

p5
Maxe Analogy.  Imagine a maze with a single entrance and single exit that has words written on the floor.  Every path from entrance to exit generates a sentence by "saying" the words in sequence.  In a sense, the maze is analogous to a grammar that defines a language.  In some case you may come to a fork, where the letter at each fork is the same.  Thus, you may have to look a head a few characters before you can make decision on which path to take.

p10
Download ANTLR from http://www.antlr.org/ and update your CLASSPATH environment variable to include antlr-*.jar and stringtemplate*.jar

---

Translators map input sentences to output sentences, and recognize many different sentences.


Lexical analysis consist of reading the input stream, character by character. 
Characters are combined and output as "tokens". 
if ( x > 312) { system.out.println( "hi" ); }
Tokens:  if, (, x, WS, >, WS, 312, ), {,

Tokens can carry additional information.  For example, token 312 will include type int.

Parsing - reading tokens and trying to organize them into a valid sentence in the language.
Parser can generate output based on the sentences it recognizes or preserve the structure in the form of an Abstract Syntax Tree (AST) which can be further processed.

An emitter - take the output of the parser and emit output based on all computations of the previous phases.
Emitter can use templates (document with hoes) that can be filled in.

ANLTRFileStream, ANTLRStream used for lexer (which the whole input stream must fit in memory).

Tokens point directly to character strings in the buffer rather than creating String for every stream

What are the two major parts of a parser?  What is the purpose of each part?
The lexical analyzer reads characters and organizes them into lexeme, producing tokens.  The syntax analyzer reads tokens and creates a parse tree that organizes the tokens and uncovers meaning.


Grammer1 lecture
Alphabet - finit set of symbols.   For example:  A= {a, b, c, d}    B={0,1,2}
String or word = finit set of symbols from alphabet.  i.e.  cab
Language - a set of strings from a language.  i.e.  L={cab, bad, dab, abc}

Language generated by a grammar is a set of all string of terminal symbols that are derived from S in zero or more steps.

Grammar G = {V, Σ, R, S} is a quadruple, where
     V = finite set of non-terminal variables
     Σ = finite set of terminal variables
     R = finite set of Rules.  The left side of each rule is a string of one or
            more elements from V U ∑ and  whose right side is a string of 0
            or more elements from V U ∑
     S = An element of V start symbol

 Consider the following "right hand rule" grammar:
G = {v, Σ , R, S }
V = {S, A}
R = S → aA
      A → b

 aA
aaA
aaab

a is a terminal variable
A is a non-terminal variable 
Grammar ends with terminal

 Grammar maybe either left hand rule or right hand rule (note: lower case letters denote terminals)

Left Hand Rule (RHR) 
or Left Linear
 Right Hand Rule (LHR)
or Right Linear
regular grammar
 A → xB
 A → x

 A → Bx
 A → x

 Either RHR or LHR

Give an example of a left-recursive BNF rule?
         <statements>    <statements> ; <statement>    |  <statement>

Describe the language generated by the following grammar:  <P> →  <P> x   | y
        Each string in the language consists the symbol y followed by zero or more x’s.

Describe the language generated by the following grammar:
     <P> → x <P> y   |  <A>
     <A> → a  | a <A>

Each string in the language consists of n x’s , n >= 0, followed by one or more a’s, followed by n y’s.

Kleen clouser is denoted by Σ*  
where it is a set of all concatenations of zero or more strings in Σ
{01}* = 010101
{0+1}* = set of all possible strings of 0's and 1's (plus sign denotes union)

Grammer2 lecture
Deterministic Finite Automata (DFA) has states, and a function for transition form one state to another.

DFA is a finite state machine where for each pair of state symbol there is one and only one transition to a next state.

DFA recognizes the set of regular languages; thus, it is equivalent to regular grammar.

Regular grammar, is either left or right linear

Then DFA is a 5-tuple {Q, Σ, q0, σ, A}f
Q= finite set
Σ = finite set of symbols
σ = function from Q x Σ to Q
q0 = a state in Q
A = subset of Q - accepting states

We call each element of Q a state,
σ the transition function, q0 the initial state, and A the set of accepting States.

Non-Deterministic finite state machine (NDFA) is a fine state machine where for each
pair of state and input symbol there are possibly two or more transitions to a next state.
Every NDFA can be converted to a DFA.

Requirements for generating complex languages:

  • State machines generate sentences
  • All the sentences generated are valid
  • There are dependencies between  words of a sentence
  • There are order requirements (a[i+3])

DFA can only generate the class of regular languages.

Programming languages belong to Context-Freee grammar.

Define the following terms associated with formal languages.  Construct an concrete example of each term:

  1. Symbol – a single character or glyph
  2. Alphabet – a finite collection of symbols
  3. String or word – a finite sequence of symbols from an alphabet
  4. Language – a set of strings from an alphabet
  5. Grammar - A grammar is a quadruple G = (V, Σ, R, S) where

1)  V is a finite set of variables (non-terminals),

2)  Σ is a finite set of terminals, disjoint from V,

3)  R is a finite set of rules.  The left side of each rule is a string of one or more elements from V U Σ and whose right side is a string of 0 or more elements from V U Σ

4) S is an element of V and is called the start symbol 

f.  Derivation - A derivation is a sequence of replacements , beginning with the start symbol, and replacing a substring matching the left side of a rule with the string on the right side of a rule

g.  Right-linear grammar  - A grammar G = (V,Σ, R, S) is right-linear if all rules are of the form:

                  A ==> xB
                  A ==> x

  where A, B ε V and x ε Σ*

h.  Regular grammar - A regular grammar is one that is either right or left linear.

i.  Deterministic Finite Automaton - Let Q be a finite set and let Σ be a finite set of symbols. Also let δ be a function from Q x Σ to Q,   let q0 be a state in Q and let A be a subset of Q. We call each element of Q a state, δ the transition function, q0 the initial state and A the set of accepting states.
Then a deterministic finite automaton (DFA) is a 5-tuple

      < Q , Σ , q0 , δ , A > .

j. Backus-Naur Form (BNF) - BNF is a metalanguage for computer languages.  It consists of a collection of productions or rules that indicate how a left-hand side non-terminal symbol can be replaced with right-hand side string of terminal and non-terminal symbols.  BNF acts as a language generation tool.  Beginning with a start symbol, a sequence of rule replacements, called a derivation, leads to a sequence of terminal symbols that represent a valid sentence in the grammar described by the BNF.

k.  fContext-free grammar -  A context free grammar is one that is described by BNF rules in which every left-hand side is a single non-terminal symbol.

Pushdown Automaton (informal):

  • Our generator needs more power – we need finite state machine equipped with a stack
  • The pushdown automaton (PDA) can use the top of the stack to decide which transition to take.
  • The pushdown automaton can manipulate the stack as part of performing a transition.

Sentences fall into a Tree Structure.  For example:  A book is not simply a sequence of words.  It has  chapters, sections, paragraphs, sentences, phrases, words.

Finite state machine (FSA) is like a stream of instructions within a single method, unable to make method calls.

Pushdown automaton can freely invoke other parts of the machine to recognize more parts of the language; provides nesting and sequencing

Ambiguous sentence is one that can generate more than one sentence via two or more distinct paths.

Languages are usually described formally with syntax diagrams and informally with descriptive language that disambiguates the sentences.

Symbols have structure:

  • Sentences are sequence of characters but your brain groups them into units.
  • Your brain constructs structure from sequence of symbols
  • The language processors we build will separate

Sentence recognition occurs in reverse. 
Scanning through the words you must reconstruct the structure.
 
Reading and Recognizeing with ANLR

/** Grammar to read the file */
File : . + // consume to eof
/** Grammar to recognize file*/
File:  LETERS DIGITS
LETTERS: ‘a’..’z’*;
DIGITS: ‘0’..’9’*;

ANTLR can target other languages besides JAVA, when generates code.
ANTLR can’t generate code for ALL grammars
ANTLR generates an LL recognizer.  Technically denoted as LL*
LL means recognize the input from Left to Right, tracing out a leftmost derivation.
LL recognizers use one method per grammar rule; which maybe recursive.
LL recognizers restrict the form of acceptable grammars.
For example:  Rules cannot be left-recursive:  

<expr>  : expr ‘++’ ;
Void expr() {
   expr();  //infinite loop
   match( ‘++’);
}
Right recursion is not a problem.  
<expr>  : ‘++’ expr  ;
Void expr() {
   match( ‘++’);
   expr(); 
 }

In LL, lookahead is too weak to handle a give grammar.
Lookahead is the number of tokens ahead in the input stream that the parser can see.
This is similar to the  number of symbols on the maze floor which help you which direction you can go.

Given a language, there are infinitely many grammars that describe it.
Ofter we alter a given grammar so that ATLR has an easier time generating the recognizer.
AN LL(3) grammar might be converted to LL(1) by refactoring

decl : ‘int’ ID (‘=’ INT) ? ‘;’
     | ‘int’ ID ‘;’
int x = 32;  // We need 3 token Look Ahead (LL3) to recognize the rule
int y;

Increasing the lookahead depth increases the power of the recognizer; however some grammars are not LL(k) for any value of K.  For example:

decl: modifier* ‘int’ ID ‘=’ INT ‘;’;
      | modifier* ‘int’ ID ‘;’ ;
modifier:  ‘static’ | ‘register’;

ATLR can implement LL(*) to solve the pervious problem in the grammar.
If LL(*) wont work, you can tell ANTLR to try the rule alternatives in a specified order.
ANLTR uses Memoization, which trades space for speed.
Backtracking over several rule alternatives means repeatedly invoking certain subrules.
The result of sub-rules inovocation are Memonized.
Not every language can be described using ONLY grammar rules.

Context-sensitive grammar - In C++, T(i) is either a function call, or a constructor-style type cast. 
xTy → xby – There exist complicated left side.
We want to keep it simple with left Non-Terminal symbol:
   T →  xT | y

For context-sensitive rules, ANTLR uses “semantic predicates”

Syntactic predicates provide a way to prioritize rule alternatives.  That is, if the current statement looks like a declaration, then it is, even if the matches the second alternative.

With LL(*), semantic predicates, and syntactic predicates, ANTLR can build recognizers for most languages.

We can definie a language via two approaches

  1. Recognition - a compiler - we can build a machine which reads a string and declares if it is or is not a valid string in the language.  A compiler is an example of such a machine
  2. Generation - grammar - a device is created to generate strings that belong to the language (grammar).  The language is effectively every possible string produced by the machine.  BNF is an example of such a machine 

Noam Chomsky Hierarcy (1950's) describes 4 classes of grammars:

1) Type 0 - unrestricted grammar - almost any collection of words
2) Type 1 - Context sensitive grammars - rules whose left hand side has a context on either left or right.
           xAy → A   where x and y are terminals.
3) Type 2 - Context free grammars - good for finding syntax of language.  Each rule or production has a left side consisting of a non-terminal.
4) Type 3 - Regular grammars - Good for defining tokens (say variables)

Context-free grammar - each rule of production has a left side consisting of single non-terminal.

Context-free grammar and regular grammars have applications in computing. 

Backus-Naur form (BNF) describe context free grammars.

metalanguage is a language used to describe another language.
It consists of nonterminal and terminal symbols (lexemes and tokens), and rules of production.
Sample left-deriveration BNF grammar:

<program> → begin <stmt_list> end
<stmt_list> → <stmt>
                   | <stmt> ; <stmt_list>
<stmt> → <var = <expression>
<var> → A | B | C
<expression> → <var> + <var>
                     | <var> - <var>
                     | <var>

Sample program:

<program> → begin <stmt_list> end
→ begin <stm> end
→ begin <var> = <expression> end
→ begin A = <expression> end
→ begin A = <var> + <var> end
→ begin A = B + <var> end
→ begin A = B + C

And Corresponding parse tree:

                                                      program
                                                 /        |           \
                                       begin      stmt_list     end
                                                          |
                                                        stmt
                                                      /    |   \
                                              var       =     expression
                                                |                   /    |    \
                                               A                var  +   var
                                                                  |         |
                                                                  B        C

Grammar lecture3

  • Sentential form - Each of the strings in a derivation of BNF notation.
  • Derivation order has no effect on the language generated by the grammar.
  • Derivation yields to Parse Trees.
  • Every leafe of tree corresponds to a terminal string.
  • Leftmost derivation - The left most non-terminal is always the one selected for replacement.
  • Derivation can be leftmost, rightmost, or neither.

Parse trees describe the hierarchical structure of the sentences (meaning) of the language they define.

Ambiguous grammar exist when two or more parse trees yield to the same sentence.  For example:

<assign> → <id> <expr>
<id> → A | B | C
<expr> → <expr> + <expr>
           | <expr> * <expr>

One tree says, you have to multiply before you add; and the other tree says you have to add before you multiply for the same sentence. 

Parse tree are used to determine the semantics of a sentences.
Ambiguous grammars lead to semantic ambiguity; thus, we can remove the ambiguity of a language. 

Why is ambiguity a problem for grammars that describe programming languages?  The parse tree provides meaning to the token sequence.  In a programming language we want each instruction to represent a distinct idea

Making the above grammar non-ambiguous. 
Here is left associative via left recursive rule of <expr>:

<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> +<term> | <term>
<term> → <term> * <factor> | <factor>
<factor> → (<expr>)  | <id>

Left recursive - BNF rule that has its left hand side appearing at the begining of its right side (Left recursion specifies left association). 

Addition and multiplication associate from left to right (recursive). 
Exponentiation is an example of right association.  For example:

<factor> → <expr> ** <factor> | <expr>
<expr> → (<expr>) | <id>

 Ambiguous if statement (because we don't know which which if the else associates with):

<stmt> → <if_stmt>
<if_stmt> → if <logic_expr> then<stmt>
                |  if <logic_expr> then <stmt> else <stmt>

Here is the sentential form.  Which if does else associate with...
if <logic_expr> then if <logic_expr> then <stmt> else <stmt>

Test with substituting the simple expression first, and the the more complicated one; and vise versa.
Here is a valid if BNF with only one parse tree:

<stmt> <matched> | <unmatched>
<matched> → if <logic_expr> then <matched> else <matched>
                   | any non-if-statement>
<unmatched> → if <logic_expr> then <stmt>
                     | if <logic_expr> then <matched> else <unmatched>

Extended BNF (EBNF) allows:

  • [...] denotes optional part.
    <selection> → if ( <expr> ) <stmt> [else <stmt>]
  • {...} indicate the enclosed part can be repeated indefinitely or left out  (zero or more).
    <ident_list> → <identifier> {, <identifier>
  • (...|...) Multiple choice is put in parentheses and separate by or operator |
    <for_stmt> → for <var> := <expr> (to | downto) <expr> do <stmt>
BNFEBNF

<expr> → <expr> + <term>
           | <expr> - <term>
           | <term>

<term> → <term> * <factor>
            | <term> / </factor>
            | <factor>

<expr> → <term> {(+ | -) <term> }
<term> → <factor {(* | / ) <factor> }

 

Java talk lecture:
Java goals: 
Portability - over network
Reliability - avoid crashes
Safety - environment protected from programming errors/malicious code
Dynamic Linking - distributed in part, loaded by JRE multi-threaded Execution
Simplicity and Familiarity to C++ and Web programmers
Efficiency - secondary goal.

Java design decision:

  • Interpreted - Java bytecode executed on Java Virtual Machine (Portability, Safety).
  • Type safety - 3 levels:  1) compile time checking, 2) type checking of byte code before execution, 3)Run-time checking (array bounds checking).
  • Objects and References - Not everything is an object (Comprise between simplicity and efficiency).
  • Garbage Collection:  Necessary for type safety; Simplify programming (avoid memory leaks); Uses concurrency (runs in background as a thread).
  • Dynamic Linking - Classes can be loaded incrementally as needed.   Shorter wait times for transmitted programs.
  • Concurrency Support - Model based on Threads - Standard concurrency primitives built into language.  Doesn't rely on OS specific concurrency mechanism.

C++ features not included in Java:

  • Structures & Unions (Classes take their places)
  • Functions (Java uses Static methods)
  • Multiple Inheritance (in Java everything is inherited from Object)
  • GoTo (Spaghetti)
  • Operator overloading (small bang for the buck)
  • Automatic coercions - complex and unsafe
  • Pointers - Reference variables are conceptually easier and less prone to programming errors.

All Java objects are explicit heap-dynamic variables (nameless).
Dog d = new Dog ("Fido"); // inside method
Fido object is explicit heap-dynamic
Variable d is stack-dynamic variable.
No destructors - objects are garbage collected when no references are made to them (when GC wakes up).

Static vars are initialized once with initialization expression or static initialization block inside class:
    public class Dog {
        public static int x = 3;
        static {/* code execute once when class is loaded */}
    }

finalize() - method called by Garbage Collection just before space is reclaimed.  Called by virtual machine when the virtual machine exits.

Four visibility distinctions for methods and fields:
1) public: accessible anywhere the class is visible
2) protected:  accessible to methods of the class and any subclasses, as well as other classes in the same package
3) private - accessible only in the class itself
4) package - accessible only to code in the same package.  Members declared without an access modifier have package visibility.

Packages used to organize classes into logical and physical units; a set of classes in a shared name space.
Package names match directory structures.  Package combined with CLASSPATH names define paths to classes.

Methods or classes can be declared final.  Final methods can't be overridden.  Final classes can't be sub-classed.

Object methods:

getClass()
toString()
Equals() - compares to objects
hasCode()
Clone()
wait(), notify(), notifyAll()
Finalize()

Abstract class does not implement all of its methods.  It can't be used to instantiate objects.

abstract class Shape {
   abstract int getSize();
   abstract void doNothing();
}

Java interface is a "pure abstract" class.
All interace members must be constants or abstract methods.
No direct implementation in interface is allowed.
Classes "implement" the interface by agreeing to code every method in the interface.
   public interface Speakable {
       public void speak();
  }

Classes can implement several interfaces.  This takes the place of multiple inheritance used. 
Interface can be used as the type argument for methods.

Java primitives:  boolean, byte, short, int, long, char, float, double
Java Reference types:  Class, Interface, Array

Java has no explicit pointer, rather objects are implicit pointers.

Java Lecture3

Class B extends class A; thus, B objects are subtypes of type of A objects.
Class can implement one or more interfaces.
(Multiple) interface subtyping allows objects to support (multiple) common behaviors without sharing common implementation.

Array Covariance
If B is a subclass of A, then array B[] is subtype of array A[]
Class A{...}
Class B implement A{...}
B[] bArray = new B[6];
A[] aArray = bArray; // OK since A[] <: B[}
aArray[0] = new A();  // allowed but causes run-time error - ArrayStoreException

 Four categories of Exceptions:
1) Code or data errors - bad array index
2) Standard method exception - substring() can generate StringIndexOutOfBoundsException
3) Programmer generated exceptions - build your own
4) JVM exceptions

Java forces you to handle "checked" exceptions.
Certain exceptions don't need to be handled (unchecked exceptions).
Support exceptions via: 1) throw a statement or expression for raising (throwing an exception.  2) try-catch to handle some code to respond to exception.

Checked exception that might occur must be named in a throws clause.
public void foo() throws IOException.

Error and RuntimeExceptions are usually thrown by the operating system and are exempt from being listed in the throws clause.

Java Lecture4

Subtype polymorphism provides a means for writing generic programs.
Every subtype of A "is-an" A object

Generics (parametrized types) is similar to C++ template.

Java Virtual Machine
Java compiler produces "bytecode" in a .class file.
The class file contains bytecode and symbol table
Class loader reads the class file and arranges the bytecode into memory
Class verifier checks that the bytecode is type correct
Linker resolves interfile references
Bytecode interpreter "executes" the bytecode

Java Loader:
1) Loads classes incrementally into memory when needed.
2) Customized ClassLoader objects can be defined

Verifier makes sure:
1) Every instruction has a valid op-code
2) Every branch instruction branches to the start of an instruction
3) Every method has a structurally correct signature
4) Every instruction obeys the Java type discipline
The verifier examines a Java program before it is executed and makes sure that every instruction has a valid op-code,  every branch instruction branches to the start of an instruction,  every method has a structurally correct signature, and every instruction obeys the Java type discipline.  Java was designed to be a language in which programs would be transmitted over the internet and run on many different machines.  The verifier is one mechanism that Java uses to make sure this process is safe

Interpreter:
1) Execute java bytecode
2) Performs run-time tests like index checking on arrays
3) Runtime architectures includes program counter, instruction area, stack and heap
4) Stack contains activation records containing local variables, params, return values, and intermediate calculations for method invocations.

Interpreter:
1) JVM has no registers.  Intermediate values left on stack (not clear if underlying machine has registers)
2) Objects stored on the heap
3) All threads running of the same JVM share the same heap
4) New threads are given a program counter and their own stack.

Activation records have three parts:
1) Local variable area - for local method variables
2) Operand stack (within a stack) for intermediate calculations and passing params to other methods.  Instructions are shorter since they implicitly reference the stack
3) Data area - constant pool resolution, normal method return, exception dispatch

Interpreter: Constant pool
1) Bytecode contains a data structure called Constant pool
2) Symbolic names - fields, classes, methods
3) Each entry is number for ease of reference
4) Bytcode instructions reference constant pool numbers

Interpreter: runtime tests
1) all casts are checked
2) All array references are checked
3) Check references are not null before utilized
4) Garbage collection and absence of pointer arithmetic contributes to type-safe execution

Interpreter: bottleneck
1) Bytecode references to a field or method causes table lookups for addresses that can be a bottleneck for concurrent programs
     getfield #5 <Field Obj var>
2) Bytecode references are modified dynamically during execution with instructions that a have direct addresses.
     getfield .... quick 6
3) Reuse of the instruction is more efficient.

What is the Java heap and how do variables end up there?
The Heap is a Java virtual machine data structure where objects reside.  Objects are created on the heap whenever they are instantiated by coding “new”.  For example, if you code Person p = new Person(“David”), a Person object is created on the heap

Method invocation in JVM
1) Instance is required before method is called; using late binding
2) Static method invocation uses early binding and no instance is required.

There are four bytecodes for invoking methods in JVM
1) Invokevirtual - used when the superclass of an object is known at compile time
2) Invokeinterface - used when only the interface of the object is known at compile time
3) Invokestatic - used to invoke static methods
4) Invokespecial - special cases

 Invokvirtual causes a method to be selected from subclass method table based on the runtime type of the object.
This requires a lookup in the constant pool.
After the first lookup, the bytecode is modified to avoid table lookup by inserting an offset to the method in the bytecode (dynamically changing bytecode for efficiency).

Invokeinterface - similar to Invokevirtual except the methods being invoked on an object declared with an interface name may be in different classes and positions wihtin the class.
Information that was determined during table look-up is preserved, but will not be correct if the next invocation occurs on an object of a different class

 Two main mechnaism for mobile code within java:
1) Sandboxing - running the code in a restricted execution environment
2) Code signing - verifying that the digital signature of the file producer is trusted

The sandbox consists of four Java mechanisms: class loader, verifier, runtime checks of the JVM, and the security manager help prevent Buffer Overflow Attack.

Class Loader separates trusted class libraries from untrusted packages by using different class loaders.
Class loader places code into catagories that let the security manager restrict the actions the code will be allowed to take
Seperate name spaces from classes loaded by different loaders

Security Manager
Single Java object, eeps track of which code cand do which dangerours operation. 
Each JVM has only one security manager at a time.
Security manager can't be uninstalled.  It is there to answer questions regarding permissions.
APIs always check against the security manager to make sure the operation is allowed.
Security manager uses the code signer and URL to determine if the operation is valid; otherwise, SecurityException is thrown.
Java enforces type safety of operations for a memory location.

What is the Java stack and how do variables end up there? T
he Java stack is also a Java virtual machine data structure that is used to store “local variables” .  These are primitive variables and reference variables that created whenever methods are entered.  For example if you code Person p = new Person(“David”),  a local reference variable p is created on the stack.  If you code int x = 5; inside a method, a primitive variable x is created on the stack.

What is the major difference in how classes are supported in Java and C++? 
In Java all classes are derived from a single “granddaddy” class called Object.  In C++, classes completely unrelated by inheritance.  Additionally, Java does not support multiple inheritance like C++.

What is a Java Interface? 
A Java interface is an abstract type consisting of method signatures and possibly some constants.  Classes declare they will implement a given interface.  This effectively establishes a contract that the class must satisfy by defining all the methods that the interface references.  Any object of a class that implements the interface can be referred to using a reference variable of the interface type. 

What is the purpose of an interface?   
Interfaces allow you to organize sets of classes by functionality.  Interfaces obviate the need for multiple inheritance – a class can inherit one class and implement many others.

What is a Java package?  What is the purpose of a package?   
A Java package is an way to physically organize Java classes into groups that logically belong together.  These classes and there physical locations can be made available by importing entire packages into a program.

Why do you think the designer of Java decided to support garbage collection while the designer of C++ did not?   
One of the design goals of Java was “safety”.  By providing garbage collection, Java programs are less likely to crash because of memory leaks than in a language like C++ in which the programmer must remember to reclaim the space for all objects explicitly.  This is not a perfect system as Java programs can leak memory as well.   The design goal in C++ was flexibility and speed.  Garbage collection takes time and often occurs when it is inconvenient.  In C++ the programmer controls when the space for variables is recovered.

 

Language Implementation Methods:
Interpretation:  occurs when the code that represents a program is read and causes an action or change in state of the executing machine.
Interpretation can occur in hardware or software.

Implementation of languages:

  1. compilation (c, c++) - translated into machine code which is interpreted in by the hardware. 
    Characterized by slow transactional and  fast Execution.
  2. Pure Interpretation (vbscript, javascript) -  The soruce code is unchanged.  An interpreter program executes reading your source as data to determine how to "execute" your program. 
    Characterized by slow execution and very flexible.
  3. Hybrid System (java, c#, .Net) - The source code is partially translated to an intermediate format.  This intermediate code is  input to an interpreter. 
    Characterized by small translation cost and medium execution speed.

Describe how a compiler works.  Draw a picture of the major components of a compiler and describe how it is organized?

The lexical analyzer reads character and organizes them into tokens which become an input stream to the syntax analyzer.  The syntax analyzer organizes sequence of tokens into phrases and sentence by constructing parse trees (either physically or logically).  These structures carry meaning.  The intermediate code generator produces a version of the program in an intermediate form between source and machine language.  This code is subject to optimization by the Optimizer.  A code generator takes the intermediate code and generates machine language for a target machine.  The major data structure for this process is the symbol table which contains symbols and information about those symbols (like type, size, location) that are discovered by the lexical and syntax analyzers.  This information is used later by the code generators to produce intermediate and machine

 

Describe how a language can be implemented with pure interpretation?

In pure interpretation the source program remains unchanged.  The interpreter is a program that runs and reads the source as input, interpreting each statement and calling routines to simulate execution of each command.  The source program statements may be read many times in the case of a loop.  This provides great flexibility but at the cost of some speed of execution.

Describe how a language can be implemented with a hybrid implementation system like the one Java uses.  Draw a picture and describe the major components and how they are organized?

The lexical analyzer reads the source program a character at a time, organizing the characters into lexemes and outputting tokens.  The token stream is input for the syntax analyzer which organizes the tokens into a parse tree (either physical or logical).  The intermediate code generator reads the parse tree and generates an intermediate form of the program which is then read as input by the interpreter.

Scheme Lecture
Compiled language flow:
Source Code -> into compiler outputs -> Object Code feeds into-> linker uses object code & other object modules (libraries) -> output the executable codes

Interpreter language flow:
source code -> interpreter reads (over and over) and executes it

Hybrid implementation flow:
Source Code -> goes into translator & generates -> Byte Code -> intrepreted

Dr. Scheme intrepeter:
http://www.drscheme.org

text after semicolons are comments in scheme
Language->Essentials of Programming Languages

Data values are allocated on the heap
Objects are "first class" - can be passed in and out of procedures
Variables are lexical scoped - you can see where variables is used 
      (define x 3)
      (let ((x 2) (y 5)) (+ x y))      ; list of pair assignment where x and y in this statement are within this scope only.

Programs are blocked structures.  Below is the foo function by using lambda expression:
      (define foo
         (lamda (x)
            (+ x x)))

      (foo 4)

Blocks can be nested
      (let ((x 2) (y 5)
            (let ((x 6)) (+ x y ))) ; inner most declaration shadows the outermost declaration

Procedures can be defined at the top level within another block
      (define incr (lambda (a) (+a 1)))  ; call (incr 3) to get 4

      (let ( (incr (lambda (a) (+ a 1) ) ))  ; let starts with an association pairs.  Associate incr with lambda
            (x 3))                                   ; and x with 3
           (incr x ))                                ; value of let is always the last expression.

Procedures are first class objects
      (define square (lambda (x) (* x x )))
      (define double (lambda (x) (+ x x )))
      (define apply (lambda (fn p) (fn p)))
      (apply square 3)  ; equals 9
      (apply double 4)  ; equals 8

Procedures can be unnamed:
      ((lambda (x y) (+ x y )) 3 4)  ; did not name lambda expression
vs
      (define add (lambda (x y) (+ x y ))) ; named expression
      (add 3 4)

Schemes support recursive functions
      (define factorial
            (lambda (x)
                  (cond ((= x 1 ) 1)                    ; collection of predicts; if x==1 return 1
                  (else (* x (factorial (- x 1 ))))))  ; else return x * factorial(x-1)
      (factorial 100)

Scheme programs are made up of keywords, variables, structured forms, constant data (numbers, characters, strings, quated vectors, quoted lists, quoted symbols, etc), whitespace, and comments.

  • Keywords, variables, and symbols are collectivesly called identifiers.
  • Identifiers include:  a-z, A-Z, 0-9, ? ! . + - * / < = > : $ % ^ & _ ~
  • Identifiers normally cannot start with any character that may start a number
  • No inherent limit on the length of a Scheme identifier
  • Strcutured forms and list constants are enclosed within parentheses.  i.e. (a b) or (* (-x 2 ) y )
  • The empty list is written as ()
  • Boolean values representing true and false are written as #t and #f
  • Vectors are preceded by #( and terminated by ).  i.e.  #(a b c d)
  • Strings are enclosed in double quotation marks.  "This is a string"
  • Characters are proceded by #\. #\a.
  • Numbers may be written as integers, ratios, floating-point, or scientific notation, or as complex numbers in rectangular or polar notation.  For example:
    123, 1/2, 1e23, 1.5+2.2i, -3, 4@45 
  • comments appear between semicolon and end of a line
    Predicate names end in a question mark (?).  eq?, zero?, and string=?
    Common numeric comparators =, <, >, <=, and >= are exceptions to the above rule.
  • Most character, string, and vector procedure names start with prefix, char-, string-, and vector-.  For example:  string-append
  • Names of procedures that convert an object of one type into an object of another type are written as type1->type2   For example:  vector->list
  • Scheme uses dynamic runtime typing.
  • Names of procedures and syntactic forms that cause side effects end with an exclamation point (!).  set! and vector-set!
    (define x 5)
    (set! x 4)
    x
    (set! x "abc")