CPSC 411 - Lab Notes - 01-20

Lex

All students need to use lex.py from http://systems.cs.uchicago.edu/ply for their code.

Format of Lex Input

In other languages such as the lex for C, input for lex is divided into three sections:

  ...definitions...
  %%
  ...rules...
  %%
  ...subroutines...
  

However, in Python, the lexer is structured quite differently. More of that will be explained in lab 2.

In python, each of the tokens is associated with a regular expression, either by direct assignment such as in the simple case of a plus token:

           t_PLUS = r'\+'

or by placing it as the comment string of a routine that will return the token, such as for a number:

      def t_NUMBER(t):
          r'\d+'
          try:
             t.value = int(t.value)
          except ValueError:
             print "Integer value too large", t.value
             t.value = 0
          return t

This lab will look at regular expressions.

Regular Expressions

ExpressionMeaningExample
.Any character except "\n"
a,b,...Non special characters match that character ab;c matches "ab;c"
[] Any character in the brackets.
^ negates it when it is the first character.
- signifies a range if not the first character.
[abz] a single a or b or z
[^a-z] Anything except lc letters.
*0 or more of the preceding pattern a* - nothing, a, aa, aaa,...
+ 1 or more of the preceding pattern
? 0 or 1 of the preceding pattern. [0-9]? An optional digit
{n} n of the preceding pattern.
{n,m} n to m of the preceding pattern. [a-z]{3,5} All groups of three, four or five letters
{name} Refers to a name defined in the definitions section of lex
\ Escape character \* matches an asterik
() groups patterns ([ab]1?)? matches nothing, a, a1, b, b1.
| Either the pattern before or after. (if)+|5 matches multiple if's or a single 5
"..." Literally what is in the quotes. "\*" matches an backslash then an asterik
^ If the first character, matches beginning of the line
<> State in lex

A more complete online reference is available on the Python web site, for example, see the re module there.

An example of recognizing numbers.

We went through how to build up an example in class. We will arrive at a regular expression that recognizes such strings as 45, -34.928, 7e9 +23.348E-6.

We want to recognize...Regular Expression
A digit[0-9]
Many digits[0-9]+
An optional sign[-+]?
A whole number[-+]?[0-9]+
A fractional number[-+]?[0-9]*\.[0-9]+
Both whole and fractional numbers [-+]?[0-9]+|([0-9]*\.[0-9]+)
An exponent[eE][-+]?[0-9]+
A number[-+]?[0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?
Last modified by Brett Giles
Last modified: Sat Feb 22 15:12:04 MST 2003

Valid XHTML 1.0!