Skip to content

Lexical Conventions

Rui Ventura edited this page Apr 14, 2017 · 8 revisions

For each group of lexical elements (tokens), only the longest sequence of characters that makes up a valid element is considered.

Blank characters

Considered to be separators, they don't represent any lexical element: line feed ASCII LF (0x0A, \n), carriage return ASCII CR (0x0D, \r), space ASCII SP (0x20, ˽), and horizontal tab ASCII HT (0x09, \t).

Comments

There are two types of comments, the two of which also function as separating elements.

  • explicative -- start with // and end at the end of the line, and;
  • operational -- start with /* and end with */, which can be nested. If the beginning sequences are a part of a string literal, they don't start a comment (see string definition).

Keywords

The following words are reserved and do not constitute identifiers (should be written exactly as indicated):

  • Literals: int, real, string, null
  • Function: procedure (see functions)
  • Scope: public, use
  • Conditional: if, elsif, else
  • Iteration: while, sweep
  • Other control statements: next, stop, return

The xpl identifier, not being a reserved word, designates the main function.

Types

The following lexical elements designate types in declarations (see syntax): int (integer), real (real), string (string).

The types that correspond to pointers wrapped around by [ and ] (see syntax).

Expression operators

Operators are lexical elements showed in the expressions' definitions.

Delimiters and terminators

The following elements are considered to be delimiters/terminators: , (comma), ; (semi-colon), ! and !! (printing operations), and ( and ) (expression delimiters).

Identifiers (names)

Can start with a letter or _ (underscore), followed by 0 (zero) or more letters, digits or _ (underscore). Names are case-sensitive and their length is unlimited.

Literals

Notation for constant values of some of the language's types (do not mistake them for constants, i.e., identifiers that designate elements which the value can not be changed throughout the execution of a program).

Integers

An integer literal is a non-negative number. Negative numbers are built by the application through the usage of the unary negation (-) on a positive literal.

Decimal integer literals are made up of sequences of 1 (one) or more digits from 0 to 9, where the first digit can not be a 0 (zero), except for the number 0 (zero). In this case, it is composed of the digit 0 (zero) (in any base).

Hexadecimal integer literals always start with the sequence 0x, followed by one or more digits from 0 to 9 or the letters from a to f (case-insensitive). The letters from a to f represent the values from 10 to 15 respectively. Example: 0x07 (the number 7).

If the integer literal can not be represented due to architecture limitations, due to an overflow, a lexical error should be generated.

Floating point reals

Real literals are expressed just like in C.

A literal without a . (decimal point) or exponencial part is of type integer.

Examples: 3.14, 1E3 = 1000 (integer represented in floating point), 12.34e-24 = 12.34 × 10⁻²⁴

Strings

Strings are delimited by double quotation marks (") and can contain any characters, except for the ASCII NULL (0x00, \0). Comment delimiters in strings have no meaning. If a string literal is written with a \0, then the string ends at that position. Example: "ab\0cd" is the same as "ab".

It is possible to designate characters by special sequences (starting with \), specially useful when no direct graphical representation exists. The special sequences are made up of the characters ASCII LF, CR and HT (\n, \r and \t, respectively), double quote (\"), backward slash (\\), or any other characters specified through the use of 1 or 2 hexadecimal digits (e.g. \0a or just \a if the following character does not represent and a hexadecimal digit). Example: "xy\0az" has the same meaning as "xy\az" and xy\nz.

Distinct lexical elements that represent two or more consecutive character strings are represented in the language as a single string which is the result of their concatenation. Example: "ab"˽"cd" is the same as "abcd".

Pointers

The only admissible pointer literal is the null pointer and is indicated by the keyword null.