Introduction
L is a rust-inspired language written for educational purposes. This document contains a language reference as well as some details regarding implementation.
Lexical Syntax
Keywords
Lexer
fn
struct
enum
if
else
true
false
let
pub
return
continue
break
mut
for
in
Identifiers
Lexer
[_a-zA-Z][a-zA-Z0-9]*
Comments
Lexer
line-comment:
//
block-comment :
/* .. */
L Programs
Syntax
<program> ::= <item>*
An L program consists of a collection of items where one of them is the main function.
Items
Syntax
<item> ::= <vis> <item-kind>
<item-kind>
::= <function>
| <struct-decl>
Functions
Syntax:
<fn> ::=
fn
<ident> <generics>? (<params>?) <return-type> ? <block-expr><params> ::= <param> (
,
<param> )*,
?<return-type> ::=
->
<type>
Examples
fn identity<T>(t: T) -> T {
t
}
fn naive_fib(n: number) -> number {
if n < 2 { n } else { naive_fib(n-1) + naive_fib(n-2) }
}
ADT Declarations
Type parameters
Statements
Syntax:
Let Statement
Syntax
Let statements introduce a new name into the current scope (i.e. in the enclosing block). The type annotation and initializer are syntactically optional.
Expression Statements
Syntax:
<expr-stmt> ::= <expr>
;
An expression statement is an expression which has its value suppressed and always evaluate to a unit type. They are used only for their side effects.
Expressions
Syntax
<expr> ::=
| <literal-expr>
| <path-expr>
| <group-expr>
| <block-expr>
| <operator-expr>
| <tuple-expr>
| <return-expr>
| <closure-expr>
| <call-expr>
| <conditional-expr>
| <struct-expr>
Lvalues and Rvalues
Lvalues
Rvalues
Literal Expressions
Path Expressions
Group Expressions
Syntax
<group-expr> ::=
(
<expr>)
Block Expressions
Syntax
A block expression is a control flow construct for sequencing multiple statements with an optional final expression. A block also introduces a new lexical scope where variable declarations are visible only from after the let statement that declares it until the end of the block.
Each statement is executed sequentially and the type of the entire block
is the type of the final expression, or ()
if the expression is omitted.
Note how the differentiating factor between an expression statement and an expression is the trailing semicolon.
let x: int = { 5 };
let x: () = { 5; };
Operator Expressions
Syntax
<operator-expr> ::=
| <assignment-expr>
Assignment Expressions
Syntax
The expression on the left of the =
must be an lvalue. Following
in the style of
C, the
result of an assignment expression is the rhs.
// assigns `x` to 5
// main also evaluates to 5
fn main() -> int {
let x = 0;
x = 5
}
Tuple Expressions
Syntax
<tuple-expr> ::=
(
<elements>?)
Note to disambiguate a singleton tuple from a grouping, you should add a trailing comma.
Return Expressions
Syntax
<return-expr> ::=
return
<expr> ?
Closure Expressions
Call Expressions
Conditional Expressions
Struct Expressions
Patterns
Syntax
<pattern> ::=
| <identifier-pattern>
| <wildcard-pattern>
Identifier Pattern
Wildcard Pattern
The Type System
Types
Syntax
<type> ::=
| <primitive-type>
| <parenthesized-type>
| <function-type>
Primitive Types
Syntax
<primitive-type> ::=
bool
|number
|char
|!
bool
- true
or false
number
(internally a IEEE 754 double-precision floating point number)
char
- character type, unimplemented
!
- the uninhabited never type
Parenthesized Types
Syntax
<parenthesized-type> ::=
(
<type>)
Function Types
Syntax
<function-type> ::=
fn(
<types>?)
<return-type>?<types> ::= <type> (
,
<type>)<return-type> ::=
->
<type>
Examples
let f: fn();
let g: fn(bool) -> bool;
let h: fn(num) -> fn(bool, bool) -> bool;
Implementation
The implementation of L is heavily inspired and influenced by rustc. Many implementation and design ideas are taken from rust and in particular, the syntax is intentionally nearly identical.
The full source code can be found here
Passes
The L compiler is currently a pass based compiler. The passes are currently approximately as follows:
- Lexical Analysis:
source code -> token stream
- Parsing:
token stream -> abstract syntax tree
- AST Lowering:
AST -> IR
- Typechecking:
IR -> Typed IR (TIR)
- TIR Lowering:
TIR -> Midlevel IR (MIR)
- MIR Lowering:
MIR -> LLVM IR
- Code Generation:
LLVM IR -> Output
Lexical Analysis
The lexer breaks the input source (&str
) into a sequence of tokens.
The lexer uses a simple library that is also utilised in the rustc
compiler which returns an iterator of TokenKind
s. This is then
transformed into my representation of Tokens and returned as a Vector.
Importantly, the lexer maintains the positions of each token in the
source code by keeping track of the Span
. A Span
is nothing more
than the start (inclusive) and end (exclusive) index of the token in the
input string.
The relevant source code can be found in the lexer module.
Syntactic Analysis (Parsing)
Conceptually, the parser takes the stream of Tokens generated by the lexer and constructs an abstract syntax tree.
Relevant source code can be found in the parser module.