Language Reference¶
There are four core parts to the scheme language:
- The syntax for describing syntax datums, more commonly known as s-expressions. This describes how code and data are written as text.
- The syntax for defining libraries. In their essence, libraries define a set of symbols to "import" (bring into their scope from other libraries) and "export" (provide upon being imported by other libraries).
- A set of core libraries that provide syntax and functions allowing one to write programs.
- A dynamic type system that describes the types of scheme values at runtime.
This system is incredibly malleable; importing symbols, including the ones most
commonly associated with scheme's semantics, such as define, set! or even
lambda is entirely optional. New languages can be defined as libraries,
allowing for an unprecedented amount of flexibility.
We will go over each of these core parts briefly. For a more in depth overview of the precise semantics of the language, please refer to the r6rs specification, of which this document is a highly abridged and paraphrased facsimile.
Syntax¶
Lexical syntax¶
The lexical syntax of Scheme determines how characters are split into sequence
of tokens, also known as "lexemes". Lexemes are separated by delimiters, which
include any amount of whitespace along with the characters (, ), [, ],
", ;, and #. The following lists all of the possible classifications of
lexemes:
Booleans¶
The values of True and False are represented by the lexemes #t (or #T) and
#f (or #F) respectively.
Identifiers¶
Identifiers are strings of characters that must start with any character that
is not a delimiter and not a number. Practically, that means that an identifier
starts with a letter or special character like $ or <.
The following are all examples of identifiers:
Scheme is much more permissive with what characters can be included in identifiers than other languages. Thus underscores are typically discouraged for naming multi-word variable names, instead using hyphens in what is typically called kebab case.
Numbers¶
Scheme provides the ability to specify a much wider collection of numerical literals than other programming languages:
1234 ; Typical base-ten digit numeral
-53 ; Negative number
+54 ; Positive number
3.1415926 ; Floating point numbers
#b0101010 ; Binary number
#o0237777 ; Octal number
#xdeadbeef ; Hexadecimal number
7/22 ; Rational number
+5.2+7.3i ; Complex number
+inf.0 ; Positive infinity
-inf.0 ; Negative infinity
+nan.0 ; Not A Number (NaN)
Characters¶
Unicode code points (also known as characters) are represented by the prefix
#\ followed by the character literal, a special character name, or x
followed by a hexadecimal scalar value. Valid special character names include
nul, alarm, backspace, tab, linefeed, newline, vtab, page,
return, esc, space, and delete.
#\a ; Lower case letter a
#\A ; Upper case letter A
#\( ; Left parenthesis
#\linefeed ; U+000A
#\newline ; Same as #\linefeed but considered depricated
#\λ ; U+03BB
Strings¶
Strings are formatted pretty similarly to other programming languages; they are
a string of characters surrounded by two double quotes ("). \ is used to
escape " and various other escape sequences. A \ at the end of line can be
used to escape whitespace between the current line and the next:
"hello, world!"
"hello?\nworld!"
"A
bc" ; This is U+0041, U+000A, U+0062 and U+0063
"A\
bc" ; This is U+0041, U+0062 and U+0063
Comments¶
Comments in Scheme come in three different flavors:
Line comments¶
Single line comments are indicated with the semicolon (;) character. The
comment extends to the end of the line:
Block comments¶
Block comments are delimited by pairs of #| and |# characters. They can be
nested:
Datum comments¶
The #; prefix can be used to comment out whole datums. Here is an
example that shows every type of comment in action:
#|
The FACT procedure computes the factorial
of a non-negative integer.
|#
(define fact
(lambda (n)
;; base case
(if (= n 0)
#;(= n 1)
1 ; identity of *
(* n (fact (- n 1))))))
Datum syntax¶
The datum syntax is a description of how Scheme s-expressions are represented in terms of sequence of lexemes. There are three components to the datum syntax:
- Pairs and lists, enclosed by
()or[] - Vectors, enclosed by
#() - Bytevectors, enclosed by
#vu8() - Non-standard datums, such as hashtable literals
We will go through each of these one-by-one:
Pairs and lists¶
The most fundamental datums are pairs and lists, the most basic of which is
() which represents the empty list. () only has one value and it is itself.
Pairs can be represented via dot notation, i.e. (⟨datum1⟩ . ⟨datum2⟩). The
first field is called the "car" (more commonly known as the "head") and the
second is called the "cdr" (more commonly known as the "tail).
Lists are constructed from multiple pairs recursively in their cdr fields. For example,
is equivalent toA list is considered "proper" if the final cdr is the empty list. For example,
is equivalent to and is not considered proper since the final cdr is the symbole rather than
the empty list ().
Vectors¶
Vectors, also known as arrays, are represented with the notation
#(⟨datum⟩ ...). For example, a vector of length four that contains the number
zero at index zero, a pair of two numbers and index one, and a string at index
three could be represented as follows:
Bytevectors¶
Similar to vectors, bytevectors are arrays, but they can only contain values that can fit in a single unsigned 8-bit byte. For example, a bytevector of length three containing the values 1, 2, and 255 could be represented as follows:
Library syntax¶
Libraries provide a syntax for importing and exporting symbols.
Libraries have the following form:
The ⟨name⟩ of the library is a list of symbols, and should match with the
location of the library in the filesytem. For example, a library named
(foo bar baz) should be located at foo/bar/baz.sls.
The optional ⟨version⟩ of a library is either null or a list of integers
that specify the semantic version of the library.
⟨export-spec⟩ is either a symbol specifying a variable to be exported or
a datum of the form (import ⟨import-spec⟩). Exports of the latter form
export all values included in the import.
An ⟨import-spec⟩ has one of the following forms:
⟨library-reference⟩(library ⟨library-reference⟩)Allows for importing of libraries that include the words "only", "except", "prefix", or "rename".(only ⟨import-spec⟩ ⟨identifier⟩ ...)Imports only the identifiers specified from the import spec.(except ⟨import-spec⟩ ⟨identifier⟩ ...)Imports all of the identifiers from the import spec except for the ones specified.(prefix ⟨import-spec⟩ ⟨identifier⟩)Prefixes all of the identifier in the import spec with the provided identifier.(rename ⟨import-spec⟩ (⟨identifier1⟩ ⟨identifier2⟩) ...)Renames each identifier in the import spec that matches the nthcarin the provided list with the nthcdr.
Type system¶
Scheme is dynamically typed, meaning the type of value is determined at run time and not at compile type.
Scheme values can have at most one type, of the following categories:
- Null: Can only be one possible value which is itself. Commonly known as the unit type.
- Pair: A collection of two values.
- Boolean: Can either be
trueorfalse. - Character: A unicode code point.
- Number: A numerical value on the numerical tower.
- String: An array of unicode code points.
- Symbol: A symbol. Conceptually similar to an immutable string. Symbols are
interned so that symbols
with the same spelling always satisfy
eq?. - Vector: An array of values.
- Byte-vector: An array of bytes.
- Syntax: Value containing a representation of the datum syntax, including source code information.
- Procedure: A scheme procedure, more commonly known as a closure.
- Record: A record.
- Record Type Descriptor: A description of a record's type.
- Hashtable: A hash table.
- Port: A value that can handle input/output from the outside world.
Numeric tower¶
Numbers can be any member of increasingly larger sets, of which there are the following:
- Integers, which is a subset of
- Rationals, which is a subset of
- Reals, which is a subset of
- Complex Numbers of which all numbers are a member of.
In scheme-rs, integers are represented with either 64-bit signed integers or big numbers, depending on their size. Rationals are represented as two big numbers. Reals are represented via a 64-bit IEEE float point number. Complex numbers are composed of two simple numbers, which can be any of the numeric types previously listed.