| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter describes those of Guile's simple data types which are primarily used for their role as items of generic data. By simple we mean data types that are not primarily used as containers to hold other data -- i.e. pairs, lists, vectors and so on. For the documentation of such compound data types, see 22. Compound Data Types.
One of the great strengths of Scheme is that there is no straightforward distinction between "data" and "functionality". For example, Guile's support for dynamic linking could be described
The contents of this chapter are, therefore, a matter of judgment. By generic, we mean to select those data types whose typical use as data in a wide variety of programming contexts is more important than their use in the implementation of a particular piece of functionality. The last section of this chapter provides references for all the data types that are documented not here but in a "functionality-centric" way elsewhere in the manual.
21.1 Booleans True/false values. 21.2 Numerical data types 21.3 Characters New character names. 21.4 Strings Special things about strings. 21.5 Regular Expressions Pattern matching and substitution. 21.6 Symbols 21.7 Keywords Self-quoting, customizable display keywords. 21.8 "Functionality-Centric" Data Types "Functionality-centric" data types.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The two boolean values are #t for true and #f for false.
Boolean values are returned by predicate procedures, such as the general
equality predicates eq?, eqv? and equal?
(see section 24.1 Equality) and numerical and string comparison operators like
string=? (see section 21.4.7 String Comparison) and <=
(see section 21.2.8 Comparison Predicates).
(<= 3 8) => #t (<= 3 -3) => #f (equal? "house" "houses") => #f (eq? #f #f) => #t |
In test condition contexts like if and cond (see section 26.2 Simple Conditional Evaluation), where a group of subexpressions will be evaluated only if a
condition expression evaluates to "true", "true" means any
value at all except #f.
(if #t "yes" "no") => "yes" (if 0 "yes" "no") => "yes" (if #f "yes" "no") => "no" |
A result of this asymmetry is that typical Scheme source code more often
uses #f explicitly than #t: #f is necessary to
represent an if or cond false value, whereas #t is
not necessary to represent an if or cond true value.
It is important to note that #f is not equivalent to any
other Scheme value. In particular, #f is not the same as the
number 0 (like in C and C++), and not the same as the "empty list"
(like in some Lisp dialects).
The not procedure returns the boolean inverse of its argument:
#t iff x is #f, else return #f.
The boolean? procedure is a predicate that returns #t if
its argument is one of the boolean values, otherwise #f.
#t iff obj is either #t or #f.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Guile supports a rich "tower" of numerical types -- integer, rational, real and complex -- and provides an extensive set of mathematical and scientific functions for operating on numerical data. This section of the manual documents those types and functions.
You may also find it illuminating to read R5RS's presentation of numbers in Scheme, which is particularly clear and accessible: see See section 21.2 Numerical data types.
21.2.1 Scheme's Numerical "Tower" Scheme's numerical "tower". 21.2.2 Integers Whole numbers. 21.2.3 Real and Rational Numbers Real and rational numbers. 21.2.4 Complex Numbers Complex numbers. 21.2.5 Exact and Inexact Numbers Exactness and inexactness. 21.2.6 Read Syntax for Numerical Data Read syntax for numerical data. 21.2.7 Operations on Integer Values Operations on integer values. 21.2.8 Comparison Predicates Comparison predicates. 21.2.9 Converting Numbers To and From Strings Converting numbers to and from strings. 21.2.10 Complex Number Operations Complex number operations. 21.2.11 Arithmetic Functions Arithmetic functions. 21.2.12 Scientific Functions Scientific functions. 21.2.13 Primitive Numeric Functions Primitive numeric functions. 21.2.14 Bitwise Operations Logical AND, OR, NOT, and so on. 21.2.15 Random Number Generation Random number generation.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Scheme's numerical "tower" consists of the following categories of numbers:
It is called a tower because each category "sits on" the one that follows it, in the sense that every integer is also a rational, every rational is also real, and every real number is also a complex number (but with zero imaginary part).
Of these, Guile implements integers, reals and complex numbers as distinct types. Rationals are implemented as regards the read syntax for rational numbers that is specified by R5RS, but are immediately converted by Guile to the corresponding real number.
The number? predicate may be applied to any Scheme value to
discover whether the value is any of the supported numerical types.
#t if obj is any kind of number, else #f.
For example:
(number? 3) => #t (number? "hello there!") => #f (define pi 3.141592654) (number? pi) => #t |
The next few subsections document each of Guile's numerical data types in detail.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Integers are whole numbers, that is numbers with no fractional part, such as 2, 83 and -3789.
Integers in Guile can be arbitrarily big, as shown by the following example.
(define (factorial n)
(let loop ((n n) (product 1))
(if (= n 0)
product
(loop (- n 1) (* product n)))))
(factorial 3)
=>
6
(factorial 20)
=>
2432902008176640000
(- (factorial 45))
=>
-119622220865480194561963161495657715064383733760000000000
|
Readers whose background is in programming languages where integers are limited by the need to fit into just 4 or 8 bytes of memory may find this surprising, or suspect that Guile's representation of integers is inefficient. In fact, Guile achieves a near optimal balance of convenience and efficiency by using the host computer's native representation of integers where possible, and a more general representation where the required number does not fit in the native form. Conversion between these two representations is automatic and completely invisible to the Scheme level programmer.
#t if x is an integer number, else #f.
(integer? 487) => #t (integer? -3.4) => #f |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Mathematically, the real numbers are the set of numbers that describe all possible points along a continuous, infinite, one-dimensional line. The rational numbers are the set of all numbers that can be written as fractions P/Q, where P and Q are integers. All rational numbers are also real, but there are real numbers that are not rational, for example the square root of 2, and pi.
Guile represents both real and rational numbers approximately using a floating point encoding with limited precision. Even though the actual encoding is in binary, it may be helpful to think of it as a decimal number with a limited number of significant figures and a decimal point somewhere, since this corresponds to the standard notation for non-whole numbers. For example:
0.34 -0.00000142857931198 -5648394822220000000000.0 4.0 |
The limited precision of Guile's encoding means that any "real" number
in Guile can be written in a rational form, by multiplying and then dividing
by sufficient powers of 10 (or in fact, 2). For example,
-0.00000142857931198 is the same as 142857931198 divided by
100000000000000000. In Guile's current incarnation, therefore,
the rational? and real? predicates are equivalent.
Another aspect of this equivalence is that Guile currently does not preserve the exactness that is possible with rational arithmetic. If such exactness is needed, it is of course possible to implement exact rational arithmetic at the Scheme level using Guile's arbitrary size integers.
A planned future revision of Guile's numerical tower will make it possible to implement exact representations and arithmetic for both rational numbers and real irrational numbers such as square roots, and in such a way that the new kinds of number integrate seamlessly with those that are already implemented.
#t if obj is a real number, else #f.
Note that the sets of integer and rational values form subsets
of the set of real numbers, so the predicate will also be fulfilled
if obj is an integer number or a rational number.
#t if x is a rational number, #f
otherwise. Note that the set of integer values forms a subset of
the set of rational numbers, i. e. the predicate will also be
fulfilled if x is an integer number. Real numbers
will also satisfy this predicate, because of their limited
precision.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Complex numbers are the set of numbers that describe all possible points in a two-dimensional space. The two coordinates of a particular point in this space are known as the real and imaginary parts of the complex number that describes that point.
In Guile, complex numbers are written in rectangular form as the sum of
their real and imaginary parts, using the symbol i to indicate
the imaginary part.
3+4i => 3.0+4.0i (* 3-8i 2.3+0.3i) => 9.3-17.5i |
Guile represents a complex number as a pair of numbers both of which are real, so the real and imaginary parts of a complex number have the same properties of inexactness and limited precision as single real numbers.
#t if x is a complex number, #f
otherwise. Note that the sets of real, rational and integer
values form subsets of the set of complex numbers, i. e. the
predicate will also be fulfilled if x is a real,
rational or integer number.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
R5RS requires that a calculation involving inexact numbers always
produces an inexact result. To meet this requirement, Guile
distinguishes between an exact integer value such as 5 and the
corresponding inexact real value which, to the limited precision
available, has no fractional part, and is printed as 5.0. Guile
will only convert the latter value to the former when forced to do so by
an invocation of the inexact->exact procedure.
#t if x is an exact number, #f
otherwise.
#t if x is an inexact number, #f
else.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The read syntax for integers is a string of digits, optionally preceded by a minus or plus character, a code indicating the base in which the integer is encoded, and a code indicating whether the number is exact or inexact. The supported base codes are:
#b, #B -- the integer is written in binary (base 2)
#o, #O -- the integer is written in octal (base 8)
#d, #D -- the integer is written in decimal (base 10)
#x, #X -- the integer is written in hexadecimal (base 16).
If the base code is omitted, the integer is assumed to be decimal. The following examples show how these base codes are used.
-13 => -13 #d-13 => -13 #x-13 => -19 #b+1101 => 13 #o377 => 255 |
The codes for indicating exactness (which can, incidentally, be applied to all numerical values) are:
#e, #E -- the number is exact
#i, #I -- the number is inexact.
If the exactness indicator is omitted, the integer is assumed to be exact,
since Guile's internal representation for integers is always exact.
Real numbers have limited precision similar to the precision of the
double type in C. A consequence of the limited precision is that
all real numbers in Guile are also rational, since any number R with a
limited number of decimal places, say N, can be made into an integer by
multiplying by 10^N.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
#t if n is an odd number, #f
otherwise.
#t if n is an even number, #f
otherwise.
(remainder 13 4) => 1 (remainder -13 4) => -1 |
(modulo 13 4) => 1 (modulo -13 4) => 3 |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
#t if all parameters are numerically equal.
#t if the list of parameters is monotonically
increasing.
#t if the list of parameters is monotonically
decreasing.
#t if the list of parameters is monotonically
non-decreasing.
#t if the list of parameters is monotonically
non-increasing.
#t if z is an exact or inexact number equal to
zero.
#t if x is an exact or inexact number greater than
zero.
#t if x is an exact or inexact number less than
zero.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
string->number returns #f.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
abs for real arguments, but also allows complex numbers.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
x must be a number with zero imaginary part. To calculate the
magnitude of a complex number, use magnitude instead.
For the truncate and round procedures, the Guile library
exports equivalent C functions, but taking and returning arguments of
type double rather than the usual SCM.
For floor and ceiling, the equivalent C functions are
floor and ceil from the standard mathematics library
(which also take and return double arguments).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following procedures accept any kind of number as arguments, including complex numbers.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Many of Guile's numeric procedures which accept any kind of numbers as arguments, including complex numbers, are implemented as Scheme procedures that use the following real number-based primitives. These primitives signal an error if they are called with complex arguments.
For the hyperbolic arc-functions, the Guile library exports C functions
corresponding to these Scheme procedures, but taking and returning
arguments of type double rather than the usual SCM.
For all the other Scheme procedures above, except expt and
atan2 (whose entries specifically mention an equivalent C
function), the equivalent C functions are those provided by the standard
mathematics library. The mapping is as follows.
| Scheme Procedure | C Function | |
$abs | fabs |
|
$sqrt | sqrt |
|
$sin | sin |
|
$cos | cos |
|
$tan | tan |
|
$asin | asin |
|
$acos | acos |
|
$atan | atan |
|
$exp | exp |
|
$log | log |
|
$sinh | sinh |
|
$cosh | cosh |
|
$tanh | tanh |
Naturally, these C functions expect and return double arguments.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
(logand) => -1 (logand 7) => 7 (logand #b111 #b011 #b001) => 1 |
(logior) => 0 (logior 7) => 7 (logior #b000 #b001 #b011) => 3 |
(logxor) => 0 (logxor 7) => 7 (logxor #b000 #b001 #b011) => 2 (logxor #b000 #b001 #b011 #b011) => 1 |
(number->string (lognot #b10000000) 2) => "-10000001" (number->string (lognot #b0) 2) => "-1" |
(logtest j k) == (not (zero? (logand j k))) (logtest #b0100 #b1011) => #f (logtest #b0100 #b0111) => #t |
(logbit? index j) == (logtest (integer-expt 2 index) j) (logbit? 0 #b1101) => #t (logbit? 1 #b1101) => #f (logbit? 2 #b1101) => #t (logbit? 3 #b1101) => #t (logbit? 4 #b1101) => #f |
Formally, the function returns an integer equivalent to
(inexact->exact (floor (* n (expt 2 cnt)))).
(number->string (ash #b1 3) 2) => "1000" (number->string (ash #b1010 -1) 2) => "101" |
(logcount #b10101010) => 4 (logcount 0) => 0 (logcount -2) => 1 |
(integer-length #b10101010) => 8 (integer-length 0) => 0 (integer-length #b1111) => 4 |
(integer-expt 2 5) => 32 (integer-expt -3 3) => -27 |
(number->string (bit-extract #b1101101010 0 4) 2) => "1010" (number->string (bit-extract #b1101101010 4 9) 2) => "10110" |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Accepts a positive integer or real n and returns a number of the same type between zero (inclusive) and N (exclusive). The values returned have a uniform distribution.
The optional argument state must be of the type produced
by seed->random-state. It defaults to the value of the
variable *random-state*. This object is used to maintain
the state of the pseudo-random-number generator and is altered
as a side effect of the random operation.
(+ m (* d (random:normal))).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Most of the characters in the ASCII character set may be referred to by
name: for example, #\tab, #\esc, #\stx, and so on.
The following table describes the ASCII names for each character.
0 = #\nul |
1 = #\soh
| 2 = #\stx
| 3 = #\etx
|
4 = #\eot |
5 = #\enq
| 6 = #\ack
| 7 = #\bel
|
8 = #\bs |
9 = #\ht
| 10 = #\nl
| 11 = #\vt
|
12 = #\np |
13 = #\cr
| 14 = #\so
| 15 = #\si
|
16 = #\dle |
17 = #\dc1
| 18 = #\dc2
| 19 = #\dc3
|
20 = #\dc4 |
21 = #\nak
| 22 = #\syn
| 23 = #\etb
|
24 = #\can |
25 = #\em
| 26 = #\sub
| 27 = #\esc
|
28 = #\fs |
29 = #\gs
| 30 = #\rs
| 31 = #\us
|
32 = #\sp |
The delete character (octal 177) may be referred to with the name
#\del.
Several characters have more than one name:
#\space, #\sp
#\newline, #\nl
#\tab, #\ht
#\backspace, #\bs
#\return, #\cr
#\page, #\np
#\null, #\nul
#t iff x is a character, else #f.
#t iff x is the same character as y, else #f.
#t iff x is less than y in the ASCII sequence,
else #f.
#t iff x is less than or equal to y in the
ASCII sequence, else #f.
#t iff x is greater than y in the ASCII
sequence, else #f.
#t iff x is greater than or equal to y in the
ASCII sequence, else #f.
#t iff x is the same character as y ignoring
case, else #f.
#t iff x is less than y in the ASCII sequence
ignoring case, else #f.
#t iff x is less than or equal to y in the
ASCII sequence ignoring case, else #f.
#t iff x is greater than y in the ASCII
sequence ignoring case, else #f.
#t iff x is greater than or equal to y in the
ASCII sequence ignoring case, else #f.
#t iff chr is alphabetic, else #f.
Alphabetic means the same thing as the isalpha C library function.
#t iff chr is numeric, else #f.
Numeric means the same thing as the isdigit C library function.
#t iff chr is whitespace, else #f.
Whitespace means the same thing as the isspace C library function.
#t iff chr is uppercase, else #f.
Uppercase means the same thing as the isupper C library function.
#t iff chr is lowercase, else #f.
Lowercase means the same thing as the islower C library function.
#t iff chr is either uppercase or lowercase, else #f.
Uppercase and lowercase are as defined by the isupper and islower
C library functions.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Strings are fixed-length sequences of characters. They can be created by calling constructor procedures, but they can also literally get entered at the REPL or in Scheme source files.
Guile provides a rich set of string processing procedures, because text handling is very important when Guile is used as a scripting language.
Strings always carry the information about how many characters they are
composed of with them, so there is no special end-of-string character,
like in C. That means that Scheme strings can contain any character,
even the NUL character '\0'. But note: Since most operating
system calls dealing with strings (such as for file operations) expect
strings to be zero-terminated, they might do unexpected things when
called with string containing unusual characters.
21.4.1 String Read Syntax Read syntax for strings. 21.4.2 String Predicates Testing strings for certain properties. 21.4.3 String Constructors Creating new string objects. 21.4.4 List/String conversion Converting from/to lists of characters. 21.4.5 String Selection Select portions from strings. 21.4.6 String Modification Modify parts or whole strings. 21.4.7 String Comparison Lexicographic ordering predicates. 21.4.8 String Searching Searching in strings. 21.4.9 Alphabetic Case Mapping Convert the alphabetic case of strings. 21.4.10 Appending Strings Appending strings to form a new string.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The read syntax for strings is an arbitrarily long sequence of
characters enclosed in double quotes ("). (7) If you want to insert a double quote character into a
string literal, it must be prefixed with a backslash \ character
(called an escape character).
The following are examples of string literals:
"foo" "bar plonk" "Hello World" "\"Hi\", he said." |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following procedures can be used to check whether a given string fulfills some specified property.
#t if obj is a string, else #f.
#t if str's length is zero, and
#f otherwise.
(string-null? "") => #t y => "foo" (string-null? y) => #f |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The string constructor procedures create new string objects, possibly initializing them with some specified character data.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When processing strings, it is often convenient to first convert them
into a list representation by using the procedure string->list,
work with the resulting list, and then convert it back into a string.
These procedures are useful for similar tasks.
string->list and
list->string are inverses as far as `equal?' is
concerned.
(string-split "root:x:0:0:root:/root:/bin/bash" #\:)
=>
("root" "x" "0" "0" "root" "/root" "/bin/bash")
(string-split "::" #\:)
=>
("" "" "")
(string-split "" #\:)
=>
("")
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Portions of strings can be extracted by these procedures.
string-ref delivers individual characters whereas
substring can be used to extract substrings from longer strings.
0 <= start <= end <= (string-length str).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These procedures are for modifying strings in-place. This means that the result of the operation is not a new string; instead, the original string's memory representation is modified.
(define y "abcdefg") (substring-fill! y 1 3 #\r) y => "arrdefg" |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The procedures in this section are similar to the character ordering
predicates (see section 21.3 Characters), but are defined on character sequences.
They all return #t on success and #f on failure. The
predicates ending in -ci ignore the character case when comparing
strings.
#t if the two
strings are the same length and contain the same characters in
the same positions, otherwise return #f.
The procedure string-ci=? treats upper and lower case
letters as though they were the same character, but
string=? treats upper and lower case as distinct
characters.
#t if s1
is lexicographically less than s2.
#t if s1
is lexicographically less than or equal to s2.
#t if s1
is lexicographically greater than s2.
#t if s1
is lexicographically greater than or equal to s2.
#t if
the two strings are the same length and their component
characters match (ignoring case) at each position; otherwise
return #f.
#t if s1 is lexicographically less than s2
regardless of case.
#t if s1 is lexicographically less than or equal
to s2 regardless of case.
#t if s1 is lexicographically greater than
s2 regardless of case.
#t if s1 is lexicographically greater than or
equal to s2 regardless of case.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When searching for the index of a character in a string, these procedures can be used.
index or
strchr functions from the C library.
(string-index "weiner" #\e) => 1 (string-index "weiner" #\e 2) => 4 (string-index "weiner" #\e 2 4) => #f |
string-index, but search from the right of the
string rather than from the left. This procedure essentially
implements the rindex or strrchr functions from
the C library.
(string-rindex "weiner" #\e) => 4 (string-rindex "weiner" #\e 2 4) => #f (string-rindex "weiner" #\e 2 5) => 4 |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
These are procedures for mapping strings to their upper- or lower-case equivalents, respectively, or for capitalizing strings.
y => "arrdefg" (string-upcase! y) => "ARRDEFG" y => "ARRDEFG" |
y => "ARRDEFG" (string-downcase! y) => "arrdefg" y => "arrdefg" |
y => "hello world" (string-capitalize! y) => "Hello World" y => "Hello World" |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The procedure string-append appends several strings together to
form a longer result string.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A regular expression (or regexp) is a pattern that describes a whole class of strings. A full description of regular expressions and their syntax is beyond the scope of this manual; an introduction can be found in the Emacs manual (see section `Syntax of Regular Expressions' in The GNU Emacs Manual), or in many general Unix reference books.
If your system does not include a POSIX regular expression library, and
you have not linked Guile with a third-party regexp library such as Rx,
these functions will not be available. You can tell whether your Guile
installation includes regular expression support by checking whether the
*features* list includes the regex symbol.
21.5.1 Regexp Functions Functions that create and match regexps. 21.5.2 Match Structures Finding what was matched by a regexp. 21.5.3 Backslash Escapes Removing the special meaning of regexp meta-characters.
[FIXME: it may be useful to include an Examples section. Parts of this interface are bewildering on first glance.]
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
By default, Guile supports POSIX extended regular expressions. That means that the characters `(', `)', `+' and `?' are special, and must be escaped if you wish to match the literal characters.
This regular expression interface was modeled after that implemented by SCSH, the Scheme Shell. It is intended to be upwardly compatible with SCSH regular expressions.
string-match returns a match structure which
describes what, if anything, was matched by the regular
expression. See section 21.5.2 Match Structures. If str does not match
pattern at all, string-match returns #f.
Each time string-match is called, it must compile its
pattern argument into a regular expression structure. This
operation is expensive, which makes string-match inefficient if
the same regular expression is used several times (for example, in a
loop). For better performance, you can compile a regular expression in
advance and then match strings against the compiled regexp.
make-regexp throws
a regular-expression-syntax error.
The flags arguments change the behavior of the compiled regular expression. The following flags may be supplied:
regexp/icase
regexp/newline
regexp/basic
regexp/extended
make-regexp includes
both regexp/basic and regexp/extended flags, the
one which comes last will override the earlier one.
str. If the optional integer start argument is
provided, begin matching from that position in the string.
Return a match structure describing the results of the match,
or #f if no match could be found.
The flags arguments change the matching behavior. The following flags may be supplied:
regexp/notbol
regexp/newline
is used). Use this when the beginning of the string should
not be considered the beginning of a line.
regexp/noteol
regexp/newline
is used). Use this when the end of the string should not be
considered the end of a line.
#t if obj is a compiled regular expression,
or #f otherwise.
Regular expressions are commonly used to find patterns in one string and replace them with the contents of another string.
port may be #f, in which case nothing is written; instead,
regexp-substitute constructs a string from the specified
items and returns that.
regexp-substitute, but can be used to perform global
substitutions on str. Instead of taking a match structure as an
argument, regexp-substitute/global takes two string arguments: a
regexp string describing a regular expression, and a target
string which should be matched against this regular expression.
Each item behaves as in regexp-substitute, with the following exceptions:
regexp-substitute/global to recurse
on the unmatched portion of str. This must be supplied in
order to perform global search-and-replace on str; if it is not
present among the items, then regexp-substitute/global will
return after processing a single match.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A match structure is the object returned by string-match and
regexp-exec. It describes which portion of a string, if any,
matched the given regular expression. Match structures include: a
reference to the string that was checked for matches; the starting and
ending positions of the regexp match; and, if the regexp included any
parenthesized subexpressions, the starting and ending positions of each
submatch.
In each of the regexp match functions described below, the match
argument must be a match structure returned by a previous call to
string-match or regexp-exec. Most of these functions
return some information about the original target string that was
matched against a regular expression; we will call that string
target for easy reference.
#t if obj is a match structure returned by a
previous call to regexp-exec, or #f otherwise.
#f.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Sometimes you will want a regexp to match characters like `*' or `$' exactly. For example, to check whether a particular string represents a menu entry from an Info node, it would be useful to match it against a regexp like `^* [^:]*::'. However, this won't work; because the asterisk is a metacharacter, it won't match the `*' at the beginning of the string. In this case, we want to make the first asterisk un-magic.
You can do this by preceding the metacharacter with a backslash character `\'. (This is also called quoting the metacharacter, and is known as a backslash escape.) When Guile sees a backslash in a regular expression, it considers the following glyph to be an ordinary character, no matter what special meaning it would ordinarily have. Therefore, we can make the above example work by changing the regexp to `^\* [^:]*::'. The `\*' sequence tells the regular expression engine to match only a single asterisk in the target string.
Since the backslash is itself a metacharacter, you may force a regexp to match a backslash in the target string by preceding the backslash with itself. For example, to find variable references in a TeX program, you might want to find occurrences of the string `\let\' followed by any number of alphabetic characters. The regular expression `\\let\\[A-Za-z]*' would do this: the double backslashes in the regexp each match a single backslash in the target string.
Very important: Using backslash escapes in Guile source code (as in Emacs Lisp or C) can be tricky, because the backslash character has special meaning for the Guile reader. For example, if Guile encounters the character sequence `\n' in the middle of a string while processing Scheme code, it replaces those characters with a newline character. Similarly, the character sequence `\t' is replaced by a horizontal tab. Several of these escape sequences are processed by the Guile reader before your code is executed. Unrecognized escape sequences are ignored: if the characters `\*' appear in a string, they will be translated to the single character `*'.
This translation is obviously undesirable for regular expressions, since we want to be able to include backslashes in a string in order to escape regexp metacharacters. Therefore, to make sure that a backslash is preserved in a string in your Guile program, you must use two consecutive backslashes:
(define Info-menu-entry-pattern (make-regexp "^\\* [^:]*")) |
The string in this example is preprocessed by the Guile reader before
any code is executed. The resulting argument to make-regexp is
the string `^\* [^:]*', which is what we really want.
This also means that in order to write a regular expression that matches a single backslash character, the regular expression string in the source code must include four backslashes. Each consecutive pair of backslashes gets translated by the Guile reader to a single backslash, and the resulting double-backslash is interpreted by the regexp engine as matching a single backslash character. Hence:
(define tex-variable-pattern (make-regexp "\\\\let\\\\=[A-Za-z]*")) |
The reason for the unwieldiness of this syntax is historical. Both regular expression pattern matchers and Unix string processing systems have traditionally used backslashes with the special meanings described above. The POSIX regular expression specification and ANSI C standard both require these semantics. Attempting to abandon either convention would cause other kinds of compatibility problems, possibly more severe ones. Therefore, without extending the Scheme reader to support strings with different quoting conventions (an ungainly and confusing extension when implemented in other languages), we must adhere to this cumbersome escape syntax.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Symbols in Scheme are widely used in three ways: as items of discrete data, as lookup keys for alists and hash tables, and to denote variable references.
A symbol is similar to a string in that it is defined by a sequence of characters. The sequence of characters is known as the symbol's name. In the usual case -- that is, where the symbol's name doesn't include any characters that could be confused with other elements of Scheme syntax -- a symbol is written in a Scheme program by writing the sequence of characters that make up the name, without any quotation marks or other special syntax. For example, the symbol whose name is "multiply-by-2" is written, simply:
multiply-by-2 |
Notice how this differs from a string with contents "multiply-by-2", which is written with double quotation marks, like this:
"multiply-by-2" |
Looking beyond how they are written, symbols are different from strings in two important respects.
The first important difference is uniqueness. If the same-looking string is read twice from two different places in a program, the result is two different string objects whose contents just happen to be the same. If, on the other hand, the same-looking symbol is read twice from two different places in a program, the result is the same symbol object both times.
Given two read symbols, you can use eq? to test whether they are
the same (that is, have the same name). eq? is the most
efficient comparison operator in Scheme, and comparing two symbols like
this is as fast as comparing, for example, two numbers. Given two
strings, on the other hand, you must use equal? or
string=?, which are much slower comparison operators, to
determine whether the strings have the same contents.
(define sym1 (quote hello)) (define sym2 (quote hello)) (eq? sym1 sym2) => #t (define str1 "hello") (define str2 "hello") (eq? str1 str2) => #f (equal? str1 str2) => #t |
The second important difference is that symbols, unlike strings, are not
self-evaluating. This is why we need the (quote ...)s in the
example above: (quote hello) evaluates to the symbol named
"hello" itself, whereas an unquoted hello is read as the
symbol named "hello" and evaluated as a variable reference ... about
which more below (see section 21.6.3 Symbols as Denoting Variables).
21.6.1 Symbols as Discrete Data Symbols as discrete data. 21.6.2 Symbols as Lookup Keys Symbols as lookup keys. 21.6.3 Symbols as Denoting Variables Symbols as denoting variables. 21.6.4 Operations Related to Symbols Operations related to symbols. 21.6.5 Function Slots and Property Lists Function slots and property lists. 21.6.6 Extended Read Syntax for Symbols Extended read syntax for symbols.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Numbers and symbols are similar to the extent that they both lend
themselves to eq? comparison. But symbols are more descriptive
than numbers, because a symbol's name can be used directly to describe
the concept for which that symbol stands.
For example, imagine that you need to represent some colours in a computer program. Using numbers, you would have to choose arbitrarily some mapping between numbers and colours, and then take care to use that mapping consistently:
;; 1=red, 2=green, 3=purple
(if (eq? (colour-of car) 1)
...)
|
You can make the mapping more explicit and the code more readable by defining constants:
(define red 1)
(define green 2)
(define purple 3)
(if (eq? (colour-of car) red)
...)
|
But the simplest and clearest approach is not to use numbers at all, but symbols whose names specify the colours that they refer to:
(if (eq? (colour-of car) 'red)
...)
|
The descriptive advantages of symbols over numbers increase as the set of concepts that you want to describe grows. Suppose that a car object can have other properties as well, such as whether it has or uses:
Then a car's combined property set could be naturally represented and manipulated as a list of symbols:
(properties-of car1)
=>
(red manual unleaded power-steering)
(if (memq 'power-steering (properties-of car1))
(display "Unfit people can drive this car.\n")
(display "You'll need strong arms to drive this car!\n"))
-|
Unfit people can drive this car.
|
Remember, the fundamental property of symbols that we are relying on
here is that an occurrence of 'red in one part of a program is an
indistinguishable symbol from an occurrence of 'red in
another part of a program; this means that symbols can usefully be
compared using eq?. At the same time, symbols have naturally
descriptive names. This combination of efficiency and descriptive power
makes them ideal for use as discrete data.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Given their efficiency and descriptive power, it is natural to use symbols as the keys in an association list or hash table.
To illustrate this, consider a more structured representation of the car properties example from the preceding subsection. Rather than mixing all the properties up together in a flat list, we could use an association list like this:
(define car1-properties '((colour . red)
(transmission . manual)
(fuel . unleaded)
(steering . power-assisted)))
|
Notice how this structure is more explicit and extensible than the flat
list. For example it makes clear that manual refers to the
transmission rather than, say, the windows or the locking of the car.
It also allows further properties to use the same symbols among their
possible values without becoming ambiguous:
(define car1-properties '((colour . red)
(transmission . manual)
(fuel . unleaded)
(steering . power-assisted)
(seat-colour . red)
(locking . manual)))
|
With a representation like this, it is easy to use the efficient
assq-XXX family of procedures (see section 22.7.2 Association Lists) to
extract or change individual pieces of information:
(assq-ref car1-properties 'fuel) => unleaded (assq-ref car1-properties 'transmission) => manual (assq-set! car1-properties 'seat-colour 'black) => ((colour . red) (transmission . manual) (fuel . unleaded) (steering . power-assisted) (seat-colour . black) (locking . manual))) |
Hash tables also have keys, and exactly the same arguments apply to the
use of symbols in hash tables as in association lists. The hash value
that Guile uses to decide where to add a symbol-keyed entry to a hash
table can be obtained by calling the symbol-hash procedure:
See 22.7.3 Hash Tables for information about hash tables in general, and for why you might choose to use a hash table rather than an association list.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When an unquoted symbol in a Scheme program is evaluated, it is interpreted as a variable reference, and the result of the evaluation is the appropriate variable's value.
For example, when the expression (string-length "abcd") is read
and evaluated, the sequence of characters string-length is read
as the symbol whose name is "string-length". This symbol is associated
with a variable whose value is the procedure that implements string
length calculation. Therefore evaluation of the string-length
symbol results in that procedure.
The details of the connection between an unquoted symbol and the variable to which it refers are explained elsewhere. See 25. Definitions and Variable Bindings, for how associations between symbols and variables are created, and 31. Modules, for how those associations are affected by Guile's module system.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Given any Scheme value, you can determine whether it is a symbol using
the symbol? primitive:
#t if obj is a symbol, otherwise return
#f.
Once you know that you have a symbol, you can obtain its name as a
string by calling symbol->string. Note that Guile differs by
default from R5RS on the details of symbol->string as regards
case-sensitivity:
If Guile is set to read symbols case-insensitively (as specified by
R5RS), and s comes into being as part of a literal expression
(see section `Literal expressions' in The Revised^5 Report on Scheme) or
by a call to the read or string-ci->symbol procedures,
Guile converts any alphabetic characters in the symbol's name to
lower case before creating the symbol object, so the string returned
here will be in lower case.
If s was created by string->symbol, the case of characters
in the string returned will be the same as that in the string that was
passed to string->symbol, regardless of Guile's case-sensitivity
setting at the time s was created.
It is an error to apply mutation procedures like string-set! to
strings returned by this procedure.
Most symbols are created by writing them literally in code. However it
is also possible to create symbols programmatically using the following
string->symbol and string-ci->symbol procedures:
The following examples illustrate Guile's detailed behaviour as regards the case-sensitivity of symbols:
(read-enable 'case-insensitive) ; R5RS compliant behaviour
(symbol->string 'flying-fish) => "flying-fish"
(symbol->string 'Martin) => "martin"
(symbol->string
(string->symbol "Malvina")) => "Malvina"
(eq? 'mISSISSIppi 'mississippi) => #t
(string->symbol "mISSISSIppi") => mISSISSIppi
(eq? 'bitBlt (string->symbol "bitBlt")) => #f
(eq? 'LolliPop
(string->symbol (symbol->string 'LolliPop))) => #t
(string=? "K. Harper, M.D."
(symbol->string
(string->symbol "K. Harper, M.D."))) => #t
(read-disable 'case-insensitive) ; Guile default behaviour
(symbol->string 'flying-fish) => "flying-fish"
(symbol->string 'Martin) => "Martin"
(symbol->string
(string->symbol "Malvina")) => "Malvina"
(eq? 'mISSISSIppi 'mississippi) => #f
(string->symbol "mISSISSIppi") => mISSISSIppi
(eq? 'bitBlt (string->symbol "bitBlt")) => #t
(eq? 'LolliPop
(string->symbol (symbol->string 'LolliPop))) => #t
(string=? "K. Harper, M.D."
(symbol->string
(string->symbol "K. Harper, M.D."))) => #t
|
Finally, some applications, especially those that generate new Scheme
code dynamically, need to generate symbols for use in the generated
code. The gensym primitive meets this need:
The symbols generated by gensym are likely to be unique,
since their names begin with a space and it is only otherwise possible
to generate such symbols if a programmer goes out of their way to do
so. The 1.8 release of Guile will include a way of creating
symbols that are guaranteed to be unique.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In traditional Lisp dialects, symbols are often understood as having three kinds of value at once:
put or get functions.
Although Scheme (as one of its simplifications with respect to Lisp) does away with the distinction between variable and function namespaces, Guile currently retains some elements of the traditional structure in case they turn out to be useful when implementing translators for other languages, in particular Emacs Lisp.
Specifically, Guile symbols have two extra slots. for a symbol's property list, and for its "function value." The following procedures are provided to access these slots.
equal?; prop should be one of the keys in that list. If
the property list has no entry for prop, symbol-property
returns #f.
symbol-property.
symbol-property.
Support for these extra slots may be removed in a future release, and it is probably better to avoid using them. (In release 1.6, Guile itself uses the property list slot sparingly, and the function slot not at all.) For a more modern and Schemely approach to properties, see 24.2 Object Properties.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The read syntax for a symbol is a sequence of letters, digits, and
extended alphabetic characters, beginning with a character that
cannot begin a number. In addition, the special cases of +,
-, and ... are read as symbols even though numbers can
begin with +, - or ..
Extended alphabetic characters may be used within identifiers as if they were letters. The set of extended alphabetic characters is:
! $ % & * + - . / : < = > ? @ ^ _ ~ |
In addition to the standard read syntax defined above (which is taken from R5RS (see section `Formal syntax' in The Revised^5 Report on Scheme)), Guile provides an extended symbol read syntax that allows the inclusion of unusual characters such as space characters, newlines and parentheses. If (for whatever reason) you need to write a symbol containing characters not mentioned above, you can do so as follows.
#{,
}#.
Here are a few examples of this form of read syntax. The first symbol needs to use extended syntax because it contains a space character, the second because it contains a line break, and the last because it looks like a number.
#{foo bar}#
#{what
ever}#
#{4242}#
|
Although Guile provides this extended read syntax for symbols, widespread usage of it is discouraged because it is not portable and not very readable.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Keywords are self-evaluating objects with a convenient read syntax that makes them easy to type.
Guile's keyword support conforms to R5RS, and adds a (switchable) read
syntax extension to permit keywords to begin with : as well as
#:.
21.7.1 Why Use Keywords? Motivation for keyword usage. 21.7.2 Coding With Keywords How to use keywords. 21.7.3 Keyword Read Syntax Read syntax for keywords. 21.7.4 Keyword Procedures Procedures for dealing with keywords. 21.7.5 Keyword Primitives The underlying primitive procedures.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Keywords are useful in contexts where a program or procedure wants to be able to accept a large number of optional arguments without making its interface unmanageable.
To illustrate this, consider a hypothetical make-window
procedure, which creates a new window on the screen for drawing into
using some graphical toolkit. There are many parameters that the caller
might like to specify, but which could also be sensibly defaulted, for
example:
If make-window did not use keywords, the caller would have to
pass in a value for each possible argument, remembering the correct
argument order and using a special value to indicate the default value
for that argument:
(make-window 'default ;; Color depth
'default ;; Background color
800 ;; Width
100 ;; Height
...) ;; More make-window arguments
|
With keywords, on the other hand, defaulted arguments are omitted, and non-default arguments are clearly tagged by the appropriate keyword. As a result, the invocation becomes much clearer:
(make-window #:width 800 #:height 100) |
On the other hand, for a simpler procedure with few arguments, the use
of keywords would be a hindrance rather than a help. The primitive
procedure cons, for example, would not be improved if it had to
be invoked as
(cons #:car x #:cdr y) |
So the decision whether to use keywords or not is purely pragmatic: use them if they will clarify the procedure invocation at point of call.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If a procedure wants to support keywords, it should take a rest argument and then use whatever means is convenient to extract keywords and their corresponding arguments from the contents of that rest argument.
The following example illustrates the principle: the code for
make-window uses a helper procedure called
get-keyword-value to extract individual keyword arguments from
the rest argument.
(define (get-keyword-value args keyword default)
(let ((kv (memq keyword args)))
(if (and kv (>= (length kv) 2))
(cadr kv)
default)))
(define (make-window . args)
(let ((depth (get-keyword-value args #:depth screen-depth))
(bg (get-keyword-value args #:bg "white"))
(width (get-keyword-value args #:width 800))
(height (get-keyword-value args #:height 100))
...)
...))
|
But you don't need to write get-keyword-value. The (ice-9
optargs) module provides a set of powerful macros that you can use to
implement keyword-supporting procedures like this:
(use-modules (ice-9 optargs))
(define (make-window . args)
(let-keywords args #f ((depth screen-depth)
(bg "white")
(width 800)
(height 100))
...))
|
Or, even more economically, like this:
(use-modules (ice-9 optargs))
(define* (make-window #:key (depth screen-depth)
(bg "white")
(width 800)
(height 100))
...)
|
For further details on let-keywords, define* and other
facilities provided by the (ice-9 optargs) module, see
23.2 Optional Arguments.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Guile, by default, only recognizes the keyword syntax specified by R5RS.
A token of the form #:NAME, where NAME has the same syntax
as a Scheme symbol (see section 21.6.6 Extended Read Syntax for Symbols), is the external
representation of the keyword named NAME. Keyword objects print
using this syntax as well, so values containing keyword objects can be
read back into Guile. When used in an expression, keywords are
self-quoting objects.
If the keyword read option is set to 'prefix, Guile also
recognizes the alternative read syntax :NAME. Otherwise, tokens
of the form :NAME are read as symbols, as required by R5RS.
To enable and disable the alternative non-R5RS keyword syntax, you use
the read-options procedure documented in 33.1 General option interface and 33.2 Reader options.
(read-set! keywords 'prefix) #:type => #:type :type => #:type (read-set! keywords #f) #:type => #:type :type -| ERROR: In expression :type: ERROR: Unbound variable: :type ABORT: (unbound-variable) |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The following procedures can be used for converting symbols to keywords and back.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Internally, a keyword is implemented as something like a tagged symbol,
where the tag identifies the keyword as being self-evaluating, and the
symbol, known as the keyword's dash symbol has the same name as
the keyword name but prefixed by a single dash. For example, the
keyword #:name has the corresponding dash symbol -name.
Most keyword objects are constructed automatically by the reader when it
reads a token beginning with #:. However, if you need to
construct a keyword object programmatically, you can do so by calling
make-keyword-from-dash-symbol with the corresponding dash symbol
(as the reader does). The dash symbol for a keyword object can be
retrieved using the keyword-dash-symbol procedure.
#t if the argument obj is a keyword, else
#f.
make-keyword-from-dash-symbol.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Procedures and macros are documented in their own chapter: see 23. Procedures and Macros.
Variable objects are documented as part of the description of Guile's module system: see 31.5 Variables.
Asyncs, dynamic roots and fluids are described in the chapter on scheduling: see 32. Threads, Mutexes, Asyncs and Dynamic Roots.
Hooks are documented in the chapter on general utility functions: see 24.6 Hooks.
Ports are described in the chapter on I/O: see 27. Input and Output.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |