Compellingly Beautiful Software
Products Programming Contact
"Life is like a box of chocolates: you never know
what you're going to get."
- Forrest Gump
Programmers hate Forrest Gump's proverbial box of chocolates: we need to know what we're going to get. One of the first steps toward this goal is to agree on what things mean when we say them to each other.
By junior high school, or sometimes earlier, students learn about the standard operator precedence hierarchy used for arithmetic and algebraic expressions. For example, we all know that
a+bcd
means "Raise b to the c power, multiply the result by d, and add a to that." Your teachers taught you this, and expected you to learn it. If you're a professional programmer, you probably learned this quite well.
One of the standard tricks used to help students is to provide them with mnemonic devices, such as
Please
Excuse
My Dear
Aunt Sally
indicating that things inside Parentheses are done before Exponentiation, which are done before Multiplication and Division, which are done before Addition and Subtraction. Of course, this left you wondering what behavior we are excusing Aunt Sally for, but eventually you encountered Functions, which perhaps gave you some idea, resulting in mnemonics you couldn't say in class.
But there are really two parts to this Aunt Sally business. There is
the Aunt Sally table shown above,
and then
there is the process for using it.
The Aunt Sally process is
usually described something like this:
We need a name for this process.
We might reasonably call it the multi-pass parsing algorithm.
But, for the remainder of this discussion, I'll simply refer to it as
the old way.
As we all know, to really foul things up requires a computer,
which will eventually lead us to
the new way.
People who design computer programming languages discover that there are lots of choices to be made. These languages are usually designed by people with strong mathematics backgrounds. It's not too surprising, then, to see something like Aunt Sally popping up in most of them.
One design constraint of most programming
languages is that one can write them as a linear sequence of
the characters found on standard typewriter keyboards. Another common
feature of these languages is multi-character variable names.
The first of these constraints makes exponents-as-superscripts infeasible,
and the second makes implicit multiplication a really bad idea.
But the resulting scripts resemble standard mathematics closely enough
that one can learn to read most of them fairly quickly. In most languages,
* and / represent multiplication and
division, and in some languages, ^
is used as an explicit exponentiation operator. We will use
these symbols for the remainder of this discussion. If we render the
Aunt Sally table in operator symbols, it looks like this:
()
^
* /
+ -
But what about the process of using the Aunt Sally table?
People who implement programming languages have many choices to make,
also, and one of the first discoveries one makes
is that the algorithmic processing of language is harder than it looks.
For example, implementing the old way,
the multi-pass algorithm described a few paragraphs back,
as a computer program in its own right is a really ugly task.
As far as I am aware, no actual programming language implementation uses the multi-pass parsing algorithm taught in math classes.
But there is another way to use the Aunt Sally table, and it is much easier to implement. This other process is the new way. It is a single-pass parsing algorithm. Rather than scanning the expression many times, the new way only has to read it once.You're probably wondering if there's any difference in the results
of these two algorithms. Before we answer this question, let's take a
brief look at the nature of mathematics and of programming, which are
certainly related, but are not really the same thing.
Most of mathematics is stateless. If, for example, you're given an algebraic statement to solve for x, that statement is a constraint, and your job is to find the numbers you can use in place of x that satisfy that constraint. Solving a math problem may involve many intermediate expressions, but these are incidental to the solution. You may use some definite process (that is, an algorithm) for solving the constraint, but nothing about the constraint is changing as you are solving it, even if you're changing its visible form according to algebraic rules. This is the reason for learning the rules of algebra: so that we can safely transform these constraints into successively simpler but nevertheless equivalent forms.
Much of programming, on the other hand, is about state, and expressing a precise order of changing states is what most programmers do for a living. For this reason, it's important to understand exactly what your programming language implementation is doing with the language you're writing.
Now let's answer your question about the equivalence of the old way and the new way. We actually need two answers, one for mathematics, and one for programming. Do these two evaluation algorithms yield the same results?
In mathematics: always.
In programming: most of the time.
Are you nervous yet?
Because most programming is not stateless, some expressions in computer programs yield different results when these two ways of using the Aunt Sally table are applied. Several times in my long career in software development, colleagues have come to me with program bugs that were incomprehensible to them. They'd been over their code time after time, trying to find whatever was causing results that simply defied their carefully-analyzed expectations. By the time they came to me, they usually had concluded they'd found a bug in the language implementation. But the problem turned out to be their use of the old way of using the Aunt Sally table, which they had learned in junior high school, and which had been restated for them by their college professors in programming classes.
This has to change. Computer programs are not supposed to be like Forrest Gump's box of chocolates. When we write a piece of code, we need to know what we're going to get.
So what is this new way? It's really very simple, although it's different enough from the old way that it may take you a little while to get used to it.
The old way is based on scanning the entire expression several times,
looking for operators with successively lower precedence in each pass.
The new way is focused on the question
"How does the next operator relate to the one I'm looking at right now?"
As an example, let's look at
{ 2 + 3 * 4 - 5 }
Note that I've introduced a couple of symbols you don't usually see
in expressions: the curly braces at the beginning and end. These are
here solely for the purpose of explaining the new way.
Once you understand it, you won't need them anymore. Meanwhile, we'll
add them to the Aunt Sally table:
^
* /
+ -
}
{
In the new algorithm, we're going to step from one operator to the next, starting with the "{", which you can think of as sort-of-an operator. Always keep in mind the notion of where you're standing now. In order to make a step, we have to move up in the precedence table. Let's evaluate {2+3*4-5}.
In the example we've just completed, the order of operations is the same whether we use the new way or the old way. Here's an example where it isn't:
{
2 * 3 + 4 * 5 + 6 * 7
}
A B C D E
These
are labels, so we can identify the
operators and talk about them.
Using the old way, the order of operations is ACEBD. Using the new way, the order of operations is ACBED, but the final result is the same. In stateless mathematics, the final result is always the same. But when you encounter state-dependent expressions in computer programs, the results can be different.
You may have noticed that parentheses didn't appear in the augmented table shown above. Parentheses can be included in such a table, but it becomes a somewhat more complicated table. Once you grasp the idea of always stepping up, it's easier to handle parentheses using these explicit rules:
What does it mean to "evaluate" a right parenthesis? It means removing itself and its corresponding left parenthesis from the expression, leaving the intermediate operand they enclosed in place.
What about functions and unary sign operators? We'll add them to the
table in just the way you would expect:
functions
unary sign operators
^
* /
+ -
Some programming languages have precedence tables that are much deeper
than this, for example, C, C++, and Java. The Smalltalk programming
language has a very shallow precedence hierarchy for its very different
syntax. But the new way presented here works for all of these
languages and a great many more.
There is one remaining question that is unresolved. The meaning of a^b^c has never been entirely clear, depending on how ^ associates. If it associates to the left, then a^b^c means (a^b)^c, which is equivalent to a^(b*c). This is a less strenuous operation than the right association a^(b^c). Some people claim that, if one means the simpler thing, one should write the simpler thing, but there continues to be no general agreement. Because of this, if the language you are using has an operator for exponentiation, you need to ask what the associativity of that operator is for your implementation. If it is right-associative, then stepping from ^ to ^ is always a step up in the new way. For the remainder of this discussion, however, you may assume that exponentiation is nothing special and associates to the left.
Here is a Java applet that demonstrates the new way. Enter an expression in the text field at the top. As you enter it, it will be incrementally evaluated by using the Aunt Sally table the new way. You will also see a record of the expression you have entered, and a display of the parser's postfix output, which shows you the order in which things were actually evaluated. Postfix is a notation similar to the keystroke sequence of a Hewlett-Packard RPN calculator. The usual unary and binary arithmetic operators and parentheses are implemented, but functions, which require more advanced lexical analysis, are not. Integers and decimal fractions are supported, but numeric entry in scientific notation, for example, 6.02E23, is not.
This implementation is intended to be error-proof. If you do something that is syntactically illegal, you'll hear a beep and see an error message.
To end an expression, use the return key, not the = key.
As a convenience,
if you have one or more open subexpressions and are otherwise at the end
of your expression, such as
2^(3*(4+5,
you can automatically close all subexpressions and end the expression
simply by hitting return. The correct number of right parentheses
will be filled in for you.
An interesting feature of this implementation is the invisible space operator. Entering a space character evaluates the rightmost pending operator. It lets you see intermediate operands in an expression such as 2+3*4^5 that would otherwise appear to evaluate all at once. It does not change the meaning of an expression, nor will it let you change the meaning of an expression. For example, if you enter 2+3*4 followed by two space characters, you will see 14 in the display. At this point, the next operator must have a precedence that is no higher than +, which was the last operator evaluated. Allowing a higher-precedence operator at this point would change the meaning of the expression, so you'll see an error message if you try to do this. Under no circumstances will the space operator penetrate an unmatched left parenthesis.
Here's another interesting feature: try hitting the backspace key. This
implementation is completely reversible,
all the way back to the beginning of an expression,
including the effects of the space operator.
The old way of evaluating expressions works well and is probably preferable for stateless mathematics, but it is inadequate for understanding real-world computer programs. If you are a professional programmer, now is a good time to learn the new way. If you are a teacher of programming languages, now is a good time to update your curriculum.
When using the applet, if you see an error message beginning with the word BUG, the error was mine, not yours. Please notify me if you see such a message, or if you see other behavior that you're sure is wrong. Include the expression you were trying to evaluate, and any other pertinent information, such as whether you were going backwards using the backspace key at the time. I would prefer to give you an email link in this page, but spammers love those, and I already receive over 10000 spam messages every day, so you'll have to hand-translate this email address into the header of your message to me: abell at brising dotcom. Use AUNT SALLY as your subject line.
This applet uses IEEE-754 binary arithmetic, which does not always conform to the expectations of classical mathematicians, but painful experience has shown this behavior to be a computational necessity. The kind of arithmetic used is unrelated to the behavior of the parsing process.
This applet was developed and compiled with JDK 1.4.2, but it seems to work with newer JREs, and maybe even backward into 1.1.?. The applet and this page are Copyright (C) 2003 brising.com, All Rights Reserved. You are free to use it in the context of this website.