This may be considered my response to Mitch Bradley's implied challenge to come up with a Forth idea that wasn't tried before (or at least that's how I took it... =) ). I have tried to implement in ANS Forth a condition and restart system in the vein of Common Lisp, Dylan, and Racket. If that seems too theoretical, the C2 wiki also points out that the DOS Abort/Retry/Ignore prompt is unimplementable on top of exception systems in every modern language except those three. The best high-level overview of the idea known to me is in Practical Common Lisp, and its originator Kent Pitman also wrote an in-depth discussion of the issues and design choices. This text will instead proceed from the lowest level up, describing parts of the code in corresponding sections.
There's nothing particularly non-obvious about this code; given an active imagination, one could even claim it does nothing but mix the well-known ideas of error callbacks and non-local exits in a particular way, any obscure Lisp influences notwithstanding. I still think it's remarkable how this particular combination of these ideas achieves a decoupling of error policy and mechanisms in a way that conventional exception systems can't.
These first two parts are essentially workarounds for inflexibilities in ANS Forth. These should be completely straightforward to implement with carnal knowledge of the system, but are rather awkward and inefficient in the portable version I give.
In ANS Forth, it's impossible to access earlier frames on the return
stack except by popping until the desired element. This part implements
what's essentially a parallel return stack, with >F
and F>
in place
of >R
and R>
, but also FP@ ( -- fp )
to save the current frame
pointer and @F ( fp n -- )
to fetch the n
th last cell pushed before
fp
was saved. There is no supposition here that the frame pointer
fp
is in fact an address usable with the usual memory access words---
it could just as well be an offset from FP0
, for example. The client
code is also not supposed to know that the frame stack is distinct from
the return stack, so this can be turned into actually usable code as a
part of a Forth system.
It puzzles me that THROW
and CATCH
, the only non-local control
transfers provided by ANS Forth, as well as both of the early proposals
(ALERT
/EXCEPT
/RESUME
of Guy and Rayburn, which uses a more
convenient syntax, and EXCEPTION
/TRAP
of Roye, which is
essentially THROW
/CATCH
by another name) insist on restoring the
data stack pointer. This is perhaps suitable if they are to be used as
an error-handling construct exactly as envisioned by the authors, but
completely inappropriate if an arbitrary amount of data needs to be
passed during the transfer, as here.
To add insult to injury, ANS-style stack-restoring THROW
and CATCH
are straightforwardly, if a tad inefficiently, implementable in terms of
stack-preserving ones using DEPTH
, while the opposite requires no
small amount of contortions. It is possible, though, using auxiliary
global storage, so that is what this part does. It also takes the
opportunity to save and restore the frame pointer, to make the "parallel
return stack" actually track the real one.
The data stack depth can be saved on the return stack and restored back
manually by MARK
and TRIM
, but only if it has become greater in the
meantime.
A final word of warning: my implementations of THROW ( -* )
and
CATCH ( ... xt -- ... f )
use ANS throw code 1 and pass through any
others. However, this is to be understood more as a debugging aid than
a serious attempt at interoperability: an ANS exception passing through
a condition handling construct will still break things. Actual
interoperability is possible, but would disrupt the logical layering of
code that this presentation uses.
This part and the next one implement the logic for doing something when an unusual situation arises and for unwinding the stack if necessary. The general approach parallels the original 32-bit Windows SEH, though I hope my implementation is not that convoluted.
When a program needs to indicate that something unusual happened, it
puts one or several items describing the situation on the data stack
and calls SIGNAL
. This does not in itself relinquish control;
instead, a callback routine is called normally. That routine can either
perform a non-local exit or return to the signaller (in case this was
only a warning and it got ignored, for example).
The callbacks are called responses and are installed using
RESPOND (... xt response-xt -- ... )
. A stack of responses is
maintained, and a response can give up and call the one up the stack
using PASS
. (In a real implementation PASS
would do a tail call.)
The stack of callbacks is maintained as linked frames on the frame
stack: the response xt is on top, and the pointer to the previous
response frame is below it. The pointer to the top frame is stored in
(would-be user) variable RESPONSE
, saved and restored by every
RESPOND
block. When a response is invoked, the pointer rf
to its
own response frame is put on the data stack on top of the information
provided by the signaller. The response can use it to retrieve
additional data from the frame or pass it to PASS ( rf -* )
.
(The slightly backwards frame structure, with the xt above the link, is
to make SIGNAL
completely oblivious of the links. I don't know if
it's worth it.)
Unlike other systems, this one does not implement the Common Lisp "condition firewall" or give any other special treatment to signalling inside a response. A response just regular code executing in the dynamic scope of the signaller.
A response does not have to perform a non-local exit, but if an error
was signalled, it will probably want to, whether to retry the operation
from some predetermined point, to return a substitute value from it, or
to abort it. The non-local exit provided by THROW
and would be
suitable here, were it not so indiscriminate: THROW
passes control to
the closest CATCH
, and that's it.
Thus, ESCAPE
and RESUME
implement on top of Forth THROW
and
CATCH
a capability that is closer to their Common Lisp and
MACLISP counterparts: an exit point established with
RESUME ( ... xt -- ... f )
is uniquely identified by its tag, which
is passed to xt
on top of the data stack, and ESCAPE ( tag -* )
requires the tag of the exit it should take.
The tag is implemented as the frame pointer at the moment RESUME
is
entered. The actual implementation of ESCAPE
is then simply THROW
,
while RESUME
, after performing a CATCH
, compares the tag on top of
the data stack with the current frame pointer and either exits to user
code or reTHROW
s. Unfortunately, this protocol has to more or less
monopolize THROW
and CATCH
. A notable exception is cleanup code
that must be executed every time control goes out of a protected block
of code: this cannot be implemented with ESCAPE
and RESUME
and must
instead be done using THROW
and CATCH
themselves:
protected-xt CATCH ( cleanup code ) IF THROW THEN
The cleanup code cannot rely on the state of the data stack, of course, but things can be passed to it on the return stack instead.
Obscure prior work note: The Common Lisp implementation of this is
more limited, as described in X3J13 issue EXIT-EXTENT. Let a
RESUME
called A be established first, then inside it a RESUME
called
B, then inside that a cleanup CATCH
as above called C, then let the
code inside that escape to A, so the cleanup code of C is now executing.
The issue is whether that cleanup code can escape to B, which is closer
in the call stack than the original A it is supposed to continue to.
Common Lisp allows systems to prohibit this (proposal MINIMAL in the
issue writeup), but this implementation allows it (proposal MEDIUM). I
think the latter is cleaner, because exit scope is dynamic scope, and
dynamic scope is dynamic scope is dynamic scope, but apparently the
Common Lisp implementers disagreed.
What is implemented up to this point is not a complete condition and restart system, but only a foundation for one: while there is a mechanism to receive data about an unusual situation and accept or decline to handle it, there is no agreed protocol for doing so. Similarly, while there is a way to escape to a chosen resumption point, possibly passing some data on the stack, there is no protocol for performing the choice. This is what remains to be done: in SysV ABI terms, the "personality".
Following common terminology, I call unusual situations that are handled
by the system conditions, the responses to them handlers, and the
ways of recovery from those situations restarts. Condition data is
put on the data stack before calling SIGNAL
, and each handler uses the
contents of its frame to decide whether to handle or PASS
. A restart
is simply a condition whose handler, by convention, immediately escapes
to the restart code, leaving the restart data intact on the data stack.
In both cases, more specific data should be placed below less specific,
so that the handler or restart code can still work even if it is only
prepared to handle a less specific kind of condition or restart.
What we need, then, is a way of organizing the various kinds of conditions and restarts in a hierarchy by specificity. After thinking about this for some time, I couldn't come up with anything better than just doing a simple single-inheritance, prototype-based object system. The way it is used is a bit unusual, though: the objects are statically allocated and represent not the conditions or restarts themselves, but the kinds of condition or restart, so I call it a class system instead of an object system.
To signal a particular condition, put the data corresponding to that
condition on the data stack, topped off by the condition class, then
execute SIGNAL
. The innermost handler for which the signalled
condition class EXTENDS
the handled one will handle it. The protocol
for restarts is similar and will be described later.
We see that very little is required of classes: we must be able to tell
whether one class EXTENDS
another one, and strictly speaking that's
it. I've also included rudimentary support for slots, that is, values
associated with a class that can be overriden by an extending class. A
slot that stores an xt can be used like a method in a traditional object
system. When used in a condition or restart class, methods will want to
inspect the data left below the class itself, but, contrary to the usual
Forth convention, they should not consume it: a method cannot know if
there are more items on the stack than it expects, and the user cannot
know how many items a method expects, unless they know in advance what
the exact class is, which would defeat the point of classes.
(Of course, there could be a standard DUPLICATE
method or a SIZE
slot. I don't think that would improve things compared to just not
consuming the data.)
The protocol for signalling conditions has already been described above:
put the condition data and condition class on the stack, then call
SIGNAL
. To handle subclasses of a class c
using handler-xt
during
execution of xt
, call HANDLE ( ... xt handler-xt c -- ... )
. The
handler receives the pointer to its own frame on top of the class; the
last cell put on the frame stack before HANDLE
is accessible using
4 @F
(and so on for earlier cells). Returning from the handler causes
execution to resume after SIGNAL
.
Slot 1 (or >UNHANDLED
) of every condition specifies the default
handler for that condition, that will be executed if no handler on the
stack accepts responsibility for it. The default handler, unlike the
usual ones, is invoked with the condition class on top of the stack.
Slot 2 (>DISPLAY
) should contain the xt that prints the condition data
on the console in a user-readable form (leaving the stack intact).
Additionally, the description of every condition should specify whether
a handler is allowed to return or whether it has to invoke a restart.
The base class for bug and error conditions is ?
. Any classes
extending it should have names ending with ?
and describing the object
or action that encountered a problem, not the problem itself, because
there is usually less arbitrary choice in naming the former: for
example, I/O?
, not I/O-ERROR?
; NAME?
, not NAME-NOT-FOUND?
.
For now, a handler that wants to escape to code up the call stack can be
established by first establishing a RESUME
, saving the tag to the
handler frame, and using it to ESCAPE
when needed. For handlers that
always want to escape before doing anything else, this operation is
packaged into a single word, RESTART ( ... xt c -- ... f )
. Restarts
are simply conditions that are by convention always handled using
RESTART
. The base class for restarts is RESTART?
. It extends ?
so as to also serve as a bug condition indicating that an unavailable
restart was signalled.
Prior work note: The idea to make a restart a kind of condition is
originally from Dylan; in Common Lisp, restarts are disjoint from
conditions. However, the functionality provided by RESTART
is
desirable in any system, but awkward to implement using restarts as the
sole stack unwinding mechanism, while the reverse is straigtforward.
This implementation uses the simpler Dylan approach.
The example demonstrates a DOS-style Abort/Retry/Ignore
prompt, with even more modularity than in DOS: the simulated SYSTEM
provides the prompt, but has no knowledge of the available recovery
options; the simulated SHELL
and I/O subsystem provide the options,
and the simulated APPLICATION
has no knowledge of either. A serious
attempt at an interactive debugger would probably look a lot like
SYSTEM
, except it could use system-specific knowledge to implement a
full debugging REPL. Alternatively, the list of restarts could be
incorporated into the a return stack trace.
Unlike the rest of the code, the example doesn't limit itself to the ANS CORE wordset; it uses quotations from Forth 200x and INCLUDED from FILE.
This is a proof of concept. The organization and naming of the words should generally be sound, but I can only design a limited part of a system without actually using it first. The following is a list of design issues I can see but have decided not to sidestep in this sketch:
-
The condition hierarchy needs to be worked out. There need to be base classes for warnings, errors, bugs, implementation limitations. The distinction between bugs and (runtime) errors is particularly important, and many languages with exception systems (C++, Java, Python...) get it wrong. It is not clear to me if errors should be a particular case of bugs, as in the example, or if the default handler for an error should signal a bug; similarly for restarts. It seems that a handler for bugs would usually be installed by an interactive toplevel of some sort, so it would make sense for it to also be an absolute barrier for any conditions or restarts. On the other hand, there can also be fatal conditions that are not in any sense bugs, like
SystemExit
in Python orThreadInterruptedException
in Java. -
Restarts being conditions still feels more like an accident than a coherent choice; I can't think of a case where a hierarchy of restarts could possibly be useful.
-
The set of methods on a condition class is a sketch. Unfortunately, no methods can be added to a class once it has been defined, so things like
DISPLAY
will have to be baked in. -
An actual implementation of frames using the return stack will need to carefully specify which words that create frames (like
HANDLE
) make them contiguous with whatever user code put there before. -
Even if frame pointers are not actual addresses, a specification might want to declare that they are disjoint from any addresses pointing to user-created areas. (ANS Forth doesn't do this even for xts, though.)
-
I don't know how to frame pointers should be passed around. Wrapping code like
(HANDLE)
might want to consume the relevant data on the frame and advance the frame pointer it passes to underlying code to a point lower on the stack. This probably composes better, but means that, for example, a handler will not be able to callPASS
. -
Restarts currently don't have user-accessible frames that could be used without passing control to the restart, like
LIST-RESTARTS
does in the example. This may or may not be a good idea. Restart code or methods will also want to access the condition it's handling; the calling convention requires some thought. -
It's not clear to me that the approach the example uses to enumerate available restarts is the best one. A separate stack containing only the restarts is also an option, but then a restart barrier, like a handler for
?
, will need to manipulate it explicitly to ensure introspection works correctly. -
Words are needed for inspecting and walking the response stack (at least equivalent to those used in the example). I'm not aware of any general approaches to looping over arbitrary structures in Forth (unlike its more Lispish cousins like Joy or Factor), so this will have to be postponed for now.
-
The layout of slots and frames is too fragile and requires far too much boilerplate in user code. It might make sense to use some sort of structure lexicon to simplify this, but there is a certain appeal to defining classes in one pass instead of compiling default data first and overwriting it later.
-
The syntax in general is downright miserable to use without quotations and still awkward with them, similar to ANS Forth exceptions. A syntax more in line with usual Forth control structures, like that of the
ALERT
/EXCEPT
/RESUME
proposal, might be better, even if it's unimplementable in terms of ANS words. On the other hand, the requirement to pack things into a separate xt serves nicely to enforce return stack separation that's in any case required by the words. (In this sense, it's the syntax ofDO
/LOOP
that is the mistake.)