Part XIII: Procedures - 27 August 1989
Introduction
At last we get to the good part!
At this point we’ve studied almost all the basic features of compilers and parsing. We have learned how to translate arithmetic expressions, Boolean expressions, control constructs, data declarations, and I/O statements. We have defined a language, TINY 1.3, that embodies all these features, and we have written a rudimentary compiler that can translate them. By adding some file I/O we could indeed have a working compiler that could produce executable object files from programs written in TINY. With such a compiler, we could write simple programs that could read integer data, perform calculations with it, and output the results.
That’s nice, but what we have is still only a toy language. We can’t read or write even a single character of text, and we still don’t have procedures.
It’s the features to be discussed in the next couple of installments that separate the men from the toys, so to speak. “Real” languages have more than one data type, and they support procedure calls. More than any others, it’s these two features that give a language much of its character and personality. Once we have provided for them, our languages, TINY and its successors, will cease to become toys and will take on the character of real languages, suitable for serious programming jobs.
For several installments now, I’ve been promising you sessions on these two important subjects. Each time, other issues came up that required me to digress and deal with them. Finally, we’ve been able to put all those issues to rest and can get on with the mainstream of things. In this installment, I’ll cover procedures. Next time, we’ll talk about the basic data types.
One Last Digression
This has been an extraordinarily difficult installment for me to write. The reason has nothing to do with the subject itself … I’ve known what I wanted to say for some time, and in fact I presented most of this at Software Development ’89, back in February. It has more to do with the approach. Let me explain.
When I first began this series, I told you that we would use several “tricks” to make things easy, and to let us learn the concepts without getting too bogged down in the details. Among these tricks was the idea of looking at individual pieces of a compiler at a time, i.e. performing experiments using the Cradle as a base. When we studied expressions, for example, we dealt with only that part of compiler theory. When we studied control structures, we wrote a different program, still based on the Cradle, to do that part. We only incorporated these concepts into a complete language fairly recently. These techniques have served us very well indeed, and led us to the development of a compiler for TINY version 1.3.
When I first began this session, I tried to build upon what we had already done, and just add the new features to the existing compiler. That turned out to be a little awkward and tricky … much too much to suit me.
I finally figured out why. In this series of experiments, I had abandoned the very useful techniques that had allowed us to get here, and without meaning to I had switched over into a new method of working, that involved incremental changes to the full TINY compiler.
You need to understand that what we are doing here is a little unique. There have been a number of articles, such as the Small C articles by Cain and Hendrix, that presented finished compilers for one language or another. This is different. In this series of tutorials, you are watching me design and implement both a language and a compiler, in real time.
In the experiments that I’ve been doing in preparation for this article, I was trying to inject the changes into the TINY compiler in such a way that, at every step, we still had a real, working compiler. In other words, I was attempting an incremental enhancement of the language and its compiler, while at the same time explaining to you what I was doing.
That’s a tough act to pull off! I finally realized that it was dumb to try. Having gotten this far using the idea of small experiments based on single-character tokens and simple, special-purpose programs, I had abandoned them in favor of working with the full compiler. It wasn’t working.
So we’re going to go back to our roots, so to speak. In this installment and the next, I’ll be using single-character tokens again as we study the concepts of procedures, unfettered by the other baggage that we have accumulated in the previous sessions. As a matter of fact, I won’t even attempt, at the end of this session, to merge the constructs into the TINY compiler. We’ll save that for later.
After all this time, you don’t need more buildup than that, so let’s waste no more time and dive right in.
The Basics
All modern CPU’s provide direct support for procedure calls, and
the 68000 is no exception. For the 68000, the call is a BSR
(PC-relative version) or JSR
, and the return is RTS
. All we have
to do is to arrange for the compiler to issue these commands at
the proper place.
Actually, there are really three things we have to address. One of them is the call/return mechanism. The second is the mechanism for defining the procedure in the first place. And, finally, there is the issue of passing parameters to the called procedure. None of these things are really very difficult, and we can of course borrow heavily on what people have done in other languages … there’s no need to reinvent the wheel here. Of the three issues, that of parameter passing will occupy most of our attention, simply because there are so many options available.
A Basis for Experiments
As always, we will need some software to serve as a basis for what we are doing. We don’t need the full TINY compiler, but we do need enough of a program so that some of the other constructs are present. Specifically, we need at least to be able to handle statements of some sort, and data declarations.
The program shown below is that basis. It’s a vestigial form of
TINY, with single-character tokens. It has data declarations,
but only in their simplest form … no lists or initializers. It
has assignment statements, but only of the kind
<ident> = <ident>
.
In other words, the only legal expression is a single variable name. There are no control constructs … the only legal statement is the assignment.
Most of the program is just the standard Cradle routines. I’ve shown the whole thing here, just to make sure we’re all starting from the same point:
{--------------------------------------------------------------}
program Calls;
{--------------------------------------------------------------}
{ Constant Declarations }
const TAB = ^I;
CR = ^M;
LF = ^J;
{--------------------------------------------------------------}
{ Variable Declarations }
var Look: char; { Lookahead Character }
var ST: Array['A'..'Z'] of char;
{--------------------------------------------------------------}
{ Read New Character From Input Stream }
procedure GetChar;
begin
Read(Look);
end;
{--------------------------------------------------------------}
{ Report an Error }
procedure Error(s: string);
begin
WriteLn;
WriteLn(^G, 'Error: ', s, '.');
end;
{--------------------------------------------------------------}
{ Report Error and Halt }
procedure Abort(s: string);
begin
Error(s);
Halt;
end;
{--------------------------------------------------------------}
{ Report What Was Expected }
procedure Expected(s: string);
begin
Abort(s + ' Expected');
end;
{--------------------------------------------------------------}
{ Report an Undefined Identifier }
procedure Undefined(n: string);
begin
Abort('Undefined Identifier ' + n);
end;
{--------------------------------------------------------------}
{ Report an Duplicate Identifier }
procedure Duplicate(n: string);
begin
Abort('Duplicate Identifier ' + n);
end;
{--------------------------------------------------------------}
{ Get Type of Symbol }
function TypeOf(n: char): char;
begin
TypeOf := ST[n];
end;
{--------------------------------------------------------------}
{ Look for Symbol in Table }
function InTable(n: char): Boolean;
begin
InTable := ST[n] <> ' ';
end;
{--------------------------------------------------------------}
{ Add a New Symbol to Table }
procedure AddEntry(Name, T: char);
begin
if Intable(Name) then Duplicate(Name);
ST[Name] := T;
end;
{--------------------------------------------------------------}
{ Check an Entry to Make Sure It's a Variable }
procedure CheckVar(Name: char);
begin
if not InTable(Name) then Undefined(Name);
if TypeOf(Name) <> 'v' then Abort(Name + ' is not a
variable');
end;
{--------------------------------------------------------------}
{ Recognize an Alpha Character }
function IsAlpha(c: char): boolean;
begin
IsAlpha := upcase(c) in ['A'..'Z'];
end;
{--------------------------------------------------------------}
{ Recognize a Decimal Digit }
function IsDigit(c: char): boolean;
begin
IsDigit := c in ['0'..'9'];
end;
{--------------------------------------------------------------}
{ Recognize an AlphaNumeric Character }
function IsAlNum(c: char): boolean;
begin
IsAlNum := IsAlpha(c) or IsDigit(c);
end;
{--------------------------------------------------------------}
{ Recognize an Addop }
function IsAddop(c: char): boolean;
begin
IsAddop := c in ['+', '-'];
end;
{--------------------------------------------------------------}
{ Recognize a Mulop }
function IsMulop(c: char): boolean;
begin
IsMulop := c in ['*', '/'];
end;
{--------------------------------------------------------------}
{ Recognize a Boolean Orop }
function IsOrop(c: char): boolean;
begin
IsOrop := c in ['|', '~'];
end;
{--------------------------------------------------------------}
{ Recognize a Relop }
function IsRelop(c: char): boolean;
begin
IsRelop := c in ['=', '#', '<', '>'];
end;
{--------------------------------------------------------------}
{ Recognize White Space }
function IsWhite(c: char): boolean;
begin
IsWhite := c in [' ', TAB];
end;
{--------------------------------------------------------------}
{ Skip Over Leading White Space }
procedure SkipWhite;
begin
while IsWhite(Look) do
GetChar;
end;
{--------------------------------------------------------------}
{ Skip Over an End-of-Line }
procedure Fin;
begin
if Look = CR then begin
GetChar;
if Look = LF then
GetChar;
end;
end;
{--------------------------------------------------------------}
{ Match a Specific Input Character }
procedure Match(x: char);
begin
if Look = x then GetChar
else Expected('''' + x + '''');
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get an Identifier }
function GetName: char;
begin
if not IsAlpha(Look) then Expected('Name');
GetName := UpCase(Look);
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Get a Number }
function GetNum: char;
begin
if not IsDigit(Look) then Expected('Integer');
GetNum := Look;
GetChar;
SkipWhite;
end;
{--------------------------------------------------------------}
{ Output a String with Tab }
procedure Emit(s: string);
begin
Write(TAB, s);
end;
{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }
procedure EmitLn(s: string);
begin
Emit(s);
WriteLn;
end;
{--------------------------------------------------------------}
{ Post a Label To Output }
procedure PostLabel(L: string);
begin
WriteLn(L, ':');
end;
{--------------------------------------------------------------}
{ Load a Variable to the Primary Register }
procedure LoadVar(Name: char);
begin
CheckVar(Name);
EmitLn('MOVE ' + Name + '(PC),D0');
end;
{--------------------------------------------------------------}
{ Store the Primary Register }
procedure StoreVar(Name: char);
begin
CheckVar(Name);
EmitLn('LEA ' + Name + '(PC),A0');
EmitLn('MOVE D0,(A0)')
end;
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
GetChar;
SkipWhite;
for i := 'A' to 'Z' do
ST[i] := ' ';
end;
{--------------------------------------------------------------}
{ Parse and Translate an Expression }
{ Vestigial Version }
procedure Expression;
begin
LoadVar(GetName);
end;
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment;
var Name: char;
begin
Name := GetName;
Match('=');
Expression;
StoreVar(Name);
end;
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure DoBlock;
begin
while not(Look in ['e']) do begin
Assignment;
Fin;
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Begin-Block }
procedure BeginBlock;
begin
Match('b');
Fin;
DoBlock;
Match('e');
Fin;
end;
{--------------------------------------------------------------}
{ Allocate Storage for a Variable }
procedure Alloc(N: char);
begin
if InTable(N) then Duplicate(N);
ST[N] := 'v';
WriteLn(N, ':', TAB, 'DC 0');
end;
{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }
procedure Decl;
var Name: char;
begin
Match('v');
Alloc(GetName);
end;
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
while Look <> 'b' do begin
case Look of
'v': Decl;
else Abort('Unrecognized Keyword ' + Look);
end;
Fin;
end;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
BeginBlock;
end.
{--------------------------------------------------------------}
Note that we DO have a symbol table, and there is logic to check a variable name to make sure it’s a legal one. It’s also worth noting that I have included the code you’ve seen before to provide for white space and newlines. Finally, note that the main program is delimited, as usual, by BEGIN-END brackets.
Once you’ve copied the program to Turbo, the first step is to compile it and make sure it works. Give it a few declarations, and then a begin-block. Try something like:
va (for VAR A)
vb (for VAR B)
vc (for VAR C)
b (for BEGIN)
a=b
b=c
e. (for END.)
As usual, you should also make some deliberate errors, and verify that the program catches them correctly.
Declaring a Procedure
If you’re satisfied that our little program works, then it’s time to deal with the procedures. Since we haven’t talked about parameters yet, we’ll begin by considering only procedures that have no parameter lists.
As a start, let’s consider a simple program with a procedure, and think about the code we’d like to see generated for it:
PROGRAM FOO;
.
.
PROCEDURE BAR; BAR:
BEGIN .
. .
. .
END; RTS
BEGIN { MAIN PROGRAM } MAIN:
. .
. .
FOO; BSR BAR
. .
. .
END. END MAIN
Here I’ve shown the high-order language constructs on the left, and the desired assembler code on the right. The first thing to notice is that we certainly don’t have much code to generate here! For the great bulk of both the procedure and the main program, our existing constructs take care of the code to be generated.
The key to dealing with the body of the procedure is to recognize that although a procedure may be quite long, declaring it is really no different than declaring a variable. It’s just one more kind of declaration. We can write the BNF:
<declaration> ::= <data decl> | <procedure>
This means that it should be easy to modify TopDecl
to deal with
procedures. What about the syntax of a procedure? Well, here’s
a suggested syntax, which is essentially that of Pascal:
<procedure> ::= PROCEDURE <ident> <begin-block>
There is practically no code generation required, other than that
generated within the begin-block. We need only emit a label at
the beginning of the procedure, and an RTS
at the end.
Here’s the required code:
{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
Fin;
if InTable(N) then Duplicate(N);
ST[N] := 'p';
PostLabel(N);
BeginBlock;
Return;
end;
{--------------------------------------------------------------}
Note that I’ve added a new code generation routine, Return, which
merely emits an RTS
instruction. The creation of that routine is
“left as an exercise for the student.”
To finish this version, add the following line within the Case statement in DoBlock:
'p': DoProc;
I should mention that this structure for declarations, and the BNF that drives it, differs from standard Pascal. In the Jensen & Wirth definition of Pascal, variable declarations, in fact all kinds of declarations, must appear in a specific sequence, i.e. labels, constants, types, variables, procedures, and main program. To follow such a scheme, we should separate the two declarations, and have code in the main program something like
DoVars;
DoProcs;
DoMain;
However, most implementations of Pascal, including Turbo, don’t require that order and let you freely mix up the various declarations, as long as you still don’t try to refer to something before it’s declared. Although it may be more aesthetically pleasing to declare all the global variables at the top of the program, it certainly doesn’t do any harm to allow them to be sprinkled around. In fact, it may do some good, in the sense that it gives you the opportunity to do a little rudimentary information hiding. Variables that should be accessed only by the main program, for example, can be declared just before it and will thus be inaccessible by the procedures.
OK, try this new version out. Note that we can declare as many
procedures as we choose (as long as we don’t run out of
single-character names!), and the labels and RTS
s all come out in the
right places.
It’s worth noting here that I do not allow for nested
procedures. In TINY, all procedures must be declared at the
global level, the same as in C. There has been quite a
discussion about this point in the Computer Language Forum of
CompuServe. It turns out that there is a significant penalty in
complexity that must be paid for the luxury of nested procedures.
What’s more, this penalty gets paid at run time, because extra
code must be added and executed every time a procedure is called.
I also happen to believe that nesting is not a good idea, simply
on the grounds that I have seen too many abuses of the feature.
Before going on to the next step, it’s also worth noting that the
“main program” as it stands is incomplete, since it doesn’t have
the label and END
statement. Let’s fix that little oversight:
{--------------------------------------------------------------}
{ Parse and Translate a Main Program }
procedure DoMain;
begin
Match('b');
Fin;
Prolog;
DoBlock;
Epilog;
end;
{--------------------------------------------------------------}
.
.
.
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
DoMain;
end.
{--------------------------------------------------------------}
Note that DoProc
and DoMain
are not quite symmetrical. DoProc
uses a call to BeginBlock
, whereas DoMain
cannot. That’s because
a procedure is signaled by the keyword PROCEDURE
(abbreviated by
a p
here), while the main program gets no keyword other than
the BEGIN
itself.
And that brings up an interesting question: why?
If we look at the structure of C programs, we find that all
functions are treated just alike, except that the main program
happens to be identified by its name, main
. Since C functions
can appear in any order, the main program can also be anywhere in
the compilation unit.
In Pascal, on the other hand, all variables and procedures must
be declared before they’re used, which means that there is no
point putting anything after the main program … it could never
be accessed. The “main program” is not identified at all, other
than being that part of the code that comes after the global
BEGIN
. In other words, if it ain’t anything else, it must be the
main program.
This causes no small amount of confusion for beginning programmers, and for big Pascal programs sometimes it’s difficult to find the beginning of the main program at all. This leads to conventions such as identifying it in comments:
BEGIN { of MAIN }
This has always seemed to me to be a bit of a kludge. The question comes up: Why should the main program be treated so much differently than a procedure? In fact, now that we’ve recognized that procedure declarations are just that … part of the global declarations … isn’t the main program just one more declaration, also?
The answer is yes, and by treating it that way, we can simplify
the code and make it considerably more orthogonal. I propose
that we use an explicit keyword, PROGRAM
, to identify the main
program (Note that this means that we can’t start the file with
it, as in Pascal). In this case, our BNF becomes:
<declaration> ::= <data decl> | <procedure> | <main program>
<procedure> ::= PROCEDURE <ident> <begin-block>
<main program> ::= PROGRAM <ident> <begin-block>
The code also looks much better, at least in the sense that
DoMain
and DoProc
look more alike:
{--------------------------------------------------------------}
{ Parse and Translate a Main Program }
procedure DoMain;
var N: char;
begin
Match('P');
N := GetName;
Fin;
if InTable(N) then Duplicate(N);
Prolog;
BeginBlock;
end;
{--------------------------------------------------------------}
.
.
.
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
while Look <> '.' do begin
case Look of
'v': Decl;
'p': DoProc;
'P': DoMain;
else Abort('Unrecognized Keyword ' + Look);
end;
Fin;
end;
end;
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
TopDecls;
Epilog;
end.
{--------------------------------------------------------------}
Since the declaration of the main program is now within the loop
of TopDecl
, that does present some difficulties. How do we
ensure that it’s the last thing in the file? And how do we ever
exit from the loop? My answer for the second question, as you
can see, was to bring back our old friend the period. Once the
parser sees that, we’re done.
To answer the first question: it depends on how far we’re
willing to go to protect the programmer from dumb mistakes. In
the code that I’ve shown, there’s nothing to keep the programmer
from adding code after the main program … even another main
program. The code will just not be accessible. However, we
could access it via a FORWARD
statement, which we’ll be providing
later. As a matter of fact, many assembler language programmers
like to use the area just after the program to declare large,
uninitialized data blocks, so there may indeed be some value in
not requiring the main program to be last. We’ll leave it as it
is.
If we decide that we should give the programmer a little more help than that, it’s pretty easy to add some logic to kick us out of the loop once the main program has been processed. Or we could at least flag an error if someone tries to include two mains.
Calling the Procedure
If you’re satisfied that things are working, let’s address the second half of the equation … the call.
Consider the BNF for a procedure call:
<proc_call> ::= <identifier>
for an assignment statement, on the other hand, the BNF is:
<assignment> ::= <identifier> '=' <expression>
At this point we seem to have a problem. The two BNF statements
both begin on the right-hand side with the token <identifier>
.
How are we supposed to know, when we see the identifier, whether
we have a procedure call or an assignment statement? This looks
like a case where our parser ceases being predictive, and indeed
that’s exactly the case. However, it turns out to be an easy
problem to fix, since all we have to do is to look at the type of
the identifier, as recorded in the symbol table. As we’ve
discovered before, a minor local violation of the predictive
parsing rule can be easily handled as a special case.
Here’s how to do it:
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment(Name: char);
begin
Match('=');
Expression;
StoreVar(Name);
end;
{--------------------------------------------------------------}
{ Decide if a Statement is an Assignment or Procedure Call }
procedure AssignOrProc;
var Name: char;
begin
Name := GetName;
case TypeOf(Name) of
' ': Undefined(Name);
'v': Assignment(Name);
'p': CallProc(Name);
else Abort('Identifier ' + Name +
' Cannot Be Used Here');
end;
end;
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure DoBlock;
begin
while not(Look in ['e']) do begin
AssignOrProc;
Fin;
end;
end;
{--------------------------------------------------------------}
As you can see, procedure Block
now calls AssignOrProc
instead of
Assignment
. The function of this new procedure is to simply read
the identifier, determine its type, and then call whichever
procedure is appropriate for that type. Since the name has
already been read, we must pass it to the two procedures, and
modify Assignment
to match. Procedure CallProc
is a simple code
generation routine:
{--------------------------------------------------------------}
{ Call a Procedure }
procedure CallProc(N: char);
begin
EmitLn('BSR ' + N);
end;
{--------------------------------------------------------------}
Well, at this point we have a compiler that can deal with procedures. It’s worth noting that procedures can call procedures to any depth. So even though we don’t allow nested declarations, there is certainly nothing to keep us from nesting calls, just as we would expect to do in any language. We’re getting there, and it wasn’t too hard, was it?
Of course, so far we can only deal with procedures that have no
parameters. The procedures can only operate on the global
variables by their global names. So at this point we have the
equivalent of BASIC’s GOSUB
construct. Not too bad … after all
lots of serious programs were written using GOSUB
s, but we can do
better, and we will. That’s the next step.
Passing Parameters
Again, we all know the basic idea of passed parameters, but let’s review them just to be safe.
In general the procedure is given a parameter list, for example
PROCEDURE FOO(X, Y, Z)
.
In the declaration of a procedure, the parameters are called
formal parameters, and may be referred to in the body of the
procedure by those names. The names used for the formal
parameters are really arbitrary. Only the position really
counts. In the example above, the name X
simply means “the
first parameter” wherever it is used.
When a procedure is called, the “actual parameters” passed to it are associated with the formal parameters, on a one-for-one basis.
The BNF for the syntax looks something like this:
<procedure> ::= PROCEDURE <ident>
'(' <param-list> ')' <begin-block>
<param_list> ::= <parameter> ( ',' <parameter> )* | null
Similarly, the procedure call looks like:
<proc call> ::= <ident> '(' <param-list> ')'
Note that there is already an implicit decision built into this
syntax. Some languages, such as Pascal and Ada, permit parameter
lists to be optional. If there are no parameters, you simply
leave off the parens completely. Other languages, like C and
Modula 2, require the parens even if the list is empty. Clearly,
the example we just finished corresponds to the former point of
view. But to tell the truth I prefer the latter. For procedures
alone, the decision would seem to favor the “listless” approach.
The statement Initialize;
,
standing alone, can only mean a procedure call. In the parsers
we’ve been writing, we’ve made heavy use of parameterless
procedures, and it would seem a shame to have to write an empty
pair of parens for each case.
But later on we’re going to be using functions, too. And since functions can appear in the same places as simple scalar identifiers, you can’t tell the difference between the two. You have to go back to the declarations to find out. Some folks consider this to be an advantage. Their argument is that an identifier gets replaced by a value, and what do you care whether it’s done by substitution or by a function? But we sometimes do care, because the function may be quite time-consuming. If, by writing a simple identifier into a given expression, we can incur a heavy run-time penalty, it seems to me we ought to be made aware of it.
Anyway, Niklaus Wirth designed both Pascal and Modula 2. I’ll give him the benefit of the doubt and assume that he had a good reason for changing the rules the second time around!
Needless to say, it’s an easy thing to accommodate either point of view as we design a language, so this one is strictly a matter of personal preference. Do it whichever way you like best.
Before we go any further, let’s alter the translator to handle a
(possibly empty) parameter list. For now we won’t generate any
extra code … just parse the syntax. The code for processing
the declaration has very much the same form we’ve seen before
when dealing with VAR
-lists:
{--------------------------------------------------------------}
{ Process the Formal Parameter List of a Procedure }
procedure FormalList;
begin
Match('(');
if Look <> ')' then begin
FormalParam;
while Look = ',' do begin
Match(',');
FormalParam;
end;
end;
Match(')');
end;
{--------------------------------------------------------------}
Procedure DoProc
needs to have a line added to call FormalList
:
{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
FormalList;
Fin;
if InTable(N) then Duplicate(N);
ST[N] := 'p';
PostLabel(N);
BeginBlock;
Return;
end;
{--------------------------------------------------------------}
For now, the code for FormalParam
is just a dummy one that simply
skips the parameter name:
{--------------------------------------------------------------}
{ Process a Formal Parameter }
procedure FormalParam;
var Name: char;
begin
Name := GetName;
end;
{--------------------------------------------------------------}
For the actual procedure call, there must be similar code to process the actual parameter list:
{--------------------------------------------------------------}
{ Process an Actual Parameter }
procedure Param;
var Name: char;
begin
Name := GetName;
end;
{--------------------------------------------------------------}
{ Process the Parameter List for a Procedure Call }
procedure ParamList;
begin
Match('(');
if Look <> ')' then begin
Param;
while Look = ',' do begin
Match(',');
Param;
end;
end;
Match(')');
end;
{--------------------------------------------------------------}
{ Process a Procedure Call }
procedure CallProc(Name: char);
begin
ParamList;
Call(Name);
end;
{--------------------------------------------------------------}
Note here that CallProc
is no longer just a simple code
generation routine. It has some structure to it. To handle
this, I’ve renamed the code generation routine to just Call, and
called it from within CallProc
.
OK, if you’ll add all this code to your translator and try it out, you’ll find that you can indeed parse the syntax properly. I’ll note in passing that there is no checking to make sure that the number (and, later, types) of formal and actual parameters match up. In a production compiler, we must of course do this. We’ll ignore the issue now if for no other reason than that the structure of our symbol table doesn’t currently give us a place to store the necessary information. Later on, we’ll have a place for that data and we can deal with the issue then.
The Semantics of Parameters
So far we’ve dealt with the syntax of parameter passing, and we’ve got the parsing mechanisms in place to handle it. Next, we have to look at the semantics, i.e., the actions to be taken when we encounter parameters. This brings us square up against the issue of the different ways parameters can be passed.
There is more than one way to pass a parameter, and the way we do it can have a profound effect on the character of the language. So this is another of those areas where I can’t just give you my solution. Rather, it’s important that we spend some time looking at the alternatives so that you can go another route if you choose to.
There are two main ways parameters are passed:
- By value
- By reference (address)
The differences are best seen in the light of a little history.
The old FORTRAN compilers passed all parameters by reference. In other words, what was actually passed was the address of the parameter. This meant that the called subroutine was free to either read or write that parameter, as often as it chose to, just as though it were a global variable. This was actually quite an efficient way to do things, and it was pretty simple since the same mechanism was used in all cases, with one exception that I’ll get to shortly.
There were problems, though. Many people felt that this method created entirely too much coupling between the called subroutine and its caller. In effect, it gave the subroutine complete access to all variables that appeared in the parameter list.
Many times, we didn’t want to actually change a parameter, but
only use it as an input. For example, we might pass an element
count to a subroutine, and wish we could then use that count
within a DO
-loop. To avoid changing the value in the calling
program, we had to make a local copy of the input parameter, and
operate only on the copy. Some FORTRAN programmers, in fact,
made it a practice to copy ALL parameters except those that were
to be used as return values. Needless to say, all this copying
defeated a good bit of the efficiency associated with the
approach.
There was, however, an even more insidious problem, which was not really just the fault of the “pass by reference” convention, but a bad convergence of several implementation decisions.
Suppose we have a subroutine SUBROUTINE FOO(X, Y, N)
,
where N is some kind of input count or flag. Many times, we’d
like to be able to pass a literal or even an expression in place
of a variable, such as CALL FOO(A, B, J + 1)
.
Here the third parameter is not a variable, and so it has no
address. The earliest FORTRAN compilers did not allow such
things, so we had to resort to subterfuges like:
K = J + 1
CALL FOO(A, B, K)
Here again, there was copying required, and the burden was on the programmer to do it. Not good.
Later FORTRAN implementations got rid of this by allowing expressions as parameters. What they did was to assign a compiler-generated variable, store the value of the expression in the variable, and then pass the address of the expression.
So far, so good. Even if the subroutine mistakenly altered the anonymous variable, who was to know or care? On the next call, it would be recalculated anyway.
The problem arose when someone decided to make things more
efficient. They reasoned, rightly enough, that the most common
kind of “expression” was a single integer value, as in
CALL FOO(A, B, 4)
.
It seemed inefficient to go to the trouble of “computing” such an integer and storing it in a temporary variable, just to pass it through the calling list. Since we had to pass the address of the thing anyway, it seemed to make lots of sense to just pass the address of the literal integer, 4 in the example above.
To make matters more interesting, most compilers, then and now, identify all literals and store them separately in a “literal pool,” so that we only have to store one value for each unique literal. That combination of design decisions: passing expressions, optimization for literals as a special case, and use of a literal pool, is what led to disaster.
To see how it works, imagine that we call subroutine FOO
as in
the example above, passing it a literal 4. Actually, what gets
passed is the address of the literal 4, which is stored in the
literal pool. This address corresponds to the formal parameter,
K, in the subroutine itself.
Now suppose that, unbeknownst to the programmer, subroutine FOO
actually modifies K
to be, say, -7
. Suddenly, that literal 4 in
the literal pool gets changed, to a -7
. From then on, every
expression that uses a 4 and every subroutine that passes a 4
will be using the value of -7
instead! Needless to say, this can
lead to some bizarre and difficult-to-find behavior. The whole
thing gave the concept of pass-by-reference a bad name, although
as we have seen, it was really a combination of design decisions
that led to the problem.
In spite of the problem, the FORTRAN approach had its good points. Chief among them is the fact that we don’t have to support multiple mechanisms. The same scheme, passing the address of the argument, works for every case, including arrays. So the size of the compiler can be reduced.
Partly because of the FORTRAN gotcha, and partly just because of the reduced coupling involved, modern languages like C, Pascal, Ada, and Modula 2 generally pass scalars by value.
This means that the value of the scalar is COPIED into a separate value used only for the call. Since the value passed is a copy, the called procedure can use it as a local variable and modify it any way it likes. The value in the caller will not be changed.
It may seem at first that this is a bit inefficient, because of the need to copy the parameter. But remember that we’re going to have to fetch some value to pass anyway, whether it be the parameter itself or an address for it. Inside the subroutine, using pass-by-value is definitely more efficient, since we eliminate one level of indirection. Finally, we saw earlier that with FORTRAN, it was often necessary to make copies within the subroutine anyway, so pass-by-value reduces the number of local variables. All in all, pass-by-value is better.
Except for one small little detail: if all parameters are passed by value, there is no way for a called to procedure to return a result to its caller! The parameter passed is not altered in the caller, only in the called procedure. Clearly, that won’t get the job done.
There have been two answers to this problem, which are
equivalent. In Pascal, Wirth provides for VAR
parameters, which
are passed-by-reference. What a VAR
parameter is, in fact, is
none other than our old friend the FORTRAN parameter, with a new
name and paint job for disguise. Wirth neatly gets around the
“changing a literal” problem as well as the “address of an
expression” problem, by the simple expedient of allowing only a
variable to be the actual parameter. In other words, it’s the
same restriction that the earliest FORTRANs imposed.
C does the same thing, but explicitly. In C, all parameters are passed by value. One kind of variable that C supports, however, is the pointer. So by passing a pointer by value, you in effect pass what it points to by reference. In some ways this works even better yet, because even though you can change the variable pointed to all you like, you still can’t change the pointer itself. In a function such as strcpy, for example, where the pointers are incremented as the string is copied, we are really only incrementing copies of the pointers, so the values of those pointers in the calling procedure still remain as they were. To modify a pointer, you must pass a pointer to the pointer.
Since we are simply performing experiments here, we’ll look at both pass-by-value and pass-by-reference. That way, we’ll be able to use either one as we need to. It’s worth mentioning that it’s going to be tough to use the C approach to pointers here, since a pointer is a different type and we haven’t studied types yet!
Pass-by-Value
Let’s just try some simple-minded things and see where they lead
us. Let’s begin with the pass-by-value case. Consider the
procedure call FOO(X, Y)
.
Almost the only reasonable way to pass the data is through the CPU stack. So the code we’d like to see generated might look something like this:
MOVE X(PC),-(SP) ; Push X
MOVE Y(PC),-(SP) ; Push Y
BSR FOO ; Call FOO
That certainly doesn’t seem too complex!
When the BSR
is executed, the CPU pushes the return address onto
the stack and jumps to FOO
. At this point the stack will look
like this:
.
.
Value of X (2 bytes)
Value of Y (2 bytes)
SP --> Return Address (4 bytes)
So the values of the parameters have addresses that are fixed offsets from the stack pointer. In this example, the addresses are:
- X: 6(SP)
- Y: 4(SP)
Now consider what the called procedure might look like:
PROCEDURE FOO(A, B)
BEGIN
A = B
END
(Remember, the names of the formal parameters are arbitrary … only the positions count.)
The desired output code might look like:
FOO: MOVE 4(SP),D0
MOVE D0,6(SP)
RTS
Note that, in order to address the formal parameters, we’re going to have to know which position they have in the parameter list. This means some changes to the symbol table stuff. In fact, for our single-character case it’s best to just create a new symbol table for the formal parameters.
Let’s begin by declaring a new table:
var Params: Array['A'..'Z'] of integer;
We also will need to keep track of how many parameters a given procedure has:
var NumParams: integer;
And we need to initialize the new table. Now, remember that the formal parameter list will be different for each procedure that we process, so we’ll need to initialize that table anew for each procedure. Here’s the initializer:
{--------------------------------------------------------------}
{ Initialize Parameter Table to Null }
procedure ClearParams;
var i: char;
begin
for i := 'A' to 'Z' do
Params[i] := 0;
NumParams := 0;
end;
{--------------------------------------------------------------}
We’ll put a call to this procedure in Init
, and also at the end
of DoProc
:
{--------------------------------------------------------------}
{ Initialize }
procedure Init;
var i: char;
begin
GetChar;
SkipWhite;
for i := 'A' to 'Z' do
ST[i] := ' ';
ClearParams;
end;
{--------------------------------------------------------------}
.
.
.
{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
FormalList;
Fin;
if InTable(N) then Duplicate(N);
ST[N] := 'p';
PostLabel(N);
BeginBlock;
Return;
ClearParams;
end;
{--------------------------------------------------------------}
Note that the call within DoProc
ensures that the table will be
clear when we’re in the main program.
OK, now we need a few procedures to work with the table. The
next few functions are essentially copies of InTable
, TypeOf
,
etc.:
{--------------------------------------------------------------}
{ Find the Parameter Number }
function ParamNumber(N: char): integer;
begin
ParamNumber := Params[N];
end;
{--------------------------------------------------------------}
{ See if an Identifier is a Parameter }
function IsParam(N: char): boolean;
begin
IsParam := Params[N] <> 0;
end;
{--------------------------------------------------------------}
{ Add a New Parameter to Table }
procedure AddParam(Name: char);
begin
if IsParam(Name) then Duplicate(Name);
Inc(NumParams);
Params[Name] := NumParams;
end;
{--------------------------------------------------------------}
Finally, we need some code generation routines:
{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }
procedure LoadParam(N: integer);
var Offset: integer;
begin
Offset := 4 + 2 * (NumParams - N);
Emit('MOVE ');
WriteLn(Offset, '(SP),D0');
end;
{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }
procedure StoreParam(N: integer);
var Offset: integer;
begin
Offset := 4 + 2 * (NumParams - N);
Emit('MOVE D0,');
WriteLn(Offset, '(SP)');
end;
{--------------------------------------------------------------}
{ Push The Primary Register to the Stack }
procedure Push;
begin
EmitLn('MOVE D0,-(SP)');
end;
{--------------------------------------------------------------}
( The last routine is one we’ve seen before, but it wasn’t in this vestigial version of the program.)
With those preliminaries in place, we’re ready to deal with the semantics of procedures with calling lists (remember, the code to deal with the syntax is already in place).
Let’s begin by processing a formal parameter. All we have to do is to add each parameter to the parameter symbol table:
{--------------------------------------------------------------}
{ Process a Formal Parameter }
procedure FormalParam;
begin
AddParam(GetName);
end;
{--------------------------------------------------------------}
Now, what about dealing with a formal parameter when it appears
in the body of the procedure? That takes a little more work. We
must first determine that it is a formal parameter. To do this,
I’ve written a modified version of TypeOf
:
{--------------------------------------------------------------}
{ Get Type of Symbol }
function TypeOf(n: char): char;
begin
if IsParam(n) then
TypeOf := 'f'
else
TypeOf := ST[n];
end;
{--------------------------------------------------------------}
(Note that, since TypeOf
now calls IsParam
, it may need to be
relocated in your source.)
We also must modify AssignOrProc
to deal with this new type:
{--------------------------------------------------------------}
{ Decide if a Statement is an Assignment or Procedure Call }
procedure AssignOrProc;
var Name: char;
begin
Name := GetName;
case TypeOf(Name) of
' ': Undefined(Name);
'v', 'f': Assignment(Name);
'p': CallProc(Name);
else Abort('Identifier ' + Name + ' Cannot Be Used
Here');
end;
end;
{--------------------------------------------------------------}
Finally, the code to process an assignment statement and an expression must be extended:
{--------------------------------------------------------------}
{ Parse and Translate an Expression }
{ Vestigial Version }
procedure Expression;
var Name: char;
begin
Name := GetName;
if IsParam(Name) then
LoadParam(ParamNumber(Name))
else
LoadVar(Name);
end;
{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }
procedure Assignment(Name: char);
begin
Match('=');
Expression;
if IsParam(Name) then
StoreParam(ParamNumber(Name))
else
StoreVar(Name);
end;
{--------------------------------------------------------------}
As you can see, these procedures will treat every variable name
encountered as either a formal parameter or a global variable,
depending on whether or not it appears in the parameter symbol
table. Remember that we are using only a vestigial form of
Expression
. In the final program, the change shown here will
have to be added to Factor
, not Expression
.
The rest is easy. We need only add the semantics to the actual procedure call, which we can do with one new line of code:
{--------------------------------------------------------------}
{ Process an Actual Parameter }
procedure Param;
begin
Expression;
Push;
end;
{--------------------------------------------------------------}
That’s it. Add these changes to your program and give it a try. Try declaring one or two procedures, each with a formal parameter list. Then do some assignments, using combinations of global and formal parameters. You can call one procedure from within another, but you cannot declare a nested procedure. You can even pass formal parameters from one procedure to another. If we had the full syntax of the language here, you’d also be able to do things like read or write formal parameters or use them in complicated expressions.
What’s Wrong?
At this point, you might be thinking: Surely there’s more to this than a few pushes and pops. There must be more to passing parameters than this.
You’d be right. As a matter of fact, the code that we’re generating here leaves a lot to be desired in several respects.
The most glaring oversight is that it’s wrong! If you’ll look back at the code for a procedure call, you’ll see that the caller pushes each actual parameter onto the stack before it calls the procedure. The procedure uses that information, but it doesn’t change the stack pointer. That means that the stuff is still there when we return. Somebody needs to clean up the stack, or we’ll soon be in very hot water!
Fortunately, that’s easily fixed. All we have to do is to increment the stack pointer when we’re finished.
Should we do that in the calling program, or the called procedure? Some folks let the called procedure clean up the stack, since that requires less code to be generated per call, and since the procedure, after all, knows how many parameters it’s got. But that means that it must do something with the return address so as not to lose it.
I prefer letting the caller clean up, so that the callee need
only execute a return. Also, it seems a bit more balanced, since
the caller is the one who “messed up” the stack in the first
place. But that means that the caller must remember how many
items it pushed. To make things easy, I’ve modified the
procedure ParamList
to be a function instead of a procedure,
returning the number of bytes pushed:
{--------------------------------------------------------------}
{ Process the Parameter List for a Procedure Call }
function ParamList: integer;
var N: integer;
begin
N := 0;
Match('(');
if Look <> ')' then begin
Param;
inc(N);
while Look = ',' do begin
Match(',');
Param;
inc(N);
end;
end;
Match(')');
ParamList := 2 * N;
end;
{--------------------------------------------------------------}
Procedure CallProc
then uses this to clean up the stack:
{--------------------------------------------------------------}
{ Process a Procedure Call }
procedure CallProc(Name: char);
var N: integer;
begin
N := ParamList;
Call(Name);
CleanStack(N);
end;
{--------------------------------------------------------------}
Here I’ve created yet another code generation procedure:
{--------------------------------------------------------------}
{ Adjust the Stack Pointer Upwards by N Bytes }
procedure CleanStack(N: integer);
begin
if N > 0 then begin
Emit('ADD #');
WriteLn(N, ',SP');
end;
end;
{--------------------------------------------------------------}
OK, if you’ll add this code to your compiler, I think you’ll find that the stack is now under control.
The next problem has to do with our way of addressing relative to the stack pointer. That works fine in our simple examples, since with our rudimentary form of expressions nobody else is messing with the stack. But consider a different example as simple as:
PROCEDURE FOO(A, B)
BEGIN
A = A + B
END
The code generated by a simple-minded parser might be:
FOO: MOVE 6(SP),D0 ; Fetch A
MOVE D0,-(SP) ; Push it
MOVE 4(SP),D0 ; Fetch B
ADD (SP)+,D0 ; Add A
MOVE D0,6(SP) : Store A
RTS
This would be wrong. When we push the first argument onto the stack, the offsets for the two formal parameters are no longer 4 and 6, but are 6 and 8. So the second fetch would fetch A again, not B.
This is not the end of the world. I think you can see that all we really have to do is to alter the offset every time we do a push, and that in fact is what’s done if the CPU has no support for other methods.
Fortunately, though, the 68000 does have such support. Recognizing that this CPU would be used a lot with high-order language compilers, Motorola decided to add direct support for this kind of thing.
The problem, as you can see, is that as the procedure executes, the stack pointer bounces up and down, and so it becomes an awkward thing to use as a reference to access the formal parameters. The solution is to define some other register, and use it instead. This register is typically set equal to the original stack pointer, and is called the frame pointer.
The 68000 instruction set LINK
lets you declare such a frame
pointer, and sets it equal to the stack pointer, all in one
instruction. As a matter of fact, it does even more than that.
Since this register may have been in use for something else in
the calling procedure, LINK
also pushes the current value of that
register onto the stack. It can also add a value to the stack
pointer, to make room for local variables.
The complement of LINK
is UNLK
, which simply restores the stack
pointer and pops the old value back into the register.
Using these two instructions, the code for the previous example becomes:
FOO: LINK A6,#0
MOVE 10(A6),D0 ; Fetch A
MOVE D0,-(SP) ; Push it
MOVE 8(A6),D0 ; Fetch B
ADD (SP)+,D0 ; Add A
MOVE D0,10(A6) : Store A
UNLK A6
RTS
Fixing the compiler to generate this code is a lot easier than it
is to explain it. All we need to do is to modify the code
generation created by DoProc
. Since that makes the code a little
more than one line, I’ve created new procedures to deal with it,
paralleling the Prolog
and Epilog
procedures called by DoMain
:
{--------------------------------------------------------------}
{ Write the Prolog for a Procedure }
procedure ProcProlog(N: char);
begin
PostLabel(N);
EmitLn('LINK A6,#0');
end;
{--------------------------------------------------------------}
{ Write the Epilog for a Procedure }
procedure ProcEpilog;
begin
EmitLn('UNLK A6');
EmitLn('RTS');
end;
{--------------------------------------------------------------}
Procedure DoProc
now just calls these:
{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }
procedure DoProc;
var N: char;
begin
Match('p');
N := GetName;
FormalList;
Fin;
if InTable(N) then Duplicate(N);
ST[N] := 'p';
ProcProlog(N);
BeginBlock;
ProcEpilog;
ClearParams;
end;
{--------------------------------------------------------------}
Finally, we need to change the references to SP
in procedures
LoadParam
and StoreParam
:
{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }
procedure LoadParam(N: integer);
var Offset: integer;
begin
Offset := 8 + 2 * (NumParams - N);
Emit('MOVE ');
WriteLn(Offset, '(A6),D0');
end;
{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }
procedure StoreParam(N: integer);
var Offset: integer;
begin
Offset := 8 + 2 * (NumParams - N);
Emit('MOVE D0,');
WriteLn(Offset, '(A6)');
end;
{--------------------------------------------------------------}
(Note that the Offset
computation changes to allow for the extra
push of A6
.)
That’s all it takes. Try this out and see how you like it.
At this point we are generating some relatively nice code for procedures and procedure calls. Within the limitation that there are no local variables (yet) and that no procedure nesting is allowed, this code is just what we need.
There is still just one little small problem remaining:
WE HAVE NO WAY TO RETURN RESULTS TO THE CALLER!
But that, of course, is not a limitation of the code we’re generating, but one inherent in the call-by-value protocol. Notice that we can use formal parameters in any way inside the procedure. We can calculate new values for them, use them as loop counters (if we had loops, that is!), etc. So the code is doing what it’s supposed to. To get over this last problem, we need to look at the alternative protocol.
Call-by-Reference
This one is easy, now that we have the mechanisms already in
place. We only have to make a few changes to the code
generation. Instead of pushing a value onto the stack, we must
push an address. As it turns out, the 68000 has an instruction,
PEA
, that does just that.
We’ll be making a new version of the test program for this. Before we do anything else,
MAKE A COPY of the program as it now stands, because we’ll be needing it again later.
Let’s begin by looking at the code we’d like to see generated for the new case. Using the same example as before, we need the call
FOO(X, Y)
to be translated to:
PEA X(PC) ; Push the address of X
PEA Y(PC) ; Push Y the address of Y
BSR FOO ; Call FOO
That’s a simple matter of a slight change to Param:
{--------------------------------------------------------------}
{ Process an Actual Parameter }
procedure Param;
begin
EmitLn('PEA ' + GetName + '(PC)');
end;
{--------------------------------------------------------------}
(Note that with pass-by-reference, we can’t have expressions in
the calling list, so Param
can just read the name directly.)
At the other end, the references to the formal parameters must be given one level of indirection:
FOO: LINK A6,#0
MOVE.L 12(A6),A0 ; Fetch the address of A
MOVE (A0),D0 ; Fetch A
MOVE D0,-(SP) ; Push it
MOVE.L 8(A6),A0 ; Fetch the address of B
MOVE (A0),D0 ; Fetch B
ADD (SP)+,D0 ; Add A
MOVE.L 12(A6),A0 ; Fetch the address of A
MOVE D0,(A0) : Store A
UNLK A6
RTS
All of this can be handled by changes to LoadParam
and
StoreParam
:
{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }
procedure LoadParam(N: integer);
var Offset: integer;
begin
Offset := 8 + 4 * (NumParams - N);
Emit('MOVE.L ');
WriteLn(Offset, '(A6),A0');
EmitLn('MOVE (A0),D0');
end;
{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }
procedure StoreParam(N: integer);
var Offset: integer;
begin
Offset := 8 + 4 * (NumParams - N);
Emit('MOVE.L ');
WriteLn(Offset, '(A6),A0');
EmitLn('MOVE D0,(A0)');
end;
{--------------------------------------------------------------}
To get the count right, we must also change one line in
ParamList
:
ParamList := 4 * N;
That should do it. Give it a try and see if it’s generating reasonable-looking code. As you will see, the code is hardly optimal, since we reload the address register every time a parameter is needed. But that’s consistent with our KISS approach here, of just being sure to generate code that works. We’ll just make a little note here, that here’s yet another candidate for optimization, and press on.
Now we’ve learned to process parameters using pass-by-value and pass-by-reference. In the real world, of course, we’d like to be able to deal with both methods. We can’t do that yet, though, because we have not yet had a session on types, and that has to come first.
If we can only have one method, then of course it has to be the good ol’ FORTRAN method of pass-by-reference, since that’s the only way procedures can ever return values to their caller.
This, in fact, will be one of the differences between TINY and KISS. In the next version of TINY, we’ll use pass-by-reference for all parameters. KISS will support both methods.
Local Variables
So far, we’ve said nothing about local variables, and our definition of procedures doesn’t allow for them. Needless to say, that’s a big gap in our language, and one that needs to be corrected.
Here again we are faced with a choice: Static or dynamic storage?
In those old FORTRAN programs, local variables were given static storage just like global ones. That is, each local variable got a name and allocated address, like any other variable, and was referenced by that name.
That’s easy for us to do, using the allocation mechanisms already in place. Remember, though, that local variables can have the same names as global ones. We need to somehow deal with that by assigning unique names for these variables.
The characteristic of static storage, of course, is that the data survives a procedure call and return. When the procedure is called again, the data will still be there. That can be an advantage in some applications. In the FORTRAN days we used to do tricks like initialize a flag, so that you could tell when you were entering a procedure for the first time and could do any one-time initialization that needed to be done.
Of course, the same “feature” is also what makes recursion impossible with static storage. Any new call to a procedure will overwrite the data already in the local variables.
The alternative is dynamic storage, in which storage is allocated
on the stack just as for passed parameters. We also have the
mechanisms already for doing this. In fact, the same routines
that deal with passed (by value) parameters on the stack can
easily deal with local variables as well … the code to be
generated is the same. The purpose of the offset in the 68000
LINK
instruction is there just for that reason: we can use it to
adjust the stack pointer to make room for locals. Dynamic
storage, of course, inherently supports recursion.
When I first began planning TINY, I must admit to being prejudiced in favor of static storage. That’s simply because those old FORTRAN programs were pretty darned efficient … the early FORTRAN compilers produced a quality of code that’s still rarely matched by modern compilers. Even today, a given program written in FORTRAN is likely to outperform the same program written in C or Pascal, sometimes by wide margins. (Whew! Am I going to hear about that statement!)
I’ve always supposed that the reason had to do with the two main differences between FORTRAN implementations and the others: static storage and pass-by-reference. I know that dynamic storage supports recursion, but it’s always seemed to me a bit peculiar to be willing to accept slower code in the 95% of cases that don’t need recursion, just to get that feature when you need it. The idea is that, with static storage, you can use absolute addressing rather than indirect addressing, which should result in faster code.
More recently, though, several folks have pointed out to me that
there really is no performance penalty associated with dynamic
storage. With the 68000, for example, you shouldn’t use absolute
addressing anyway … most operating systems require position
independent code. And the 68000 instruction MOVE 8(A6),D0
has exactly the same timing as MOVE X(PC),D0
.
So I’m convinced, now, that there is no good reason not to use dynamic storage.
Since this use of local variables fits so well into the scheme of pass-by-value parameters, we’ll use that version of the translator to illustrate it. (I sure hope you kept a copy!)
The general idea is to keep track of how many local parameters
there are. Then we use the integer in the LINK
instruction to
adjust the stack pointer downward to make room for them. Formal
parameters are addressed as positive offsets from the frame
pointer, and locals as negative offsets. With a little bit of
work, the same procedures we’ve already created can take care of
the whole thing.
Let’s start by creating a new variable, Base:
var Base: integer;
We’ll use this variable, instead of NumParams, to compute stack
offsets. That means changing the two references to NumParams
in
LoadParam
and StoreParam
:
{--------------------------------------------------------------}
{ Load a Parameter to the Primary Register }
procedure LoadParam(N: integer);
var Offset: integer;
begin
Offset := 8 + 2 * (Base - N);
Emit('MOVE ');
WriteLn(Offset, '(A6),D0');
end;
{--------------------------------------------------------------}
{ Store a Parameter from the Primary Register }
procedure StoreParam(N: integer);
var Offset: integer;
begin
Offset := 8 + 2 * (Base - N);
Emit('MOVE D0,');
WriteLn(Offset, '(A6)');
end;
{--------------------------------------------------------------}
The idea is that the value of Base will be frozen after we have
processed the formal parameters, and won’t increase further as
the new, local variables, are inserted in the symbol table. This
is taken care of at the end of FormalList
:
{--------------------------------------------------------------}
{ Process the Formal Parameter List of a Procedure }
procedure FormalList;
begin
Match('(');
if Look <> ')' then begin
FormalParam;
while Look = ',' do begin
Match(',');
FormalParam;
end;
end;
Match(')');
Fin;
Base := NumParams;
NumParams := NumParams + 4;
end;
{--------------------------------------------------------------}
(We add four words to make allowances for the return address and old frame pointer, which end up between the formal parameters and the locals.)
About all we need to do next is to install the semantics for
declaring local variables into the parser. The routines are very
similar to Decl
and TopDecls
:
{--------------------------------------------------------------}
{ Parse and Translate a Local Data Declaration }
procedure LocDecl;
var Name: char;
begin
Match('v');
AddParam(GetName);
Fin;
end;
{--------------------------------------------------------------}
{ Parse and Translate Local Declarations }
function LocDecls: integer;
var n: integer;
begin
n := 0;
while Look = 'v' do begin
LocDecl;
inc(n);
end;
LocDecls := n;
end;
{--------------------------------------------------------------}
Note that LocDecls
is a FUNCTION
, returning the number of locals
to DoProc
.
Next, we modify DoProc
to use this information:
{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }
procedure DoProc;
var N: char;
k: integer;
begin
Match('p');
N := GetName;
if InTable(N) then Duplicate(N);
ST[N] := 'p';
FormalList;
k := LocDecls;
ProcProlog(N, k);
BeginBlock;
ProcEpilog;
ClearParams;
end;
{--------------------------------------------------------------}
(I’ve made a couple of changes here that weren’t really
necessary. Aside from rearranging things a bit, I moved the call
to Fin
to within FormalList
, and placed one inside LocDecls
as
well. Don’t forget to put one at the end of FormalList
, so that
we’re together here.)
Note the change in the call to ProcProlog
. The new argument is
the number of WORDS
(not bytes) to allocate space for. Here’s
the new version of ProcProlog:
{--------------------------------------------------------------}
{ Write the Prolog for a Procedure }
procedure ProcProlog(N: char; k: integer);
begin
PostLabel(N);
Emit('LINK A6,#');
WriteLn(-2 * k)
end;
{--------------------------------------------------------------}
That should do it. Add these changes and see how they work.
Conclusion
At this point you know how to compile procedure declarations and procedure calls, with parameters passed by reference and by value. You can also handle local variables. As you can see, the hard part is not in providing the mechanisms, but in deciding just which mechanisms to use. Once we make these decisions, the code to translate the constructs is really not that difficult. I didn’t show you how to deal with the combination of local parameters and pass-by-reference parameters, but that’s a straightforward extension to what you’ve already seen. It just gets a little more messy, that’s all, since we need to support both mechanisms instead of just one at a time. I’d prefer to save that one until after we’ve dealt with ways to handle different variable types.
That will be the next installment, which will be coming soon to a Forum near you. See you then.