Macro system

I am working on implementing a basic scripting language called YAL. You can check it out here.

I want to implement a macro system for our language YAL. The motivation for having a macro system in a language is that you can extend the language without modifying it. It is a way of meta-programming. There are various ways to implement meta-programming, macros are one such way. Broadly, by meta-programming, I mean,

WRITING CODE THAT WRITES CODE.

Yes, code that when evaluated generates more code. It is a cool idea. Languages like LISP, Julia, Elixir, and Racket have it, making the language easy to extend and more expressive. Normally, if we want to introduce a new feature in our language, we would probably have to modify our lexer, parser and evaluator for that, but with macros, you can just write a macro, that when evaluated, generates code for your new feature using existing language constructs.

We use the term evaluation in a loose sense here. We don’t mean the runtime execution of the code when we say evaluation in this context. We mean, when this macro code is analyzed/parsed/evaluated, it generates more code. In the case of macros, this code generation is called Macro Expansion. We expand the code. You’ll get it soon.

Let’s build an intuition for what a macro might look like in reality. It is something that generates code when evaluated. Generates something when evaluated. Huh, like functions? Okay, let’s say macros syntactically look like functions. You define them and later you call them. But functions when executed in runtime, return a value and move on. They don’t generate code. So semantically, macros have to be different. Of course, if macros were functions, why are we even doing all this? Macros are like functions that take source code as input and give source code as output. How do you pass source code around though? ASTs! Representation of code in data structures. We can pass around AST in macros. So in a way, macros deal with ASTs. Now you can do all sorts of stuff with this CODE AS DATA concept, you pass it an AST node and then do nothing, source code disappears. Maybe, return two ASTs, and double the code. Or maybe, modify the AST, whatever you like. It is code that expands to some different code.

Ok, now, how do we go about it? How to implement macros in the language. We know what they will look like. Here’s a possible definition -

let macro_a = fn (x) { //do something with x};

But wait, this is just a function, we have to use a different keyword to distinguish macros from functions.

let macro_a = macro (x) { //do something with x};

Seems ok. Looks like a function but is different because of the new keyword macro.

Now at the call site, you of course do a simple call,

 macro_a(x) // call macro with argument x

Ok, looks ( syntax ) is figured out. Let’s get into semantics.

Our macros contain logic. We play around with input AST nodes in macros with logic. With no logic, our macros will just be templates. There is a category of such macros called text substitution macros like the one C has, C preprocessor. However, these types of macros are limited in what they can do. We want to have syntactic macros. Macros with logic. And this is the reason why we want to evaluate our macros.

We need to evaluate the macros. Evaluation happens in a context/environment. An environment is basically a store that maps variable names to objects. So, we need to pass macro arguments as objects. But we don’t want to evaluate it, we want to pass it as the AST node itself. Well, wrappers!

Introducing Quote object, an object that wraps an AST node.

Now at macro call sites, we can pass the arguments of the macro as quoted objects wrapping the argument AST node. The macro body is evaluated as usual in its own environment extended by the parameters. The evaluation of the macro body has to return an AST node ( expanded code ) that will replace the macro call expression node at the macro call site. But evaluation returns objects, not AST nodes. Well, we have a wrapper already. We will return a quote object wrapping the generated AST node. But how do you generate this AST node?

See, we want to evaluate our macros. But what we are dealing with inside macros are AST nodes which is just unevaluated source code. We don’t want to evaluate that, remember. We want to evaluate the logic that plays with these unevaluated source code nodes. So, we want a way to evaluate some code and treat some as data. We want to stop evaluation on demand for certain pieces of code and treat it as an AST.

Stop evaluation on demand. Introducing keyword quote. quote is only allowed inside macros since that is its only use case.

quote takes just one argument ( for simplicity ) and prevents it from getting evaluated. It treats its argument as an AST node and returns it as is. Since a quote is a call expression, when evaluated, it has to return an object as per our evaluator. So, that’s why, a quote just wraps its argument AST node inside a quote object and returns it.

Now, consider this macro:

minus(x,y) //this should expand to x-y. Not the best example of expansion, but you get the point

Now, let’s see how we can define such a macro.

Attempt 1: let minus = macro(a,b){a-b};

The parameters a and b here are bound to quoted objects with nodes being x and y identifiers. Evaluating a - b would mean arithmetic on quoted objects which is not defined. Also, this does not return a quoted object wrapping a node which is the major requirement of macro.

Attempt 2: let minus = macro(a,b){quote(a-b)}

This returns a quoted object. But since quote stops the evaluation of its argument, this macro will just return a quoted object with the node being expression a-b, and not x-y.

We want something like,

let minus = macro(a,b){quote(Evaluation of a - Evaluation of b)};

We need evaluation inside quote. quote(), which is meant to protect its argument from evaluation and return it as a wrapped node, now needs evaluation inside it. What this means is, that for the argument inside quote(), which is an AST, for certain parts of that AST, we need evaluation.

Introducing unquote. Selective evaluation of AST nodes. unquote is only allowed inside quote, since that is its only use case.

Attempt 3: let minus = macro(a,b){quote(unquote(a)-unquote(b))}

Perfect.

Just like quote, unquote takes one argument. It evaluates its argument, and then the unquote call expression node in the quoted AST gets replaced by this evaluated result. But again, as evaluation returns an object, and we need a node to replace the unquote call expression node, we need to turn this evaluated object into an AST node before returning it. We go backwards here, from object to AST.

We have quote and unquote now. We know how to define macros and call them. Now, when exactly do we evaluate these macros? We have three phases in our current implementation of YAL - lexing, parsing and evaluating. Where does this macro evaluation ( macro expansion ) fit in? We know that macros deal with ASTs. So macro expansion has to come after the parsing phase. Also, macro expansion ( code generation ) should happen before run time evaluation. Runtime evaluation is agnostic of macros. As far as it is concerned, there is no such thing as macros.

So, macro expansion deserves its own phase, after parsing and before evaluation. Let’s call this the Macro Expansion phase. Now here is the weird part. Macro expansion involves expansion. Code generation. We have logic inside our macros. Remember, logic is more powerful than dumb templating macros. Now what this means is, we are evaluating macros to do code generation. This evaluation comes before the final runtime evaluation phase. So, in a way, we are evaluating at compile time. It sounds weird, but it is just another phase that involves source code evaluation. It just happens to be before the final runtime evaluation phase, so some languages name it evaluation at compile time.

Ok, let’s implement this phase. After parsing, we have our source code represented as AST in a nice data structure. We will walk this AST and as we are walking -

If we see a macro definition, we would
- Mark this statement, so that at the end of this phase, we can remove all macro definitions from AST before we pass it to the final evaluation phase.
- Bind the macro to the identifier i.e., evaluate the macro literal and store it in the macro environment context.
If we see a macro call expression, we will evaluate the macro and replace the macro call expression node with the result of the evaluation which is a node wrapped in a quoted object. This is the expansion part.

We are doing this in a single pass here. We can have two passes also, one for registering macro definitions in the environment and removing them from source code and then another for macro expansion using the macro environment.

The final AST with expanded code ( generated code ) is then passed to our macro agnostic evaluator. While implementing this macro expansion phase, you will probably need an ASTWalker function to traverse the input AST and modify some target nodes inside that. Make sure, this ASTWalker is non-mutating. I learned this the hard way after hours of painful debugging.

We are done. We now have macros in YAL. You might ask, what was all this for? Remember the motivation? Something about extending the language. Well, now we can extend our language. Lemme show you some examples.

Now, I can evaluate a program like this in YAL.

puts(ternary(true, 10, 20))
puts(ternary(false, 10, 20))
unless(1 > 10, puts("1 is not greater than 10"))
unless(20 > 10, puts("20 is greater than 10"))
puts(not(1))

The cool part is that the interpreter for YAL does not know anything about these operators - ternary, unless and not. I have not implemented any support for these anywhere, not in the lexer, not in the parser and not in the evaluator. All this is possible because we defined these macros in our program.

let ternary = macro(condition, trueExpr, falseExpr) {
   quote(if (unquote(condition)) { unquote(trueExpr) } else { unquote(falseExpr) })
};

let unless = macro(cond, body) {
   quote(if (!(unquote(cond))) { unquote(body) })
};

let not = macro(expr) {
   quote(!(unquote(expr)))
};

In the macro expansion phase, the macro call expressions got expanded to already known language constructs which the evaluator then evaluated. And just like that, we extended YAL with no modification to the language.

Isn’t it cool?

Jatin Malik

2025/02/18