Dumping a C program’s AST with Psyche-C
Given that the Psyche-C compiler frontend is being renewed, I decided to start a blog about the project. This is my first post.
Without further ado, consider this C program (assume that its name is test.c
):
int v, f(int d);
int f(int p)
{
if (p == v)
return(p - 1);
return 0;
}
To produce and dump an AST with Psyche-C, we invoke the cnippet driver adaptor with command line option --C-dump-AST
. The output is as follows:
Now, I’ll briefly correlate the above AST with the one produced by Clang for the same test.c
program. Let’s take a look at it:
Apart from the colors and extra decoration, what are the main differences between the two?
- It’s that Psyche-C’s AST is a bit closer to the grammar of C. For instance, you can’t see declarators in Clang’s AST, they are absorbed by their containing declarations.
Whether or not this characteristic is good or bad will depend on a given application. For the purpose of static analysis implementation, I consider the refined syntax insights of Psyche-C’s AST an advantage.
I’m not saying that every node in a AST should correspond to a terminal of the grammar. If so, we’d end up with a rather concrete syntax (or parse) tree, instead of an abstract syntax tree. A comprehensible AST will contain just enough of a language’s meaning embedded into it. (The AST produced by the old parser of Psyche-C, e.g., was too grammar-oriented.)
Consider this snippet from our test.c
program:
int v, f(int d);
In Clang’s AST, it’s represented through a VarDecl
and a FunctionDecl
; both these nodes inherit from DeclaratorDecl
(i.e., a declaration with a declarator). But, in pedantic terms, that representation isn’t quite correct, given that a single declaration — one starting at int
and ending at ;
— exists in that snippet.
In Psyche-C, I opted for a more rigorous representation in the AST:
- A
VariableAndOrFunctionDeclaration
is created (just like Clang, this node inherits from aDeclaratorDeclaration
) with two child nodes: anIdentifierDeclarator
, designating object of typeint
, and aFunctionDeclarator
, designating a function whose return type isint
. This choice makes the AST rewritable with accuracy as well.
In general, the design “spirit” of Psyche-C’s AST is more aligned with that of Roslyn, the .NET compiler platform. Although, because Psyche-C is a frontend for C, its AST will, of course, resembles that of Clang.
To wrap up, I’ll leave a C# version of test.c
(well, test.cs
), together with its AST.
Screenshot from SharpLab.