natechoe.dev The blog Contact info Other links The github repo

How the C/C++ preprocessor works

I also need to question the complexity of this code. This is professional level coding far above what would be expected of students. Please explain why I am seeing this.

- My actual CS professor referring to my code.

Thanks prof! Allow me to explain!

In Java, if I want to make two classes comparable, I just implement the Comparable interface and write a single compareTo function. For something like the Integer wrapper class, that might look like this:

public class Integer implements Comparable<Integer> {
	private int value;

	// NOTE: This code breaks in cases of integer overflow. I like this code
	// because it makes the logic a lot clearer, but if you were actually
	// writing this class you should absolutely take that into
	// consideration.
	public int compareTo(Integer other) {
		return this.value - other.value;
	}
}

Just one function to do this one operation of "comparison".

In C++, on the other hand, you have to overload six separate operators for each of the possible comparison operations:

class Integer {
private:
	int value;
public:
	bool operator==(Integer& other) {
		return value == other.value;
	}
	bool operator!=(Integer& other) {
		return value != other.value;
	}
	bool operator<(Integer& other) {
		return value < other.value;
	}
	bool operator>(Integer& other) {
		return value > other.value;
	}
	bool operator<=(Integer& other) {
		return value <= other.value;
	}
	bool operator>=(Integer& other) {
		return value >= other.value;
	}
};

This sucks. I don't want to have to write out six different functions for one operation. Luckily, we have the C preprocessor:

class Integer {
private:
	int value;
public:
#define OPERATION(op) \
	bool operator op (Integer& other) { \
		return val op other.value; \
	}
	OPERATION(==)
	OPERATION(!=)
	OPERATION(<)
	OPERATION(>)
	OPERATION(<=)
	OPERATION(>=)
#undef OPERATION
};

Now, I can write out the logic of a single comparator in a macro and have the preprocessor do the busy work of copying it over six times. I can go even farther by putting those macro expansions into their own macro, like this:

#define OPERATIONS OPERATION(==) OPERATION(!=) OPERATION(<) OPERATION(>) OPERATION(<=) OPERATION(>=)
class Integer {
private:
	int value;
public:
#define OPERATION(op) \
	bool operator op (Integer& other) { \
		return val op other.value; \
	}
	OPERATIONS
#undef OPERATION
};

There we go, we've reinvented X macros.

That is not the proper way to overloaded operators. I'm expecting that you use C++ rather than C. That is one of the goals of this class is for you to learn C++

- My CS professor, after I explained this to him (albeit in much less detail).

Even though it comes from C, the C preprocessor is a part of C++. It's an older part of C++ sure, but so are pointers. Still, to understand the preprocessor, you have to understand that it does not come from C++. In fact, it doesn't even come from C, code preprocessors originally came from assemblers. The preprocessor has absolutely no understanding of the logic of your code. It's basically just copy-paste on steroids.

The clearest example of this is with the #include directive. #include literally just takes one file and pastes it into another. That's why this code works:

// main.cpp

#include <iostream>

int main() {
#include "main-body.cpp"
}
// main-body.cpp

std::cout << "Hello world!" << std::endl;
return 0;

The preprocessor sees #include "main-body.cpp", and copies that file into that location, giving us this:

// main.cpp

#include <iostream>

int main() {
// main-body.cpp

std::cout << "Hello world!" << std::endl;
return 0;
}

iostream is another file on disk. On my system, expanding that out completely gives me 33,732 lines of code, so I've elected not to do that in this small example.

$ echo '#include <iostream>' > nonce.cpp
$ c++ -E nonce.cpp | wc -l
33732

#include is really just the tip of the iceberg. By far the most powerful directive in the entire preprocessor is #define.

#define defines a text replacement macro. That might look like this:

#define msg "Hello world!"
int main() {
	std::cout << msg << std::endl;
	return 0;
}

The macro in line one tells the preprocessor to replace every instance of msg with "Hello world!", resulting in this expansion:

int main() {
	std::cout << "Hello world!" << std::endl;
	return 0;
}

That's literally it. The preprocessor does not understand anything beyond this basic copy-paste. In C (at least on my system, this is really implementation defined behavior), the NULL pointer isn't a variable or a keyword. It's actually defined using this macro:

#define NULL ((void *)0)

These sorts of basic macros can also be used for header guards, like this:

#ifndef SOMECLASS_H
#define SOMECLASS_H

class SomeClass {
	// ...
};

#endif

By sandwiching your header files in those three magic lines, you prevent mistakes like this from breaking your code:

#include "SomeClass.h"
#include "SomeClass.h"

After the first time you include a file, the preprocessor will detect any future attempts to include the file again and ignore them. Keep in mind that this is not some built-in feature of C++ like if/else, but a pattern that comes as a side effect of several existing features. Most compilers have added a sort of built-in header guard with the #pragma once directive, but this has yet to be standardized.

#pragma once

class SomeClass {
	// ...
};

This code will work almost everywhere, but only almost.

Macros can also have arguments. Here's a super classic example of that:

#define SQUARE(x) x*x
int main() {
	std::cout << SQUARE(5) << std::endl;
	return 0;
}

The preprocessor sees SQUARE, understands that that's a macro expansion, and does a text replacement where x is 5.

#define SQUARE(x) x*x
int main() {
	std::cout << 5*5 << std::endl;
	return 0;
}

This specific macro is a classic example because it has a fatal flaw:

#define SQUARE(x) x*x
int main() {
	std::cout << SQUARE(5+3) << std::endl;
	return 0;
}

This code should print out 64, but instead it prints out 23. This is because the expansion of SQUARE(5+3) creates this:

int main() {
	std::cout << 5+3*5+3 << std::endl;
	return 0;
}

Now, because multiplication has higher precedence than addition, we actually calculate 5+15+3 = 23.

For this reason, it's recommended that macros with arguments use a lot of paranthesis whenever necessary:

#define SQUARE(x) ((x)*(x))

This still breaks with code like SQUARE(++i) (which causes undefined behavior), but it's definitely a lot better.

Macros can also have variadic arguments.

#include <stdio.h>

#define IGNORE_FIRST_ARG(first_arg, ...) __VA_ARGS__

int main() {
	printf("%d\n", IGNORE_FIRST_ARG("this is ignored", 5));
	return 0;
}

The IGNORE_FIRST_ARG macro can take anywhere from 1 to infinity arguments. Anything beyond the first argument is substituted into __VA_ARGS__.

This is usually used when you want to call an actual function with variadic arguments, like printf.

int fprintf(FILE *stream, const char *pattern, ...)
#define printf(...) fprintf(stdout, __VA_ARGS__)

If we're implementing the C standard library, we can just implement fprintf and get printf for free.

There are some other really cool things we can do though, like argument counting:

#include <stdio.h>

#define FIFTH(a, b, c, d, e, ...) e
#define COUNT_ARGS(...) FIFTH(__VA_ARGS__, 4, 3, 2, 1, 0)

int main() {
	printf("%d\n", COUNT_ARGS(a, b));
	printf("%d\n", COUNT_ARGS(a, b, c));
	return 0;
}

Or even macro overloading:

#include <stdio.h>

// a##b concatenates two arguments, so if a is "OVERLOAD_" and b is "1", then
// a##b is "OVERLOAD_1"
#define CAT_RAW(a, b) a##b

// we need this hack to avoid blue paint
// https://en.wikipedia.org/wiki/Painted_blue
#define CAT(a, b) CAT_RAW(a, b)

#define FIFTH(a, b, c, d, e, ...) e
#define OVERLOAD(...) CAT(OVERLOAD_, FIFTH(__VA_ARGS__, 4, 3, 2, 1, 0))(__VA_ARGS__)

// #a turns an argument into a string, and C allows you to implicitly
// concatenate strings
#define OVERLOAD_2(a, b) "arg 1: " #a ", arg 2: " #b
#define OVERLOAD_3(a, b, c) "arg 1: " #a ", arg 2: " #b ", arg 3: " #c

int main() {
	printf("%s\n", OVERLOAD(a, b));
	printf("%s\n", OVERLOAD(a, b, c));
	return 0;
}

Don't actually use this trick. It's cool, but it's also very dirty. X macros are pushing the bounds on what I'd call "clean code", this is just way too far.

That's about as far as I'm willing to write for this bit. If you really want to learn more about the preprocessor, check out the fantastic preprocessor iceberg meme, which goes into a lot more detail than even this.