Design of Programming Languages
Passion project by Leif Barton
Make sure to visit all the following links to view the whole project
Background
Since elementary school, I have been programming. Sometimes, it has been for more useful purposes, it has been for more useful purposes, such as this website. Sometimes it's just for fun, like this pong game I created in 5th grade. Other times, it’s for utility, such as this html compressor that I have recently worked on. I have moved on to other projects such as working with microcontrollers and creating games with the Unity game engine. Even the website you are looking at right now is created by me, with a few programming languages called HTML, CSS, and JavaScript. Admittedly, I am not programming the nitty-gritty of the web server or the JavaScript and HTML compilers, but ultimately the code is mine. To see what I mean, you can right click anywhere on this page and hit Inspect.
This will show you the HTML of the page you are looking at, and different tabs may show the JavaScript or CSS. Over time, I have used many different programming languages.
What is a Programming Language?
First, let’s take a step back. Those not versed in computer science or another form of computer education may not know what a programming language is, or why they should even care. In essence, a programming language attempts to take a messy, real world problem and try to distill it into something a computer can understand
(Morffis). The programming language is responsible for turning a standardized bit of text into something that is executed on.
In the 1940s, programmers primarily used basic code, like zeros and ones, to give instructions to a computer. Early pioneers like Konrad Zuse and William Schmitt created lower-level languages that made repetitive tasks much easier (Hierso). These advances, and advances after them, ensured that in a brief and relatively easy way any person with a programming language installed could quickly and easily, for instance, print something to the screen, and with a little bit of time, make a complex program. Here are a few examples of printing to the terminal:
- Python:
print('Hello, world!') - C++:
cout << "Hello, world!" << endl; - Rust:
println!("Hello, world!");
Core Principles
If a programming language is regarded as a tool to aid the programmer, it should give him the greatest assistance in the most difficult aspects of his art, namely program design, documentation, and debugging.C. A. R. Hoare
The bulk of formulating a programming language can be broken down into three main principles: program design, debugging, and documentation.
Program design refers to the structure and soundness of a program written in that programming language. Does the language use classes, structures, traits, or a combination of the three? Does it use something else entirely?
Debugging is easier to understand: can the language give a clear reason as to why something fails? Debugging is important because any program will eventually run into non-trivial mistakes and will require easy viewing of whatever is causing the problem or bug.
Documentation is the ease of adding explanations to code. Imagine a developer who is trying to develop a library to send networking requests. They may add a function called send_request to that library. However, the end user may have no idea how to use or implement this function into their own code. This is why documentation is necessary: it allows developers to explain of otherwise obtuse or obfuscated code.
How to Start
To start building a programming language, the programmer must first identify pain points in existing methods of production. These can be gripes about how a compiler is slow, the language isn’t fast enough, not easy enough to use, or otherwise. The first programming languages were created by people who were fed up with writing binary code all day, and therefore created a low-level programming language to make designing a program less repetitive (Hierso). In modern language design, Go is a shining example of this principle. Rob Pike, a developer of Go, highlighted the fact that specific pain points were the origin of some of Go's features and design principles: the properties Go does have address the issues that make large-scale software development difficult
(Pike). Next, new features must be implemented to solve these issues. To combat the complexity of outdated C and C++ systems, the Go language attempted to be as simple as possible to avoid confusion. Python attempted to solve the obfuscation of older languages by making everything readable and intuitive. Next, you will have to figure out whether the language is functional or object oriented and whether it is statically or dynamically typed.
What People Look For
Adoption of a programming language is critical to its success. Developer surveys, such as the 2025 Stack Overflow Developer Survey, provide insights into what features in programming languages are most sought after. A good index to look for is the admired/desired section, where developers from all over the world indicate what programming languages they would like to use in the future, and what programming languages they are using in the present which they like. Here's a short excerpt of the survey, ranked by the admired
statistic — that is, the number of developers which admire the said programming language.
- Rust: Type safe, semi-functional programming language that tries to balance safety and efficiency
- Gleam: Type safe, functional programming language that emphasizes safety and runs on multiple platforms
- Elixir: Functional programming language for building fault tolerant systems that are also scalable
- Zig: Focuses on simplicity and interoperability with C, an existing programming language
Some of the terms in the descriptions of these programming languages are quite cryptic. What does being functional mean? What does simplicity, safety, or efficiency mean in the context of programming languages? What is interoperability?
Functional vs. Object Oriented
Functional programming languages can be likened to functions in mathematics. Input in, answer out, no two outputs for one identical input. In essence, this is what functional programming languages attempt to provide. The try to guarantee that for every function call in a program, nothing changes state unless explicitly told to do so. Object-oriented languages, on the other hand, are somewhat different. Functions on objects can mutate, or change them. Instead of preventing side effects, object-oriented languages focus on ease. (Attila Fejér). To illustrate, here’s an example of making an existing string lowercase in two languages, Ruby and Gleam.
# Ruby example
example_string = 'HELlo, WORLd!'
example_string.downcase!
puts example_string
# Expected output: "hello, world!" // Gleam example
import gleam/io
import gleam/string
pub fn main() {
let example_string = "HEllo, WORlD!";
let downcased_string = example_string |> string.lowercase;
io.println(downcased_string);
// Expected output: "hello, world!"
} In the Ruby example, you can see that a singular string is being created, and its value is being modified. The modified value is printed out. In the Gleam example, which is functional programming, another variable must be created to store the lowercase variant of it. Then, the new variable must be printed. Although it may seem somewhat odd and counterintuitive, functional programming languages, as stated above, can prevent side-effects from programmers calling a function without knowing how it behaves.
Although object-oriented languages like C++, Python, and Ruby are very popular, the Stack Overflow survey suggests that functional programming is more sought-after in recent times.
Dynamically vs Statically Typed
When creating a new language, typing is an essential part. Types
are what a certain piece of data represents. For example, a variable could be a number, a string (text), a list, or even a custom data structure. What static typing entails is that each variable can only have one type, and you must always know what type something is. This is the traditional path, and many languages take this route. However, a different kind of typing exists, called dynamic typing (also known as duck typing). This means that you don’t have to necessarily know what type anything is at any given time. Again, for illustrative purposes, here is an example, with Python as the dynamically typed language and Rust as the statically typed language.
// Rust example
fn main() {
let x: String = "Hi! ".into(); // x is now a string
let y: i32 = 42; // y is now an integer
let z = x + y // ERROR! You cannot add two separate types together.
let z: String = z + y.to_string(); // Works because of explicit conversion.
println!("{z}");
// Expected output: "Hi! 42"
} # Python example
x = "Hi!" # x is now a string
y = 42 # y is an integer
print(x + y) # Doesn't error!
# Expected output: "Hi!42" The types used in a language can be indicative of more than just its typing system: rather, they can also indicate if a language is interpreted or compiled. Interpreted languages operate by going line-by-line in a file and interpreting
each line. Compiled languages are processed to machine code, meaning they can run on systems that don’t have the language already installed (thus making it more portable). A general assumed relationship (which is not always true) between a language’s typing and whether it is compiled or interpreted can be thus: programs written in static languages are compiled to machine code, while programs that are written in dynamic languages run in an interpreter
(Tourville).
Building the Language
So, you know what you want and are ready to start building your language. The first thing to decide upon is in what other language you should implement your language in, at least to start out with. Even if your language will compile and could theoretically be self-hosted in the future, an initial language will have to be built with tools from an existing language. For my testing language, TestLang, I used Rust. Even Rust’s initial release had to be built off of another language, OCaml.
Now, the main burden is implementation. A language implementation is built on three things: a lexer, a parser, and an interpreter or compiler (Tourville). There are many different ways to go about creating each of these things, but I will simplify for the sake of brevity. First up is the lexer. This segment of the language is responsible for translating the text in the program into lots of different tokens.
These tokens are easier to pass around in the implementation than strings, and they can also separate things such as identifiers and keywords. Next up is the parser. The parser takes the tokens generated by the lexer and organizes them logically so that the interpreter or compiler can compute each step of the way without much trouble. The interpreter or compiler, then, uses the output from the parser and acts on it. Of course, these steps are all just examples of how the actual lexer, parser, and interpreter could function. In this example, the whole file is processed before any action is taken. However, implementations of some languages use something called a REPL (read-evaluate-print-loop) which processes one line at a time.
Attempt, Process, & Difficulties
After researching what it takes to create a programming language, I attempted to create one myself. I called it TestLang as a portmanteau of Test
and (Programming) Language
. This is how I went beyond just the research, and learned the real-life difficulties in creating a programming language.