Skip to main content
~/andi.dev

Write a domain-specific language in javascript in a weekend

1 September 2024

A domain-specific language(DSL) is a small language made for a specific purpose. It's the opposite of general languages like: JavaScript, Python, Go, etc. which are designed to do everything.

Think of SQL or HTML. SQL is made to more easily query structured data, and HTML is made to more easily represent the structure of a document/website.

We'll be making a DSL called jayson that lets us query JSON objects with a more readable syntax than with regular JS.

Who is this for?

This article is aimed at beginner to intermediate programmers, or for those who want to get their feet wet with making programming languages.

It's a light introduction to parsing and transforming text.

The rabbit hole of making your own languages runs deep. I recommend Crafting Interpreters as a more in-depth resource.

Contentslink

Why make your own DSL?link

Because it's cool. It's also much easier to do than making a general language. And it might even be a useful tool to be able to whip out in certain cases.

Why in a weekend?link

I really liked the approachable nature of Ray tracing in one weekend. It condensed a complex subject into a project that can be finished in a few days.

This will take less than that, a few hours most likely.

What does it look like?link

This is how you would query a list of objects in jayson:

scores
  score > 1000
  desc name.firstName
  grab name.lastName

instead of:

scores
  .filter(s => s.score > 1000)
  .sort((a,b) => -a.name.firstName.toLocalCompare(b.name.firstName))
  .map(s => s.name.lastName)

Overviewlink

It's probably best to start a DSL by planning out how what exactly it will do and how that will be accomplished.

We'll do the opposite of that for this project. We'll start by implementing a super-basic version of the language, and build up more features on top of it as we go along.

Here's a short list of what you should figure out when you start your next DSL project:

  1. Determine scope
    • What features are supported?
    • What features are explicitly not supported?
  2. Design the syntax
    • How will the DSL look like?
    • Does it have JS-style curly braces or significant whitespace like Python?
    • What are the keywords we're supporting?
  3. Implementation
    • Which language will it be implemented in? (also called the "host language")
    • How do we transform the DSL into host language instructions?
    • Will the DSL features require run-time functionality?

From text to codelink

There's a very simple pipeline that all DSLs have to go through. The details can be complicated but at a high level it's always just text -> code.

You write some text somewhere, maybe a file, maybe a string. Some code then picks that up, extracts all of the information from the text and turns it into more code that does something.

The first step of building jayson is to set up that pipeline.

To keep things easy in the beginning, we'll be writing the jayson code directly as JS strings.

What's our pipeline?link

We need 2 things to get started: a JSON object to query, and jayson string.

Fire up your editor and create a jayson.js file:

jayson.js
const o = {
  name: "My cool object",
  isItCool: true,
}

const jayson = "grab name"

o will be our queryable object, and we'll store the code for the jayson language in the jayson variable.

grab is the first bit of functionality that we'll implement.

Given the keyword grab and a property name, it will select that property from the object.

So the code grab name should return "My cool object".

In the same file, after the declarations, we'll create a function that can take the input and return the right output:

jayson.js
const o = {
  name: "My cool object",
  isItCool: true,
};
const jayson = "grab name";
const query = (json, input) => {
  // logic will go here
}
query(o, jayson); // will return "My cool object"

Parsing the jayson inputlink

The query function should make sure that the jayson code is in the right format, and then extract the relevant information from it so that we can use it.

jayson.js
const query = (json, input) => {
  const [keyword, property] = input.split(" ");
  if(keyword !== "grab") 
    throw Error("Invalid jayson syntax.");
  return json[property];
}

Look at that, your first DSL! And it only took 5 lines of code.

If you run console.log(query(o, jayson)) you should see "My cool object" being logged.

Now try to replace the jayson string to grab the other property.

What happens if you do grab someRandomProp?

Nested propertieslink

Modify the query object so it has a nested property:

jayson.js
const o = {
  name: "My cool object",
  isItCool: true,
  metadata: {
    coolnessScore: 100,
  }
};

Running grab metadata.coolnessScore will return undefined.

The query function takes the whole of metadata.coolnessScore as one string and uses that as a property, which doesn't exist.

It should split the properties by '.' instead and do multiple nested queries

jayson.js
const query = (json, input) => {
  const [keyword, property] = input.split(" ");
  if(keyword !== "grab") 
    throw Error("Invalid jayson syntax.");

  // Updated code
  const nested = property.split(".");

  let result = json; // result starts out as the full object

  for (const property in nested) {
    // as we loop through the properties, first "metadata", then "coolnessScore",
    // we set the result as each nested query that we performed
    result = result[property];
  }
  return result;
}

The for loop is doing the heavy lifting for handling the nested props. Here it is explained visually:


1/4 Initial
>
The loop hasn't started yet. result is assigned as the full object.
┌───────────────────┐ ┌──► Object │ let result = obj; │ │┌─────────────────┐│ │ ││ metadata ││ result┘ ││┌───────────────┐││ │││ coolnessScore │││ ││└───────────────┘││ │└─────────────────┘│ └───────────────────┘

Multiple propertieslink

How do we grab multiple properties from one object? We can't. Not yet at least.

The next extension to the grab feature, is adjusting the syntax so that running grab isItCool, metadata.coolnessScore will return both values.

But grabbing 2 or more properties from an object will result in 2 values. JavaScript can't do multiple returns.

We'll have to wrap the 2 values in either an array or an object.

Either

[true, 100] or {isItCool: true, "metadata.coolnessScore": 100}

The object wrapping is more readable so we'll go with that.

For consistency, we'll change the single property grab to return an object too.

grab isItCool will return {isItCool: true} instead of true.

jayson.js
...
const query = (json, input) => {
	const [keyword, ...properties] = input.split(/, |,| /);
	if (keyword !== "grab") throw Error("Invalid jayson syntax.");

  const aggregate = {};

  for (const propertyPath of properties) {
    const nested = propertyPath.split(".");
    let result = json;
    for (const property of nested) {
      result = result[property];
    }
    aggregate[propertyPath] = result;
  }
	
	return aggregate;
};
query(o, "grab isItCool, metadata.coolnessScore")

There's been a few changes, here they are explained line by line:


const [keyword, ...properties] = input.split(/, |,| /);

We are now splitting the input by a regex (/, |,| /) that matches: a comma followed by a space, a comma, or a space.

Given an input "grab isItCool, name, otherProp", it will split it into ["grab", "isItCool", "name", "otherProp"].

We still get the first item in the list and assign it to keyword, but we gather the rest of the items into a properties array.


const aggregate = {};

This is the object that will hold the final result, no matter if only one or multiple properties are grabbed.


for (const propertyPath of properties)

We're now looping through the possibly multiple properties we extracted.

For each one of them, we perform the nesting logic that was implemented before.


aggregate[propertyPath] = result;

Previously, we would return the result. Since we're dealing with multiple properties now, instead of returning it we add it to the aggregate object, keyed by the full propertyPath.

The propertyPath is the un-split string.

In grab isItCool, metadata.coolnessScore the propertyPath will become each of: "isItCool" and "metadata.coolnessScore" in the loop.


Running queries on lists of objectslink

The code we've written so far assumes that the json input will be an object.

We should support running queries on lists as well.

query(
  [
    {label: "object 1", value: 23},
    {label: "object 2", value: 57},
  ],
  "grab value"
)

Will return [{value: 23}, {value: 57}]

We'll add an extra variable in the query function:

jayson.js
...
const list = Array.isArray(json);
const aggregate = {};
...

We can use list to check if we should handle the querying as an object or as an array.

Next, we'll move the code that queries the object into a new function. (this new function is still inside the query function).

jayson.js
...
const list = Array.isArray(json);

// From declaring aggregate to the return, it's all in here now
const queryObject = (toQuery) => {
  const aggregate = {};
  for (const propertyPath of properties) {
    const nested = propertyPath.split(".");

    // This has changed from `result = obj;`
    // we're referencing the argument of the `queryObject` function now
    let result = toQuery;
    for (const property of nested) {
      result = result[property];
    }
    aggregate[propertyPath] = result;
  }
  return aggregate;
}

We've moved the aggregate logic inside the queryObject function, and there's no more return in the query function.

jayson.js
const list = Array.isArray(json);
const queryObject = (toQuery) => {
  ...
}

// New code:
if (list)
  return json.map(queryObject);
return queryObject(json);

We're using list again to check if we should handle the json as a list or as an object. If it's a list, we map through all of the objects inside the list, and we call queryObject for each one.

Otherwise, when json is an object, we call queryObject directly on it.

This is the full code up until now:

jayson.js
const query = (json, input) => {
  const [keyword, ...properties] = input.split(/, |,| /);
  if (keyword !== "grab") throw Error("Invalid jayson syntax.");
  
  const list = Array.isArray(json);

  const queryObject = (toQuery) => {
    const aggregate = {};
    for (const propertyPath of properties) {
      const nested = propertyPath.split(".");
      let result = toQuery;
      for (const property of nested) {
        result = result[property];
      }
      aggregate[propertyPath] = result;
    }
    return aggregate;
  }

  if (list)
    return json.map(queryObject);
  return queryObject(json);
};

Implementing more commandslink

>

This article is still in progress...

We'll add filter and order commands to the jayson language next time.

Check back in the near future or subscribe to get an email when the post is updated. I don't send spam.

If you spot any issues or want to leave a comment, shoot me an email at hi@andi.dev. I'll add interesting comments to the bottom of this article.

Design inspired by (and copied from): owickstrom.github.io/the-monospace-web