Creating Test Files from Markdown Documentation Strings

This article is written for someone who has some familiarity with ReasonML.

I have been writing documentation for Relude, a standard library replacement ("prelude") written in ReasonML, targeting compilation to JavaScript. I’m using Markdown to document the library functions. Here’s what the documentation looks like for the length() function for arrays.

/**
  `length(xs)` returns the number of items in `xs`.

  ## Example
  ```re
  length([|"a", "b", "c"|]) == 3;
  length([| |]) == 0;
  ```
*/
let length: array('a) => int = Belt.Array.length;

To make sure that my example code was correct, I was copying and pasting the example code (between the lines with ```re and ```) into a ReasonML file, surrounding it with Js.log() calls, and running it:

  Js.log(length([|"a", "b", "c"|]) == 3));
  Js.log(length([| |] == 0);

This soon became tiresome, it was error-prone, and the result was a list of true (or false), with no real clue of which test was which. I decided to write a program to extract the examples for me and add the console logging, which included the test itself:

Js.log("================");
Js.log2("length([|\"a\", \"b\", \"c\"|]) == 3; ",
  length([|"a", "b", "c"|]) == 3);
Js.log2("length([| |]) == 0; ",
  length([| |]) == 0);

I’ve written a command-line program which is run like this. I need to pass the module name as the last argument so that the program can create an open for the module at the beginning of the test file:

node MakeTests.re path/to/Module.re Module > Tests.re

The Main Program

Here’s the main program code, which gets the command-line arguments, checks them for validity, and calls a function to process the file:

let nodeArg = Belt.Array.getExn(Node.Process.argv, 0);
let progArg = Belt.Array.getExn(Node.Process.argv, 1);
let fileOpt = Belt.Array.get(Node.Process.argv, 2);
let moduleNameOpt = Belt.Array.get(Node.Process.argv, 3);

switch (fileOpt, moduleNameOpt) {
  | (Some(inFileName), Some(moduleName)) => processFile(inFileName, moduleName)
  | ( _, _) =>
    Js.log("Usage: " ++ nodeArg ++ " " ++ progArg ++ " ModuleName.re ModuleName")
};

The first two lines get the first two command line arguments: the path to node and the name of our program. The Belt.Array.getExn() functions throws an exception if the item in the array doesn’t exist, but I know that can’t happen here—you must have specified node and the program name, or the program wouldn’t be running at all!

The next two lines are a different story, as the person running the program might leave out the file and module name. In this case, we don’t want to throw an exception or (possibly worse) get a null or undefined to haunt us. Instead, we use Belt.Array.get(), which returns an option type result: Some(value) if the array item exists, None if it doesn’t. When you have option type, ReasonML makes sure you handle both the Some() and None values. That’s what the switch statement does. If the user provided both arguments, it extracts them from their Some() wrapper and pass them on the processFile() function. Otherwise, it provides an appropriate error message telling people how to run the program.

Processing the Source File

Now is where I have to step back and plan what it means to “process the file.“ While I am scanning the source file, I need to keep track of three things:

Whether I am in example code or not.
All of the lines in the current example.
The resulting output string.

Here are the types I defined for that purpose:

type scanStateType =
  | Scanning
  | InExample;
  
type stateType = {
  scanState: scanStateType,
  exampleLines: array(string),
  result: string
};

Why not use a bool variable instead of creating a type for the scanning state? At this point in my planning (and, it turned out, in the final program), I had only two states, but I had the nagging feeling that I might need more than two. I might still need more than two if I want to add capabilities to the program.

The stateType is a record type, which is the ideal choice for a collection of heterogeneous data.

Now that I have my types, I can write the code to process a file: read in the file as one large string, split into an array of lines, set up an initial state, and then use reduce() to convert the array of strings into the result string. The initial result starts out with the open for the module:

 1 let processFile = (inFileName:string, moduleName: string): unit => {
 2   let fileContents = Node.Fs.readFileAsUtf8Sync(inFileName);
 3   let lines = Js.String.split("\n", fileContents);
 4   let init = {
 5     scanState: Scanning,
 6     exampleLines: [| |],
 7    result: "open " ++ moduleName ++ ";\n\n"
 8   };
 9
10   let finalResult =
11    Belt.Array.reduce(lines, init, lineReducer);
12
13   Js.log(finalResult.result);
14 };

The initial state (lines 4-8) is a variable of type stateType. ReasonML’s inference engine figured that out without me having to declare it. I could have made it explicit by changing line 4 to:

let init: stateType = {

Processing a Line

On to the lineReducer() function, whose logic is as follows: If I’m scanning and encounter a ```re, I switch my scan state to be “in an example“ and set the lines of the example to the empty array (as I haven’t gotten to those lines yet).

 1 let lineReducer = (acc: stateType, line: string): stateType => {
 2   switch (acc.scanState) {
 3     | Scanning => {
 4         if (Js.Re.test_([%re "/```re/"], line)) {
 5           {...acc,
 6             scanState: InExample,
 7             exampleLines: [| |]
 8           }
 9         } else {
10           acc;
11         }
12       }

Line 4 shows how to create a regular expression in ReasonML. In lines 5-8, I set the new accumulator value. The code ...acc uses the spread operator, which means “use all the fields in acc as they currently are.” I then set new values for only the changed fields. Without the spread operator, I would have had to update all the fields in the accumulator:

  {
    result: acc.result; /* keep result field unchanged */
    scanState: InExample, /* update other fields */
    exampleLines: [| |]
  }

In this program, spread doesn’t save me a lot of effort, as the stateType record has only three fields. If I had a record with many more fields, though, this would have been an indspensable shortcut.

If I’m in an example and encounter ```, I process the example lines (by adding the Js.log and other cleanup) and return to scanning for another example (lines 3-9). Otherwise, I trim off leading and trailing spaces from the line and add it to the example lines (lines 11-14).

 1     | InExample => {
 2         if (Js.Re.test_([%re "/```/"], line)) {
 3           {...acc,
 4             result: acc.result
 5               ++ "Js.log(\"================\");\n"
 6               ++ processExampleLines(acc.exampleLines)
 7               ++ "\n",
 8             scanState: Scanning
 9           }
10         } else {
11           {...acc,
12              exampleLines: Belt.Array.concat(acc.exampleLines,
13                [| Js.String.trim(line) |])
14            }
15          }
16       } 
17     }
18   };

Processing an Example

I can’t just blindly surround each line of the example with a Js.log2(); for two reasons:

Some of the examples take up more than one line.
Some of the examples may contain a double quote mark, which must be escaped.

Let’s tackle the second problem first. For readability, I created a function to escape quote marks:

let escapeQuotes = (s: string): string => {
  Js.String.replaceByRe([%re "/\"/g"],"\\\"", s);
};

The first problem is solved by accumulating all the lines until I find one with an ending semicolon (I have been very careful to make sure that all statements in the examples end with semicolons.) Once I have a “complete line,” I escape its quotes, get rid of the semicolon, and enclose it in Js.log2(). This doesn’t apply to let statements, which I use to set up functions that the example code will use.

If the code line isn’t complete—I haven’t encountered the semicolon—it’s a multi-line statement, so I add it to the statement I’m accumulating.

This is, again, a job for reduce, but this time I have two things to keep accumulating: the result of processing all the example lines, and the statement that is currently being built. Rather than create a new type, my accumulator is a two-tuple: (result, stmt).

Here’s the finished code to process the example lines:

 1   let processExampleLines = (arr: array(string)): string => {
 2   let endStmtPattern = [%re "/;\\s*$/"];
 3
 4   let helper = ((result, stmt), item) => {
 5     let full_stmt = stmt ++ item
 6     if (Js.Re.test_(endStmtPattern, full_stmt)) {
 7       if (Js.Re.test_([%re "/^let\s+/"], full_stmt)) {
 8         (result ++ full_stmt ++ "\n", "")
 9       } else {
10         (result
11           ++ "Js.log2(\""
12           ++ escapeQuotes(full_stmt)
13           ++ " \",\n  "
14           ++ Js.String.replaceByRe(endStmtPattern, "", full_stmt)
15           ++ ");\n",
16           "")
17       }
18     } else {
19       (result, stmt ++ item ++ "\n  ")
20     }
21   }
22   let (result, _) = Belt.Array.reduce(arr, ("", ""), helper);
23   result;
24 };

A few things to note:

The pattern for detecting an end of statement (line 2) allows blanks after the semicolon.
Line 4 uses destructuring to bind the variable names result and stmt to the elements of the tuple.
let statements aren’t enclosed in Js.log2() (lines 7-8)
The tuple that is returned if I detect the end of a statement sets the “complete line accumulator” (the second element of the tuple in lines 8 and 16) to the empty string—I am starting a new statement.
In line 22, I bind the result tuple from reduce to (result, _) I am interested in the final result, but don’t care about the last statement that was processed. The underscore is ReasonML’s way of saying, “ignore this.”

Conclusion

And there you have it; a short but useful ReasonML program that will save me a lot of work. You can see the code on Github.