Linguistic Syntax in L4

How L4 borrows from natural language linguistics to make legal code readable.


Overview

Most programming languages borrow their syntax from mathematics and earlier programming languages. L4 takes a different path: it borrows heavily from natural language linguistics. This is a deliberate design choice rooted in L4's core principle of legal isomorphism -- the idea that L4 code should structurally mirror the legal text it encodes.

Legal text is written in natural language. If L4 is to mirror that structure, its syntax must be able to express the same patterns that natural language uses: possessives, ellipsis, coordination without conjunctions, and grammatical scaffolding that aids comprehension without changing meaning.

This page explains five linguistic features that set L4 apart from conventional programming languages, and shows how they combine to let L4 encode statutes and contracts in a form that legal professionals can read and verify.


The Saxon Genitive ('s)

In English, we say "the person's age" -- not "the age of the person" and certainly not "person.age". The possessive clitic 's (known in linguistics as the Saxon genitive or genitive clitic) is one of the most natural ways to express that something belongs to or is an attribute of something else.

L4 adopts this directly for record field access:

person's age
contract's buyer's name

In most programming languages, the equivalent would be dot notation: person.age or contract.buyer.name. The dot is a convention inherited from mathematics and early computing. It is concise but opaque to anyone outside the programming world.

The Saxon genitive, by contrast, is immediately legible to anyone who reads English. A lawyer reviewing L4 code sees "the contract's buyer's name" and understands it without explanation. This is not a cosmetic choice -- it directly serves the goal of making L4 code reviewable by domain experts who are not programmers.

Technical detail: The 's operator chains naturally, just as it does in English. company's ceo's name accesses the name field of the ceo field of the company record. When used as an argument to a function, wrap in parentheses: f (record's field).

For the full syntax specification, see the Genitive reference.


The Caret Operator (^) and Gapping

Consider a decision table that maps scores to grades:

grade MEANS
    BRANCH
        IF score >= 90 THEN "A"
        ^  ^     >= 80 ^    "B"
        ^  ^     >= 70 ^    "C"
        ^  ^     >= 60 ^    "D"
        ^  ^     >= 50 ^    "E"
        OTHERWISE           "F"

Each ^ stands for the token that appeared in the same position on the line above. The second line reads as IF score >= 80 THEN "B", the third as IF score >= 70 THEN "C", and so on.

This is a computational implementation of gapping -- a well-studied phenomenon in linguistics where repeated words are omitted in parallel structures. In the sentence "Alice ordered coffee, and Bob tea," the verb "ordered" is gapped in the second clause because it would be redundant. The listener fills it in automatically.

L4's caret operator makes gapping explicit. Rather than leaving gaps unmarked (which would be ambiguous in a programming language), the ^ character serves as a visible placeholder that says "same as above."

The effect is that decision tables become visually clean. The eye can scan down the columns to see what varies (the threshold and the grade) without being distracted by what stays the same (the keywords IF, THEN, and the variable score). In Edward Tufte's terms, this improves the data-ink ratio -- the proportion of visual elements that carry information versus those that are merely structural repetition.

For more examples, see the Ditto reference.


Asyndetic Conjunction and Disjunction (... and ..)

In linguistics, syndetic coordination joins clauses with an explicit conjunction: "A and B and C." Asyndetic coordination joins them without one: "A, B, C" -- the conjunction is implied by context.

L4 supports both styles. The keywords AND and OR provide syndetic coordination:

IF age >= 18 AND hasValidID

The operators ... (three dots) and .. (two dots) provide asyndetic coordination -- implicit AND and implicit OR, respectively:

DECIDE `commits cheating` IF
        `by deceiving any person`
   ...  p1   ...
                  `fraudulently`
              OR  `dishonestly`
              ..  "induces the person so deceived"
         ...    "to" ...     `deliver`
                         OR  `cause the delivery`
                         ..  "of any property to any person"

Here, ... joins elements with AND and .. joins elements with OR, but without spelling out the conjunction keyword each time.

Why does this matter? Because legislation itself uses asyndetic coordination extensively. A statute might read:

...fraudulently or dishonestly induces the person so deceived to deliver, or cause the delivery of, any property to any person...

The nested structure of conjunctions and disjunctions is implicit in the legal text. L4's ellipsis operators let the formalization preserve that same implicit structure, rather than forcing it into a fully explicit form that would look nothing like the original. This is statutory isomorphism in practice: the L4 encoding mirrors the structure of the statute it encodes.

The choice of three dots for AND and two dots for OR is a convention: the more common operation (conjunction) gets the longer symbol.


Inert Elements: Bare Strings as Scaffolding

Legal text is full of words and phrases that carry grammatical meaning for human readers but have no effect on the logical outcome. Connecting phrases like "induces the person so deceived" or "to any person" are essential for comprehension but are not boolean conditions.

L4 handles this with inert elements -- bare string literals that appear in boolean expressions:

         ...    "to" ...     `deliver`
                         OR  `cause the delivery`
                         ..  "of any property to any person"

The strings "to" and "of any property to any person" are inert. They appear in the code, they are preserved in the AST for visualization and explanation traces, but they do not affect the boolean result. During evaluation, an inert element becomes the identity value for its containing operator: TRUE in an AND context (because TRUE AND x = x), FALSE in an OR context (because FALSE OR x = x).

This is a precise semantic choice, not a hack. The identity value is the unique value that, when combined with the operator, leaves the other operand unchanged. Inert elements are mathematically invisible while remaining linguistically present.

The practical effect is significant: L4 code can include the connecting tissue of legal language -- the prepositions, the relative clauses, the explanatory phrases -- so that reading the code feels like reading the statute. The visualization and explanation tools can then present these phrases to end users, producing output that reads naturally rather than as a bare logical formula.

Inert elements are distinguished from active boolean conditions by their quoting style. Backtick identifiers like `deliver` are resolved to boolean variables. Double-quoted strings like "of any property to any person" are treated as inert scaffolding.


Backtick Identifiers

Most programming languages restrict identifiers to single words composed of letters, digits, and underscores: isPersonEligible, has_valid_id, calculateTotalAmount. These conventions are efficient for programmers but opaque to everyone else.

L4 allows multi-word identifiers enclosed in backticks:

DECIDE `the person is eligible for benefits` IF
    `the person is a citizen`
    AND `the person has resided for at least 5 years`
    AND NOT `the person has been disqualified`

This reads like structured English. A legal professional reviewing this code can understand what each condition means without needing to decode camelCase or learn naming conventions.

Backtick identifiers serve a deeper purpose than readability alone. They enable mixfix syntax, where function names and parameters can be interspersed:

GIVEN mom IS A STRING
      dad IS A STRING
      kid IS A STRING
mom and dad `have a baby named` kid MEANS
    CONCAT "mother: ", mom, ", father: ", dad, ", child: ", kid

Here, the backtick identifier `have a baby named` is a function name with parameters appearing on both sides, reading as a natural English sentence. This is a direct borrowing from how legal rules are phrased: "the buyer shall pay the seller" has the verb phrase embedded between the parties.


These linguistic features are not independent conveniences. They are designed to work together, enabling L4 to achieve legal isomorphism -- a structural correspondence between the L4 code and the legal text it encodes.

Consider Section 415 of the Singapore Penal Code, which defines cheating. The statute reads (in part):

Whoever, by deceiving any person, whether or not such deception was the sole or main inducement, fraudulently or dishonestly induces the person so deceived to deliver or cause the delivery of any property to any person, or to consent that any person shall retain any property...

Here is the corresponding L4 encoding:

DECIDE `commits cheating` IF
        `by deceiving any person`
   ...  p1   ...
                      `fraudulently`
                  OR  `dishonestly`
                  ..  "induces the person so deceived"
             ...    "to" ...     `deliver`
                             OR  `cause the delivery`
                             ..  "of any property to any person"
                 OR "to" ... `consent that any person shall retain any property`

         OR  `intentionally` .. "induces the person so deceived"
             ... "to" ...    `do`
                          OR `omit to do`
                          ..      "anything which he would not"
                              ...     "do"
                                  OR "omit to do"
                                  .. "if he were not so deceived"
             AND         "which act or omission"
                     ..  `causes`
                     OR  `is likely to cause`
                 ...     `damage`
                     OR  `harm`
                     ..  "to any person"
                 ...     "in"
                     ..  `body`
                     ..  `mind`
                     ..  `reputation`
                     OR  `property`

Every linguistic feature appears in this example:

  • Backtick identifiers (`by deceiving any person`, `fraudulently`, `deliver`) mark the boolean conditions that correspond to factual questions.
  • Inert elements ("induces the person so deceived", "of any property to any person", "to any person") preserve the statutory language as scaffolding.
  • Asyndetic operators (... and ..) join clauses implicitly, mirroring the implicit conjunction structure of the statute.
  • Explicit OR marks the disjunctive alternatives that the statute presents.
  • Indentation reflects the nesting structure of the legal provisions.

The result is L4 code that a lawyer can place side-by-side with the statute and verify: every clause is accounted for, every disjunction is correctly represented, every condition is identified. The code is simultaneously a formal specification that a machine can evaluate and an annotated rendering of the statute that a human can review.

This is what legal isomorphism means in practice. It is not about making code "look nice." It is about making the relationship between law and formalization transparent, auditable, and trustworthy.


Summary

Feature Linguistic Concept Programming Equivalent Why L4 Chose the Linguistic Form
's Saxon genitive Dot notation (record.field) Reads like natural English possessives
^ Gapping / ellipsis No equivalent Improves data-ink ratio in parallel structures
... Asyndetic conjunction && or AND Mirrors implicit conjunction in statutes
.. Asyndetic disjunction || or OR Mirrors implicit disjunction in statutes
"string" in boolean Inert element / scaffolding No equivalent Preserves statutory language without affecting logic
`multi word` Natural language phrasing camelCase identifiers Readable by legal professionals

Further Reading