Project: CatScript - Concatenative JavaScript
After I explored Forth, I wondered if I could apply some of what I learned to my comfort zone of JavaScript in a browser environment.
The TypeScript source code for the language lives here.
REPL
Tests
Click here to toggle the visibility of this section
If you see ❌ or 🚧, then that test has failed.
This paragraph contains some tests which don't make much sense in the table below. First test: 🚧. Leading and extra whitespace don't change anything: 🚧. Look in the console to see if this logs a checkmark.
Word(s)/feature | ✅ / ❌ / 🚧 | Code |
---|---|---|
Emoji | 🚧 | |
' ✅' me >text | ||
swap | 🚧 | |
' ✅' ' ❌' swap me >text | ||
dup | 🚧 | |
' ✅' dup me >text me >text | ||
drop | 🚧 | |
' ✅' ' ❌' drop me >text | ||
over | 🚧 | |
' ✅' ' ❌' over me >text | ||
rot | 🚧 | |
' ✅' ' ❌' ' ❌' rot me >text | ||
-rot | 🚧 | |
' ❌' ' ✅' ' ❌' -rot me >text | ||
&& | 🚧 | |
true ' ✅' && me >text | ||
|| | 🚧 | |
false ' ✅' || me >text | ||
: and ; | 🚧 | |
: single ' ✅' ; single me >text | ||
Multi-word : | 🚧 | |
: multi 4 ' ❌' ' ❌' ' ✅' ; multi me >text | ||
Multi-level : | 🚧 | |
: levels 1 2 3 multi ; levels me >text | ||
immediate , , , tick , lit | 🚧 | |
: check immediate tick lit , ' ✅' , ; : pushCheck ' ❌' check ; pushCheck me >text | ||
typeof | 🚧 | |
' test' ' string' typeof ' ✅' && me >text | ||
now | 🚧 | |
now ' number' typeof ' ✅' && me >text | ||
@ and ! | 🚧 | |
variable foo 5 ' ✅' foo ! foo @ me >text | ||
numeric == | 🚧 | |
5 5 == dup ' boolean' typeof && ' ✅' && me >text | ||
numeric === | 🚧 | |
5 5 === dup ' boolean' typeof && ' ✅' && me >text | ||
+ and - | 🚧 | |
5 4 3 + - -2 == ' ✅' && me >text | ||
2dup | 🚧 | |
8 7 2dup + + + 30 === ' ✅' && me >text | ||
Floats + like JS | 🚧 | |
0.1 0.2 + dup 0.3 > swap 0.31 < && ' ✅' && me >text | ||
\* and / | 🚧 | |
10 1.5 * 5 / 3 == ' ✅' && me >text | ||
Floats * like JS | 🚧 | |
0.1 0.2 * dup 0.02 > swap 0.021 < && ' ✅' && me >text | ||
> and < | 🚧 | |
4 5 < 5 4 > && ' ✅' && me >text | ||
>= and <= | 🚧 | |
4 5 <= 4 4 <= && 5 4 >= 5 5 >= && && ' ✅' && me >text | ||
sleep | 🚧 | |
now 750 sleep now swap - 749 > ' ✅' && me >text | ||
After sleep | 🚧 | |
' ✅' me >text | ||
: and sleep | 🚧 | |
: sleepier now 750 sleep now swap - 749 > ' ✅' && ; sleepier me >text | ||
branch | 🚧 | |
: 3, immediate 3 , ; : branchy ' ✅' branch 3, ' ❌1' ' ❌2' drop ; branchy me >text | ||
0 0branch | 🚧 | |
: branchy ' ✅' 0 0branch 3, ' ❌1' ' ❌2' drop ; branchy me >text | ||
1 0branch | 🚧 | |
: branchy ' ❌1' 1 0branch 3, ' ✅' ' ❌2' drop ; branchy me >text | ||
Falsy falsyBranch | 🚧 | |
: branchy ' ✅' ' ' falsyBranch 3, ' ❌1' ; branchy me >text | ||
Truthy falsyBranch | 🚧 | |
: branchy ' ❌1' true falsyBranch 3, ' ✅' ; branchy me >text | ||
true if | 🚧 | |
: iffy ' ❌' true if ' ✅' endif ; iffy me >text | ||
false if | 🚧 | |
: iffy ' ✅' false if ' ❌' endif ; iffy me >text | ||
true if/else | 🚧 | |
: iffy true if ' ✅' else ' ❌' endif ; iffy me >text | ||
false if/else | 🚧 | |
: iffy false if ' ❌' else ' ✅' endif ; iffy me >text | ||
begin/until | 🚧 | |
: countDown begin 1 - dup 1 < until ; 5 countDown 0 === ' ✅' && me >text | ||
Endless loop | 🚧 | |
: flicker begin 750 sleep dup me >text 250 sleep over me >text again ; ' ✅' ' 🚧' flicker | ||
me . innerHTML | 🚧 | |
me . innerHTML ' 🚧' === ' ✅' && me >text | ||
select | 🚧 | |
' span:nth-child(2)' me select first ' ✅' swap >text | ||
select' | 🚧 | |
me select' span:nth-child(2)' first ' ✅' swap >text | ||
select in : | 🚧 | |
: selecty ' span:nth-child(2)' me select first ; selecty ' ✅' swap >text | ||
select' in : | 🚧 | |
: selecty me select' span:nth-child(2)' first ; selecty ' ✅' swap >text | ||
next | 🚧 | |
me select' span' first ' span' swap next ' ✅' swap >text | ||
previous | 🚧 | |
me select' span:nth-child(2)' first ' span' swap previous ' ✅' swap >text | ||
addClass & removeClass | 🚧✅ | |
' hidden' me select' span' first addClass ' hidden' me select' span:nth-child(2)' first removeClass | ||
toggleClass | 🚧✅ | |
' hidden' me select' span' first toggleClass ' hidden' me select' span:nth-child(2)' first toggleClass | ||
nth | 🚧✅ | |
' hidden' me select' span' first toggleClass ' hidden' me select' span' 1 nth toggleClass | ||
on click , closest , & emit | 🚧 | |
on click ' ✅' me >text ; | ||
( comments ) | 🚧 | |
: checky ( comment in definition ) ' ✅' ; checky ( this is a comment, it can contain anything ✅ except a closing paren ) me >text | ||
match/ .../ | 🚧 | |
' te123st' match/ e\\d+/ first ' e123' === ' ✅' && me >text | ||
match/ .../ within : | 🚧 | |
: matchy match/ e\\d+/ first ; ' te123st' matchy ' e123' === ' ✅' && me >text | ||
C and . operator | 🚧 | |
5 C . parameterStack first === ' ✅' && me >text | ||
.! | 🚧 | |
' ✅' me >text true C .! halted ' ❌' me >text |
Logbook
Tue Feb 06 21:53:03 PST 2024
I reflected on Hyperscript and how it added a convenient layer on top of JavaScript. In my experience with Hyperscript, I was frustrated with extending the language. And that was one of the main strengths of what I'd learned of Forth. So I started this experiment as a way to explore some universe where a library as convenient as Hyperscript was as extensible as Forth.
I started with a script on this page.
Wed Feb 7 10:46:08 AM PST 2024
I continued making progress in the time normally allotted for creative coding. Today's creative coding prompt was "some of my favorite things." JavaScript, exploring programming languages, and putting energy towards creative interfaces felt aligned with the prompt.
I succeeded in getting some small code on an HTML attribute to work how I expected! It was so fun to use JS's step debugger to walk through my code and see it work. I continued to make some more basic words, like dup
, swap
, and drop
.
Thu Feb 22 08:49:18 AM PST 2024
There were two different things I wanted to implement in my language. Well, re-implement. I'd struggled thinking of things I'd like to use my language for as I was writing it, because I invested most of my interest and drive in that moment into the language itself.
Future: I wanted to reimplement some hyperscript I'd written recently in my typing game. The hyperscript was relatively simple, but there was a lot that this language was missing to get there.
Future: I wanted to reimplement a 5 minute timer I'd written in Forth.
Both of these ideas were pressing today since I wanted to give a presentation on this concept. I signed up to give the presentation to give me an accountability target for the day, and I had to admit that was motivating me!
I started taking incremental steps towards those things, to see how far I could get before the presentation. Since my hyperscript example relied heavily on variables, and I wasn't immediately sure how I'd implement the same thing in my language, I started with the forth example. I implemented now
to get the current milliseconds and typeof
to quickly test if the result was a number. Then, of course, I noticed that my Forth code also uses variables. But these are Forth-type variables, which are much more straightforward (now that I've read Starting Forth) than Hyperscript variables. So I pushed forward.
I got an implementation of variables that matched the most basic semantics from Forth. That is, my new words @
and !
could act successfully on the result of a variable. But pointer arithmetic would not work, since there was no linear memory behind it. In short, I faked it.
The next thing I had to tackle was waiting. Dot dot dot. Silence. Do nothing. Up to this point, my interpreter would execute every single bit of code as quickly as it could without pause or interruption. To do so, I'd have to hook my language in to JavaScript's event loop somehow. I wasn't sure how, but I created a dictionary word sleep
to stare at where the implementation should go.
I got my basic test passing, and in the process I found a lot of inadequacies in my interpreter implementation. I had made many assumptions in order to move quickly towards the next incremental step. I felt good about the decisions I made up to this point, but also daunted by the challenges. And there was still yet more features to add! I still had my failing click event test, which now wasn't being executed at all.
Up to this point, I used the same interpreter for every distinct piece of code on the page. This worked until I introduced asynchronicity. The problem was that the interpreter was stateful, and each piece of code on the page (i.e. each c=
HTML attribute) needed to set a different state. Previously this worked smoothly because each script would block until completion, and the next one could set the internal variables for itself without any conflict.
The solution I had in mind was to use a separate instance of each interpreter for each distinct script. And a pattern I'd already been using without understanding why became clear - I'd been passing around a "context object", ctx
, where I was storing all the state. Well, not every piece of state, I hadn't been perfectly consistent. But if I refactored to make my use of the context object consistently, and make every other function entirely pure, then I could simply switch to a separate ctx
for each script, and isolate all state therein. That worked!
I was missing a lot of the functionality still to replicate my timer in Forth, but I realized I could do a different version of the same thing with separate scripts in separate HTML elements.
I made a rough-and-tumble, ad-hoc area to present this at RC:
4 1 + . ." languages"
In my rush to prepare the above presentation, I accidentally made the timers all 10 times the duration, so one minute actually took 10 minutes to trigger. Whoops! Anyway, the presentation went very well.
Sun Feb 25 09:14:19 PM PST 2024
I wanted control flow via if
and else
, and loops like Forth's begin until
. I struggled to understand how one does this, but this reddit comment unlocked it for me. The missing link for me was that Forth uses the parameter stack during compilation to save, recall, and point back to memory locations earlier in the definition. So during compilation, when if
runs, it compiles some code that will jump to a place, but it also leaves an empty spot in the compiled code and puts the address of that empty space on the data stack. When later endif
is compiled, it grabs the address off the data stack and sets the current compilation address back into it. Thus, if
now has the address it needs to jump to if the condition is false. With that level of understanding, I tried to make it work.
My attempt at this revealed a handful of missing pieces in my design. For example, I had no way to run user-defined words containing multiple words. That would be a prerequisite to jumping around in a multi-word definition! So I had to implement that first.
As an aside, I found it very fun to implement a word debugger
which had the same semantic effect as the debugger
statement in JavaScript. I popped this word into the middle of a failing test and went exploring! And it helped me catch a significant bug as well. My previous isolation of the ctx
object had not been as thorough as I'd imagined! I was still sharing a lot of data across different words. I also found a couple bugs where I was pushing data to the wrong ctx
's parameter stack because of a typo. So fun to dive into my own code and watch it work!
Back to the main event, multi-word definitions were simple to implement in the case of synchronous code. In fact, I just had to loop through each definition. Asynchronous code was trickier, in the same way as I'd found when I implemented sleep
in isolation, within the definition of a word it had to pause that loop and resume it later.
Unfortunately, I couldn't determine how to write a test which would fail accurately for my asynchronous code. That is, even though I knew the semantics of sleep
inside a :
definition were nonsense in my current implementation, the test I wrote passed! And I wasn't sure how to write a better test. The best I came up with was : sleepier ' ❌' 750 sleep ' ✅' ; sleepier me >text
. But this passed incorrectly because the checkmark was immediately pushed to the stack! How could I design a test which would show the current flaw in my design? I considered that I could do a little math with now
, so assert that I was sleeping for 750 milliseconds, and then test that now
was different by at least 750 milliseconds! That should work!
Thu Feb 29 06:18:25 PM PST 2024
I wrote the test I described above and it failed successfully. That is, now 750 sleep now swap - 749 >
was true for my sleep
test at the top level, but it failed when sleep
was used in a colon definition. Again, I expected this because I knew my implementation was incomplete.
The solution I imagined was for my colon definition implementation to change the state of the interpreter. As it was, the colon definition implementation simply looped synchronously. So I added a return stack and changed the interpreter as a state machine. I was utterly surprised at how effective it was!
Fri Mar 1 10:49:31 AM PST 2024
I traded the Markdown table for a HTML table, because a huge markdown table makes for really terrible git diffs - When you reformat to make the columns wider or skinnier, every line changes! Very confusing when mixed with semantic changes as well.
Sun Mar 3 03:53:28 PM PST 2024
I finally felt ready to implement if
and endif
. There were still some decisions to make about memory representation, but I felt I understood the problem enough to make those decisions as I went.
Future: I had an idea about compiling words. Perhaps I could append vanilla JS to a string to build a new function. So every time I run :
, a global function is actually constructed. I'm not sure how I could do that with JavaScript return statements. So I think I'd have to do some careful management of return statements. However it works, it will be different from Forth because it has to compile JavaScript instead of assembly, and JS and assembly have very different operations.
Mon Mar 4 07:44:28 AM PST 2024
In Forth, the return stack and parameter stack hold arbitrary data of any type. In this language, I wanted similar semantics. And by now my parameter stack already worked this way. But the return stack was different, because it had one 'normal' kind of data for which the return type was consistent, but if users could push and pop any data from it, then those structured chunks would be interspersed with anything. Obviously, this made Typescript freak out. But that's the correct response to interspersing structured data with arbitrary data. One example of a nasty issue which might arise from this is that users could push things which kind of look like the structured return pointers but not really, and so bugs could get extremely confusing.
Fri Mar 29 02:39:42 PM EDT 2024
I started reading Jonesforth Implemented lit
and compiled numeric primitives using it as Jonesforth does. I also renamed my internal dictionary variable to latest
. Jonesforth is already an invaluable resource and I feel like I've only just begun to study it.
Wed Apr 24 10:29:00 AM PDT 2024
I've made a lot more progress I never described. I got control structures working and an idea for how to work with click handlers and much more.
I wanted to play with the library for a creative coding exercise, but I realized I know how to quickly include it on another page. It would be easy to include on another page on my website, but maybe not on the web-at-large? Or maybe I could just include the compiled JavaScript straight from my website? I tried this on a codepen. I noted a couple issues. First, the classic jQuery ready
problem. I added an event listener for DOMContentLoaded
to fix that. Then, because I was using a type="module"
script tag, I didn't expose anything on the global object. That made it really hard to detect if it was loaded at all in codepen. So I added some of the utilities to the global scope. Then I pushed to deploy and test again on codepen.
After I pushed, it still didn't work. I checked how I was using codepen, and I was using it wrong! I never properly included my script in the first place. When I properly included it, it worked! I didn't know if it previously would have worked without my improvements, but I liked the improvements either way, so I didn't revert and check.
Tue Jun 25 08:10:05 PM PDT 2024
Saw the Cognate Language, and I enjoyed a lot of its key insights. One insight was using words starting with uppercase letters as commands and all lowercase words as comments. Another was the prescript notation. I had thought it would be fun to make a toggle to a line-oriented, prescript variant, so one could go back and forth when one deemed appropriate. Confusing, but fun.
Added comments.
I started translating the exact code for execute
from JavaScript into my language. A big part of what interested me about Forth was the exposed internals of the interpreter, and right now my interpreter was wrapped up in JavaScript. I wanted to move as much as possible into my language, so that I could mess with those internals from the inside as well as the outside.
Wed Jun 26 09:11:13 AM PDT 2024
On the way to implementing the interpreter internals inside the language, I implemented regular expression matching. I didn't yet make any way to create a JS RegExp object on the stack, because I couldn't think of a way to test it given the other current faculties of the language. I needed to make a way to call arbitrary functions on objects on my stacks, so I could call the equivalent of "abcd".match(/bc/)
. Currently, I had no way to do that.
Future: I explored how to make arbitrary JS function calls on objects in my language. I wondered if this was related to the issue of evaluating arbitrary JavaScript. I also wondered if it was related to calling the new
operator in JavaScript on arbitrary objects.
I continued translating code from JS to my language, now onto the direct manipulation of the textual input stream (the string of actual code being interpreted)
I got tired of saying "my language" in here. I was surprised I'd never written down the name I was calling the project in my head. I guess I didn't want to commit. Maybe I was afraid of picking the wrong name and having to do find and replace a bit if I changed hte name later. Anyway, a while ago, Elias M helped me come up with the name "CatScript". I checked the name against Wikipedia's list of programming languages and found no matches. I did find references to CATScript, which was (is) apparently a variant of Visual Basic for Applications for use in a CAD program called CATIA. But I thought hopefully there was minimal crossover. CatScript merges "conCatenative" languages, a category of languages of which Forth is traditionally a member, and JavaScript. And it's cute. So why not? So from now on I'll call this CatScript.
As I continued to work, I remembered how much fun this project was to work on. Continual progress. Once I commit to doing a certain thing, it's far easier and more straightforward to implement than I imagine in my head. That encourages me to keep trying things out. I remember how this matched my drive to do more than think, and the feel the results of my doing over imagining the results of my thinking.
I implemented a few things where I wasn't sure if that's how I wanted the language to work. Like property access and mutation via .
and .!
. They looked odd. But they were so fast to implement, and I was moving forward. It would be easy to change later! I had to keep reminding myself of this. Maybe if I kept on working on this project for long enough, I'd internalize that lesson.
Future: I made a <script type="application/catscript">
so I didn't have to make empty divs just to write some code on the page.
I finally finished implementing foreach
to allow me to iterate over an array easily. And with that, I was able to replace all the Hyperscript on my page with CatScript!
Thu Jun 27 08:53:33 AM PDT 2024
After my success with the complicated foreach
loop, I turned back to my main current goal. I wanted CatScript to have all the internals of the interpreter written in CatScript. The point is that they could be replaceable that way, modifiable. And that's what I keep coming back to about CatScript in my mind. It's malleable. It's open and reworkable. That's what makes it attractive as a quick-n-dirty tool.
I was using the strategy "burn the candle at both ends," which I think I'd heard from a lecture by Gerald Sussman (it could also have been co-author Hal Abelson) from the MIT course on Scheme on YouTube. Or maybe I read it in SICP years ago. Anyways, the strategy is to write high-level code that doesn't work yet, and switch back and forth to low-level code that definitely does work, but doesn't do a complex, useful thing on its own. My high level code replaced the core functions which drove the interpreter that I'd previously written as JavaScript with CatScript versions. And the low level code amounted to all the improvements to the language I'd made in the past couple days.
I only had a few high-level concepts left to implement, the different modes of the interpreter, I called them "queryWord", "compileWord," and "executeCompiledWord". Forth would call them the binary interpreter "states".
Fri Jun 28 12:06:49 AM PDT 2024
More bits of work on the high level portion of "burning the candle at both ends".
Future: One of the big questions which kept coming up was "how am I going to encode calling JavaScript functions?" It came up a lot because of laziness - I didn't want to implement every single convenience function in CatScript. But even if I did go that route, I still want a seamless way to call JavaScript functions. There were a few options I could imagine. One involved eval, like evalJs' func()'
to call a JS function named func
. That seemed like the most straight forward path and probably a good place to start. I had to remind myself yet again to just try the most obvious thing instead of ruminating on possibilities.
I wrote a small REPL to play with. I knew it was difficult to work with if you didn't know exactly what was going on, but it was a fun diversion from some bigger questions.
Tue Jul 2 10:12:31 AM PDT 2024
I began reading Jonesforth from the beginning again, now that I had some practice implementing my own version, to compare and contrast. I found Jonesforth still the pinacle of concise and clear explanation and code. My implementation was all over the place and verbose in comparison. A few more things clicked for me on this reading, and I wanted to try implementing them in CatScript. The use of QUIT
, INTERPRET
, and codewords began to make a lot more sense to me. I wanted to test my understanding.
First, I tried implementing QUIT
as the starting point for all CatScript execution, as it is in Jonesforth. Immediately I realized this didn't make sense, because sometimes CatScript pauses for async timers or events and then resumes later. If every time it resumed, it cleared the call stack, that would make async code almost useless. Still, QUIT
was a useful faculty to have in a REPL, so I implemented it anyways, unused for now. QUIT
called INTERPRET
, so I implemented that next. In Jonesforth's assembly, QUIT
loops on INTERPRET
indefinitely. I wasn't sure if my implementation should have the call to interpret in a literal loop?
I realized while implementing this that I had a kind of redundency. I could define words in JavaScript, and increasingly I could equally define them in CatScript. The shapes would be different, though. In JavaScript, I could put a sequence of calls to other JavaScript functions in the body of a function. In CatScript, that would be a less performant loop through an array of "compiled" code. That compiled code was just the same implementation though. Was it less performant? I wasn't sure, because in my JavaScript function bodies, I was always using findDictionaryEntry
, which might search through lots of dictionary code again and again. As I considered this, I realized that my constant use of findDictionaryEntry
was incorrect, or at least non-standard Forth. In Forth's compiled code, it's always compiling pointers to the actual implementation of whatever word is meant. That means that if a new word of the same name is recompiled, the old pointers don't change, and nothing uses the new word. In my code, on the contrary, if a new word was compiled into the dictionary, all my code which constantly used findDictionaryEntry
would suddenly run the new code.
Running new code like this wasn't impossible in Forth, but it wasn't the default way things happened. In Forth, to get this "automatic updating" behavior, you'd have to either call findDictonaryEntry
explicitly (usually called find
in Forth, I think) in your code, or put the pointer to the code you want in a variable you manually update, or even tediously edit the pointer in some compiled code. So, there are ways to do what I was doing, but they all take extra effort.
If I wanted to compile my code how Forth normally does it, I'd do that by calling findDictionaryEntry
once, and using a closure reference to that original value in my code. Jonesforth achieves something similar by duplicating the name of the Dictionary values as Assembly labels, and then using the Assembly labels.
Future: I was using different names for similar concepts in Jonesforth. I was using a variable called "interpreter" to hold the current state of the interpreter. Forth calls this "state". I find this pretty confusing in a modern sense. "State" is one of those words which is used in so many different contexts to be utterly meaningless as a name. But "interpreter" was also confusing, so I changed it to "isCompiling".
In Forth, "state" is a binary value which represents whether the interpreter is running in "immediate mode" or "compiler mode". My equivalent "interpreter" variable had 3 values. The same two as Forth (though I called them "queryWord" and "compileWord" respectively, names I wanted to change to align more with Jonesforth), plus a third one for "executeColonDefinition". Reading Jonesforth helped me understand the difference. In Forth, the equivalent ot my "executeColonDefinition" is "DOCOL", which is a codeword on each word. I didn't have codewords, I was using this extra bit of state instead. I wondered if I could align my implementation better to Jonesforth by implementing codewords. I tried adding a separate codeword
field, along with my JavaScript impl
and compiled array of functions called compiled
.
I ran into a lot of issues and that exposed a pretty interesting fundamental difference between the JavaScript environment which CatScript existed within and the assembly environment which Jonesforth lived in. In Jonesforth, those codewords were addresses which never changed after they were defined, which meant they could be written as numeric offsets or addresses directly into the dictionary definition. I feel like I didn't say that correctly, but either way, JavaScript's different. In JavaScript, we'd have to somehow point to the implementation function for JS-defined words or point to the compiled array of words. Either way, the codeword
field could not be static. It would have to be dynamic and be connected to the dictionary entry. All that hastily said, I decided to simply always call the existing impl
field, and in the case of compiled words, use the impl
function as the codeword
. The rambling nature of this paragraph might show my lack of comprehension, but I felt like I came to a good place that matched Jonesforth.
Jonesforth names all code (core) words with assembly labels so that other core words can call them by label instead of looking them up constantly. Looking them up in the dictionary isn't just a (slight?) performance hit. If we looked up the definition of core words by name, we might get words of the same name defined by users later. We don't want that for core functionality. So I made a little map to store the implementations of core words by name.