Here's an idea. It's kind of crazy but it's also fun to think about.
Imagine a language which has every piece of its state stored in the filesystem.
Yes, I know it sounds weird but do read on!
Datatypes
Let's say we do this:
a = 1
What will happen under the covers is that file "a" will be created that contains nothing but "1".
Lists could be represented as files where each line corresponds to an item.
b = [1, 2, 3]
The above would simply create file "b" with the following content:
1
2
3
Filesystem also provides hash tables out of the box. They are called directories.
c = {
foo: 1,
bar: {
quux: 3
}
}
Which would create a directory "c" with file "foo" (containing "1") and directory "bar" containing file "quux" which would in turn contain "2".
Fun idea: For map indexing use operator / instead of []. So, to access "quux" you would write "c/bar/quux" which happens to be the path to the file in question.
Call stack
Of course, state of the program is not just about variables. We would have to represent the call stack in some way.
Easy to do! For each stack frame create a directory called "nextframe". The call stack would then look like a sequence of embedded directories:
main
|-- a # contains 1
|-- b # contains 1\n2\n3
`-- nextframe
|-- c
| |-- foo # contains 1
| `-- bar
| `-- quux # contains 2
`-- nextframe
`-- a # contains 2
In such a model, scoping is easy. If you want to access "b" in the innermost frame, just walk up the directory tree until you find a file (or directory) named "b".
Also, overrides work as expected. If you ask for "a" in the innermost frame, you'll get "2", not "1".
Instruction pointer
Now you are probably wondering how would one persist the instruction pointer. The language needs to know which command is the next to execute after all and not making this information persistent defeats the entire point of the language.
Let's make it work this way:
When program is started, a copy of the source file is created. We'll call it a "todo" file.
Then the first line of the "todo" file is executed. Then it is removed from the file.
We'll do the same with the second and every subsequent line.
That way the "todo" file always contains a list of commands that are yet to be executed.
But hey, you cry, if there's a loop in the source code we can't just delete a command within the loop! We will still need it in the next iteration!
Easy to fix! Let's make each iteration of the loop create a new stack frame (we would want that anyway, to get proper variable scoping). The body of the loop would be then copied to the "todo" file of the new frame and executed. Once there's nothing left in the inner "todo" file the scope would be exited, the stack frame deleted and the parent could start new iteration of the loop with a new copy of the loop body. Finally, when there's no more iterations to do the parent will delete the entire loop construct, including the loop body and move on.
Let's have a look at an example:
b = [1, 2, 3]
for i in b:
echo i
Once the first line is executed we end up with the following state on the disk:
main
|-- b # contains "1\n2\n3"
`-- todo # contains "for i in b:\n echo i"
First iteration of the loop is started. Interpreter creates a new stack frame and populated both the "i" variable and the local "todo" file:
main
|-- b # contains "1\n2\n3"
|-- todo # contains "for i in b:\n echo i"
`-- nextframe
|-- i # contains "1"
`-- todo #contains "echo i"
Now the echo command can be executed which will print "1" and remove the statement from the "todo" file:
main
|-- b # contains "1\n2\n3"
|-- todo # contains "for i in b:\n echo i"
`-- nextframe
|-- i # contains "1"
`-- todo #contains ""
"todo" file is empty now so we can exit and delete the scope:
main
|-- b # contains "2\n3"
`-- todo # contains "for i in b:\n echo i"
Note how the "for" construct deleted the first element of "b". This may or may not be a good idea, but it's simple, so let's go with it for now.
At this point we can do the entire dance described above for element "2", then for element "3".
Finally, there are no more elements in "b" and "for" construct is considered done. It can be deleted from the "todo" file.
main
|-- b # contains ""
`-- todo # contains ""
As there's nothing more to do, the entire program can now exit.
Why the hell?
I mean, it's a fun exercise, but why would anyone want to use such a language?
But once you start playing with concept a little you'll find out that it has some interesting properties.
Debugging
For starters, it kind of feels like being in a debugger although there's no debugger anywhere in sight.
It's entirely possible to execute just one statement at a time. The entire state of the program is on the disk, so once you've executed one command you can execute the next one and so on.
In the process you can inspect all the variables: Remember? They are just files in the filesystem.
$ hull next
done: b = [1, 2, 3]
$ cat b
1
2
3
Time travel
Given that the entire state of the program resides in a directory, you can backup the directory and proceed with the debugging. Then, when you want to return to the previous point, you just restore it from the backup and you are set to go.
Laziness
Once again: Given that the entire state of the program is stored on the disk, it means that you can execute it partially, then do something else, then resume the execution.
For example, a parent program may be interested only in the first line of your program's output. It can run the interpreter until the first line is produced, then suspend it and go on with it's own stuff. When it needs the second line it would just resume your program and so on.
In this way it works very much like python generators. Or, for that matter, like unix pipes.
And it's also a bit like Haskell: No code has to be executed unless you actually need the results.
Concurrency
Given that programs are interruptible, you can launch many of them and then schedule them as you wish. The scheduling can be driven by commands such as "give me the next line of output".
To put it in a different way: Forking is easy. Just copy the directory and run second instance of the interpreter.
Here we are getting into the area of CSP, goroutines and channels.
Remote execution
Want to move the program to a different machine?
Easy! Just scp the directory to the machine in question. Then ask the interpreter to resume the execution where it has ended.
April 28th, 2019
EDIT May 4th 2019: Typesystem based on file extensions, including polymorphism: frobnicate(foo.json) can have different implementation than frobnicate(foo.xml)
Idea not bad, but please read ruby.
grammar are not usefull. Please change this to similar ruby
object.method ()
[1,2,3].each{|e| puts e} or similar
I think Ruby syntax is ugly and hard to read
Indeed. Ruby is pretty awful
Username checks out!
I dunno if it'd be _useful_ but it sure sounds intriguing.
Given that the entire state of the program is stored on the disk
Of course, the disk in question might actually be a ramdisk.
Or the filesystem might be distributed across a network, like in Plan 9. ;)
Haha, this is a fun idea. I would like to try implementing it when I have the time, it's a good and fun practice project.
Go for it!
I have been toying with the same idea recently… so I ll give some input to further the reflexion :
- files and directory hierarchy is very well understood by the masses, so I see such a language as a very good introduction to programming for total beginners. With IoT around the corner it seems a good fit, if it eventually becomes actually useful.
- there is a possibility here to do something right, as in, mix the shell concept and the programming language concept, they should be one and the same (the shell is a just a repl)
- category theory seems a good theoritical framework for such a language… it s all about structure and composition. So I personnally would have a more functional approach to this topic…
- Data in the files could also be some kind of hierarchy or even graph to some level, not just lines. have a look at https://ogdl.org/
Maybe you wont have time to implement it alone, but with a few more people, we just might be able to.
Dont hesistate to send me an email if you want to talk it out further ;-)
I also have been toying with an idea along these lines… basically a programming language whose objects/structs/records could (at least potentially) be directories in the filesystem and bytes of data are the files.
I've been toying around with how one might implement a type system (beyond directories vs files). For example, a C-style struct is a bit like a directory with known filenames each with known size, or whatever.
I agree that being filesystem compatible is extremely good for the masses and great for playing and debugging. I'll also note the success of systems like git which are very filesystem based but let you store the data in another way. Though I think sometimes you need to go beyond a "traditional" filesystem - to execute bare-metal code over stack-allocated structs, or to access a distributed database - so I'm wondering if there are abstractions to be made so that you can use the same programming language (haha… shell script!) to interact with any of these?
(As an aside, I'd love to see an operating system that supports the concept of arrays in the filesystem, in addition to files and directories. The example of text separated by newlines stashed inside a file seems specific to certain types of data - elements which are text of unknown length and containing no newlines, for example - it would seem better if the filesystem provided the abstraction instead of everyone coming up with a different encoding).
Mainframes used to have that kind of thing: https://en.wikipedia.org/wiki/Virtual_Storage_Access_Method
Making all your state as files and directories is pure UNIX brain damage. This isn't only terrible from a design perspective, but also from tothe fact UNIX treats all file systems as a tape drive. You can kiss goodbye good performance.
Yes, you'll need some kind of intermediate in-memory caching layer to make it work at reasonable speed.
That's just a marvelous idea, considering that the underlying filesystem is RAM based, a good and smart implementation can make a real success. Thanks for this idea.
While the idea of persisting data objects certainly sounds intriguing, I am not entirely convinced that for example arrays and newline-separated lines map well to each other. For the record I am Unix zealot so I see how the idea works; it is just that I work too much with binary data and more complex data these days.
Cool idea, and could be expanded into other areas like first-class functions or continuations. Unfortunately, not all state can be saved into the todo / nextframe files, so you have to make some (considerable) sacrifices. It starts getting more complicated when you think about executing other code (eg. native binaries) which for a shell is a pretty common task.
I imagine it wouldn't be difficult to create a pointer into the list in scope of the for-loop. This pointer could for example be a file containing the name of the sequence and index, eg. b and 1. This way you could preserve b in its original form and use it in the loop. But then we run into other sorts of complications, and after giving it some thought it seems that maybe the right way to do it would be to translate the Pythonic for i in b into a more low-level while-loop with incrementing index and access the elements of b using this index.
This is where it gets interesting. If the underlying system supports sockets or named pipes, this could help with IPC (shared memory / mutual exclusion / synchronization, select and so on), but again, as with executing native binaries or other programs, you cannot properly preserve the state if you are using sockets or pipes. And more importantly, you will not be able to debug correctly — once you read the data from the shared object manually (during debugging), it's gone and will not be available to the program. All in all, I think you not only will not have time to implement this shell, but if you want it to do anything meaningful, you most likely won't be able to do it at all, at least not properly. But I suppose you know all the caveats…
There are definitely challenges and integrating with existing pipe-based programs seems to be the most daunting of them.
That being said, the hardware has changed since 1970, we have much more memory and much more disk nowadays and thus a large subset of the use cases can be solved by simply getting the entire output of the binary and saving it to a file (ls, grep and such).
What remains is a.) handling very large datasets b.) handling binaries with infinite output (e.g. yes).
As for a.) I am not sure that shell-like language is the good option for this kind of stuff in the first place.
As for b.) these can dealt with as a process running in a background and a named pipe. That will, if course, break once you restart the system, but until then it'll work.
It reminds me of a different idea implemented at an Australian university in the early 1990s.
They couldn't afford an OODBMS - they were still expensive at the time, so they implemented their own where each row was a Unix text file, each table was a directory, and queries were done with Unix shell commands.
Unbearably slow, missing key concepts like transactions and security, but sufficient to allow them to try out experimental ideas.
Post preview:
Close preview