Mounting python objects as file system

Benedikt Jenik - January 6, 2019

Warning: Don’t try this at home. The whole thing is kind of a dumb idea, super hacky, and could make your machine hang or worse.

Have you ever wondered, when debugging some deeply nested python object chain, if there was a better way to explore it, like for example using your favorite file manager? Me neither - but let’s try anyway.

So what are we doing exactly? We are going to turn this

class LinkedList:
    def __init__(self, count):
        self.data = count
        if count > 0:
            self.next = LinkedList(count-1)

into this

Some nested files

in a way that’s fully dynamic and works on other objects as well, and all of that in less than 50 lines of code.

How?

FUSE and some python internals. But let’s take a step back to think about what we are doing a little more generally: we have some python object with attributes which contain further objects with more attributes and so on - and we want to represent that hierarchy as a folder structure to explore. There’s going to be a folder for each attribute which contains further folders if that object has its own attributes etc. and a txt file in each folder to look at the actual object value. We could obviously just create all of that stuff and be done with it - but building our own file system sounds like more fun, right? Also: actually creating all those files and folders on our main disk would mean we would have to clean up after - and who likes doing that? Thinking about this further, if our object is big, creating all those files and folders could also take a lot of time - and ideally we would only need the parts someone actually looks at - but can this work?

This is where our own file system comes in: everything is going to be fake - there’s not going to be any real files on any real disk - we are just going to trick the operating system into believing there is. To make this all happen we are going to rely on one of the - in my opinion - most powerful, most under-appreciated, and at the same time most dangerous CS concepts - and I’m not talking about some fancy algorithm - I’m talking about abstraction.

The magic of abstraction

Abstraction is what makes something like map-reduce awesome, allows us to build deep neural networks in just a few lines of code, and makes the internet work and so much more. In fact, I believe having the right abstraction that is at the same time immensely powerful, while also being super easy to understand, is the main reason for commercial success and mainstream adoption of any technology. It is what makes a bunch of recursive references on chunks of data and people computing checksums on that into a widely popular concept like blockchain (even though the answer on whether that was a good idea or just a lot of hot air in China is still open for some people).

But how does all of that help us here? The operating system of most computers also deals with actual file storage using an abstraction, because it couldn’t possibly know all potential formats of storing data and underlying hardware details. What it does instead is rely on different levels of drivers and interfaces, meaning the concept of dealing with files and folders boils down to having access to a number of functions that are providing capabilities of listing files and folders, getting details about them, reading and writing their contents and a few tiny extras. Everybody who can answer to those functions can pretend to be a real file system - how you are actually dealing with the data is your problem then - the os doesn’t care - it could even be all fake, without any files ever touching any disks. Or looking at it from the other side: what makes a real file is not that there is some bits sitting on some disk, but that there is someone telling you it exists and he is also able to give you the data if you ask - that’s it - it can be as made up as he likes, as long as he is able to do that it counts as real.

Implementation details

We are going to do exactly that. For this we are going to use FUSE, a library that helps us with some of the low level nastiness of talking to operating systems, and a python wrapper that makes it even easier. The actual minimal number of functions we have to implement for a read only file system that doesn’t blow up too badly is just 3. There has to be one function to list the contents of a directory at some path called readdir, a function read to read contents of a file and a function getattr that provides some metadata for a path to either a file or a folder - that’s all.

The python object hierarchy side of things is just as easy: each python object with attributes also has another hidden attribute called __dict__ that contains a dictionary listing all of them. What we are going to do is taking each file path we get asked about and split it into segments. We use those segments to recursively look up things in the dictionary if we find something and when we end up at the end of those segments we return whatever we find there, either as a string representation of the object for the read function or as a list of the other attributes it has for the readdir function that lists dictionary contents. If the path ends up somewhere that doesn’t exist on our side we just tell the os we can’t help you there. But this is most likely not going to happen, because the os is only supposed to ask us about contents of things we have told it about before, apart from the main folder / where it starts out.

Show me the code

So how does all of this look in code? Here it is:

Making it run

This code is focussed on brevity, not on solid engineering, so things may blow up badly - you have been warned.

You need some system that supports FUSE - so either Linux or macOS (or some more obscure unix variations - but all bets are off there) - sorry - no Windows.

You need to first install libfuse or osxfuse if you don’t already have and after that pip install fusepy. If all of that works you should be able to run the code from above by just running python3 objectfs.py or ./objectfs.py - the file system will be mounted in the folder you are running this from.

Instead of the custom made LinkedList(100) you can dump any object in there to explore it. You can even look at your favorite library - e.g. numpy by replacing the last few lines with this:

if __name__ == '__main__':
    import numpy as np
    ObjectFS(np)

Mounting all of the internet can be just as easy - but that’s maybe for another time.