Over the last semester I created a rudimentary filesystem for Linux to access files on Amazon S3. The original goal was to create a system that was transparent to the end user and provide a form of semantic caching that would be optimized for applications like browsing photos or music and for network connected devices like a mobile phone or netbook. The motivation being that your files could reside in the cloud where they are safe, synchronized, and have unlimited room for growth. Due to my difficulty with writing the initial filesystem I did not have time to work on caching.
The original plan was to develop this completely in Linux Kernel space so that it would be simple and compact. However, to simplify development and use libcurl for HTTP requests, I had to split the project into Kernal space and userspace components. This follows a similar structure to projects like FUSE and Coda:
In this structure, requests have to pass through a virtual character device defined by the system, which is not ideal (you have to poll the device) but seems to work fairly well. The filesystem itself leverages standard Linux system calls.
Here are a few resources I found very helpful while developing this. They serve as a great jumping off points for developing Linux filesystems or (virtual) device drivers.
Current Status and Future
Since this is a very rough implementation there are still a number of limitations that need to be addressed before this module is really usable:
- Currently the system only works for files less than a block in size. This means that small text files are fine, but anything larger than that doesn't work.
- The system currently can only be used as superuser. I attempted addressing this with udev rules, but still run into permission denied errors when looking at files.
For the future, I still think it would be nice to move the whole module into Kernel space and avoid having to run an extra program to make requests. This would keep the usage of the module much simpler. I also think it would be exciting to add support for some sort of semantic caching and an API to allow programs to define the type of prefetching to use.
At this point I am hosting the code and build instructions on my GitHub account as s3simple. I am not going to have as much time to develop on this project over the next little while, but it's likely that I will be pushing small changes whenever I have some time and am interested.