site  contact  subhomenews

A rethink of EasyOS architecture

May 23, 2022 — BarryK

As I posted a couple of days ago, taking some time-out from getting the bugs out of Easy Bookworm, to think about some fundamental issues and how they might be fixed.

A big concern is running Easy on a cheap flash drive that does not have wear-leveling. The problem becomes more severe if the working-partition is ext4 with a journal, as the journal writes a lot to the drive.

Another problem, that has also plagued the pups, is that if the working-partition does not have a journal, the filesystem is marked as "not clean" at shutdown.

The latter problem is caused by the aufs layered filesystem being at "/", so we are unable to unmount it at shutdown. Nor are we able to remount the working-partition as read-only, as it is busy.

Being "not clean" is actually not a problem, there is no serious error, and Easy still boots up and runs OK. However, I received an email awhile ago, that the boot manager that the person is using, sees the filesystem as "not clean" and performs a filesystem check at every bootup.

So, two problems: we want to minimize writes to the flash drive, and we want the filesystem to be marked as "clean" at shutdown. I have been playing with a possible solution...

The scenario at bootup, is the initrd sets up the layered filesystem, basically consisting of a read-write folder on top (/mnt/${WKG_DEV}/${WKG_DIR}.session) and read-only easy.sfs on the bottom (/mnt/easy_ro). A switch_root is performed onto the top of the layers, and Easy is off and running.

If the flash drive is /dev/sdb, WKG_DEV=sdb2 and WKG_DIR=easyos/, then that rw folder is /mnt/sdb2/easyos/.session -- sdb2 is the ext4 working-partition. So, anything we do under "/", install a package, whatever, actually goes into that .session folder.

So lots of writes happening to that '.session' folder. And if there is a ext4 journal, there will be lots of journal writes to /mnt/sdb2. If there is no wear-leveling, it may be that the same locations in the flash drive will get hammered. Even if there is wear-leveling, flash drives do have very limited number of writes.

The solution that I am playing with, actually is kind-of going back to the Puppy idea, where the top layer is a tmpfs in RAM, and 'snapmergepuppy' script is used to flush the tmpfs layer down to the .session layer.

This way of doing it only works for aufs. overlay f.s. is not capable of flushing down to a lower layer.

The solution that I am exploring will work with aufs and potentially overlayfs, will have the layers like this:

/mnt/easy_rw Read-write top layer in zram
/mnt/${WKG_DEV}/${WKG_DIR}.session Session folder mounted read-only
/mnt/easy_ro easy.sfs mounted read-only

At bootup, the initrd creates /dev/zram1 with a size about 120-150% of available RAM and having a ext2 filesystem. This gets mounted on /mnt/easy_rw.

/dev/${WKG_DEV} is mounted on /mnt/${WKG_DEV}, but as the .session folder is read-only in the layered filesystem, there are no writes to the working-partition. This also means no journal writes.

The working-partition

Thinking a bit more about what might be keeping the working-partition "busy" at shutdown, here is a snapshot of the folders in the working-partition, in this case WKG_DEV=sdb2 and WKG_DIR=easyos/:

img1

...the '.session' folder has already been explained, it is mounted read-only in the aufs layered filesystem. Folder 'files' is mounted by /etc/rc.d/rc.sysinit at bootup, like this:

busybox mount --bind /mnt/${WKG_DEV}/${WKG_DIR}files /files 

...so, '/files' is actually a kind of redirection, out of the layered "/" hierarchy, to an external 'files' folder in the working-partition. '/files' is the default open/save/download path for most apps. It is for keeping your personal files.

Anyway, 'files' is a folder that might potentially be keeping sdb2, the working-partition, busy at shutdown.

Saving the session

This design only saves the session at shutdown, not during as the pups can do with 'snapmergepuppy' script. The main shutdown script, /etc/rc.d/rc.shutdown, has this in it:

#20220522
if [ "$TOP_LEVEL_ZRAM" == "1" ];then
sync
echo "Saving session..." >/dev/console
#attempt to remove .session layer...
busybox mount -o remount,del:/mnt/${WKG_DEV}/${WKG_DIR}.session / 2>/dev/console
busybox umount /files 2>/dev/console
#...even if these failed, merge rw layer to .session ...
/etc/rc.d/rw-merge 2>/dev/console
#attempt unmount, so f.s. without journal marked clean...
busybox umount /mnt/${WKG_DEV} 2>/dev/console
fi

The first really important part of the above code, is that as the '.session' folder is only mounted read-only in the aufs layers, it is easy to take it out, which is what the bold-green line does.

What that means, is can now copy the top-level rw layer, in zram, to the .session folder, saving the current session.

The second important point about the above code, is it then goes ahead and unmounts the working-partition, achieving a "clean" filesystem.

There are caveats to the above operations. There are possibly going to be processes that might block the above unmounts. There is code in rc.shutdown that kills processes; however, currently, in this first iteration of the code, I am having a problem with 'files' sometimes not unmounting.

As for saving the current session, see the above line in bold-red. Here is /etc/rc.d/rw-merge:

#!/bin/ash
#20220522 save rw layer to permanent .session folder at shutdown.
#when TOP_LEVEL_ZRAM='1' in PUPSTATE file, then /dev/zram1 is mounted on
# /mnt/easy_rw which rw top aufs layer.
# underneath is /mnt/${WKG_DEV}/${WKG_DIR}.session mounted ro.
# rc.shutdown will remove .session layer, then call this script.

export LANG=C
. /etc/rc.d/PUPSTATE
DEST="/mnt/${WKG_DEV}/${WKG_DIR}.session"

###whiteouts###
###############
cd /mnt/easy_rw
#find all files and folders in easy_rw, if a matching wh in .session, then delete wh...
#find . -mindepth 2 -mount |
while read F
do
[ "$F" == "" ] && continue
pathF="${F%/*}" #ex: ./usr/share/doc
pathF="${pathF#./}" #ex: usr/share/doc
nameF="${F##*/}" #ex: zarfy.txt
if [ -e "${DEST}/${pathF}/.wh.${nameF}" ];then
rm -f "${DEST}/${pathF}/.wh.${nameF}"
fi
done <<_END1
$(find . -mindepth 2 -mount)
_END1

#find wh in easy_rw, fix in .session...
#find . -mindepth 2 -mount -type f -name '.wh.*' |
while read WH
do
[ "$WH" == "" ] && continue
pathWH="${WH%/*}" #ex: ./lib/firmware
pathWH="${pathWH#./}" #ex: lib/firmware
nameWH="${WH##*/}" #ex: .wh..wh..opq

if [ "$nameWH" == ".wh..wh..opq" ];then
if [ -h "${DEST}/${pathWH}" -o -f "${DEST}/${pathWH}" ];then
rm -f "${DEST}/${pathWH}"
elif [ -d "${DEST}/${pathWH}" ];then
rm -rf "${DEST}/${pathWH}"
fi
continue
fi

delF="${nameWH#.wh.}" #ex: .wh.ycalc.txt becomes ycalc.txt
if [ -h "${DEST}/${pathWH}/${delF}" -o -f "${DEST}/${pathWH}/${delF}" ];then
rm -f "${DEST}/${pathWH}/${delF}"
elif [ -d "${DEST}/${pathWH}/${delF}" ];then
rm -rf "${DEST}/${pathWH}/${delF}"
fi
done <<_END2
$(find . -mindepth 2 -mount -type f -name '.wh.*')
_END2

###merge###
###########
tar -cpf - --exclude={./files,./dev,./mnt,./var,./run,./.*,*/.cache,./proc,./sys,./tmp} --one-file-system . | tar -xf - -C ${DEST} --overwrite --warning=none
sync
###end###

...completely different from 'snapmergepuppy' script in the pups. It seems sane. First test, works. Note, 'tar' automatically excludes socket files, which is very good.

Summary

With this design, after bootup you can be doing the usual operations, such as surfing the web with the web browser, download a video, play a video, etc., and there will only be writes to the working-partition when you save a file in '/files'. Just surfing the web, there will be no writes.

Even when download and install a package, there will be no writes to the drive. It all happens in the zram.

There will be writes if you download an SFS file, as that will go into the 'sfs' folder in the working-partition. However, running the SFS, on main desktop, no writes to the drive -- as long as the SFS is mounted with "noatime".

No writes, whether the ext4 filesystem has a journal or not.

There is a limitation here, that the free space is only that of the zram device, which is in RAM. At every bootup, that free space will be about 120-150% of the available RAM, about 5GB in a computer with 4GB RAM. This high figure is due to zram compression.

As you download packages, use the web browser (which puts stuff into the browser cache), etc., that free space will get used up. The free-space icon in the tray will show you how much space is left.

Then at shutdown, everything in zram gets merged into the '.session' folder, and that folder has the entire free space of the working-partition to use. For, say, a 16GB flash-stick, the working-partition will be about 15GB.

At next bootup, zram is completely empty.

Hardly any writes to the drive, yippy, it will last forever!

The other important feature of this design, is able to unmount the working-partition, achieving a clean filesystem even when there is no journal.

There is another interesting outcome with this design; at shutdown you could choose to either save the session or not save. After a session of web-browsing, shutting down without saving might be a desirable choice. However, remember that anything you save into '/files' is already saved.

Of course there are gotchas -- for example, a crash or power-failure and you will loose the current session.

Thinking ahead, it will work with overlayfs, which I might be forced to move to one day.

The above design is just one way of reaching those two goals; minimal writes to the flash drive, and clean shutdown. I will keep playing with it, see if any road-blocks appear, or whether it turns out to be a clear road ahead.

Hmm, I can already think of a couple of road-blocks. Will think if can get around them....     

Tags: easy