Important Notice: this service will be discontinued by the end of 2024 because for multiple years now, Plume is no longer under active/continuous development. Sadly each time there was hope, active development came to a stop again. Please consider using our Writefreely instance instead.

ZIL? L2ARC? How TF does cache work in ZFS?

A simple explanation of how caching works in ZFS

ZFS is the free copy-on-write filesystem from Oracle Corp (former Sun Microsystems). Two of the most interesting things about ZFS is the ability to use fast SSDs to speed things up. You have ZIL (ZFS Intent Log) and and L2ARC (Layer 2 Adaptive Replacement Cache). The ZIL is the write cache and the L2ARC is the read cache. Why do I need those? Well, I’m currently looking into moving Shitty Services onto ZFS because my database is getting rather large. Only a small part of it needs to be on SSD storage, the rest can be on cheaper spinny disks. I'd like to create a read/write cache for it similar to an SSHD.

The old model

The old model of file servers uses the machines RAM as a read cache and the rest of the data is on normal, spinning disks.
The old model with RAM and disks

The new model

Now, how can we be more efficent? The problem with disks is that they are slooooow, really slow. The problem with RAM is that you never have enough of it. The solution is to insert another layer in the storage hierarchy, a SSD layer. The fast SSD disks will act as a cache, much faster than spinning disks and with a lot more storage capacity than RAM.
The new model with SSDs


ZFS uses the new model, but with a “twist”. Like i mentioned before, there are two kinds of SSD cache in ZFS: ZIL and L2ARC.

The ZIL, or ZFS Intent Cache, is the ZFS write cache. Many applications, like databases, needs to do synchronous writes to disk to ensure that the data is secured down in storage. This tends to be a problem since sync writes are really slow. What usually happens is that ZFS uses transaction groups, these are pushed out to every about every couple of seconds. Does the database want to wait this time? Probably not and the ZFS transaction log that says “I’m about to write baladibla to block bla bla” is written to disk instead, painfully slow but at least the data won’t be gone in case of a power failure. This pretty much works like the logs in a normal database. So what the ZIL does is that is gathers these transaction groups and instead of writing the logs to slow spinning disks they are stored on fast SSDs and the sync writes can be handled much faster.

The L2ARC on the other hand is totally different, this is the ZFS read cache. In the old model the data requested would first be read from the cache in RAM, if it’s missing there it would have to be read from disk. Disk reads are slow, can we please avoid them? Yes, we can. We’re inserting another layer between RAM and spinning disks consisting of much faster SSD disks. They will work as an extension of the normal cache in RAM (called ARC, hence L2ARC). This cache is now filled on some basic rules like “Most frequently used”, “Most recently used” and so on. When you read data the system first checks the ARC, then the L2ARC and last the spinning disks. This means a lot faster reads, especially random reads which tends to be extremely slow in spinning disks.
ZFS uses ZIL and L2ARC on SSD

Final thoughts

Unfortunately automatic migration by ZFS of hot data to low-latency vdevs and cool data from low-latency vdevs is not really possible without solving the infamous block-pointer-rewrite problem. So, unfortunately ZFS won't solve my particular problem.