Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't get it - many people here claim in this thread that VM base image deduplication is great use case for this. So lets assume there are couple of hundreds of VMs on a ZFS dataset with dedupe on, each of them ran by different people for different purposes entirely - some databases, some web frontends / backends, minio S3 storage or backups ect - this might save you those measly hundreds of megabytes for linux system files those VMs might have in common ( even though knowing how many linux versions are out there with different patch levels - unlikely ) it will still not be worth it considering ZFS will keep track of each users individual files - databases and backup files and whatnot - data which is almost guaranteed to be unique between users so it will completely miss the point of ZFS deduplication. What am I missing?


It largely depends on how you set up your environment. On my home server, most VMs consist of a few gigabytes of a base Linux system and then a couple of hundred megabytes of application code. Some of those VMs also store large amounts of data, but most of that data could be stored in something like a dedicated minio server and maybe a dedicated database server. I could probably get rid of a huge chunk of my used storage if I switched to a deduplicating system (but I have plenty of storage so I don't really need to).

If you're selling VMs to customers then there's probably no advantage in using deduplication.


In such a sevario you'd probably have several partitions. So dedupe activated on the root filesystem (/bin,/lib etc) but not for /home and /var.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: