Proxmox drives configuration

dimitrisgr

New Member
Jun 8, 2021
11
0
1
28
Hi there, i am about to make my first server and i will use it for production use / website hosting.

I have decided to buy either Dell R630 or Dell R730. So about the drives i am thinking to use an nvme card with 2 Samsung 970 Pro in raid1. All VMs will live in the nvme’s and i will add at least 2 NAS HDD’s also as raid1. In the HDD i will have my FreeNas and also the VM snapshots.
So i am thinking where to put the proxmox. HDD or SSD?
I read somewhere that proxmox writes a lot and it will decrease the life of the nvme’s. The speed of the vm’s is what really interests me. Also i am not sure if i should use ZFS? Last think i will have 2 Zeon e5 2680 v4 with total of 28cores.
How much ram you thing i will need?
The server will be used mostly as web server with cpanel and few cores for freenas.
 
Hi there, i am about to make my first server and i will use it for production use / website hosting.

I have decided to buy either Dell R630 or Dell R730. So about the drives i am thinking to use an nvme card with 2 Samsung 970 Pro in raid1.
Do yourself a favor and don't buy consumer SSDs. Enterprise SSDs will be much faster for server workloads, got a powerloss protection for better data safety and less write amplification on sync writes (like your MYSQL dbs will do) and they will last much longer. To quote the ZFS NVMe Benchmark Paper:
Can I use consumer or pro-sumer SSDs, as these are much cheaper than enterprise-class SSD?
No. Never. These SSDs wont provide the required performance, reliability or endurance. See the fio results from before and/or run your own fio tests.

All VMs will live in the nvme’s and i will add at least 2 NAS HDD’s also as raid1. In the HDD i will have my FreeNas and also the VM snapshots.
It isn't that easy to use TrueNAS. TrueNAS itself will use ZFS and you shouldn't run ZFS ontop of ZFS because this amplifies the write overhead. Also ZFS needs direct access to the drives without any raid controller between. Best way would be to buy a PCIe HBA card, attach the two HDDs to it and use PCI passthrough to bring the HBA with all attached drives directly into your TrueNAS VM. That way your TrueNAS VM could directly access the physical drives without any virtualization or another filesystem below it.
How much ram you thing i will need?
The more the better. RAM overprovisioning isn'T really working and the more RAM you allow ZFS to use as cache, the faster your pools will be. By default ZFS will use 50% of your hosts RAM (but you could limit that).
 
Last edited:
I'm running a single node proxmox box. Proxmox and vm's are on a nvme ssd (1tb, wd sn550). Onboard sata controller is passed through to truenas for direct access to drives. Board (asus x570e) has 8 onboard sata ports, 4 in use now). So far this has been working out relatively well.

There's 4 active vm's at this time.
* Sophos utm
* ubuntu w/nextcloud
* ubuntu w/freepbx and asterisk
* Truenas

Monitoring the nvme, I see typical writes of about 30GB/day between proxmox and the vm's. This seems greater than what I saw under esxi.. Quite a bit more. In fact, running esxi with the same vm's resulted in a total tbw of about 2TB over a 4 month period. This comes out to ~17GB/day. I'm not overly concerned. The drive is rated at 600TBW, so at 30, lets say even 50 GB/day that's 18.25TB/yr. In a decade, under 200TBW. The server will prob get upgraded in some manner in the next 5 years.
 
A big problem is the write amplification. My homeserver right now is using 400GB of storage but is writing around 1TB per day to the NAND cells of the SSDs while idleing. Idleing means I dont download or copy any stuff to the server. 1TB of writes per day is just stuff the VMs are creating by themselve like logs, metrics, backups, snapshots, journaling and so on.

A big problem here are the virtualization, sync writes, ZFS and how SSDs are working. Everything creates overhead and write amplification isnt adding up but multiplying. Lets say for example (no real numbers) virtio SCSI virtualization causes a write amplification of 4x. Sync writes of 2x. ZFS of 2x. And the SSDs themself also got a internal write amplificatiin of factor 3x. You dont get "4 + 2 + 2 + 3 = 11x" but "4 * 2 * 2 * 3 = 48x" write amplification. So for each 1GB of data you write inside the VM 48GB are written to the SSDs.
TrueNAS ontop of ZFS would be a bad idea because it would look like this:
"3x SSD * 2x zfs * 2x sync writes * 4x virtio scsi * 2x zfs * 2x sync writes = 192x write amplification".
So you basically want to skip any unnecessary virtualization, abstraction or filesystem layer.

How high your write amplification will be depends on your hardware, your storage setup and VM configuration.
I got a total write amplification from VM to NAND of around factor 20 for async writes and factor 40 for sync writes (factor 7 from guest to zfs on host * factor 3 inside the SSD * factor 2 for sync writes).
So the 1TB of writes every day are caused by only 25 to 50 GB of real data written inside VMs per day. And that is easy to do. I got around 20 VM so if every VM is writing all the time with just 15 to 30kb/s this is causing 1TB of writes per day.
 
Last edited:
Do yourself a favor and don't buy consumer SSDs. Enterprise SSDs will be much faster for server workloads, got a powerloss protection for better data safety and less write amplification on sync writes (like your MYSQL dbs will do) and they will last much longer. To quote the ZFS NVMe Benchmark Paper:



It isn't that easy to use TrueNAS. TrueNAS itself will use ZFS and you shouldn't run ZFS ontop of ZFS because this amplifies the write overhead. Also ZFS needs direct access to the drives without any raid controller between. Best way would be to buy a PCIe HBA card, attach the two HDDs to it and use PCI passthrough to bring the HBA with all attached drives directly into your TrueNAS VM. That way your TrueNAS VM could directly access the physical drives without any virtualization or another filesystem below it.

The more the better. RAM overprovisioning isn'T really working and the more RAM you allow ZFS to use as cache, the faster your pools will be. By default ZFS will use 50% of your hosts RAM (but you could limit that).
So next option i found is the Kingston Data Centre DC1000B. What is you opinion about this one?
Also what about not using ZFS and only use hardware raid1 for ssd and hdd setups. ZFS i believe is pretty complex and if done wrong everything goes wrong plus it uses resources.
 
The Kingston don't deserve the name "datacenter". The cheapest you can go is probably with Samsung's PM series, although the SM can endure much more write loads. Anything below will work for some time, but you'll end up buying new disks pretty soon.
 
  • Like
Reactions: dimitrisgr
I
I'm running a single node proxmox box. Proxmox and vm's are on a nvme ssd (1tb, wd sn550). Onboard sata controller is passed through to truenas for direct access to drives. Board (asus x570e) has 8 onboard sata ports, 4 in use now). So far this has been working out relatively well.

There's 4 active vm's at this time.
* Sophos utm
* ubuntu w/nextcloud
* ubuntu w/freepbx and asterisk
* Truenas

Monitoring the nvme, I see typical writes of about 30GB/day between proxmox and the vm's. This seems greater than what I saw under esxi.. Quite a bit more. In fact, running esxi with the same vm's resulted in a total tbw of about 2TB over a 4 month period. This comes out to ~17GB/day. I'm not overly concerned. The drive is rated at 600TBW, so at 30, lets say even 50 GB/day that's 18.25TB/yr. In a decade, under 200TBW. The server will prob get upgraded in some manner in the next 5 years.
I think i will try to replicate your setup! How WD sn550 working out? I am interested in buying this one or Kingston Data Centre DC1000B not sure which one to choose. Is the wd an enterprise ssd?
 
The Kingston don't deserve the name "datacenter". The cheapest you can go is probably with Samsung's PM series, although the SM can endure much more write loads. Anything below will work for some time, but you'll end up buying new disks pretty soon.
What about Samsung PM 981 or Samsung 980Pro? Or WD sn550? I am trying to buy from amazon.de and they have not many server hardware. Also i cannot find any recommendations anywhere!!
 
A big problem is the write amplification. My homeserver right now is using 400GB of storage but is writing around 1TB per day to the NAND cells of the SSDs while idleing. Idleing means I dont download or copy any stuff to the server. 1TB of writes per day is just stuff the VMs are creating by themselve like logs, metrics, backups, snapshots, journaling and so on.

A big problem here are the virtualization, sync writes, ZFS and how SSDs are working. Everything creates overhead and write amplification isnt adding up but multiplying. Lets say for example (no real numbers) virtio SCSI virtualization causes a write amplification of 4x. Sync writes of 2x. ZFS of 2x. And the SSDs themself also got a internal write amplificatiin of factor 3x. You dont get "4 + 2 + 2 + 3 = 11x" but "4 * 2 * 2 * 3 = 48x" write amplification. So for each 1GB of data you write inside the VM 48GB are written to the SSDs.
TrueNAS ontop of ZFS would be a bad idea because it would look like this:
"3x SSD * 2x zfs * 2x sync writes * 4x virtio scsi * 2x zfs * 2x sync writes = 192x write amplification".
So you basically want to skip any unnecessary virtualization, abstraction or filesystem layer.

How high your write amplification will be depends on your hardware, your storage setup and VM configuration.
I got a total write amplification from VM to NAND of around factor 20 for async writes and factor 40 for sync writes (factor 7 from guest to zfs on host * factor 3 inside the SSD * factor 2 for sync writes).
So the 1TB of writes every day are caused by only 25 to 50 GB of real data written inside VMs per day. And that is easy to do. I got around 20 VM so if every VM is writing all the time with just 15 to 30kb/s this is causing 1TB of writes per day.
Very helpful i had honestly no idea about this!
 
The wd550 is a basic consumer grade storage. It's just what I happened to have on hand when I built this thing. My usage isn't too heavy here. Most (all?) of those writes are from logs.

The two ubuntu vm's are ext4, while truenas is zfs for the boot drive (which resides on the nvme). Sophos utm is also ext4. Power loss isn't a big deal here with the ups connected. UPS is good for at least 45 minutes of battery but it's all set to shutdown after 5 min on battery power.

I think you need to evaluate what your daily writes will actually be to determine which type of ssd is best for your use case and budget.
 
Last edited:
  • Like
Reactions: dimitrisgr
Power loss isn't a big deal here with the ups connected. UPS is good for at least 45 minutes of battery but it's all set to shutdown after 5 min on battery power.
If you got a UPS that don't means you don't need SSDs with powerloss protection. If your SSD got no powerloss protection it can't cache sync writes. If it can't cache sync writes it will be slow and your write amplification will "explode" because it can't use the cache to optimize writes. Lets say you want to sync write 100x 4kb but your SSD can only erase NAND cells in 128K blocks. With powerloss protection it will cache the 100x 4kb and write them as 4x 128K operations. Without powerloss protection it can't use the cache (because the SSDs firmware didn't know that there is a UPS and that not everything in the SSDS RAM cache will be lost on an power outage) so it will write 100x 128K to store the 100x 4K.

So SSDs without a buildin powerloss protection are fine for async writes but as soon as you are using sync writes (like most of the databases will do) you really want to use SSDs with powerloss protection. And 99% of the SSDs with powerloss protection are enterprise/datacenter SSDs.

Samsung 980Pro? Or WD sn550
Like the other people already said these are prosumer SSDs and not enterprise SSDs. Look for a good TBW/DWPD value and powerloss protection.

Its hard to find M.2 enterprise SSDs because the footprint is most of the time just too small to fit the NAND chips and capacitators for the powerloss protection. Therefore most NVMe enterprise SSDs are U.2 format which also can be used with a M.2 slot if you buy a M.2 to U.2 cable.

The DC1000M for example would be the better U.2 version of the M.2 DC1000B SSD.
 
Last edited:
  • Like
Reactions: dimitrisgr
Its hard to find M.2 enterprise SSDs because the footprint is most of the time just too small to fit the NAND chips and capacitators for the powerloss protection. Therefore most NVMe enterprise SSDs are U.2 format which also can be used with a M.2 slot if you buy a M.2 to U.2 cable.
The Dell r630 which i intent to buy has a SAS backplane. Will the U.2s be compatible with it?