Expanding a ZFS RAID-Z2 by switching to bigger disks
ZFS saves the day again
Introduction
Let's say you have a small NAS appliance running FreeBSD and you want to replace all your 4TB disks with 10TB disks.
# uname -a FreeBSD server.example.com 13.0-RELEASE-p7 FreeBSD 13.0-RELEASE-p7 #0: Mon Jan 31 18:24:03 UTC 2022 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Let's have a look at the pool:
# zpool status zdata pool: zdata state: ONLINE config: NAME STATE READ WRITE CKSUM zdata ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada0p1.eli ONLINE 0 0 0 ada1p1.eli ONLINE 0 0 0 ada2p1.eli ONLINE 0 0 0 ada3p1.eli ONLINE 0 0 0
It's a nice raidz2 pool with 4 disks.
Let's have a look at the disks:
# gpart show [...] => 40 7814037088 ada0 GPT (3.6T) 40 7814037088 1 freebsd-zfs (3.6T) => 40 7814037088 ada1 GPT (3.6T) 40 7814037088 1 freebsd-zfs (3.6T) => 40 7814037088 ada2 GPT (3.6T) 40 7814037088 1 freebsd-zfs (3.6T) => 40 7814037088 ada3 GPT (3.6T) 40 7814037088 1 freebsd-zfs (3.6T)
Yep, 4 disks.
Replace the disks
Let's take the 4th one out of the pool:
# zpool offline zdata ada3p1.eli
zpool confirms the disk is out:
# zpool status zdata pool: zdata state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. config: NAME STATE READ WRITE CKSUM zdata DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 ada0p1.eli ONLINE 0 0 0 ada1p1.eli ONLINE 0 0 0 ada2p1.eli ONLINE 0 0 0 ada3p1.eli OFFLINE 0 0 0 errors: No known data errors
Let's identify the disk:
# geom disk list ada3 Geom name: ada3 Providers: 1. Name: ada3 Mediasize: 4000787030016 (3.6T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r1w1e2 descr: ST4000NM002A-2HZ101 lunid: 5000c500d0240667 ident: WJG1PN8W rotationrate: 7200 fwsectors: 63 fwheads: 16
Here the disk's serial number is `WJG1PN8W`.
# halt -p
Replace the disk physically on the server.
Start the server.
# zpool status pool: zdata state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. config: NAME STATE READ WRITE CKSUM zdata DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 ada0p1.eli ONLINE 0 0 0 ada1p1.eli ONLINE 0 0 0 ada2p1.eli ONLINE 0 0 0 ada3p1.eli OFFLINE 0 0 0 errors: No known data errors
The pool is still in the same state, i.e. we replaced the correct disk.
# diskinfo -v ada3 ada3 512 # sectorsize 10000831348736 # mediasize in bytes (9.1T) 19532873728 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 19377850 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. TOSHIBA MG06ACA10TEY # Disk descr. 51S0A88FGMRT # Disk ident. ahcich3 # Attachment id1,enc@n3061686369656d30/type@0/slot@4/elmdesc@Slot_03 # Physical path No # TRIM/UNMAP support 7200 # Rotation rate in RPM Not_Zoned # Zone Mode
The new disk herited the name of the old disk.
Create a partition table on the disk:
# gpart create -s gpt ada3 ada3 created
Create a ZFS partition on the disk:
# gpart add -a 4k -t freebsd-zfs ada3 ada3p1 added
Look at the partition table:
# gpart show ada3 => 40 19532873648 ada3 GPT (9.1T) 40 19532873648 1 freebsd-zfs (9.1T)
Configure encryption on the disk:
# geli init -l 256 -J /etc/geli/ada3p1_passphrase ada3p1 Metadata backup for provider ada3p1 can be found in /var/backups/ada3p1.eli and can be restored with the following command: # geli restore /var/backups/ada3p1.eli ada3p1
Attach the encrypted disk:
# geli attach -j /etc/geli/ada3p1_passphrase ada3p1
Tell zpool it can replace the offline disk with the new one at the same path:
# zpool replace zdata ada3p1.eli
Check that disk is resilvering:
# zpool status zdata pool: zdata state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Apr 2 11:27:05 2022 662G scanned at 8.17G/s, 943M issued at 11.6M/s, 8.02T total 0B resilvered, 0.01% done, 8 days 08:38:56 to go config: NAME STATE READ WRITE CKSUM zdata DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 ada0p1.eli ONLINE 0 0 0 ada1p1.eli ONLINE 0 0 0 ada2p1.eli ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 ada3p1.eli/old OFFLINE 0 0 0 ada3p1.eli ONLINE 0 0 0 errors: No known data errors
Wait.
Repeat with the other disks.
Expand the pool
Check that all the disks are back online and included into the pool:
# gpart show => 40 19532873648 ada1 GPT (9.1T) 40 19532873648 1 freebsd-zfs (9.1T) => 40 19532873648 ada2 GPT (9.1T) 40 19532873648 1 freebsd-zfs (9.1T) => 40 19532873648 ada3 GPT (9.1T) 40 19532873648 1 freebsd-zfs (9.1T) => 40 19532873648 ada0 GPT (9.1T) 40 19532873648 1 freebsd-zfs (9.1T)
# zpool status zdata pool: zdata state: ONLINE scan: resilvered 1.98T in 06:52:44 with 0 errors on Mon Apr 4 16:03:35 2022 config: NAME STATE READ WRITE CKSUM zdata ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada0p1.eli ONLINE 0 0 0 ada1p1.eli ONLINE 0 0 0 ada2p1.eli ONLINE 0 0 0 ada3p1.eli ONLINE 0 0 0 errors: No known data errors
Look at the pool:
# zpool list zdata NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zdata 14.5T 8.10T 6.40T - 21.8T 7% 55% 1.00x ONLINE -
We can see the pool detected the additional data and is willing to expand onto it:
# zpool get expandsize zdata NAME PROPERTY VALUE SOURCE zdata expandsize 21.8T -
The man page explains how to do that:
zpool online [-e] pool device... Brings the specified physical device online. This command is not applicable to spares. -e Expand the device to use all available space. If the device is part of a mirror or raidz then all devices must be expanded before the new space will become available to the pool.
Expand all the disks:
# zpool online -e zdata ada0p1.eli # zpool online -e zdata ada1p1.eli # zpool online -e zdata ada2p1.eli # zpool online -e zdata ada3p1.eli
Look at the pool again:
# zpool list zdata NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zdata 36.4T 8.10T 28.3T - - 3% 22% 1.00x ONLINE -
All good!
All disks were changed, pool now has more space, and all of this happened with no downtime.