Expanding a ZFS RAID-Z2 by switching to bigger disks


ZFS saves the day again

April 2022.
The FreeBSD logoImage

Introduction

Let's say you have a small NAS appliance running FreeBSD and you want to replace all your 4TB disks with 10TB disks.

# uname -a
FreeBSD server.example.com 13.0-RELEASE-p7 FreeBSD 13.0-RELEASE-p7 #0: Mon Jan 31 18:24:03 UTC 2022     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

Let's have a look at the pool:

# zpool status zdata
  pool: zdata
 state: ONLINE
config:

        NAME            STATE     READ WRITE CKSUM
        zdata           ONLINE       0     0     0
          raidz2-0      ONLINE       0     0     0
            ada0p1.eli  ONLINE       0     0     0
            ada1p1.eli  ONLINE       0     0     0
            ada2p1.eli  ONLINE       0     0     0
            ada3p1.eli  ONLINE       0     0     0

It's a nice raidz2 pool with 4 disks.

Let's have a look at the disks:

# gpart show
[...]

=>        40  7814037088  ada0  GPT  (3.6T)
          40  7814037088     1  freebsd-zfs  (3.6T)

=>        40  7814037088  ada1  GPT  (3.6T)
          40  7814037088     1  freebsd-zfs  (3.6T)

=>        40  7814037088  ada2  GPT  (3.6T)
          40  7814037088     1  freebsd-zfs  (3.6T)

=>        40  7814037088  ada3  GPT  (3.6T)
          40  7814037088     1  freebsd-zfs  (3.6T)

Yep, 4 disks.

Replace the disks

Let's take the 4th one out of the pool:

# zpool offline zdata ada3p1.eli

zpool confirms the disk is out:

# zpool status zdata
  pool: zdata
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
config:

        NAME            STATE     READ WRITE CKSUM
        zdata           DEGRADED     0     0     0
          raidz2-0      DEGRADED     0     0     0
            ada0p1.eli  ONLINE       0     0     0
            ada1p1.eli  ONLINE       0     0     0
            ada2p1.eli  ONLINE       0     0     0
            ada3p1.eli  OFFLINE      0     0     0

errors: No known data errors

Let's identify the disk:

# geom disk list ada3
Geom name: ada3
Providers:
1. Name: ada3
   Mediasize: 4000787030016 (3.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   descr: ST4000NM002A-2HZ101
   lunid: 5000c500d0240667
   ident: WJG1PN8W
   rotationrate: 7200
   fwsectors: 63
   fwheads: 16

Here the disk's serial number is `WJG1PN8W`.

# halt -p

Replace the disk physically on the server.

Start the server.

# zpool status
  pool: zdata
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
config:

        NAME            STATE     READ WRITE CKSUM
        zdata           DEGRADED     0     0     0
          raidz2-0      DEGRADED     0     0     0
            ada0p1.eli  ONLINE       0     0     0
            ada1p1.eli  ONLINE       0     0     0
            ada2p1.eli  ONLINE       0     0     0
            ada3p1.eli  OFFLINE      0     0     0

errors: No known data errors

The pool is still in the same state, i.e. we replaced the correct disk.

# diskinfo -v ada3
ada3
        512             # sectorsize
        10000831348736  # mediasize in bytes (9.1T)
        19532873728     # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        19377850        # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        TOSHIBA MG06ACA10TEY    # Disk descr.
        51S0A88FGMRT    # Disk ident.
        ahcich3         # Attachment
        id1,enc@n3061686369656d30/type@0/slot@4/elmdesc@Slot_03 # Physical path
        No              # TRIM/UNMAP support
        7200            # Rotation rate in RPM
        Not_Zoned       # Zone Mode

The new disk herited the name of the old disk.

Create a partition table on the disk:

# gpart create -s gpt ada3
ada3 created

Create a ZFS partition on the disk:

# gpart add -a 4k -t freebsd-zfs ada3
ada3p1 added

Look at the partition table:

# gpart show ada3
=>         40  19532873648  ada3  GPT  (9.1T)
           40  19532873648     1  freebsd-zfs  (9.1T)

Configure encryption on the disk:

# geli init -l 256 -J /etc/geli/ada3p1_passphrase ada3p1

Metadata backup for provider ada3p1 can be found in /var/backups/ada3p1.eli
and can be restored with the following command:

        # geli restore /var/backups/ada3p1.eli ada3p1

Attach the encrypted disk:

# geli attach -j /etc/geli/ada3p1_passphrase ada3p1

Tell zpool it can replace the offline disk with the new one at the same path:

# zpool replace zdata ada3p1.eli

Check that disk is resilvering:

# zpool status zdata
  pool: zdata
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Apr  2 11:27:05 2022
        662G scanned at 8.17G/s, 943M issued at 11.6M/s, 8.02T total
        0B resilvered, 0.01% done, 8 days 08:38:56 to go
config:

        NAME                  STATE     READ WRITE CKSUM
        zdata                 DEGRADED     0     0     0
          raidz2-0            DEGRADED     0     0     0
            ada0p1.eli        ONLINE       0     0     0
            ada1p1.eli        ONLINE       0     0     0
            ada2p1.eli        ONLINE       0     0     0
            replacing-3       DEGRADED     0     0     0
              ada3p1.eli/old  OFFLINE      0     0     0
              ada3p1.eli      ONLINE       0     0     0

errors: No known data errors

Wait.

Repeat with the other disks.

Expand the pool

Check that all the disks are back online and included into the pool:

# gpart show
=>         40  19532873648  ada1  GPT  (9.1T)
           40  19532873648     1  freebsd-zfs  (9.1T)

=>         40  19532873648  ada2  GPT  (9.1T)
           40  19532873648     1  freebsd-zfs  (9.1T)

=>         40  19532873648  ada3  GPT  (9.1T)
           40  19532873648     1  freebsd-zfs  (9.1T)

=>         40  19532873648  ada0  GPT  (9.1T)
           40  19532873648     1  freebsd-zfs  (9.1T)
# zpool status zdata
  pool: zdata
 state: ONLINE
  scan: resilvered 1.98T in 06:52:44 with 0 errors on Mon Apr  4 16:03:35 2022
config:

        NAME            STATE     READ WRITE CKSUM
        zdata           ONLINE       0     0     0
          raidz2-0      ONLINE       0     0     0
            ada0p1.eli  ONLINE       0     0     0
            ada1p1.eli  ONLINE       0     0     0
            ada2p1.eli  ONLINE       0     0     0
            ada3p1.eli  ONLINE       0     0     0

errors: No known data errors

Look at the pool:

# zpool list zdata
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zdata  14.5T  8.10T  6.40T        -     21.8T     7%    55%  1.00x    ONLINE  -

We can see the pool detected the additional data and is willing to expand onto it:

# zpool get expandsize zdata
NAME   PROPERTY    VALUE     SOURCE
zdata  expandsize  21.8T     -

The man page explains how to do that:

     zpool online [-e] pool device...
             Brings the specified physical device online.  This command is not
             applicable to spares.

             -e      Expand the device to use all available space.  If the
                     device is part of a mirror or raidz then all devices must
                     be expanded before the new space will become available to
                     the pool.

Expand all the disks:

# zpool online -e zdata ada0p1.eli
# zpool online -e zdata ada1p1.eli
# zpool online -e zdata ada2p1.eli
# zpool online -e zdata ada3p1.eli

Look at the pool again:

# zpool list zdata
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zdata  36.4T  8.10T  28.3T        -         -     3%    22%  1.00x    ONLINE  -

All good!

All disks were changed, pool now has more space, and all of this happened with no downtime.