Data / Storage High Performance

Mitigating NFS noisy neighbors with nconnect

nconnect is a very useful mount option for NFS volumes and is fully supported by ANF. nconnect allows the NFS client to setup multiple TCP connections to the storage endpoint, potentially increasing performance. It’s available for Linux kernels running version 5.3 and higher. More information can be found on the official Docs page.

nconnect is not only useful when trying to maximize volume performance, it can also mitigate the noisy neighbor effect when you have multiple volumes terminated on the same IP address, being accessed from the same VM.

Let me first demonstrate the noisy neighbor effect for volumes without nconnect. I’ve deployed a single VM (standard D4s_v4 running Ubuntu 20.04) and mounted 2x 1TB premium ANF volumes.

ANF volumes

Observe the output from the mount command below to confirm there is no nconnect option.

172.23.1.4:/volume01-1tb-premium on /volume01-1tb-premium type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.23.1.4,mountvers=3,mountport=635,mountproto=tcp,local_lock=none,addr=172.23.1.4)

172.23.1.4:/volume02-1tb-premium on /volume02-1tb-premium type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.23.1.4,mountvers=3,mountport=635,mountproto=tcp,local_lock=none,addr=172.23.1.4)

We’ll now run a specific fio read operation on volume01-1tb-premium. We’re not trying the max out performance (around 65 MiB/s) for the volume, we just want to put some load on it.

rutger@rutger02-NE:/volume01-1tb-premium$ fio --name=8krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=1 --iodepth=2 --size=4G --runtime=60
8krandomreads: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=2
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=25.6MiB/s][r=3281 IOPS][eta 00m:00s]
8krandomreads: (groupid=0, jobs=1): err= 0: pid=1334: Tue Sep 28 13:08:13 2021
  read: IOPS=3648, BW=28.5MiB/s (29.9MB/s)(1710MiB/60001msec)
    slat (nsec): min=1900, max=48900, avg=7750.79, stdev=2194.12
    clat (usec): min=428, max=5627, avg=539.24, stdev=60.81
     lat (usec): min=435, max=5634, avg=547.16, stdev=60.82
    clat percentiles (usec):
     |  1.00th=[  490],  5.00th=[  510], 10.00th=[  515], 20.00th=[  519],
     | 30.00th=[  523], 40.00th=[  529], 50.00th=[  529], 60.00th=[  529],
     | 70.00th=[  537], 80.00th=[  545], 90.00th=[  553], 95.00th=[  685],
     | 99.00th=[  742], 99.50th=[  758], 99.90th=[  889], 99.95th=[  947],
     | 99.99th=[ 2769]
   bw (  KiB/s): min=26032, max=29968, per=100.00%, avg=29204.67, stdev=949.06, samples=119
   iops        : min= 3254, max= 3746, avg=3650.55, stdev=118.63, samples=119
  lat (usec)   : 500=1.80%, 750=97.50%, 1000=0.66%
  lat (msec)   : 2=0.03%, 4=0.01%, 10=0.01%
  cpu          : usr=1.57%, sys=5.76%, ctx=199103, majf=0, minf=13
  IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=218884,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=2

Run status group 0 (all jobs):
   READ: bw=28.5MiB/s (29.9MB/s), 28.5MiB/s-28.5MiB/s (29.9MB/s-29.9MB/s), io=1710MiB (1793MB), run=60001-60001msec

Note this workload consumes 28.5 MiB/s.

Next, we will repeat the same workload on volume01-1tb-premium and in parallel, we will run a fio workload on volume02-1tb-premium with a very large iodepth. Setting the iodepth to 6 for this specific fio workload would easily max out the volume performance. We’re going to increase it even further to 128. This way we’re fairly certain IOs will be queued.

fio ouput for volume02-1tb-premium:

rutger@rutger02-NE:/volume02-1tb-premium$ fio --name=8krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=1 --iodepth=128 --size=4G --runtime=60
8krandomreads: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=128
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=65.4MiB/s][r=8374 IOPS][eta 00m:00s]
8krandomreads: (groupid=0, jobs=1): err= 0: pid=1437: Tue Sep 28 13:23:45 2021
  read: IOPS=8459, BW=66.1MiB/s (69.3MB/s)(3967MiB/60020msec)
    slat (nsec): min=1700, max=739700, avg=5598.24, stdev=6322.33
    clat (usec): min=682, max=54545, avg=15123.65, stdev=6050.45
     lat (usec): min=687, max=54548, avg=15129.42, stdev=6050.41
    clat percentiles (usec):
     |  1.00th=[ 1237],  5.00th=[ 2147], 10.00th=[10683], 20.00th=[11207],
     | 30.00th=[11600], 40.00th=[11863], 50.00th=[12125], 60.00th=[20317],
     | 70.00th=[20579], 80.00th=[21365], 90.00th=[22152], 95.00th=[22414],
     | 99.00th=[23200], 99.50th=[32375], 99.90th=[33424], 99.95th=[33817],
     | 99.99th=[44303]
   bw (  KiB/s): min=66208, max=132790, per=99.98%, avg=67661.08, stdev=6011.70, samples=120
   iops        : min= 8276, max=16598, avg=8457.60, stdev=751.40, samples=120
  lat (usec)   : 750=0.01%, 1000=0.08%
  lat (msec)   : 2=4.86%, 4=1.08%, 10=0.09%, 20=52.55%, 50=41.34%
  lat (msec)   : 100=0.01%
  cpu          : usr=2.16%, sys=5.37%, ctx=96054, majf=0, minf=265
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=507737,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=66.1MiB/s (69.3MB/s), 66.1MiB/s-66.1MiB/s (69.3MB/s-69.3MB/s), io=3967MiB (4159MB), run=60020-60020msec

fio ouput for volume01-1tb-premium:

rutger@rutger02-NE:/volume01-1tb-premium$ fio --name=8krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=1 --iodepth=2 --size=4G --runtime=60
8krandomreads: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=2
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=16.7MiB/s][r=2139 IOPS][eta 00m:00s]
8krandomreads: (groupid=0, jobs=1): err= 0: pid=1447: Tue Sep 28 13:23:45 2021
  read: IOPS=220, BW=1766KiB/s (1808kB/s)(103MiB/60001msec)
    slat (usec): min=2, max=232, avg= 8.49, stdev= 6.57
    clat (usec): min=436, max=13220, avg=9047.09, stdev=4149.89
     lat (usec): min=443, max=13267, avg=9055.87, stdev=4150.22
    clat percentiles (usec):
     |  1.00th=[  502],  5.00th=[  515], 10.00th=[  529], 20.00th=[ 9896],
     | 30.00th=[10552], 40.00th=[10814], 50.00th=[10945], 60.00th=[11076],
     | 70.00th=[11207], 80.00th=[11338], 90.00th=[11469], 95.00th=[11600],
     | 99.00th=[11994], 99.50th=[11994], 99.90th=[12256], 99.95th=[12387],
     | 99.99th=[12911]
   bw (  KiB/s): min= 1408, max=11168, per=87.14%, avg=1538.03, stdev=890.58, samples=119
   iops        : min=  176, max= 1396, avg=192.20, stdev=111.33, samples=119
  lat (usec)   : 500=0.87%, 750=17.83%, 1000=0.35%
  lat (msec)   : 2=0.06%, 4=0.01%, 10=1.25%, 20=79.63%
  cpu          : usr=0.16%, sys=0.28%, ctx=10012, majf=0, minf=14
  IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=13245,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=2

Run status group 0 (all jobs):
   READ: bw=1766KiB/s (1808kB/s), 1766KiB/s-1766KiB/s (1808kB/s-1808kB/s), io=103MiB (109MB), run=60001-60001msec

As you can see, running the intensive workload on volume02-1tb-premium is interfering with the performance on volume01-1tb-premium. Volume01-1tb-premium gave us 28.5 MiB/s during the first run, but it dropped down to 1766 KiB/s for the parallel run.

If we were to access volume01-1tb-premium and volume02-1tb-premium in parallel from different VMs, there would be no performance issues at all. So we’re running into a client/OS limitation here.

We’ll now unmount the 2 volumes and re-mount them with nconnect set to 8. Observe the mount output and confirm nconnect is configured.

172.23.1.4:/volume02-1tb-premium on /volume02-1tb-premium type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,nconnect=8,timeo=600,retrans=2,sec=sys,mountaddr=172.23.1.4,mountvers=3,mountport=635,mountproto=tcp,local_lock=none,addr=172.23.1.4)

172.23.1.4:/volume01-1tb-premium on /volume01-1tb-premium type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,nconnect=8,timeo=600,retrans=2,sec=sys,mountaddr=172.23.1.4,mountvers=3,mountport=635,mountproto=tcp,local_lock=none,addr=172.23.1.4)

We’ll start with the single run on volume01-1tb-premium:

rutger@rutger02-NE:/volume01-1tb-premium$ fio --name=8krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=1 --iodepth=2 --size=4G --runtime=60
8krandomreads: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=2
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=27.0MiB/s][r=3458 IOPS][eta 00m:00s]
8krandomreads: (groupid=0, jobs=1): err= 0: pid=1601: Tue Sep 28 13:47:02 2021
  read: IOPS=4129, BW=32.3MiB/s (33.8MB/s)(1936MiB/60001msec)
    slat (nsec): min=1900, max=47200, avg=7560.79, stdev=2592.63
    clat (usec): min=407, max=5221, avg=475.35, stdev=57.42
     lat (usec): min=414, max=5231, avg=483.09, stdev=57.64
    clat percentiles (usec):
     |  1.00th=[  429],  5.00th=[  437], 10.00th=[  441], 20.00th=[  449],
     | 30.00th=[  453], 40.00th=[  457], 50.00th=[  465], 60.00th=[  469],
     | 70.00th=[  478], 80.00th=[  486], 90.00th=[  515], 95.00th=[  537],
     | 99.00th=[  668], 99.50th=[  701], 99.90th=[  840], 99.95th=[  938],
     | 99.99th=[ 2278]
   bw (  KiB/s): min=25072, max=34144, per=100.00%, avg=33082.89, stdev=1379.05, samples=119
   iops        : min= 3134, max= 4268, avg=4135.36, stdev=172.38, samples=119
  lat (usec)   : 500=86.03%, 750=13.76%, 1000=0.17%
  lat (msec)   : 2=0.03%, 4=0.01%, 10=0.01%
  cpu          : usr=2.06%, sys=5.23%, ctx=180130, majf=0, minf=13
  IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=247773,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=2

Run status group 0 (all jobs):
   READ: bw=32.3MiB/s (33.8MB/s), 32.3MiB/s-32.3MiB/s (33.8MB/s-33.8MB/s), io=1936MiB (2030MB), run=60001-60001msec

As you can see, performance has increased slightly from 28.5 MiB/s without nconnect to 32.3 MiB/s with nconnect (the impact of nconnect will be more clear for heavier workloads).

We’ll now repeat the parallel run for both volumes.

fio ouput for volume02-1tb-premium:

rutger@rutger02-NE:/volume02-1tb-premium$ fio --name=8krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=1 --iodepth=128 --size=4G --runtime=60
8krandomreads: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=128
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=65.2MiB/s][r=8341 IOPS][eta 00m:00s]
8krandomreads: (groupid=0, jobs=1): err= 0: pid=1646: Tue Sep 28 13:54:29 2021
  read: IOPS=8460, BW=66.1MiB/s (69.3MB/s)(3967MiB/60011msec)
    slat (nsec): min=1700, max=593700, avg=5993.80, stdev=9796.92
    clat (usec): min=261, max=71625, avg=15121.45, stdev=10698.18
     lat (usec): min=416, max=71629, avg=15127.61, stdev=10697.65
    clat percentiles (usec):
     |  1.00th=[  441],  5.00th=[  486], 10.00th=[  586], 20.00th=[  947],
     | 30.00th=[10421], 40.00th=[10814], 50.00th=[11207], 60.00th=[20579],
     | 70.00th=[20841], 80.00th=[21103], 90.00th=[30540], 95.00th=[31065],
     | 99.00th=[41157], 99.50th=[41157], 99.90th=[51643], 99.95th=[60556],
     | 99.99th=[61080]
   bw (  KiB/s): min=65884, max=132998, per=99.97%, avg=67663.79, stdev=6031.57, samples=120
   iops        : min= 8235, max=16624, avg=8457.95, stdev=753.88, samples=120
  lat (usec)   : 500=5.65%, 750=11.29%, 1000=3.59%
  lat (msec)   : 2=1.62%, 4=0.04%, 10=1.24%, 20=29.91%, 50=46.28%
  lat (msec)   : 100=0.39%
  cpu          : usr=2.11%, sys=5.50%, ctx=85124, majf=0, minf=266
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=507724,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
   READ: bw=66.1MiB/s (69.3MB/s), 66.1MiB/s-66.1MiB/s (69.3MB/s-69.3MB/s), io=3967MiB (4159MB), run=60011-60011msec

fio ouput for volume01-1tb-premium:

rutger@rutger02-NE:/volume01-1tb-premium$ fio --name=8krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=1 --iodepth=2 --size=4G --runtime=60
8krandomreads: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=2
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=31.2MiB/s][r=3998 IOPS][eta 00m:00s]
8krandomreads: (groupid=0, jobs=1): err= 0: pid=1659: Tue Sep 28 13:54:29 2021
  read: IOPS=3924, BW=30.7MiB/s (32.1MB/s)(1840MiB/60001msec)
    slat (nsec): min=2000, max=880200, avg=7874.83, stdev=5084.93
    clat (usec): min=9, max=4849, avg=500.27, stdev=119.78
     lat (usec): min=418, max=4857, avg=508.34, stdev=119.93
    clat percentiles (usec):
     |  1.00th=[  429],  5.00th=[  437], 10.00th=[  437], 20.00th=[  445],
     | 30.00th=[  449], 40.00th=[  453], 50.00th=[  457], 60.00th=[  465],
     | 70.00th=[  482], 80.00th=[  523], 90.00th=[  619], 95.00th=[  717],
     | 99.00th=[ 1057], 99.50th=[ 1172], 99.90th=[ 1434], 99.95th=[ 1549],
     | 99.99th=[ 1778]
   bw (  KiB/s): min=30768, max=31872, per=99.95%, avg=31380.34, stdev=235.39, samples=119
   iops        : min= 3846, max= 3984, avg=3922.54, stdev=29.43, samples=119
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.01%, 500=75.09%, 750=20.54%
  lat (usec)   : 1000=3.02%
  lat (msec)   : 2=1.35%, 4=0.01%, 10=0.01%
  cpu          : usr=2.25%, sys=5.13%, ctx=204326, majf=0, minf=14
  IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=235473,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=2

Run status group 0 (all jobs):
   READ: bw=30.7MiB/s (32.1MB/s), 30.7MiB/s-30.7MiB/s (32.1MB/s-32.1MB/s), io=1840MiB (1929MB), run=60001-60001msec

As you can see, the performance for volume01-1tb-premium is nearly unaffected. During the single run volume01-1tb-premium provided 32.3 MiB/s, during the parallel run this was 30.7 MiB/s.

Conclusion
Configuring nconnect will prevent the noisy neighbor effect, that may occur when utilizing multiple NFS volumes that are terminated on the same IP address, inside a single VM.

Leave a Reply

%d bloggers like this: