gve: use max allowed ring size for ZC page_pools

NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
current gve devmem tcp configuration.

Root causing showed that this particular workload results in a very
bursty pattern of devmem allocations and frees, exhausting the page_pool
ring buffer. This results in sock_devmem_dontneed taking up to 5ms to
free a batch of 128 netmems, as each free does not find an available
entry in the pp->ring, and going all the way down to the (slow) gen_pool,
and gve_alloc_buffer running into a burst of successive allocations
which also don't find entries in the pp->ring (not dontneed'd yet,
presumably), each allocation taking up to 100us, slowing down the napi
poll loop.

From there, the slowness of the napi poll loop results, I suspect,
in the rx buffers not being processed in time, and packet drops
detected by tcpdump. The total sum of all this badness results in this
workload running at around 0.5 GB/s, when expected perf is around 12
GB/s.

This entire behavior can be avoided by increasing the pp->ring size to the
max allowed 32768. This makes the pp able to handle the bursty
alloc/frees of this particular workload. AFACT there should be no
negative side effect of arbitrarily increasing the pp->ring size in this
manner for ZC configs - the memory is prealloced and pinned by the
memory provider anyway.

Tested by running AllToAll PXN=2 workload. Before:

Avg bus bandwidth    : 0.434191

After:

Avg bus bandwidth    : 12.5494

Note that there is more we can do to optimize this path, such as bulk
netmem dontneeds, bulk netmem pp refills, and possibly taking a page
from the iouring zcrx playbook and replacing the gen_pool with a simpler
fixed-size array based allocator, but this seems sufficient to fix these
critcal workloads.

With thanks to Willem and Eric for helping root cause this,

cos-patch: bug
Fixes: commit 62d7f40503bc ("gve: support unreadable netmem")
Reported-by: Vedant Mathur <vedantmathur@google.com>
Change-Id: Ic597a356d347a2e61da214b1702090306057c44e
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-on: https://cos-review.googlesource.com/c/third_party/kernel/+/116948
Reviewed-by: Eric Dumazet <edumazet@google.com>
Main-Branch-Verified: Cusky Presubmit Bot <presubmit@cos-infra-prod.iam.gserviceaccount.com>
Tested-by: Cusky Presubmit Bot <presubmit@cos-infra-prod.iam.gserviceaccount.com>
Reviewed-by: Robert Kolchmeyer <rkolchmeyer@google.com>
Reviewed-on: https://cos-review.googlesource.com/c/third_party/kernel/+/118006
1 file changed