Making NFS Suck Faster

A Presentation by Greg Banks

Linux makes a great little NFS server, the emphasis being on "little".
SGI's experience is that Linux' NFS server will scale to small machines
with 4 or fewer processors, but not to larger machines. To make Linux
scale to 8 processors and 8 Gigabit Ethernet network cards, an important
market segment for SGI, we have had to perform a lot of performance and
scaling work.

After protracted internal development, this work is now (July 2006)
just starting to be pushed into the Linux mainstream, as a series of
over 40 kernel and nfs-utils patches. See the Linux NFS mailing
list at nfs.sourceforge.net.

The talk will start with a brief introduction to the theory of operation
of the Linux NFS server, explaining how the nfsd threads work and a
brief sketch of the lifetime of an NFS call as seen by the server.

Tools and techniques for identifying and measuring software bottlenecks
on large machines will be explored. These include profilers, statistics
tools, network sniffers, and the relative merits of various kinds of
load generators.

Various specific bottlenecks in the NFS server (discovered using these
tools) will be covered, along with how they affect performance on real
workloads and the approach SGI have taken in fixing each. Some of the
bottlenecks are specific to the NUMA nature of SGI's architecture, but
most are generic to any multi-processor platform.

The problems associated with erving large numbers (thousands) of clients,
rather than large amount of traffic from mere tens of clients, will be discussed,
together with how SGI have solved those.

Finally, there will be a brief mention of work remaining for the future.

Some knowledge of Linux kernel internals and TCP/IP networking will
be assumed, but not of NFS. The talk will appeal to kernel programmers,
people interested in NFS and network performance, and to programmers
interested in improving Linux kernel performance.

Direct link to video