Writing a 9P server from scratch, pt 3: server plumbing
In my previous post, I covered the implementation of a decoder and encoder for 9P messages. However, if that was all it took to implement a 9P server, 9P would not be a protocol; it would be a file format. In this post, I will cover the implementation of the net/styx package, which provides plumbing for writing a 9P server. This is not a hypothetical package; it is used to implement jsonfs, a 9P file server that serves a JSON-formatted file as a file system.
9P transactions
A typical 9P session consists of pairs of T-messages and R-messages that represent requests from the client and their responses from the server, respectively. Here is one such transaction, used to open a file:
The first three fields (size, type, and tag) are purely bookkeeping used to identify and associate a request with its response. Ignoring these fields, we have what looks very much like a remote procedure call:
open(fid, flag) -> (qid, iounit)
This simple example exposes a few important details that we will need to keep in mind:
- Ordered: 9P requires a transport that guarantees in-order transmission of messages. Because each session is essentially a sequence of RPCs, re-arranging said calls could have unexpected results. Also, because future RPCs can depend on the return value of prior RPCs, parallelizing or pipelining a large number of requests can prove challenging (but not impossible!)
- Synchronous: While it is valid in 9P to have multiple requests
"in flight", the fact that there is a 1-1 mapping between every
T-message and an R-message means each individual request blocks until
the server sends a response, and there are guarantees about the
state of the world when a client receives a response. For instance,
when an
Rclunk
message is received, the client is free to re-use the file handle in question.
Notes on identifiers
There are 3 types of identifiers in the 9P protocol.
- Tag: 16-bit identifiers for each transaction. Chosen by the client. No two unanswered T-messages may have the same tag.
- Fid: 32-bit identifier for a file pointer, akin to Unix file descriptors. Chosen by the client. Not unique; two fids may point to the same file.
- Qid: 104-bit identifier for a file, analogous to (but not the same as) an inode number in a Unix filesystem. Chosen by the server. No two files may have the same Qid, even if they have the same name and one has been deleted.
To ensure our server is suitable for public, anonymous use, we should pay special attention to the identifiers that are chosen by the client, as it affects how we store them. If we had control over fids, for instance, we could have a lookup table of open files that used the fid as an index:
var openFiles []*file
f := openFiles[m.Fid()]
However, because the server does not control fid (or tag) values, a client could choose values that are more difficult for the server to handle. For instance, in the above example, a client could use a fid of 4 billion. This would either cause a run-time panic, or cause 4GB of memory to be allocated for the connection, making denial-of-service type attacks trivial.
With this in mind, we must store and lookup these identifiers in a data
structure that uses the same amount of resources for any possible value.
There are plenty of candidates, such a sorted slice or a balanced tree,
but right at the top of the list is a simple map
. Note that Go's
implementation of maps is resistant to collision-based attacks.
The main loop
Here is the main loop of our server. This is very similar to any other Listen/Accept server you would write in Go.
for {
rwc, err := l.Accept()
if err != nil {
return err
}
conn := newConn(srv, rwc)
go conn.serve()
}
We spawn one goroutine per connection. Here is what a connection looks like:
type conn struct {
*styxproto.Decoder
*styxproto.Encoder
msize int64
sessionFid map[uint32]*Session
qidpool *qidpool.Pool
pendingReq map[uint16]context.CancelFunc
}
In the actual implementation, there are a few more fields
and the two maps are replaced with thread-safe wrappers. The
embedded Decoder and Encoders read and write 9P messages on
the underlying connection. Two maps are used to lookup the session
a file is associated with (more on sessions later) and to lookup in-flight
requests that may be cancelled via a Tflush
request. Unique qids are
retrieved for files on-demand.
The main loop of a connection looks something like this:
func (c *conn) serve() {
defer c.close()
if !c.acceptTversion() {
return
}
for c.Next() && c.Encoder.Err() == nil {
if !c.handleMessage(c.Msg()) {
break
}
}
if err := c.Encoder.Err(); err != nil {
c.srv.logf("write error: %s", err)
}
if err := c.Decoder.Err(); err != nil {
c.srv.logf("read error: %s", err)
}
c.srv.logf("closed connection from %s", c.remoteAddr())
}
Version negotiation must be the first transaction made, and looks like this:
for c.Next() {
for _, m := range c.Messages() {
tver, ok := m.(styxproto.Tversion)
if !ok {
c.Rerror(m.Tag(), "need Tversion")
return false
}
msize := tver.Msize()
if msize < styxproto.MinBufSize {
c.Rerror(m.Tag(), "buffer too small")
return false
}
if msize < c.msize {
c.msize = msize
c.Encoder.MaxSize = msize
c.Decoder.MaxSize = msize
}
if !bytes.HasPrefix(tver.Version(), []byte("9P2000")) {
c.Rversion(uint32(c.msize), "unknown")
}
c.Rversion(uint32(c.msize), "9P2000")
return true
}
}
In plain english, the client proposes a version, and the server responds with "unknown" until the client proposes a 9P2000 variant, after which the server, which has the final say, forces the client to use 9P2000. Here is the handler for all other messages:
func (c *conn) handleMessage(m styxproto.Msg) bool {
if _, ok := c.pendingReq[m.Tag()]; ok {
c.Rerror(m.Tag(), "%s", errTagInUse)
return false
}
cx, cancel := context.WithCancel(c.cx)
c.pendingReq[m.Tag()] = cancel
switch m := m.(type) {
case styxproto.Tauth:
return c.handleTauth(cx, m)
case styxproto.Tattach:
return c.handleTattach(cx, m)
case styxproto.Tflush:
return c.handleTflush(cx, m)
case fcall:
return c.handleFcall(cx, m)
case styxproto.BadMessage:
c.clearTag(m.Tag())
c.Rerror(m.Tag(), "bad message: %s", m.Err)
return true
default:
c.Rerror(m.Tag(), "unexpected %T message", m)
return false
}
return true
}
Each request type is passed to its own handler that is specific to
that type of transaction. You may be wondering what an fcall
is:
type fcall interface {
styxproto.Msg
Fid() uint32
}
Remember from the previous post that our representation of 9P
messages in the styxproto package is just a slice of bytes
with methods for the fields, with a 1-1 mapping from a message to a Go
type. Using that knowledge, we can create interfaces that select common
classes of 9P messages. In this case, an fcall
is a 9P message that
operates on a file, pointed to by a fid. An fcall is special, because
it is part of a session.
9P Sessions
9P allows for multiple sessions to be multiplexed over a single connection.
A session is established with an attach
call:
attach(fid, afid, user, aname) -> qid
This call establishes fid
as a file handle for the root of a filesystem
tree aname
(usually the empty string). After an attach, all operations
on fid
will be associated with the user named in the attach. The afid
argument has to do with authentication and will be explained later.
It may be somewhat surprising to learn that there is no "session ID" in
9P, especially coming from an HTTP mindset, where session cookies
abound. There is no need; in 9P, sessions are implicit, not explicit. Any
fid can be traced back to the attach
call that established its session.
This is due to a transaction we haven't covered yet, walk
:
walk(fid, newfid, path ...) -> qid ...
The walk
transaction is used to move around a directory hierarchy; think
of it a more general version of what happens when you do cd path/to/dir
on a normal file system. Note that the first argument to walk
is an
already established fid; the walk is relative to that file. Other than
attach
, walk
is the only way to establish a new fid. This is why
we can always use the ancestry of a fid to determine the session it is
associated with. In practical terms, this is what the sessionFid
map
in the conn
structure is for.
In the styx
package, there will be one managing goroutine per session.
This goroutine is created when the server handles a Tattach
message.
func (c *conn) handleTattach(ctx context.Context, m styxproto.Tattach) bool {
defer c.Flush()
s := newSession(c, m)
go func() {
c.srv.Handler.Serve9P(s)
s.cleanupHandler()
}()
c.sessionFid[m.Fid()] = s
s.IncRef()
s.files.Put(m.Fid(), file{name: ".", rwc: nil})
c.clearTag(m.Tag())
c.Rattach(m.Tag(), c.qid(".", styxproto.QTDIR))
return true
}
All sessions must have at least one file associated with them. After an
attach
transaction, that file is the root directory. Reference counting is
then used to detect when a session is finished and notify the session
handler. The cleanupHandler
method closes all open files on a
connection if its handler exits prematurely.
I/O
It is reasonable to say that the primary goal of a 9P session is to read
from and write to files. Here are the read
and write
transactions:
read(fid, offset, count) -> (count, data)
write(fid, offset, count, data) -> count
Note the offset
field: in 9P, it is not the server's responsibility to
keep track of the current position in the file; a client must specify
the offset for read and write operations every time. In Go, this is
very similar to the io.ReaderAt
and io.WriterAt
interfaces. One
solution, then, is to define an interface that all files must meet in
order to be served by the styx
package. The styxfile
package does just that, with styxfile.Interface
:
package styxfile
type Interface interface{
ReadAt(p []byte, offset int64) (int, error)
WriteAt(p []byte, offset int64) (int, error)
Close() error
}
The actual type just names io.ReaderAt
& co
rather than copying their definitions. Here is how we store these files,
in a map keyed by their fids:
type file struct {
rwc styxfile.Interface
name string
}
Here is one way we can handle the read
transaction
func (s *Session) handleTread(msg styxproto.Tread, f file) bool {
if f.rwc == nil {
s.conn.clearTag(msg.Tag())
s.conn.Rerror(msg.Tag(), "fid %d not open for I/O", msg.Fid())
return false
}
count := min(
int(msg.Count()),
int(s.conn.MaxSize - styxproto.IOHeaderSize))
buf := make([]byte, count)
n, err := f.rwc.ReadAt(buf, msg.Offset())
s.conn.clearTag(msg.Tag())
switch err {
case nil,io.EOF,io.ErrUnexpectedEOF:
s.conn.Rread(msg.Tag(), buf[:n])
default:
s.conn.Rerror(msg.Tag(), "%s", err)
}
return true
}
There's a lot here, so I'll step through it piece by piece.
if f.rwc == nil {
s.conn.clearTag(msg.Tag())
s.conn.Rerror(msg.Tag(), "fid %d not open for I/O", msg.Fid())
return false
}
Before a file may be read from, it must be prepared for I/O using the
open
transaction:
open(fid, flags) -> (qid, iounit)
Our message handler for this transaction sets the file's rwc
field
appropriately. All the message handlers in the styx
package return
true if the session can continue or false if it should be ended.
count := min(
int(msg.Count()),
int(s.conn.MaxSize - styxproto.IOHeaderSize))
buf := make([]byte, count)
Here we allocate a buffer, taking care not to let the client DOS us by requesting a large buffer. This buffer will hold the results of reading from the file.
I would have liked to avoid using a temporary buffer here. However,
there is very little we can do to avoid this, because we must know
how much data is available before writing an Rread
message. One
possible solution would be to introduce buffering into the styxfile.Encoder
and implement a method that takes an io.WriterTo
, or implement a
wrapper type that implements io.ReaderFrom
. However, this is a non-trivial
amount of work and must be justified by measurement.
n, err := f.rwc.ReadAt(buf, msg.Offset())
Here, we're finally reading the data from the file.
s.conn.clearTag(msg.Tag())
The server should always clear a tag before sending its response, the reason being that once a client sees a response to a given message, it is allowed to re-use the tag immediately for new transactions. See this commit where I fixed an issue that arose when using v9fs, where the client was fast enough to send a new request after the server responded to the old one, but before it cleared the tag for re-use.
The following lines were put together after some trial and error on my part.
switch err {
case nil,io.EOF,io.ErrUnexpectedEOF:
s.conn.Rread(msg.Tag(), buf[:n])
default:
s.conn.Rerror(msg.Tag(), "%s", err)
}
return true
In a traditional Unix filesystem, when you read from a file and you've
reached the end, further reads will return an error, usually called EOF
(end-of-file). If we were to copy this behavior in our 9P server, we would
do something like this:
if err == nil || n > 0 {
s.conn.Rread(msg.Tag(), buf[:n])
} else {
s.conn.Rerror(msg.Tag(), "%s", err)
}
However, this is not what is done in practice. Instead, the way to signal
an end-of-file condition in 9P is to send a 0-length Rread
response.
The client will note that nothing was returned and discern EOF that way.
This behavior is not explicit in the documentation for 9P, but was
observed after testing several clients against my server. See this
commit for more details. In retrospect, this behavior is cleaner
than using a sentinel error value; It has always felt kind of wrong to
me that EOF is considered an "error" given its inevitability (many files
have a finite length), and doing it this way means clients do not have
to parse error strings to discern EOF from other errors.
Using more than just ReadAt/WriteAt
While *os.File
implements styxfile.Interface
, many very useful
types do not implement ReadAt
or WriteAt
methods. Sometimes they
are not implemented because it was not deemed necessary. Sometimes it
is impossible; when you consider that these methods are essentially a
more flexible, client-side seek
, it does not really make sense to call
seek
on sockets, fifos, pipes, and other pseudo-files. We really want
server authors to have the flexibility to provide any kind of byte stream
as a file. With this in mind, I created the styxfile.New
function, which takes an interface{}
and picks a wrapper type to fill
in mising functionality:
func New(rwc interface{}) (Interface, error) {
switch rwc := rwc.(type) {
case Interface:
return rwc, nil
case interfaceWithoutClose:
return nopCloser{rwc}, nil
case io.Seeker:
return &seekerAt{rwc: rwc}, nil
case io.Reader:
return &dumbPipe{rwc: rwc}, nil
case io.Writer:
return &dumbPipe{rwc: rwc}, nil
}
return nil, fmt.Errorf(
"Cannot convert %T to styxfile.Interface", rwc)
}
For types that implement Seek
, we can implement ReaderAt
and WriterAt
by seeking to the offset before calling Read
or Write
. This must be protected
by a mutex.
For types that only allow Read
or Write
, we can track the current offset
in the stream and return an error if a client attempts to write or read elsewhere:
type dumbPipe struct {
rwc interface{}
offset int64
sync.Mutex
}
func (dp *dumbPipe) ReadAt(p []byte, offset int64) (int, error) {
r, ok := dp.rwc.(io.Reader)
if !ok {
return 0, ErrNotSupported
}
dp.Lock()
defer dp.Unlock()
if dp.offset != offset {
return 0, ErrNoSeek
}
n, err := io.ReadFull(r, p)
dp.offset += int64(n)
return n, err
}
By limiting the amount of tedious work server code has to do, we
also reduce the chance of mistakes; it is not entirely trivial to implement
ReadAt
and WriteAt
, and it is better to do it once than force users
to repeat it over and over again.
Directory listings
In 9P, directories are not special; they are simply files full of
Stat
structures, which are documented here. A client
can list the contents of a directory by reading it, as if it were
any other file.
We could stipulate that server programs must marshal the contents
of directories into styxproto.Stat
structures. But this is usually
too much work. Instead, we can lift the Readdir
method from
*os.File
and use it to define a new interface:
type Directory interface {
Readdir(n int) ([]os.FileInfo, error)
}
Then, we provide the styxfile.NewDir
function that turns any
type that meets the above interface into a styxfile.Interface
which automatically translates os.FileInfo
values into
styxproto.Stat
structures. Here are the important bits of
that wrapper type:
type dirReader struct {
Directory
offset int64
sync.Mutex
pool *qidpool.Pool
path string
}
func (d *dirReader) ReadAt(p []byte, offset int64) (int, error) {
d.Lock()
defer d.Unlock()
if offset != d.offset {
return 0, ErrNoSeek
}
nstats := len(p) / styxproto.MaxStatLen
if nstats == 0 {
return 0, ErrSmallRead
}
fi, err := d.Readdir(nstats)
n, marshalErr := marshalStats(p, fi, d.path, d.pool)
d.offset += int64(n)
if marshalErr != nil {
return n, marshalErr
}
return n, err
}
Translating an os.FileInfo
value into a styxproto.Stat
structure
can be somewhat OS-specific when dealing with real files, especially
when determining ownership of a file. 9P uses string identifiers for
user and group names, not the uid/gid numbers that most people
are used to. While I think this was a good design choice, having had
to deal with several id/name mismatches on NFSv3 exports in my career,
it also means we have to resolve the UID/GID of the syscall.Stat_t
structure returned by FileInfo.Sys()
for real files on unix systems.
The sys package contains the non-portable code required to determine ownership of a file.
Authentication
The 9P protocol does not prescribe an authentication method. Instead,
client and server communicate by reading from and writing to a special
file. The handle to this file is established with the auth
transaction:
auth(afid, uname, aname) -> qid
Client and server may then carry out nearly any authentication protocol.
This helps 9P stay current, as it can take in more authentication methods
as they arise. Once the authentication protocol is complete, a client can
use the afid as an authentication token in subsequent attach
transactions.
To implement authentication, a server must provide the following function:
type AuthFunc func(rwc io.ReadWriteCloser, user, aname string) error
From the server side, a read on rwc
blocks until a Twrite
request
comes from the client, and a write on rwc
blocks until a Tread
request comes, then the data is passed to the client. In this way, we hide
the details of the marshalling and processing of 9P messages from the
authentication function, and in effect give it a private bi-directional
channel with the client that is tunnelled over 9P. This was suprisingly
easy to implement using Go's net.Pipe
function. The implementation
is here. The styxauth
package provides a few authentication functions. Note that the actual
styx.AuthFunc
has access to the underlying network connection,
allowing it to use transport-based authentication, as demonstrated by
the SocketPeerID
and TLSSubjectCN
functions.
Tracing 9P requests
Being able to see the messages as they come in and go out is invaluable
when troubleshooting errors. When testing the styx
package, I found
I made a number of incorrect assumptions or misread the documentation
in a few key places. It would have taken me ages to find the problem if
I could not see the messages coming in to the server and going out.
I wrote the internal tracing package for this purpose. It
provides wrappers for styxproto.Decoder
and styxproto.Encoder
that
allow us to peek at messages as they pass through. Because of our
choice not to unmarshal messages, but leave them as they are, this
was surprisingly easy and, at a glance, should not incur an unreasonable
performance penalty. I added a Write
function to the styxproto
package:
func Write(w io.Writer, m Msg) (written int64, err error) {
n, err := w.Write(m.bytes())
if r, ok := m.(io.Reader); ok {
written, err = io.Copy(w, r)
return written + int64(n), err
}
return int64(n), err
}
Then, implementing tracing was simply a matter of stacking
encoders/decoders together with io.Pipe()
:
func Decoder(r io.Reader, fn TraceFn) *styxproto.Decoder {
rd, wr := io.Pipe()
decoderInput := styxproto.NewDecoderSize(r, 8*kilobyte)
decoderTrace := styxproto.NewDecoderSize(rd, 8*kilobyte)
go func() {
for decoderInput.Next() {
for _, m := range decoderInput.Messages() {
fn(m)
styxproto.Write(wr, m)
}
}
wr.Close()
}()
return decoderTrace
}
func Encoder(w io.Writer, fn TraceFn) *styxproto.Encoder {
rd, wr := io.Pipe()
encoder := styxproto.NewEncoder(wr)
decoder := styxproto.NewDecoderSize(rd, 8*kilobyte)
go func() {
for decoder.Next() {
for _, m := range decoder.Messages() {
fn(m)
styxproto.Write(w, m)
}
}
}()
return encoder
}
In the styx
package, tracing is accessed by setting the TraceLog
member on the Server
structure. The output looks like this:
→ 65535 Tversion msize=8192 version="9P2000"
← 65535 Rversion msize=8192 version="9P2000"
→ 000 Tattach fid=1 afid=NOFID uname="droyo" aname=""
← 000 Rattach qid="type=128 ver=0 path=1"
→ 000 Twalk fid=1 newfid=2 "apiVersion"
← 000 Rwalk wqid="type=0 ver=0 path=2"
→ 000 Topen fid=2 mode=0
← 000 Ropen qid="type=0 ver=0 path=2" iounit=0
→ 000 Tread fid=2 offset=0 count=8168
← 000 Rread count=3
→ 000 Tread fid=2 offset=3 count=8168
← 000 Rread count=0
Interfacing with user code
So far, all that we have covered a lot of plumbing, that shuffles
messages along to their appropriate handlers. We are able to handle
many bookkeeping transactions like flush
, clunk
, and attach
without asking the user code for help. However, when it comes to the
important stuff, file IO and directory walking, we need to ask the
user code for help. Exactly how we do that will be the topic of the
next post in this series, as this post has already gotten too long.
We will cover the server API of the styx
package using a toy
server, jsonfs.
- Related posts
-
Writing a 9P server from scratch
Sep 2015
Using the plan9 file system protocol -
Writing a 9P server from scratch, pt 2: Protocol parsing
Sep 2015
Decoding the 9P message format