XML Schema and Go
Like it or hate it, XML is a reality that many of us have to deal
with on a daily basis at the workplace. The encoding/xml
package
in Go's standard library provides a convenient, data-driven approach
to parsing XML documents that is usually sufficient for most use
cases. When you are dealing with a massive API, however, it quickly
grows tedious translating XML structures to Go types.
Most large SOAP-based web services provide a formal description of the XML structures they use in the form of XML Schema. The XML Schema standard is very large, and its 2-part spec is written in highly abstract, difficult language. I find it amusing that both the specification and XML schema documents themselves are full of boilerplate.
Languages with strong support for XML-based services, such as #C and Java, have very rich code-generation tools, that let you generate source code for working with the XML elements described in an XML Schema. The xsdgen package is my attempt to add Go to that list. With the go generate feature, added in Go 1.4, code generation is easier than ever.
Generating Go types from XML Schema
We have IPAM software at my workplace that provides a SOAP API. It has the following schema (anonymized) in its wsdl file:
<schema targetNamespace="http://example.com/"
xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
xmlns:tns="http://example.com/"
xmlns="http://www.w3.org/2001/XMLSchema">
<complexType name="WSDevice">
<sequence>
<element name="addressType" nillable="true" type="soapenc:string"/>
<element name="description" nillable="true" type="soapenc:string"/>
<element name="deviceType" nillable="true" type="soapenc:string"/>
<element name="domainName" nillable="true" type="soapenc:string"/>
<element name="hostname" nillable="true" type="soapenc:string"/>
<element name="id" nillable="true" type="soapenc:int"/>
<element maxOccurs="unbounded" name="interfaces" nillable="true" type="tns1:WSInterface"/>
<element name="ipAddress" nillable="true" type="soapenc:string"/>
</sequence>
</complexType>
<complexType name="WSInterface">
<sequence>
<element name="id" nillable="true" type="soapenc:int"/>
<element name="ipAddress" nillable="true" type="tns:ArrayOf_soapenc_string"/>
<element name="macAddress" nillable="true" type="soapenc:string"/>
<element name="name" nillable="true" type="soapenc:string"/>
<element name="sequence" nillable="true" type="soapenc:int"/>
<element name="virtual" nillable="true" type="soapenc:boolean"/>
</sequence>
</complexType>
<complexType name="ArrayOf_soapenc_string">
<complexContent>
<restriction base="soapenc:Array">
<attribute ref="soapenc:arrayType" wsdl:arrayType="soapenc:string[]"/>
</restriction>
</complexContent>
</complexType>
</schema>
The xsdgen command is suitable for use with go generate
. In
my workspace, I save the wsdl file as "schema.xml" and created the
file gen.go
with the following lines:
package ipam
//go:generate xsdgen -ns http://example.com/ -pkg ipam schema.xml
Running "go generate" produces the file "xsdgen_output.go":
package ipam
import "encoding/xml"
type ArrayOfsoapencstring []string
func (a *ArrayOfsoapencstring) MarshalXML(e *xml.Encoder, start xml.StartElement) error {
tag := xml.StartElement{Name: xml.Name{"", "item"}}
for _, elt := range *a {
if err := e.EncodeElement(elt, tag); err != nil {
return err
}
}
return nil
}
func (a *ArrayOfsoapencstring) UnmarshalXML(d *xml.Decoder, start xml.StartElement) (err error) {
var tok xml.Token
var itemTag = xml.Name{"", ",any"}
for tok, err = d.Token(); err == nil; tok, err = d.Token() {
if tok, ok := tok.(xml.StartElement); ok {
var item string
if itemTag.Local != ",any" && itemTag != tok.Name {
err = d.Skip()
continue
}
if err = d.DecodeElement(&item, &tok); err == nil {
*a = append(*a, item)
}
}
if _, ok := tok.(xml.EndElement); ok {
break
}
}
return err
}
type WSDevice struct {
AddressType string `xml:"http://example.com/ addressType"`
Description string `xml:"http://example.com/ description"`
DeviceType string `xml:"http://example.com/ deviceType"`
DomainName string `xml:"http://example.com/ domainName"`
Hostname string `xml:"http://example.com/ hostname"`
Id int `xml:"http://example.com/ id"`
Interfaces []WSInterface `xml:"http://example.com/ interfaces"`
IpAddress string `xml:"http://example.com/ ipAddress"`
}
type WSInterface struct {
Id int `xml:"http://example.com/ id"`
IpAddress ArrayOfsoapencstring `xml:"http://example.com/ ipAddress"`
MacAddress string `xml:"http://example.com/ macAddress"`
Name string `xml:"http://example.com/ name"`
Sequence int `xml:"http://example.com/ sequence"`
Virtual bool `xml:"http://example.com/ virtual"`
}
I can replace ugly names by modifying my xsdgen command:
//go:generate xsdgen -ns http://example.com/ -r "^WS -> " -r "ArrayOf_soapenc_string -> Strings" -pkg ipam schema.xml
Will produce types named Device
, Strings
, Interface
. Note
that while the replacement supports regular expressions and
subexpression substitution, the "go generate" command clobbers
subexpression references such as $1
.
The xsdgen package respects xml namespaces and inheritance; it knows
that a soapenc:string
is derived from an xsd:string
, for instance.
Rather than preserving this hierarchy in the generated Go source,
the xsdgen package "squashes" all inheritence, and tries to minimize
the levels of indirection between any given type and the builtin
types defined in the XML schema specification. This is done to
reduce the amount of code generated and provide a more pleasant
experience for the user of the generated library. While writing
these packages I became acutely aware of how heavily XML Schema was
influenced by inheritence-ridden OOP languages such as Java.
Customizing the behavior of xsdgen
You may need to customize the code generation process more than what the command-line flags to xsdgen allow. For instances, say that you do not care about the "sequence" or "virtual" elements defined in the schema above.
Create the file _gencfg/cfg.go
. The name is not important. I prefix
the directory with an underscore so that commands such as go build ./...
ignore it. The file contains something like this:
package main
import (
"log"
"os"
"aqwari.net/xml/xsdgen"
)
func main() {
var cfg xsdgen.Config
cfg.Option(xsdgen.DefaultOptions...)
cfg.Option(
xsdgen.LogOutput(log.New(os.Stderr, "", 0)),
xsdgen.IgnoreElements("virtual", "sequence"))
if err := cfg.GenCLI(os.Args[1:]...); err != nil {
log.Fatal(err)
}
}
The full set of Options available can be found in the documentation
for the xsdgen package. Some Options are pretty advanced,
providing a shim for manipulating types and Go syntax trees with
arbitrary code. Once this file is created, update the gen.go
file:
//go:generate go run _gencfg/cfg.go -ns http://example.com/ -r "^WS ->" -r "ArrayOfsoapencstring -> Strings" -pkg ipam schema.xml
The declaration of Interface
then becomes
type Interface struct {
Id int `xml:"http://example.com/ id"`
IpAddress Strings `xml:"http://example.com/ ipAddress"`
MacAddress string `xml:"http://example.com/ macAddress"`
Name string `xml:"http://example.com/ name"`
}
This was my first time using the go/ast
package in the Go standard
library. I recommend that anyone doing non-trivial code generation
look at using go/ast
instead of text/template
; being able to
manipulate expressions as data structures is very powerful. For instance,
a SOAP array is naiively mapped to the structure
type Array struct {
Items []T `xml:",any"`
}
As a post-processing step, the xsdgen package looks for any structures that contain a single slice element, and changes the type expression to
type Array []T
Because it uses an *ast.StructType
instead of opaque text, the code
can reach in and access information such as struct tags for use in
marshal/unmarshal methods.
Making it better
The code for the xsdgen
and related packages is on github.
- Related posts
-
Writing a 9P server from scratch
Sep 2015
Using the plan9 file system protocol -
Don't be afraid to panic
Feb 2015
Using panic and recover for clearer code -
Aqwari.net Go libraries
Jan 2013