API v3 of the yaml package for Go is available

Gustavo Niemeyer

Gustavo Niemeyer

on 5 April 2019

Tags: golang, yaml

Moving Gopher by @ashleymcnamara

API v3 of the yaml package for Go is out, and it brings comment handling, intermediate node representations, and much more.

The initial sketch for v3 of the yaml package for Go was first drafted almost exactly a year ago, by the end of March last year (2018). If this package doesn’t sound familiar, it’s the underlying code that reads and writes configuration files for many of the popular Go packages that we benefit from, including snaps, Kubernetes, juju, and many others.

The package was born in late 2010, and its v2 API was stabilized by 2014, so between then and now there were about 4 years of small paper cuts to address in its behavior. But paper cuts are rarely a good motivation to introduce a breaking change that forces people to perform manual actions.

This time what finally broke the stability barrier was the desire of having an intermediate representation that allowed YAML to be understood and described in a structure closer to the text format. This unlocks a number of interesting features that were long time requests in the issue tracker, such as the ability to manipulate comments, surfacing line and column locations for data, better handling of style, encoding and decoding of anchors and aliases, and so on.

Although fairly distinct features, there’s a common theme uniting these: automated manipulation. API v3 offers a better ability to read and write documents while preserving much of the original form of the text. But there’s much more packaged into v3 as well. Here are the key changes to be aware of between v2 and v3:

The new Node type

While YAML documents are typically encoded and decoded into higher level types, such as structs and maps, Node is an intermediate representation that allows detailed control over the content being decoded or encoded.

Values that use the new type interact with the yaml package in the typical way. For example, this would decode a document with personal details into a struct:

type Person struct {
    Name    string
    Address yaml.Node

var person Person
err := yaml.Unmarshal(data, &person)

This should work as expected. That Address field, though, will have a very detailed view into the YAML text decoded for that particular field, including line and column numbers, style, and even comments surrounding the node. The Node.Decode method then allows lazily decoding that individual part of the document further into any other type.

When it’s time to encode this type back into yaml text, the behavior should also be unsurprising. The Node data is taken into account and preserved to a large degree.

This feature alone unlocks many of the following changes.

Comment decoding and encoding!

This was probably the most requested feature since the package was conceived. The goal is clear: YAML files are handled by humans, and humans need to document things to keep their sanity. If we can’t preserve comments, we can’t modify the files without destroying them as far as people are concerned.

But just reading and writing comments isn’t enough. Most parsers can easily do that by just generating a new token at the appropriate time. The problem to solve is making it comfortable to manipulate the comments and the association between them and the values they are documenting.

With v3, we can now achieve that with three fields in the new Node type:

type Node struct {
        HeadComment string
        LineComment string
        FootComment string

These fields contain the comments immediately preceding a value, inline with them, and immediately following a value. Line breaks are taken into account when deciding whether to associate a comment with the preceding value, or the following one.

The way this works means values may be moved around or removed and their comments will go along with them, and values may be easily documented directly in their respective Node.

Support for anchors and aliases

Anchor and alias manipulation is supported via the new Node API as well, which has properties that allow fine grained control over those.

The Unmarshaler interface now uses Node

The Node API is a much more powerful way to do customized decoding, so types implementing the new yaml.Unmarshaler interface will take a Node as a parameter, which allows them to introspect the value at will, and then lazily decode the value further if desired, via the Node.Decode method.

With that said, we’re preserving backwards compatibility with v2-style unmarshalers. If a type implements the old method, it will be called.

MapSlice is gone

MapSlice was a simplistic way to customize ordering in mappings that would otherwise be sorted. Node gives us that and much more, so MapSlice is gone from the API altogether.

Decoded values are not zeroed anymore

Up to v2, the yaml package would zero out the value being decoded into so that decoding inside iterations is safe from left over data from previous iterations. As a side effect, though, it becomes harder to deal with types that are pre-initialized for good reasons.

So in v3 this behavior has changed, which means pre-existing data will be left alone on any fields that were not correctly decoded into. The resulting value is then a sum of the previous data and the successfully decoded fields.

When porting, particular care needs to be taken in the aforementioned cases: loops that decode into the same value repeatedly. If the variable decoded into is declared in a scope out of the loop block, following iterations will observe data decoded in previous iterations. To fix that just move the variable declaration inside the loop.

Booleans are now from YAML 1.2, mostly

In YAML 1.2 the yes/no and on/off booleans are gone. These are just strings, and all that remains as typed booleans are the classical true/false.

To preserve some compatibility with YAML 1.1, the old strings will decode correctly into a typed boolean value (a Go bool). But given freedom, the package will unmarshal those as strings.

String keyed maps are easier

Improving the situation here was another frequent request, but nothing could happen without breaking compatibility, so it needed to wait.

The perceived problem was that YAML supports non-string keys in its mappings, which the Go yaml package handles as map[interface{}]interface{} when given a choice, but developers don’t like using that very often.

To be clear, the Go package always supported decoding into map[string]interface{} (or map[string]anything really), and that continues to work well, but if there is a map inside a map, that second map would be decoded in the most general form for lack of typing information.

This has now changed. When v3 finds a map where all the keys are strings, which is the most common scenario by far, it will automatically decode it into a map[string]interface{} type.

Hopefully this change will make the most common cases easier, while still respecting the ability to have maps with non-string keys.

Custom indentation via Encoder.SetIndent

We can now change the default indentation. To be fair, this was pretty easy to do internally before as well, but there was a known mishandling of the indentation in some cases that needed to be addressed. This is now fixed, and we can custom-indent at will.

Default indentation is now 4 spaces

Now that we have that problem addressed, the default has also changed to something more reasonable. Unless customized, everything is encoded with 4 space indents now.

time.Duration will reject plain ints

We’ve supported nice formatted duration decodings for a long time now, but unfortunately the original logic allowed people to use naked ints too, which meant nanoseconds. But that’s almost always done as a mistake, but we couldn’t fix the bug without breaking potential code depending on the feature.

Now this is gone. Durations are strings.

Unique keys by default

The specification says that mapping keys are unique, but unless the strict flag was specified the decoding was lenient. Again, backwards compatibility. But no more.. now keys need to be unique.

Strict mode is now Decoder.KnownFields

The general “strict” term is gone from the API, and has been replaced by the more specific Decoder.KnownFields method which forces the decoder to ensure that any mapping keys being decoded from the YAML text are present in the struct being decoded into.

The other behavior change enabled by the strict mode was forcing unique keys, which is now the default.

Octals from YAML 1.2 are supported

But we’re not dropping the YAML 1.1 ones. In fact, they are still the default. The new octals from YAML 1.2 look like 0o777, which is cute, but pretty much 100% of the code out there still uses the old school 0777. Not only that but the code handling them doesn’t recognize the new format. So we won’t change that behavior in v3. This is something to reevaluate for v4, in 10 years (?). Meanwhile, we support both the new format and the old one, and encode in the old one.

Bug fixes and improvements

There were other minor bug fixes and small features that are not worth calling out here, but it may be worth looking into v3 if you are unhappy with something in v2.

Happy hacking!

(Gopher Artwork by Ashley McNamara)

Talk to us today

Interested in running Ubuntu Desktop in your organisation?

Newsletter signup

Select topics you’re interested in

In submitting this form, I confirm that I have read and agree to Canonical’s Privacy Notice and Privacy Policy.

Related posts

Zero to Hero – Snap me up before you GO!

Two weeks ago, my colleague Alan wrote an article on how one goes about packaging an application as a snap. The focus of that piece was a handful of tips and tricks that should make the transition from raw code to a working snap easier and…