Author Archives: Jason

New tricks for old MUDs: using JSON for flexible player data

In the beginning (1990), the creators of Diku had many programming challenges.  One of the very least interesting of these was how player data would be written to and read from permanent disk storage.  This was before the days of structured data formats and object-oriented databases – some subset of Staerfeldt, Madsen, Nyboe, Seifert, and Hammer had to roll their own format.  Sometimes I wonder if they thought that their work would still be around nearly 30 years later.

As it turns out, the original method they came up with for storing players was a little terrible.  Diku stored all players in one binary file, and all of their possessions in another binary file.  Altering either of these would corrupt your entire player base. A year later Merc 1.0 would take a big leap to fix this, by storing players (with their stuff) in individual binary files.  You still couldn’t modify them unless you were handy with a hex editor, but at least only one player could be messed up at a time.

The Almighty pFile

It was Merc 2.0 (1992) that would introduce the format that most of the Merc derivatives still use: the human readable “pfile.”  For the first time, game administrators could search and modify the files as needed.  Here’s an example, straight from a fresh save on Merc:

What we see here are strings terminated by a tilde character, simple integers for Sex, Class, Race, and Level, and an array of integers for HpManaMove.  The reading behavior is defined for each key – whether to expect a string, int, array, or something else.

This was a huge improvement over previous games – and not just because the admin could tinker with it.  Merc 2 introduced treating the file as a key-value store, where reading the key (such as “Name”) could be mapped to the correct location to store the value (“Montrey”).  Keys could come in any order as well, to prevent additions or removals from making files unreadable.  This was accomplished with a nifty macro to hide a huge ugly bunch of “if” conditions:

The inflexible pFile

If you’ve ever worked on a Merc derivative (like ROM), you’ve seen all this in save.c.  It works well, to an extent.  Why would anyone want to change it?

Rather than list all the abstract reasons why a portable, standardized structured file format is better than the Merc standard, I’ll describe the use case that led me to the decision to change.  We at Legacy are planning an equipment enhancement system based on placing “gem” objects in a piece of equipment to add stat boosts.  Simple enough, I thought.  I’ll just make any piece of equipment able to contain a list of “gems”, add up and cache their effects, and done.  It was all clear-cut… until I faced save.c.

In Merc muds, objects (of the container type) can hold a list of other objects.  When the game saves the player’s inventory, it writes the first object, followed by the first thing (if any) that it contains, writing the nest level for each object so that they can be read correctly in reverse order.  (Actually, because of how singly linked lists work, it uses a recursive function to write the list in reverse order, so that it can be loaded in the correct order without having to iterate to the tail of the list.  But that’s not important.)  The pfile ends up looking like this:

So how to add a new savable list of objects that can be “contained” in another object? What if you put gems in a bag?  How about a bag encrusted with gems?  Hrm.  You could write one list after another… but then you’d have to remember the nest level for “gems” and for “contents.”  You could come up with all kinds of convoluted ways that would make the code harder to read, harder to maintain, and easier to break.  The limitations of the pfile format were starting to become obvious.  The format was meant to store sets of values, but only barely supports nested objects.  I wanted a more elegant solution.

Enter JSON

JSON (JavaScript Object Notation) is a fairly modern structured storage format, similar to XML without all the <>; nonsense.  A JSON file can store 3 types of things:

  • values (strings, integers, floats, etc)
  • lists (of values, like a C array), separated by commas, enclosed within square brackets ([])
  • maps (a set of named values, like a C struct), with keys : value pairs separated by commas and enclosed within braces ({})

Most importantly for my purposes, these can be nested – you can store, say, an array of maps.  Some examples can be found at json.org, but it might be easier to just show a Character expressed in JSON:

The root of the JSON file is a map, with named values contained in it.  Name, Description, Sex, Class, Race, and Level are string or integer values.  HpManaMove is a list of integer values.  Inventory is nested map of the top-level inventory objects, with Contents and Gems optional lists of more objects.  Obviously, you probably don’t want to write complex JSON files by hand, but it is still pretty straightforward if you need to occasionally grep, sed, or fix something in a text editor.

Serializing MUD objects with cJSON

So how do we make it happen?  I won’t lie, it did take a while to convert Legacy to use JSON for pfiles, but once I muddled my way through it, it turned out to be pretty simple.  I decided to use the cJSON project, because it is lightweight (just a single include and source file) and had the reasonably simple semantics I wanted.  Here’s a shortened example of writing a character to file:

In this code, we simply create a root object (which is a JSON map) and add some named objects to it.  Then we let cJSON convert the whole thing to a string and write it to the file.  cJSON takes care of the parsing for us too, so at the high level, reading a pfile is essentially the reverse:

Constructing and iterating through cJSON objects

Of course, the real meat of it in the fread/write_char and inventory functions.  Here is how the “character” section is written – actually not that different from the default fprintf statements:

And next is the corresponding read section. This is a little more involved because of how ROM uses a memory pool for strings. We also use a variety of integer lengths and some strings that convert to bitpacked integers, so I defined some functions to handle the conversions:

With that out of the way, reading data isn’t much different from the original Merc 2 way:

Wasn’t this about nested objects?

The original problem was that I was having trouble storing different types of nested lists in the Merc format. This turns out to be trivial with JSON, because the structure is already similar to the inventory hierarchy of the player. An object can contain other objects, or other named lists (“gems”, etc) as necessary. We simply create those lists within the cJSON object representing the game object.

What about that spiffy recursion to write the list in reverse order? Turns out we don’t even need it – since cJSON stores children of a cJSON node as a linked list, we can efficiently insert new children at the head (index 0). This way, the list is written backwards, so we can naturally load it forwards. Next up is writing the object itself, which can write its own lists of contents as necessary:

We’re in the home stretch! All that’s left to implement is reading those objects back into the game. Here things get a little weird – we do a check to see if the object loaded correctly. What if we decided to remove the object from the game since last time the player logged on? Rather than blow up the contents, we put them into the player’s inventory.

And finally, the code to load one game object:

What’s the downside?

“Wow!” you might say.  “Why not put all the game’s data in JSON?”

Why not?  JSON is easy to read, easy to write.  Players are probably the most complex thing you would ever use them for in a MUD.  Databases have their purposes (we use sqlite3 for some data) but files are good for human-accessible things.

I would say that the one thing I would not consider storing in JSON is the area files, unless you are wholly committed to OLC.  JSON is human readable, human changeable… but actually writing complex data files by hand would be tedious at best, and error-prone at worst.

You also might think that performance is an issue, because a JSON file is more verbose than the Merc format.  In the early 1990s, this was probably true.  I may have been a little hard on the developers of Merc in the introduction – they faced resource constraints and performance challenges that only embedded systems programmers think about now.  But in 2017, the extra 20-30% bytes in a JSON player file is going to amount to zero measurable difference.

Putting it all together

By now, I’ll bet you’re just dying to crack open your ROM source code for a marathon coding session, chugging Mountain Dew and pretending it’s 1994 again, dreaming up the new features made possible by the slick JSON player files. I tried not to make it too exciting, but it’s exciting stuff after all.

But wait. How do you get those old player files into the new format?

You could implement the new system in two stages, since your game already reads the old files.  Create the writer, and then come up with some way to batch load and save all of your current files.  I did that once to build a MySQL database pfile implementation, and it worked fine.  The parsing, anyway – SQL storage for complex game objects is a terrible idea.  Or, you could take the conversion script I wrote in Python and spend a few minutes adapting it to your particular flavor of pfiles.  I’ve included it in the source linked below.

I hope you’ve enjoyed this article, and for those of you still running old Merc/ROM derivatives, maybe it will be helpful.  You can find a copy of Legacy’s save.c here, and if you’d like to see this and our other projects in action, please visit us at legacy.xenith.org:3000.

Dockered DPDK: packaging Open vSwitch

I recently attended the NFV World Congress in San Jose, and had a great time talking to vendors about their solutions and current trends toward widespread NFV adoption. Intel’s hot new(ish) multicore programming framework – the Data Plane Development Kit, or DPDK – was part of the marketing spiel of almost everyone even remotely invested in the NFVI.  The main interest is in the poll mode driver, which dedicates a CPU core to polling devices rather than waiting for interrupts to signal when a packet has arrived.  This has resulted in some amazing packet processing rates, such as a DPDK-accelerated Open vSwitch switching at 14.88Mp/s.

Since I’ve been working with Docker lately, I naturally started imagining what could be done with combining crazy fast DPDK applications with the lightweight virtualization and deployment flexibility of Docker.  Many DPDK applications – such as Open vSwitch – have some requirements in the DPDK build that may break other applications if they relied on the same libraries.  This makes it a great candidate for containerization, since we can give the application its very own tested build and run environment.

I was not, of course, the first to think of this – some Googling will turn up quite a few bits and pieces that have been helpful in writing this post.  My goal here is to bring that information into a consolidated tutorial and to explain the containerized DPDK framework that I have published to Dockerhub.

DPDK Framework in a Container

DPDK applications need to access a set of headers and libraries for compilation, so I decided to create a base container (Github, Dockerhub) with those resources.  Here’s the Dockerfile:

Pretty basic stuff at first – get some packages, set the all-important RTE_SDK environment variable, grab the source.  One thing that is important is to not rely on kernel headers; doing so would be seriously non portable.  The uio and igb_uio kernel modules have to be built and installed by the host that will run the DPDK container. Therefore, we configure the SDK to not compile kernel modules, and therefore not require installing kernel headers on the build system.

The key part of this build script is the deferment of compilation to when the application is built, so that the application can specify its requirements. This is done by requiring the DPDK application provide dpdk_env.sh and dpdk_config.sh, which provide environment variables (such as RTE_TARGET) and a set of commands to run before compilation occurs. For example, Open vSwitch requires that DPDK be compiled with CONFIG_RTE_BUILD_COMBINE_LIBS=y in its configuration, which would be inserted in dpdk_config.sh.

DPDK Application in a Container

Now that the framework is there, time to use it in an application.  In this post I will demonstrate Open vSwitch in a container (Github, Dockerhub), which could be plenty useful.  To begin, here’s the dpdk_env.sh and dpdk_config.sh files:

OVS has some special requirements for DPDK, which is kind of the point of putting it in a container, right? Here’s the Dockerfile to build it:

The ONBUILD instructions in the DPDK Dockerfile will be executed first, of course, which will compile the DPDK framework. Then we install more packages for OVS, get the source, and compile with DPDK options. In the last few lines, we move the final script into the container, which is all the stuff OVS needs running:

Now, here you could go a bit differently, and the repository I linked to may change somewhat. It could be said that it is more Dockerish to put the ovsdb-server in its own container, and then link them. However, this is a self contained example, so we’ll just go with this.

Running Open vSwitch

Before we start it up, we need to fulfill some prerequisites. I won’t go into details on the how and why, but please see the DPDK Getting Started Guide and the OVS-DPDK installation guide.  OVS requires 1GB huge pages, so you need your /etc/default/grub to have at least these options:

followed by an update-grub and reboot. You also need to mount them with this or the /etc/fstab equivalent:

Compile the kernel module on the host system and insert it. Download DPDK, extract, and run the dpdk/tools/setup.sh script. Choose to build to the x86_64-native-linuxapp-gcc target, currently option 9, and then insert the UIO module, currently option 12. Finally, bind one of your interfaces with option 18, though you’ll have to bring that interface down first.

Now you can start the container. Here’s what I used:

This gives the container access to the huge page mount, and the uio0 device that you just bound to the UIO driver. I found that I needed to run the container as --privileged to access parts of the /dev/uio0 filesystem, though it appears that some people are able to get around this. I will update this post if I find out how to run the container without privileged.

If all goes well, you now have DPDK-accelerated OVS running in a container, and you can go about adding interfaces to the container, adding them to OVS, and setting up rules for forwarding packets at ludicrous speeds. Good luck, and please let me know how it works out for you!

Links

DPDK base Docker container – rakurai/dpdkGithub, Dockerhub
Open vSwitch Docker container – rakurai/ovs-dpdkGithub, Dockerhub
DPDK Getting Started Guide
OVS-DPDK installation guide

Exposing Docker containers with SR-IOV

In some of my recent research in NFV, I’ve needed to expose Docker containers to the host’s network, treating them like fully functional virtual machines with their own interfaces and routable IP addresses.  This type of exposure is overkill for many applications, but necessary for user space packet processing such as that required for NFV.  An example use case might be if you want to give your containerized virtual firewall direct access to a physical interface, bypassing the OVS/bridge and the associated overhead, but without the security vulnerabilities of --net=host.

You have a few options in this kind of situation.  You could directly assign the interface, giving the container exclusive access.  The number of available physical NICs is limited, though, so a more realistic option for most of us is to virtualize the interface, and let the container think it has a real NIC.  Jérôme Petazzoni’s pipework script makes it a breeze to do this; by default, if you assign a physical interface to a container with pipework, it will create a virtual interface, place it under a fast macvlan L2 switch, and assign the virtual interface to the container’s network namespace. This comes with a cost, of course: macvlan is still a software switch, another layer between your container and the NIC.

A third option is to use Single Root IO Virtualization (SR-IOV), which allows a PCI device to present itself as multiple virtual functions to the OS.  Each function acts as a device and gets its own MAC, and the NIC can then use its built in hardware classifier to place incoming packets in separate RX queues based on the destination MAC.  Scott Lowe has a great intro to SR-IOV here.

There are a few reasons that you might want to use SR-IOV rather than a macvlan bridge.  There are architectural benefits to moving the packet processing from the kernel to user space – it can improve cache locality and reduce context switches (Rathore et al., 2013).  It can also make it easier to quantify and control the CPU usage of the user space application, which is critical in multi tenant environments.

On the other hand, there are situations where you would definitely not want to use SR-IOV – primarily when you have containers on the same host that need to communicate with each other. In this case, packets must be copied over the PCIe bus to the NIC to be switched back to the host, which has a pretty dismal performance penalty.  I published a paper recently that covers this and other performance issues concerning Docker networking with SR-IOV, macvlan, and OVS; take a look at the results of chaining multiple containers and the increase in latency and jitter (Anderson et al., 2016).

So, let’s dig in and make it happen.  You’ll want to make sure that the NIC you intend to virtualize actually supports SR-IOV, and how many virtual functions are supported.  I’m working with some Dell C8220s with Intel 82599 10G NICs, which support up to 63 virtual functions.  Here’s a list of Intel devices with support, other manufacturers should have their own lists.

Creating Virtual NICs

First, get a list of your available NICs.  Here’s a handy one liner:

This gives you the PCI slot, class, and other useful information of your Ethernet devices, like this:

In this case, I’m going to be virtualizing the 10G NIC, so I note the slot: 0000:82:00.0.  Next, decide how many virtual functions you’re going to need, in addition to the physical device.  I’m going to be assigning interfaces to 2 Docker containers, so I’ll create 2 VFs.  Next, we’ll just write that number into the sriov_numvfs file for the device:

Now, check ifconfig -a. You should see a number of new interfaces created, starting with “eth” and numbered after your existing interfaces. They’ve been assigned random MACs, and are ready for you to use. Here’s mine:

Plumb It

My preferred tool to add interfaces to Docker containers is with pipework, but in this case, pipework will virtualize the virtual interface with a macvlan bridge.  As a workaround, I forked the pipework repository and made it accept --direct-phys as the first argument, to force it to skip the macvlan and bring the interface directly into the container’s network namespace. I’ve upstreamed the change, and if it makes its way into the original project, I’ll update this post.

First, I’ll make a container for testing:

Now, let’s give that container a new virtual NIC, with the modified pipework:

By default, pipework will name the new interface eth1 inside the container (Note: see bottom of post for one caveat).  Just to double check:

Note that the MAC is the same as ifconfig on the host, and that also the interface no longer visible in the host’s ifconfig: this is because the interface is now in the container’s network namespace. Now, to try it out with another physical machine on that interface’s network:

If you like doing things the hard way, here’s the steps to mimic how pipework put the interface in the container’s network namespace:

EDIT:

One issue that you may have with this approach happens when you stop the container.  If you’ve renamed the interface to something else, like pipework and my above example do, there will be a conflict when the kernel tries to move the interface back to the host’s namespace.  The simplest solution would just be to avoid renaming the interface, unless it’s critical that the interface be named something specific within the container.  This is pretty easy with pipework, just specify the container interface name:

Let me know how it works out for you in the comments below.

Links:
Jérôme Petazzoni’s pipework
Modified pipework with –direct-phys option
Scott Lowe on SR-IOV
Intel devices that support SR-IOV
Intro to Linux namespaces