Packetizer Logo
 

XML Compression

While XML is widely used on the Internet for a wide variety of tasks, including configuration files, protocols, and web services, XML is a very verbose syntax. Simple messages can be quite large, considering how little information is actually conveyed.

To address this issue, a number of people have investigated various compression methods. Perhaps the two most interesting options to consider when compressing XML is zlib, a general-purpose compression algorithm, and Efficient XML Interchange (EXI), which is special-purpose XML encoding method that produces a binary encoding.

Zlib's primary strength is the fact that is can produce an extremely compact message. It is also relatively small and widely deployed on virtually every hardware platform. Most importantly, it has been exhaustively tested over a period of years and is a proven technology.

EXI is a relatively new technology. However, it offers the potential to compress XML and preserve the "streamability" of the XML stream. It would be possible, for example, to use a SAX-style decoder to receive an EXI message and decode without any additional steps. Using zlib, by comparison, would require one to receive the entire binary message, decompress it, and then parse the resulting XML message.

On the surface, it would seem that zlib would be more processor intensive. However, we are lacking concrete information on computational requirements for both EXI and zlib.

EXI offers a number of different operating modes, including bit-aligned (default), byte-aligned, schema-informed, "schemaless", compression, etc. The various options are defined here.

We set out to understand just how much compression we could achieve with EXI and compared that to zlib. The results appear in the chart below.

XML Compression with EXI and zlib
XML Compression with EXI and zlib

What is very interesting to note is that for slightly larger files (those greater than 1KB in size), almost all of the modes converge to produce files that are roughly the same size. One has to question whether choosing a computationally more expensive mode has any real benefit.

What is also interesting is that zlib almost always produces a smaller file once files are larger than 1KB. But, using EXI with the internal compression capability produces the most compact encoding of all.