Variable Length Integer EncodingAugust 29, 2023 In recent years, I’ve seen several methods for variable-length encoding of integers for transmission over a network. The most recent one I encountered was defined in section 16 of the QUIC protocol specification. In that document, 64-bit integers are encoded as 1, 2, 4, or 8 octets based on the value of the integer. Another format is specified in Appendix B of the CBOR specification, which specifies how to encode any one of the typical integer sizes commonly used on modern computers today. Using a rigid encoding like these offers a means of quickly serializing data, but at a cost of increasing the number of octets required. In the case of CBOR, it perhaps isn’t accurate to refer to it as a variable-length encoding, but rather an encoding for each possible signed and unsigned integer type. However, the effect is more-or-less similar, though larger integers are encoded with less space efficiency. When considering space-efficiency, another encoding approach is to encode integers so that the MSb of each serialized octet indicates whether this is the final octet or whether there is another octet to consume. The following illustrates that idea: 10000011 11111111 01111111 ^ ^ ^—- 0 == final octet Here, if a 1 is present as the MSb, it means the next octet is a part of the integer. If the MSb is a 0, it indicates this octet is the last octet in the serialized sequence of octets. Following this method, a value between 0 and 127 can be serialized into a single octet, while incrementally larger integers consume additional octets. This method can be used for both unsigned and signed integers, where the signed integers are stored in twos-complement format. Serializing and deserializing data can be efficient. Below, I present functions that will perform those operations on a buffer given an unsigned or signed 64-bit integer. This code uses C++ constexpr functions, though these could easily be transformed into C macros if one prefers those. // Example of space-efficient encoding of variable length integers // Copyright (C) 2023 // Paul E. Jones <paulej@packetizer.com> #include <cstdint> #include <cstddef> #include <span> // Find the most significant bit position for unsigned integers constexpr std::size_t FindMSb(std::uint64_t v) { std::size_t p = 0; if (v >= std::uint64_t(1) << 32) p += 32, v >>= 32; if (v >= std::uint64_t(1) << 16) p += 16, v >>= 16; if (v >= std::uint64_t(1) << 8) p += 8, v >>= 8; if (v >= std::uint64_t(1) << 4) p += 4, v >>= 4; if (v >= std::uint64_t(1) << 2) p += 2, v >>= 2; if (v >= std::uint64_t(1) << 1) p += 1, v >>= 1; return p; } // Find the most significant bit position for signed integers constexpr std::size_t FindMSb(std::int64_t v) { return ((v >= 0) ? FindMSb(static_cast<std::uint64_t>(v)) : FindMSb(static_cast<std::uint64_t>(~v))); } // Return number of octets required to encode given std::uint64_t value constexpr std::size_t VarUintSize(const std::uint64_t value) { return FindMSb(value) / 7 + 1; } // Return number of octets required to encode given std::int64_t value constexpr std::size_t VarIntSize(const std::int64_t &value) { return (FindMSb(value) + 1) / 7 + 1; } // Serialize the unsigned integer into the given buffer, returning 0 if the // buffer is too short to hold the serialized integer std::size_t Serialize(std::span<std::uint8_t> buffer, std::uint64_t value) { // Determine space requirements for the variable-width integer const std::size_t octets_required = VarUintSize(value); // Ensure the buffer is of sufficient length if (buffer.size() < octets_required) return 0; // Write octets from right to left (reverse order) for (std::size_t i = octets_required; i > 0; i—) { // Get the group of 7 bits std::uint8_t octet = value & 0x7f; // Shift the data bits vector by 7 bits value >>= 7; // If this is not the last octet, set the MSb to 1 if (i != octets_required) octet |= 0x80; // Write the value into the buffer buffer[i - 1] = octet; } return octets_required; } // Deserialize the unsigned integer from the given buffer, returning number of // octets deserialized or zero if there was an error std::size_t Deserialize(const std::span<std::uint8_t> buffer, std::uint64_t &value) { std::uint8_t octet{0x80}; std::size_t total_octets{0}; // Initialize the integer value value = 0; // Read octets until we find the last one having a 0 MSb while (octet & 0x80) { // A 64-bits value should never require more than 10 octets if (++total_octets == 11) return 0; // Ensure we do not read beyond the buffer if (total_octets > buffer.size()) return 0; // Get the target octet octet = buffer[total_octets - 1]; // Add these bits to the returned value value = (value << 7) | (octet & 0x7f); } // If the total length is 10 octets, initial octet must be 0x81 if ((total_octets == 10) && (buffer[0] != 0x81)) return 0; return total_octets; } // Serialize the signed integer into the given buffer, returning 0 if the // buffer is too short to hold the serialized integer std::size_t Serialize(std::span<std::uint8_t> buffer, std::int64_t value) { // Determine space requirements for the variable-width integer std::size_t octets_required = VarIntSize(value); // Ensure there is sufficient space in the buffer if (octets_required > buffer.size()) return 0; // Write octets from right to left (reverse order) for (std::size_t i = octets_required; i > 0; i—) { // Get the group of 7 bits std::uint8_t octet = value & 0x7f; // Shift the data bits vector by 7 bits value >>= 7; // If this is not the last octet, set the MSb to 1 if (i != octets_required) octet |= 0x80; // Write the value into the buffer buffer[i - 1] = octet; } return octets_required; } // Deserialize the signed integer from the given buffer, returning number of // octets deserialized or zero if there was an error std::size_t Deserialize(const std::span<std::uint8_t> buffer, std::int64_t &value) { std::uint8_t octet{0x80}; std::size_t total_octets{0}; // Ensure we do not read beyond the buffer if (buffer.empty()) return 0; // Determine the sign of the number by inspecting the leading sign bit value = (buffer[0] & 0x40) ? -1 : 0; // Read octets until we find the last one having a 0 MSb while (octet & 0x80) { // A 64-bits value should never require more than 10 octets if (++total_octets == 11) return 0; // Ensure we do not read beyond the buffer if ((total_octets) > buffer.size()) return 0; // Get the target octet octet = buffer[total_octets - 1]; // Add these bits to the returned value value = (value << 7) | (octet & 0x7f); } // If the total length is 10 octets, ensure the initial octet is one // of the only two valid values if ((total_octets == 10) && (buffer[0] != 0x80) && (buffer[0] != 0xff)) { return 0; } return total_octets; }
|
![]() Contact Social Media
Gab |