Binaries in ABC (or UTF-8)

Awelon Bytecode (ABC) doesn’t have good direct support for binary data. Binaries are not uncommon: cipher texts, secure hash values, compressed visual or audio data, raw sensory data, and so on. Efficient storage and communication of binaries is a desirable feature.

This morning, I had a spark of an idea: a compression algorithm can be specialized to recognize a large sequence of base64 data and simply re-encode it as a run length of binary. This simple technique cleanly separates the concerns of simple representation at the language layer vs. efficient storage and transmission of binary data.

But base64 is hardly relevant in this case, isn’t it? We could use the simpler base16 and reap similar benefits.

Awelon Bytecode is encoded in UTF-8, and may use the full UTF-8 via embedded text. A useful property of UTF-8 is that it has thirteen unused bytes: 0xC0, 0xC1, 0xF5..0xFF. It is feasible to simply usurp one of these bytes to perform a compression pass specialized for UTF-8 that is expected to embed base16. Proposed format:

  • header byte: 0xF8
  • single length byte L, encoding 3..256 bytes (0xFE,0xFF not used)
  • thus encoding 6..512 base16 characters (always an even number)

Thus, we have UTF-8 with efficiently embedded base16 binaries. The overhead for large binaries is less than 1%. The break-even point with base64 is 6 bytes. The encoder requires 512 bytes lookahead, and the decoder requires no buffer. I can easily imagine applying this idea to HTML+JavaScript. Further compression remains quite viable.

For ABC in particular, I’m considering use of a specialized base16 alphabet `bdfghjkmnpqstxyz`. This is mostly to avoid spelling offensive words (no vowels) and avoid interference with numbers (0-9) or ABC data plumbing (mostly `vrwlc`).

This entry was posted in Distributed Programming, Language Design. Bookmark the permalink.

2 Responses to Binaries in ABC (or UTF-8)

  1. Pingback: ABC Linking with Provider Independent Security | Awelon Blue

  2. Pingback: Awelon Progress Report VIII | Awelon Blue

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s