Awelon Bytecode (ABC) doesn’t have good direct support for binary data. Binaries are not uncommon: cipher texts, secure hash values, compressed visual or audio data, raw sensory data, and so on. Efficient storage and communication of binaries is a desirable feature.
This morning, I had a spark of an idea: a compression algorithm can be specialized to recognize a large sequence of base64 data and simply re-encode it as a run length of binary. This simple technique cleanly separates the concerns of simple representation at the language layer vs. efficient storage and transmission of binary data.
But base64 is hardly relevant in this case, isn’t it? We could use the simpler base16 and reap similar benefits.
Awelon Bytecode is encoded in UTF-8, and may use the full UTF-8 via embedded text. A useful property of UTF-8 is that it has thirteen unused bytes: 0xC0, 0xC1, 0xF5..0xFF. It is feasible to simply usurp one of these bytes to perform a compression pass specialized for UTF-8 that is expected to embed base16. Proposed format:
- header byte: 0xF8
- single length byte L, encoding 3..256 bytes (0xFE,0xFF not used)
- thus encoding 6..512 base16 characters (always an even number)
For ABC in particular, I’m considering use of a specialized base16 alphabet `
bdfghjkmnpqstxyz`. This is mostly to avoid spelling offensive words (no vowels) and avoid interference with numbers (0-9) or ABC data plumbing (mostly `vrwlc`).