5 Oct 2009

Programmer Friendly text Exchange (ProgFTE)

(This is edited version #7 of this post.)

There are 2 versions of the ProFTE specifications. As of January 2013 the most up to date version is the ProgFTE_v1.  The rest of the text in this blog post describes the ProgFTE_v0, which originates from 2009.




ProgFTE Specification Version 0 (ProgFTE_v0), which is superseded by ProgFTE_v1



For short: each key-value pair of a hash-table is encoded as:

keyAsText|||ValueAsText|||

The text versions of the key-value pairs are concatenated and one can even use JSON/YAML/XML/almost-whatever strings for the value part, except that one also has to make sure that neither the key, nor the value, contains the literal “|”. This can be overcome by replacing the “|” within the keys and values with some string (one calls it hereafter: pillarSubstString) that does not occur within the keys and values.

For decoding, one has to write a function that BISECTS a string at the first occurrence of a search-string. In this case, the search-string is the “|||”. For example, bisect(“simpler|||than|||XML|||”,”|||”) would output a PAIR that consists of string “simpler” and string “than|||XML|||”.

The reason, why "|" is replaced in stead of the "|||" is that if a key or value contains "|" or "||" as its suffix (one omits the analyzing of the prefix case for now), one ends up with <key or value without suffix>|||| or <key or value without suffix>|||||, which makes the finding of the "|||" problematic.

So, all in all, the format is:

NumberOfKeyValuePairs|||pillarSubstString|||key1AsText|||value1AsText|||key2AsText|||value2AsText|||etcOtherKeyValuePairs

The number of key-value pairs is prefixed to avoid the counting of the “|||” at deserialization.

The reason, why the separator string is "|||" in stead of the more space/traffic efficient "|" is that the "|||" is easier to read during debugging and the "|" and the "||" already have historic meanings in software development.

The main benefit of this format is that one can implement it in different languages with relatively little amount of work. That includes exotic languages, self made, domain specific languages, that do not have extensive XML or JSON or other "mainstream" format libraries available. An example application is a website, where the server side has been written in PHP or Java and the client side has been written in JavaScript (keyword: AJAX) or some JavaScript based Scheme dialect.

The secondary benefit comes from the comfort of using hashtables.

--------

Update on 22 December 2011

Actually, the format that is described in this post, is in use in the real world and has worked without problems, but unfortunately that's pure luck, because the format(ProgFTE_v0) is flawed and I have a new, improved, specification in the works. (Update on 03 January 2013: the new specification  is called ProgFTE_v1.)

If (Hash.new)["nice_key"]="Cariba|" and the pillarSubstString=="baba", then the ProgFTE is

1|||baba|||"nice_key"|||Caribababa|||

There is an issue, how to reverse-translate the "bababa" part of the "Caribababa". Should it be "Cari|ba" or "Cariba|".

The good news is that one can distinguish the old version of ProgFTE, the one described in this blog post, from the new one and simply improve the ProgFTE libraries of the real world application, without any need to convert saved data. The old version, the one in this blog post, always starts with a number, but the new version always starts with a letter "v", like "v<format_version>". I'll update this blog post after I have shipped the new version.