21 Sep 2010

About Domain Specific Algorithms and Software Development Labor

The point of this blog post is to claim that by inventing new programming languages it is not possible to eliminate the labour of describing domain specific algorithms.

For example, an assertion that a file that does not exist, can not be written to, holds regardless of programming language. It is possible for the language designer or library author to make sure that the file is automatically created, if it does not exist, but this is already a labour of describing a domain specific algorithm.

Another example is matrix multiplication. A programming language with its semantics determines the notation, i.e. multiply(a,b) or aTIMESb or a.times(b) or whatever, but someone, be it the hardware designer, programming language designer or a library designer, has to describe the essence of the matrix multiplication at least somehow. The various kinds of implementation details and optimizations do not change the fact that someone has to do actual development labour to get the multiplication to work.

In practice it entails that whenever a new programming language is designed, a new "standard library" is created, the very same work of describing the domain specific algorithms has to be done all over again, unless the previous descriptions are reused.

From intuitive point of view, and yes, according to my personal subjective taste and impressions, software development productivity depends a lot on the notation and semantics of the programming language. As of 2010 I believe that the preferences for a programming language (read: preferences for notation and semantics) is subjective and depends on the chooser's background.

By noticing that most programming languages that are in use in 2010, share a subset of common basic data types, namely arrays/vectors, hashtables, whole numbers, rational numbers, boolean values, I have a hypothesis that a probably withstandable, but imperfect, solution to the reuse and the notation-switch problem is to share memory space (the common data structures) between different programming language implementations. A program would start in one language, then, without exiting, switch to another programming language and then switch to whatever other programming language. The programming language switch regions might, probably would, contain data structure mapping code.

A vague, raw, preliminary, syntax example, where a hashtable named "symbolspace" is reserved in all language implementations:

# We'll start in Ruby
s=(4+5).to_s

LANGSWITCH to PHP

$s_output='The answer is'.$symbolspace['s']
echo $s_output

LANGSWITCH to JavaScript

s3="From PHP we've got:"+symbolspace('s_output')
document.write(s3)


I believe that the .NET and Java virtual machines allow that sort of functionality, but it won't work out socially, because Java got acquired by Oracle, .NET runs practically only on Windows (no, in 2010 the Mono won't do) and in the end many projects, like Haskell, Python, Perl, etc., have their mainstream implementations on "bare C or C++", not to mention the extra work that it would take to port them and how a single virtual machine implementation would limit developer creative freedom. May be a solution is a set of automatically inserted, language specific, library calls that dump and load the values of the common data structures to and from something like the Memcached.

I'll update/modify this article, if I have changed my mind on this or made a working implementation of the Memcached biased solution. So, this blog post will probably be modified.

No comments: