7 Apr 2011

Ahead-of-time Computing

If a problem is not parallelizable, but the computation is extensive enough for the hardware to fail to complete it on time in a single thread, then probably the only way to implement a soft-real-time system is to calculate the answer before it is needed.

The time spent on computation, data transfer, etc., introduces a latency between an input and the output.

In general, whenever there is a latency between an input data and the output data and the parallelization of the computation does not decrease the latency enough, the next thing to try, probably in conjunction with the parallelization, is the ahead-of-time computing.

The idea is that one calculates a whole bunch of answers in parallel and later, when the actual input data is known, picks the answer, the output, that has been derived from the input that is closest to the real input. The rest of the answers are discharged, thrown away.

As of April 2011 I haven't tried it yet, but may be the ahead-of-time computing could be used for speeding applications up even on single-CPU systems. On single-CPU systems one would perform the ahea-of-time computation in the background, in a low priority thread. Tasks that can be run in the background, for example some of the game artificial intelligence and just-in-time compilation, seems to be good candidates for the ahead-of-time computing, because they are not that time critical.

One might even offload some of the background computation to computers on a local area network. If most of the computation results get thrown away anyway and there's not even a guarantee that the bunch of the ahead-of-time calculated answers contains a usable answer, then it does not matter, if some of the answers in the bunch are not found, completed, calculated, on time. That entails that even old-ish and slow-ish computers can contribute to the speed of an application that runs on a modern computer, provided that their computational power to electric power ratio is acceptable.

11 Feb 2011

How to Bootstrap a Threading API

This article is not intended to be a full-blown tutorial. It is meant to be a relatively concise set of notes, quick reference, with some rehearsal to get the reader's mind to the "track".

Just like any Boolean logic operation (read: digital circuit) can be assembled from Boolean inversion and conjunction, all of the threading API, except the pausing and awakening of sleeping threads, can be assembled if a thread-blocking sleep function exists and a mechanism for forking threads exists.

Once the sleep and fork exist, the rest is all subject to subjective taste, except the pausing and awakening of sleeping threads. The remaining part of this article is just one, somewhat inefficient and largely improvable, version of "the rest".

To create a threading API that does not support the pausing and awakening of sleeping threads I add 2 things:

1) critical sections
2) thread collision detection

The critical sections are used for maintaining consistency of the values of a set of variables.

For the sake of rehearsal, or getting "in the mood": if the set of variables consists of an organism type and its coating, let's say, organism==giraffe, coating==fur, then if the set gets updated to a parrot, i.e. first organism:=parrot, and after that coating:=feathers, then before the value of the coating is updated, there's a situation, where organism==parrot, but the coating==fur. It's better that no thread reads the inconsistent values of the pair, (organism,coating), before the whole pair has been updated and that's where the critical sections become useful.

Thread collision detection is for handling edge-situations, where no thread has yet locked the critical section, but more than one thread wants to enter it.

To allow the capabilities of multi-CPU hardware to be leveraged as much as possible, one should avoid putting the threads to a wait, idle, state. To minimize the waiting time of threads, one wants to maximize the number of threads that can concurrently access the shared set of variables, i.e. one wants to allow as many threads to concurrently enter the critical sections as possible. One way to do that is to use 2 types of critical sections:

A) critical section for only reading
B) critical section for reading and writing


The idea is that it's OK for more than one thread to concurrently read the same set of shared variables, as long as there's no thread writing to the set of shared variables at that very same moment. On the other hand, no thread should be reading the set of shared variables while some other thread is writing to them. Neither should there be more than one thread concurrently writing to the set of shared variables.

At the implementation level it means that each set of the shared variables has 2 locks: one set and released by the read-only critical sections and the other set and released by the read-write critical sections. Both of the critical section implementations read both of the locks.

One possible way to implement the thread collision detection is like this:

b_done=false
while(!b_done)
        i_lock++
        if(i_lock==1){
                enter critical section
                i_lock--
                b_done=true
        } else {
                i_lock--
                sleep some semi-random time
        } // else
} // while

The sleeping time of different blocked threads has to differ, preferably change at every iteration, because otherwise they might collide with each other at every iteration, resulting an infinite loop. One way to assure the sleeping time difference is to use some semi-random numbers. This entails that a programming language can have guite functional threading support without having a statistically proper rand function.

11 Oct 2010

Programming Language Support to Succinct Code

In general, a programming language determines a general idea (semantics), how input data is transformed to output data, and a syntax, the rules how the the contemplation that adheres to the general idea, is written down. APIs don't usually determine syntax, but they do determine semantics.

Leaving a long introduction to aside, the point of this blog post is to point out that in addition to the number of compulsory semicolons, commas, braces, keywords, etc. the succinctness depends heavily on the statistical profile of the code. The number of characters that a construct, let's say, loop clause, function declaration keyword, call to some API subroutine or operator, etc. contributes to an application code depends on the number of times the construct is used in the application.

For example, a code that is written in Maple or Mathematica probably has considerably more calls to various symbolic calculation routines and statistics routines than software that controls industrial robots or runs a customer self service web site. This means that an inefficient, clumsy, API of more "complex" math routines does not have much of an impact on the succinctness of the code of the web application, but it has heavy impact on math intensive data analysis related code.

The conclusion is that one can not assess the succinctness support of a programming language or API without actually measuring the usage frequencies of its different parts in an "average" real life application.

Secondary conclusion is that API-s have to be updated, changed, after they have been in use for a little while and more is known about their "real life usage".

Third conclusion is that if some party comes up with some new technology, programming language, API, etc., then the aiding capability of that technology probably depends on the application domain. It's a bit like comparing antibiotics with a pain killer: taking a pain killer to treat infection might not give as good results as the antibiotics would give and vice versa.

25 Sept 2010

How to Differentiate Oneself in Business

Actually, as of September 2010 I don't know a firm answer to that question, but I think that the contemplation that leads to the hypothesis is probably inspiring.

Suppose we compare passenger cars that fit 2 to 5 passengers. Almost all of them have a staring wheel, 4 tyres, they're more or less of the same size and get the passengers from point A to point B. Usually they all get the job done, technically speaking, they're practically all the same, just switch the car on and drive. In an average non-highway commuting route the cars have roughly the same speed. Probably the skills of their development teams are also equal: an engineer from one of the companies can probably move to another company and be able to handle the job after about a year of learning.

However, obviously, there is a very big difference between Fiat, Mercedes, BMV, Toyota and Ferrari, despite the fact that they're all just cars that take roughly the same amount of fuel and get their passengers from point A to point B.

My hypothesis is that one way to differentiate something is to differentiate it in style, bias, not functionality or technically fundamental properties(i.e. a flying car or an amphibious car). There are different ways of achieving the same goal and there's always the matter of taste. It doesn't make sense to argue that apple juice is somehow better, more innovative, more effective, more functional, etc., than pear juice, but the two definitely differ and people have their preferences.

If the hypothesis held, the conclusion for me would be that even though I'm just an "average" software engineer/developer/whatever-nice-name and even though it probably is possible to find some other person that has at least as good or much "better" (whatever that means) skill set than I have, I can differentiate myself by the style of my software. For example, I can create software that solves a common problem that is already solved by my "competitors", but I can target people with certain habits or preferences, just like BMV and Toyota do it with their products.

Actually, it is also extremely simple to come up with totally UNIQUE products, ideas, by taking something that already exists and giving it A BIAS.  On can just pick anything one likes and quite randomly pick some field, let's say, gardening, military, medical aid, etc., and combine the two. Probability that a thing like this already exists on the market, is pretty low or if it's not low enough, combine something existent with 2 biases in stead of 1, etc.

Examples:
scissors -> gardening scissors,
tongs -> nut crackers,
cup -> coffee cup.

What regards to the business success of unique things, then that's probably subject to marketing and trial and error. After all, even the usefulness of cellphones was considered questionable at their early days. At early days of computing very few could imagine, how a house-wife or a teenager uses a computer, a personal one even, or that a granny needs a strong, military grade, cryptography for paying her utility bills from her living room, over a computer network.

For a halve joke I point out that what I've been describing in my current post is in line with the saying that every great scientist stands on the shoulders of giants. (I'm not a scientist, but development work is what software developers do.)


++++++++++++++++++++++++++

Added in May 2013: "Everything has been Done!

21 Sept 2010

About Domain Specific Algorithms and Software Development Labor

The point of this blog post is to claim that by inventing new programming languages it is not possible to eliminate the labour of describing domain specific algorithms.

For example, an assertion that a file that does not exist, can not be written to, holds regardless of programming language. It is possible for the language designer or library author to make sure that the file is automatically created, if it does not exist, but this is already a labour of describing a domain specific algorithm.

Another example is matrix multiplication. A programming language with its semantics determines the notation, i.e. multiply(a,b) or aTIMESb or a.times(b) or whatever, but someone, be it the hardware designer, programming language designer or a library designer, has to describe the essence of the matrix multiplication at least somehow. The various kinds of implementation details and optimizations do not change the fact that someone has to do actual development labour to get the multiplication to work.

In practice it entails that whenever a new programming language is designed, a new "standard library" is created, the very same work of describing the domain specific algorithms has to be done all over again, unless the previous descriptions are reused.

From intuitive point of view, and yes, according to my personal subjective taste and impressions, software development productivity depends a lot on the notation and semantics of the programming language. As of 2010 I believe that the preferences for a programming language (read: preferences for notation and semantics) is subjective and depends on the chooser's background.

By noticing that most programming languages that are in use in 2010, share a subset of common basic data types, namely arrays/vectors, hashtables, whole numbers, rational numbers, boolean values, I have a hypothesis that a probably withstandable, but imperfect, solution to the reuse and the notation-switch problem is to share memory space (the common data structures) between different programming language implementations. A program would start in one language, then, without exiting, switch to another programming language and then switch to whatever other programming language. The programming language switch regions might, probably would, contain data structure mapping code.

A vague, raw, preliminary, syntax example, where a hashtable named "symbolspace" is reserved in all language implementations:

# We'll start in Ruby
s=(4+5).to_s

LANGSWITCH to PHP

$s_output='The answer is'.$symbolspace['s']
echo $s_output

LANGSWITCH to JavaScript

s3="From PHP we've got:"+symbolspace('s_output')
document.write(s3)


I believe that the .NET and Java virtual machines allow that sort of functionality, but it won't work out socially, because Java got acquired by Oracle, .NET runs practically only on Windows (no, in 2010 the Mono won't do) and in the end many projects, like Haskell, Python, Perl, etc., have their mainstream implementations on "bare C or C++", not to mention the extra work that it would take to port them and how a single virtual machine implementation would limit developer creative freedom. May be a solution is a set of automatically inserted, language specific, library calls that dump and load the values of the common data structures to and from something like the Memcached.

I'll update/modify this article, if I have changed my mind on this or made a working implementation of the Memcached biased solution. So, this blog post will probably be modified.