If a problem is not parallelizable, but the computation is extensive enough for the hardware to fail to complete it on time in a single thread, then probably the only way to implement a soft-real-time system is to calculate the answer before it is needed.
The time spent on computation, data transfer, etc., introduces a latency between an input and the output.
In general, whenever there is a latency between an input data and the output data and the parallelization of the computation does not decrease the latency enough, the next thing to try, probably in conjunction with the parallelization, is the ahead-of-time computing.
The idea is that one calculates a whole bunch of answers in parallel and later, when the actual input data is known, picks the answer, the output, that has been derived from the input that is closest to the real input. The rest of the answers are discharged, thrown away.
As of April 2011 I haven't tried it yet, but may be the ahead-of-time computing could be used for speeding applications up even on single-CPU systems. On single-CPU systems one would perform the ahea-of-time computation in the background, in a low priority thread. Tasks that can be run in the background, for example some of the game artificial intelligence and just-in-time compilation, seems to be good candidates for the ahead-of-time computing, because they are not that time critical.
One might even offload some of the background computation to computers on a local area network. If most of the computation results get thrown away anyway and there's not even a guarantee that the bunch of the ahead-of-time calculated answers contains a usable answer, then it does not matter, if some of the answers in the bunch are not found, completed, calculated, on time. That entails that even old-ish and slow-ish computers can contribute to the speed of an application that runs on a modern computer, provided that their computational power to electric power ratio is acceptable.