Archive for the ‘Computer Science’ Category
February 9th, 2008
As mentioned, some spare time at the moment is being spent investigating the mistakes other people make using the W3C DOM API. Having now looked at literally hundreds of separate bugs, I’ve begun to construct a coarse categorization for each. I don’t claim there’s anything novel here – in fact, most of us have these internal categories well defined. All I’m doing here is spitting out a semi-minimal collection of them, and giving each a title.
Some types of errors can be argued to fall into multiple groups – this is no bad thing. The idea, eventually, is to speak of how effective certain techniques are at preventing certain types of error.
Errors of ignorance
When you don’t know what you’re doing
A remarkable amount of bugs just boil down to raw incompetence. Harsh words, but in reality, we’ve all been there. Using languages and frameworks before we’ve actually performed some experiments, being thrown into fixing a problem in a system we’ve never seen before, being forced to render comment on a project before we’ve even seen a spec – all old stories for even the most talented developers. Errors of ignorance capture the problems you introduce in this state – using the wrong constructs, not knowing what frameworks actually do, and generalised language idiom failures.
Errors of expectation
When the outside world intrudes.
Most of the time, we’re dealing in some form with input in our programs. If we expect input in a certain format, and receive it in another, this is our fault – we need to be more robust. We might also think we’ll get input in a given format, but it’s subtly different.
We can also have expectations on what other people will provide for us, or the side effects they will have on the world. Any problems induced by our expectations differing from reality, I’m currently calling “Errors of expectation”.
Errors of omission
When you forget to handle something
Often, we know what we’re doing, we intend to do it, and then just forget – perhaps to handle an edge case, perhaps to release a resource. Our natural forgetfulness leads us into “Errors of omission”.
NB: This is currently my least favorite category, which I hope to kill off, but some bugs just fall into this group very naturally.
Errors of intention
When you just got it wrong.
These are what most people would call bugs – when the we, the programmer or designer, just get it wrong. Perhaps we thought the algorithm worked when it didn’t, perhaps our mathematics is just wrong, perhaps our expectations are mutually inconsistent. The code is doing exactly what we wanted – it is, in it’s way, flawless work. It just doesn’t do the right thing because, even having missed all the above classes of errors, we still didn’t know what the right thing was – we’ve committed an “Error of intention”.
This is my first attempt at a concise set of names to concepts I’ve been dealing with for years – it will be interesting to see how they evolve as I try and put it to use.
January 30th, 2008
At the moment, I’m spending spare moments collecting and collating idiomatic failures in Javascript (specifically, failures of DOM manipulation in Javascript). This is, alas, proving harder than I expected – bug database trawling is about as fun as it sounds and, more importantly, the bug databases don’t keep track of developmental bugs.
By idiomatic failures, I’m referring to common mistakes and patterns that cause errors. Suitable examples from other languages would be
- In C, failing to account for the null string terminator
- Buffer overruns, in C and C++
- Off by one errors in pretty much every language
- With C++, delete’ing a new[]’d resource
- In PHP and many other scripting languages, failing to account for the weakly typed comparisons (so you forget that 0, null and the empty string are treated as semantically the same under comparison, but are distinct in your program logic).
These are the building blocks of certain types of program failure –often, when debugging, one of these type of error is the cause.
Of course, I’ve run into these many times myself, but that doesn’t really say much about the commonality of the problem; I may be a particularly poor programmer. So, I’ve been spelunking around bug databases for long running open source projects (i.e. the Javascript libraries behind Ruby on Rails); this is proving somewhat fruitful, but it’s a statistically invalid sample. As mentioned, the bug databases only tend to get populated by bugs in release versions, and then only the bugs that people have bothered to report. If a developer finds a bug whilst he’s developing, these are often not added – even if it took days to track down. Moreover, if they find a bug in a release version themselves, they often quietly fix it. Bugs that only occur rarely and are not repeatable are added almost never.
In a way, this is quite good – the failures I’m seeing are the ones that escaped testing, escaped the eyes of the developers, yet blew up in a users face. These are the hard to see bugs – the most annoying of the species. On the other side, this is quite bad – I miss a huge subset of bugs that fall into my class of interest.
Of course, these types of problems could be considered language defects – stupid mis-features that bite people again and again specifically because they are so artificial. After I have a relatively complete map of these failures, I’m going to move onto logical failures and failures of design that are common within Javascript/DOM – i.e. conceptual black holes that people are repeatedly sucked into, the actual failures of programmers.
August 8th, 2006
OK, so we’ve all been busily tapping away at our keyboards trying to implement the rough design we came up with last time. Let’s have a look at one way of doing it (this is just one way; there are obviously many different methods).
public class CallByFuture<T>
{
public delegate T CallResult();
public static implicit operator T(CallByFuture<T> instance)
{
return instance.Result();
}
public static T operator ~(CallByFuture<T> instance)
{
return instance.Result();
}
private System.Threading.Thread workerThread;
private T result;
public CallByFuture(CallResult func)
{
workerThread = new System.Threading.Thread(delegate(object state) {
((CallByFuture<T>)state).result = func();
});
workerThread.Start(this);
}
public T Result()
{
workerThread.Join();
return result;
}
}
The implementation has tried to stay as close to the design goals as possible; we have a generic class, parameterised by the return type of a delegate which the user will create (either implicitly through an existing function, or explicitly with an anonymous method). The construction spins up a thread which evaluates this delegate, and gives the result back to the enclosing CallByFuture class to be made public via a Result method which will block until the answer is available. For syntactic sugar, we allow the unary bitwise NOT operator as a shortcut to the Result method (so rather than a call to the result type T expressed by the CallByFuture being t.Result().Foo, we have simply (~t).Blah.
The use of operator~ as an “easy conversion” might seem rather strange – after all, the language supports explicit type conversion operators natively. It does not, however, support having both an implicit and explicit conversion operators for the same type; the people who actually read the code will have noticed the implicit conversion snuck in at the top of the class.
The implicit conversion is very useful – it allows you to use a CallByFuture in most cases you want a T without any additional syntax. However. I would personally probably flag its inclusion as “controversial ” in a code review mainly due to the excessive “magic” behind the conversion of complex types – an innocuous usage of a variable might entail a complex evaluation behind the scene (rather than the virtual no-op of just reference passing you were expecting) – it might even throw an exception. In our case, the worst that can happen is that it must wait until the result is evaluated (any exceptions will be thrown on the CallByFuture thread, and we’ll discuss them later). Total worst case, the evaluation locks because the CallByFuture delegate never completes.
Our implementation rules stated we wanted something we could “drop in” to existing programs, and the implicit conversion certainly aids in that so next time, we’ll look at mitigating the problems it raises as well as some usage examples to prove this really does work.
[Edit: Woops - forgot to actually include operator~]
August 3rd, 2006
Last time, we outlined an evaluation strategy known as “Call-by-future” and determined that it might (just might!) be useful in adding an easy to use concurrency feature to mainstream languages. Now we’re resolved to implement it, what features should we be aiming for in our implementation?
- Easy to understand – the solution we come up with must be understandable by as many people as possible. This goes without saying for all code – we don’t want to create an un-maintainable monster that no one touches in their fear.
- Type safety – We don’t want to throw away existing safeguards just to get a bit of convenience, as we’ll likely end up introducing bugs when using it.
- Easy to begin using it in existing codebases – The point of this exercise was to come up with something people can use here and now, and just jump in with. If it requires a whole program redesign or an entirely new way of thinking, we’ve probably not done all that well.
So, how do we envisage this working? To comply with rule number 3 above, we want something that behaves as closely to normal parameter passing as it can – only we control the execution. Ideally, we’d be able to declare Foo using something like void Foo(callbyfuture int a) or similar, to indicate that the parameter would be evaluate using our call by future-by-future evaluation. Alas, we can’t add keywords to the complier – but – we can add types. How about void Foo(CallByFuture a)? We’re using the generics system to construct a CallByFuture type, indicating that the parameter evaluates to an int; this fulfils our 2nd requirement (type safety).
The other thing we need to do is find some method of “taking control” of parameter evaluation. That is, if I use the expression Foo(Bar()) in a C# program, the compiler will invoke it’s normal evaluation strategy and emit code that will first evaluate bar, then pass the result either by value or reference to Foo. We need to be able to step in and say “Whoa, don’t evaluate Bar() just yet – hold off!” so that we can evaluate it in the way we want without confusing the compiler.
Whilst we can’t control expression evaluation in general, we can control the evaluation of a group of expressions via a .net delegate (effectively a function pointer) – any list of expressions wrapped in a delegate, we can call “at will”. Now, manually creating a new delegate for every single parameter we wish to evaluate using call-by-future would be tedious in the extreme, and hence violate both rules 1 and 3 above. Luckily, in C# 2.0, a new feature was added – anonymous delegates. Using these, creating an “expression” whose evaluation we can control becomes as simple as delegate { return (expr); }. Not ideal, but fairly terse (and flexible – as we’ll soon see).
Having discussed how we see this all working, let’s head off and have the boring keyboard bashing session. Return soon for part 3, when we’ll have finished the implementation and be ready to take it for a road test!
July 31st, 2006
Writing code that makes good use of parallel execution techniques in most mainstream languages is not an easy task. Not only must you specifically write code that is safe for such execution, but then you have to manually deal with the laborious task of making it parallel – spinning up threads, waiting for them to finish, synchronising your results and generally making sure that thread A doesn’t tread on thread Bs toes, or that A and B don’t end up waiting for each other.
The upshot of this is that most people just don’t bother, an attitude that whilst understandable will shortly become untenable. The year on year advance in the speed of linear processing is dropping off, and most CPU architectures are leaning towards so called “multi-core” designs (i.e. cramming several processors on one chip). If you want your code to actually run any faster on these processors it darn well be better threaded, or you’re throwing away a large speed increase.
The change in attitudes will take time – oddly for a high tech field, most programmers have great inertia and many are reluctant to absorb new techniques. Inevitably, we’ll need gradual changes to help the transition so how can we make the tedious task of writing multithreaded code easier?
We could change language – systems like Erlang are naturally concurrent, and after a mindset change, your code suddenly takes advantage of parallel execution where possible without the labour involved in manually demanding it. We can also try and retrofit this sort of ability into existing languages through pre-processors and extra libraries – OpenMP is an example that’s making the rounds at the moment.
Alas, these techniques do require a mindset or architecture change (which we already know will take years). Designing for parallelism is not easy, and a good linear program rarely becomes a good parallel program just by wedging in a new library and crossing your fingers. This doesn’t mean we’re defeated though – surely we can think up a few tricks to get some easy freebies in our current systems. So in the spirit of finding one of these tricks, let’s diverge for a moment and consider what happens when you pass a parameter to a function.
“Mainstream languages” (by which I mean C(++), C#, VB.net, Java) provide direct support two parameter evaluation strategies – Call by reference, and Call by value. You know them both – if I call a function int a = 5; Foo(a);, by default the value of 5 is passed to the function as the first parameter – int is a so called “value” type. If I say String a = “Hello World”; Foo(a); then a reference to the string is passed in, rather than the value of the string itself. Whether reference or value types are employed depends on the types being used and hints from programmer but in both cases, the parameters are evaluated before the function is called. So, if I say Foo(ComputeSomething()), the result of the expression ComputeSomething() will first be calculated, then Foo invoked.
If we head off to less popular (but just as valid, and indeed arguably “better”) languages like Haskell, we gain Call by need. In this case, parameters are only evaluated when they are needed. If I said “Foo(ComputeSomething())”, then never used the parameter, “ComputeSomething()” would never be called. This is not only an optimisation (no wasted evaluation), but a significant property. I could pass in expressions that can never successfully evaluate and as long as they are never used, no problem will arise.
How does this fit into a little method for making parallel execution more accessible? Well, let’s introduce another strategy – Call by future. What if, rather than parameters being evaluated before they were called, they were evaluated whilst the call was going on? Of course, when the result of the evaluation was needed the function would have to sit and wait for it to finish, but if the function was working on something else before it needed the parameter, then the evaluation can continue in the background whilst the function is working. Graphically, we can view the execution sequences as follows.

It’s not a panacea – one still has to design for some of the problems concurrency introduces. However, it is easy to add to existing systems, and can be used safely without much effort.
So, how can we introduce this to our mainstream languages? After all, there doesn’t seem to be anything too special going on – the evaluation of the parameter is just run in a separate thread, and then the result is retrieved and used when needed. If we can claim control of parameter evaluation and then find a syntactically friendly way to wrap this concept, we’ll have a useful tool in the arsenal of weapons helping us fight the “cheap route to currency” battle. Tune in next time to see how we’ll add a version of Call-By-Future support to .Net 2.0.