Saturday, June 13, 2009

Evaluation and testing

If your contribution to knowledge is a better way of doing something, coming up with the idea (and maybe implementing it somehow) is only half the battle. The real work will come with evaluation, and this will need a methodology all on its own.
Most new algorithms are tested against data sets found in the literature and things are interesting if in some sense your idea performs better than its predecessors in these tests. But there are often problems with this approach – and I have occasionally seen a conspiracy of silence where anyone can see that the comparison is not entirely fair. After all, if your new algorithm is suited to a particular class of problem not previously addressed, then why test in some previous or different scenario? There are difficulties in using data from a different problem scenario, or comparing with an algorithm that was actually tackling a different issue. This sort of test is rarely more than the researcher’s equivalent of “Hello world”. The conspiracy of silence arises because this test is standard in the literature, and is used for convenience even though everyone knows that the data is unrealistically simplistic, or has been cleaned to remove any real-world difficulties.
On the other hand, artificial data, designed to exhibit the sort of issue that your idea helps to solve, has a value. It is dealing with the sort of hypothesis that starts “Problems reported with this aircraft control system may be associated with …” and your investigation is as much about exploring some peculiar feature that might occasionally occur in the data, and exploring responses to such a feature from the existing algorithms and yours, to see if the hypothesised feature was the source of the actual difficulty.
If your contribution arises from some solving a real-world problem, it will probably need a lot more work to collect real-world data and draw real-world conclusions. Space and time considerations limit most PhD theses, and all conference papers, to artificial, toy examples. Maybe though, some preliminary data can be analysed within the PhD (and lead to a job with the company with the problem so you can work on the real data) and the evaluation part can be beefed up with actual comments about the value of the contribution from those more familiar with the real-world problem.
In addition, the implementation of the idea in your PhD will probably have the nature of a prototype, which will need to be re-implemented within a real-world control system. Again, space and time considerations make it unlikely that real production software will be used in your PhD or any publication resulting from it. But actual adoption of the approach within the industrial process will count as a complete proof of the value of your contribution, and any progress in this direction should go in your thesis.

No comments:

Post a Comment