Reboot Research: June 2009

Saturday, June 27, 2009

Supervision

Whose research is it? Yours. The basic idea will be something that the supervisor is an expert in (or you need another supervisor), but since you are making a new contribution to knowledge, at the end of the process you will be the world expert in your subject.
Your supervisor plays hugely important roles throughout the process though. At the start, they will help you with the relevant literature and established approaches. As the research progresses, they will help with methodology, with planning the research, and helping you phrase your research questions. Once you have parts of your thesis in draft, they will provide an invaluable critique of the flow of argument, and the construction of your thesis as a piece of rationally-argued writing. Your supervisor will also play a crucial role in selecting your external examiners, and being your supporter and eyes and ears during the viva.
Above all, throughout the process, they are following your journey, engaging in the discussions, playing the part of reader of your thesis and papers, reacting in the ways that your audience and your examiners might to the parts of your work that are new and surprising, so that you can fine-tune your arguments and make sure there are no loose ends.
The relationship between student and supervisor can sometimes be stormy - it is always a two way process, and a second supervisor can sometimes play a useful role in getting things back on track. It can and should be inspiring.

Saturday, June 20, 2009

All models are wrong

.. but some models are useful (George Box et al, 2009, p.61). What makes a model useful? Some theories of science have made grand descriptions in terms of prediction, explanation etc, but it really comes down to a consensus. Today (June 20th) it is reported that the British Government has decided that the spelling rule “I before e except after c” should no longer be taught in schools because the large number of exceptions made it useless. Such a rule is of course one of observation rather than a law of nature, but on close inspection it is easy to find hidden qualifications to any law you care to mention.
Until very recently, many in the scientific community used to imagine that they were discovering the truth about how the physical world works. Whewell (1833, p.256) quotes Lagrange’s opinion “that Newton was fortunate in having the system of the world for his problem, since its theory could be discovered once only”. Now a lake can be discovered only once, but systems are merely constructed, and many refinements and re-interpretations will be possible. Twentieth century physics revealed unimaginable strangeness, needing many alternative and conflicting models to apply to quantum mechanics, diffraction, cosmology, etc., and there was some useful criticism of old notions such as “final causes” (basically, boundary conditions at infinity).
Many researchers in the late nineteenth and early twentieth century searched only for natural laws expressible in terms of differential equations. Since this search followed so closely after the development of the calculus it appears with hindsight that these men with a new hammer suddenly saw nails everywhere.
The same hindsight opens our eyes to the serious untruths in their “natural laws”: on close inspection a natural law does not actually apply everywhere, but only (um..) where it applies (e.g. in the absence of discontinuities, in a neighbourhood of the origin). To be fair, talk of truth or laws was mostly a habit of speech: the models described in these laws are useful in telling us what to expect in the sort of situation for which the model was designed. In other situations or on close inspection we might need a different or more refined model.
As with final causes, or the ether, models can be useful even when they conflict with other models (seem counter-intuitive) or don’t fit with current ideas of causality. For example, classical field theory remains useful, even though we know that action at a distance is impossible, and that there is a better model based on radiation. Non-existent lakes will eventually be removed from atlases, but models will continue as long as somebody finds them useful.

References:
Box, G. E. P; Luceo, A.; Paniagua-quinones, M. d. C. (2009): Statistical Control by Monitoring and Feedback Adjustment, 2nd ed. (Wiley) ISBN 0470148322
Whewell, W. (1833) Astronomy and General Physics: Considered with Reference to Natural Theology

Saturday, June 13, 2009

Evaluation and testing

If your contribution to knowledge is a better way of doing something, coming up with the idea (and maybe implementing it somehow) is only half the battle. The real work will come with evaluation, and this will need a methodology all on its own.
Most new algorithms are tested against data sets found in the literature and things are interesting if in some sense your idea performs better than its predecessors in these tests. But there are often problems with this approach – and I have occasionally seen a conspiracy of silence where anyone can see that the comparison is not entirely fair. After all, if your new algorithm is suited to a particular class of problem not previously addressed, then why test in some previous or different scenario? There are difficulties in using data from a different problem scenario, or comparing with an algorithm that was actually tackling a different issue. This sort of test is rarely more than the researcher’s equivalent of “Hello world”. The conspiracy of silence arises because this test is standard in the literature, and is used for convenience even though everyone knows that the data is unrealistically simplistic, or has been cleaned to remove any real-world difficulties.
On the other hand, artificial data, designed to exhibit the sort of issue that your idea helps to solve, has a value. It is dealing with the sort of hypothesis that starts “Problems reported with this aircraft control system may be associated with …” and your investigation is as much about exploring some peculiar feature that might occasionally occur in the data, and exploring responses to such a feature from the existing algorithms and yours, to see if the hypothesised feature was the source of the actual difficulty.
If your contribution arises from some solving a real-world problem, it will probably need a lot more work to collect real-world data and draw real-world conclusions. Space and time considerations limit most PhD theses, and all conference papers, to artificial, toy examples. Maybe though, some preliminary data can be analysed within the PhD (and lead to a job with the company with the problem so you can work on the real data) and the evaluation part can be beefed up with actual comments about the value of the contribution from those more familiar with the real-world problem.
In addition, the implementation of the idea in your PhD will probably have the nature of a prototype, which will need to be re-implemented within a real-world control system. Again, space and time considerations make it unlikely that real production software will be used in your PhD or any publication resulting from it. But actual adoption of the approach within the industrial process will count as a complete proof of the value of your contribution, and any progress in this direction should go in your thesis.

Saturday, June 6, 2009

Ethics and bad research

Like “health and safety”, the phrase “research ethics” tends to elicit weary groans from many researchers. A full discussion is obviously out of place in this blog, but it seems obvious that research should not do actual harm without a very convincing argument. What I would like to focus on is whether bad research is ever ethical.
By bad research I mean research that is poorly thought out, where data cannot reliably support the sort of investigation for which they were collected. People have used up time, and costs incurred, for no benefit. In my view such research is always unethical, since its value (roughly zero) does not justify the trouble it has taken. It may cause actual harm, possibly even to the whole process of research, if enough people find it ridiculous. Future funding, or the cooperation of potential subjects, may be affected if research does not seem to be useful.
You therefore need to explain why your research really is useful, and why your subjects have to answer a long list of strange-looking questions. This explanation is for when you approach potential subjects, supervisors and sources of funding. You need to be open about what your research is about and what its expected benefits are. You must not use any deception in your approach to any of these people. Not can you say (yet) what the conclusions will be. Sometimes (rarely?) it will not be possible without compromising the research outcomes to tell your subjects what the hypothesis is, but you must be able to explain the expected area of benefit of the research and why they have been approached. Also, you should never collect data without discussing these aspects first.
An anecdote may help explain this point. Suppose you are at a management training course and you are given a set of objectives to prioritise. It is late in the day, and they all look important so you just take the first six and make up some spurious reasons for your choice. You could well be irritated if these priorities are fed back to senior management in your company – in two ways. Because your careless reasoning may be subjected to more scrutiny tan you would like, and this reflects badly on you. But more importantly, you fear that these may be the wrong priorities, and if you had known they would be used, you would have taken more care over them. Because of the careless way the data has been collected you do not know if your performance will be unfairly judged, or if the organisation will now change its behaviour as a result of bad data.
The selection of sources of data, whether from human subjects or more generally, requires the greatest care, and has been discussed in an earlier entry in this blog, as the chosen pattern will have a crucial bearing on the scope of validity of your conclusions. What data you collect, and how, will limit its interpretation, and this too has been discussed in the entry on research methodology. You probably won’t need to get ethical clearance unless your research involves living subjects, but the application you make before you start will provide a concise overview of the plan for your research and how the conclusions will be drawn. Even if you don’t need ethical clearance, you should protect your research by thinking these things out.

Reboot Research