Andart: Visualising the scientific state

October 11, 2010

Visualising the scientific state

Information is Beautiful has one of the most promising visualisations about health I know of. Basically, it shows various nutritional supplements and how much scientific evidence there is for them to actually work asa cloud of bubbles. By clicking a filter one can see supplements against a certain condition and clicking on a bubble one can find abstracts about the supplement. It is pretty self-explanatory and gives good intuitive feel.

There are some problems of course. Size of the circles denotes popularity on the internet, which is distracting if you are more interested in what you should be considering to take than the extent of snake oil use. I would prefer to see something like number of studies or effect sizes instead. Maybe colour would be better used to denote popularity. Ideally the measure of effectiveness would be some automatically generated meta-study data, like an estimated effect size, but here there are likely different methods. I would probably have used some simple measure as default and then added an option for showing other data.

Generally speaking, what number of studies are needed to rationally decide to take a supplement (or eat/not eat a certain kind of food, use a technology etc?) Imagine that the real effect size is X. A study of size N will produce an estimated effect size of Y=X+e, where e is noise of amplitude ~k/sqrt(N) (where k is depending on how noisy this kind of study is). Let's assume no bias and that everything is Gaussian (in reality the file drawer problem will introduce bias). As more studies of size N_i and effect size Y_i arrive, we can average them together to estimate X (in practice meta-analysis use more sophisticated statistical methods). The average of all data points would be distributed as N(Y_mean, k^2/N_total)where Y_mean is the weighted mean and N_total the total number of data points. If Y_meansqrt(N_total)/k is large then we ought to do something.

So, where does this leave us? First, it would obviously be useful to have a tool that collated all relevant studies and calculated things like the above number. In practice this is hard since many studies use different effect size measures, and to properly do meta-analysis you need to check other aspects of study quality.

But as a rule of thumb, if the claimed effect sizes are large when compiled in a meta-analysis you should sit up and take notice. Both because there might be something there, and because it actually matters - taking a nutritional supplement with a proven but minimal effect is a waste of effort.

The number of studies done gives some indication of how large N_total is. If we assume a field starts out with a few small studies and then moves to large studies, the quality of data should be expected to rise sharply only after a while (and then we just get a slow increase since the benefit of super-large studies gets counteracted by the convexity of the square root). A pilot study will have on the order of ~10-100 participants, while a big study will have ~1000+ participants - it will push the X estimate at 2 or 3 times more than the small study. So this leads to another heuristic: the longer a field has been gathering data, the more reliable it is. This is not just from the accumulation of studies, but also that dis-confirming evidence and new measurement methods will have arrived, increasing robustness. So if a modern meta-analysis agrees with older ones, then we have reason to think the conclusion is pretty reliable (or has gotten stuck in a dogma).

From a practical standpoint this is still pretty useless. I cannot easily call up all studies on the health benefits of cashew nuts, vitamin D or yoga and get their N, effect sizes and historical distribution compared... yet. I think that will change soon, and that may lead to a much stronger evidence base for a lot of things. Just being able to get funnel plots of studies would be helpful for seeing where things are leaning. Plenty of potential for misunderstanding and bias too - statistics is a subtle science, some biasing effects can be pretty insidious and many studies are about apples and oranges. But our ability to collate and visualize scientific data efficiently is increasing. The snake-oil visualisation is an early, promising step in bringing it to the public.

Posted by Anders3 at October 11, 2010 10:16 AM

Comments