Gwern Branwen has a great essay, The Existential Risk of Mathematical Error where the problem of mathematical errors is dealt with.

As we argued in our probing the improbable paper, complex scientific arguments are untrustworthy because there is a finite probability of mistakes at every step. There are estimates that at least 1% of all scientific papers ought to be retracted. When explaining this to a mathematician friend, he responded that he thought a far smaller number of papers in math were this flawed. It seems that he should not have been so optimistic.

But Gwern points out that type 1 errors, where the claim is true but the proof doesn't work, seem to be more common than type 2 errors, where the claim is false and the proof is erroneous. This is good news. Although I am not entirely convinced that we have sufficiently long and unbiased data to make a firm judgement. Another interesting aspect is that type 2 errors have often precipitated mathematical revolutions by showing that intuitive insights are wrong and the world is stranger than we expect.

Unfortunately I suspect that the connoisseurship of mathematicians for truth might be local to their domain. I have discussed with friends about how "brittle" different mathematical domains are, and our consensus is that there are definitely differences between logic, geometry and calculus. Philosophers also seem to have a good nose for what works or doesn't in their domain, but it doesn't seem to carry over to other domains. Now moving outside to applied domains things get even trickier. There doesn't seem to be the same "nose for truth" in risk assessment, perhaps because it is an interdisciplinary, messy domain. The cognitive abilities that help detect correct decisions are likely local to particular domains, trained through experience and maybe talent (i.e. some conformity between neural pathways and deep properties of the domain). The only thing that remains is general-purpose intelligence, and that has its own limitations.

Maybe a sufficiently smart mind would indeed be able to learn the patterns in such messy domains and develop a nose for them (this might be less a case of superintelligence and more a need for very large amounts of data and learning, but it could also be that very clever representations are needed to handle and learn this much data).

In the meantime, we should learn how to sniff out type 2 errors. They are the most valuable discoveries we can make, because they make us change direction.

Comments