Trusting the world of floating point

A lot of the everyday calculations we rely on depend on “floating point arithmetic” – in other words an approximation of accurrate mathematics and not actual accurate mathematics.

For the last six weeks or so I have been working on bring floating point arithmetic to “Riscyforth” – my assembly-based implementation of Forth for Risc-V “single board computers”.

I could have just used the software written by other people and piped my calls through to that – but this is as much about a learning process and challenge for as it is every going to be about production code. And it turns out floating point is far from simple.

Typically floating point numbers are represented as though in scientific notation, but using powers of 2, rather than powers of 10. So to from the easy-to-write base 10 number $10^{-1}$ to a binary representation there is a fair amount of long (binary) division required – and in this case – because $\frac{1}{10}$ can only be represented by an infinite series of factions to the power of two, an approximation is required:

$\frac{1}{10} = \frac{1}{16} + \frac{1}{32} + \frac{1}{256} + \frac{1}{512} + \frac{1}{4096} + \frac{1}{8096}+...$

The error in that partial sum – which requires 13 bits to store – is “just” 0.035% (it comes to 0.09996420851031553398058252427184 according to Microsoft’s calculator – though, that too is an floating point approximation). But that would be enough to miss a target at the distance of the Moon by 138km.

Typically more accurate FP approximations than this are being used to guide modern engineering – but they are still approximations.