Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #273 from daemanos/new-overflow
Add new overflow-checking primitives Motivation MLton currently implements checked arithmetic with specialized primitives. These primitives are exposed to the Basis Library implementation as functions that implicitly raise a primitive exception: val +! = _prim "WordS8_addCheck": int * int -> int; In the XML IR, special care is taken to "remember" the primive exception associated with these primitives in order to implement exceptions correctly. In the SSA, SSA2, RSSA, and Machine IRs, these primitives are treated as transfers, rather than statements. This pull request implements a possibly better implementation of these operations as simple primitives that return a boolean: val +! = _prim "Word8_add": int * int -> int; val +? = _prim "WordS8_addCheckP": int * int -> bool; val +$ = fn (x, y) => let val r = +! (x, y) in if +? (x, y) then raise Overflow else r end This would eliminate the special cases in the XML IR and in the SSA, SSA2, RSSA, and Machine IRs, where the primitives would be treated as statements. Other compilers provide overflow checking via boolean-returning functions: * https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html * https://llvm.org/docs/LangRef.html#arithmetic-with-overflow-intrinsics Implementation This patch refactors the primitive checked-arithmetic operations such that the suffix `?` represents a new overflow-checking predicate, a suffix `!` represents the non-overflow-checking variant, and a suffix `$` represents the overflow-checking variant (mnemonic: `$` for "safe" or "expensive"). The behavior of the new `$`-operations is controlled by a compile-time constant, `MLton.newOverflow`. When set to false (the default), the `$`-operations make use of the old-style implicit-`Overflow` primitives. When set to true, the `$`-operations are implemented as an `if`-expression that branches on the result of the corresponding `?`-operation and either raises the `Overflow` exception or returns the result of the corresponding `!`-operation. Finally, the bare operation is aliased to either the `$`-form (with overflow detection enabled) or the `!`-form (with overflow detection disabled). Essentially: val +! = _prim "Word8_add": int * int -> int; val +? = _prim "WordS8_addCheckP": int * int -> bool; val +$ = if MLton.newOverflow then fn (x, y) => let val r = +! (x, y) in if +? (x, y) then raise Overflow else r end else fn (x, y) => ((_prim "WordS8_addCheckP": int * int -> int;) (x, y)) handle PrimOverflow => raise Overflow val + = if MLton.detectOverflow then +$ else +! Note that the checked-arithmetic using `!`- and `$`-operations is careful to perform the `!`-operation before the `$`-operation. With the native-codegens, a new peephole optimization combines the separate unchecked-arithmetic operation and checked-arithmetic-predicate operation into a single instruction. For the C-codgen, the new checked-arithmetic-predicate primitives are translated to uses of the `__builtin_{add,sub,mul}_overflow` intrinsics (which improves upon the previous explicit conditional checking, but requires gcc 5 or greater). Similarly, for the LLVM-codgen, the new checked-arithmetic-predicate primitives are translated to uses of the `{sadd,uadd,smul,umul,ssub,usub}.with.overflow` intrinsics. For both the C- and LLVM-codegens, it is expected that these intrinsics will be combined with the separate unchecked-arithmetic operation. In addition, the `RedundantTests` optimization has been extended to eliminate the overflow test when adding or subtracting 1 with the new primitives. Performance The native-codegen peephole optimization and `RedundantTests` have been mostly sufficient to keep performance on par with the older checked-arithmetic primitives, and in some cases performance has even significantly improved. Below is a summary of the exceptional runtime ratios in the different codegens (both positive and negative): | Benchmark | Native | LLVM | C | |-----------------|--------|------|------| | even-odd | 1.00 | 1.00 | 1.09 | | hamlet | 0.98 | 0.90 | 0.93 | | imp-for | 0.99 | 1.50 | 0.46 | | lexgen | 0.92 | 1.31 | 1.24 | | matrix-multiply | 0.99 | 1.00 | 0.87 | | md5 | 1.06 | 1.01 | 0.97 | | tensor | 1.01 | 1.00 | 0.57 | No benchmarks were consistently worse or better on all codegen, e.g., the `imp-for` benchmark performed exceptionally badly on the LLVM codegen, but was much faster on the C codegen and about even on the native codegen. For this particular benchmark, the cause of the slowdown with the LLVM codegen has yet to be discovered. Similarly, the cause of the slowdown in `lexgen` with the C- and LLVM-codegens is unknown. For the `md5` benchmark, on the other hand, the cause of the slowdown with the native codegen seems to be a failure to eliminate common subexpressions in certain circumstances, which can potentially be improved in the future. Improvements in the C-codegen are likely to be due to the better overflow checking with builtins. Future work The `CommonSubexp` optimization currently handles `Arith` transfers specially; in particular, the result of an `Arith` transfer can be used in common `Arith` transfers that it dominates. This was done so that code like: (n + m) + (n + m) can be transformed to let val r = n + m in r + r end With the new checked-arithmetic-predicate primitives, the computation of the boolean value may be common-subexpr eliminated, but `Case` discrimination will not. This forces the boolean value to be reified and to be discriminated multiple times (though, perhaps `KnownCase` could eliminate subsequent discriminations). Extending `CommonSubexpr` to propagate flow-sensitive relationships at `Case` transfers to the dominated targets could improve the performance `md5` with `MLton.newOverflow true` and potentially improve performance elsewhere as well (e.g., by propagating the results of comparisons as well). Once all performance slowdowns with `MLton.newOverflow true` have been eliminated, it would be desirable to remove the old-style implicit-`Overflow` primitives and `Arith` transfers. This would eliminate many previous instances of special-case code to handle these primitives and transfers. Finally, it may be worth investigating an implementation of the checked-operations via val +$ = fn (x, y) => let val b = +? (x, y) val r = +! (x, y) in if b then raise Overflow else r end rather than val +$ = fn (x, y) => let val r = +! (x, y) in if +? (x, y) then raise Overflow else r end The advantage of calculating the boolean first is that when `x` (or `y`) is a loop variable and `r` will be passed as the value for the next iteration, then both `x` and `r` could be assigned the same location (pseudo-register or stack slot). `x` and `r` cannot share the same location when the boolean is calculated second, because `x` and `r` are both live at the calculation of the boolean. See #218. This would require reworking the native-codegen peephole optimization.
- Loading branch information