-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overflow behaviour of Double64 does not match Float64 #151
Comments
A fix could look like so: in
|
This is a deep, heavily called function. I need to see if there is a way to resolve the issue that is less impactful. |
Fortunately, this anomaly is not pervasive.
If the behavior is limited to one or just a few functions, it makes more sense to trap this case within |
This issue is based in a corner case for multiplication or squaring, when the magnitudes get huge:
The next step is looking into this more deeply. |
omitted |
I think, the issue is with all base operations (+, -, *, /), which all return julia> a + a julia> -a - a julia> a * a julia> a / Double64(0.0) julia> a / (1/a) julia> 1/a
|
That is clear, and helpful. |
I found the problem -- when the result is Inf the Double64 computation can result in HILO(result) == (Inf, -Inf) [or (-Inf, Inf) which prints as NaN when it should print as HI(result). I need to see if this is enough or if the result must be remade to the general form used in the package (Inf, NaN) or (-Inf, NaN) to work in carry through calculations. |
THIS IS INCORRECT, see next The results need to be remade -- or something else modified
|
Ignore last comment --
|
still though, there is the problem you highlighted .. and unfortunately not a string display issue
|
I think we must first make the base operation correct. The
|
It appears that I have to check for |
I agree. Question is where to insert the checks to avoid runtime regression afa possible. |
the obvious place is to change (and similarly for sub_, mul_ dvi_)
into
It would be better if there were a less pervasive way, though. |
Consequently all other But as I see, that All other calls of |
Hmm, good thought. |
e.g. replacing (the | should be || in any event)
with
or
|
That would work at the cost of one additional (Actually I did not understand, why the original test for infinity was based on the |
The encoding of (+/-Inf, NaN) for +/-Inf and (NaN, NaN) for NaN was chosen so that
|
Your last proposal looks more obvious to me. ( I mean |
agreed |
The same corrections need to done to the DoubleT op FloatT (and FloatT op DoubleT) user facing routines. |
I will implement these changes on a new branch (with tests) and post here when done. |
I saw, that also |
ok .. let me know if you notice others |
I understand that the ops in op_dddd_dd.jl and op_ddfp_dd.jl and op_fpdd_dd.jl and op_dd_dd.jl and many or all in op_fp_dd.jl, op_ddsi_dd.jl need this adjustment. |
Also returning |
I am not sure about that - as those functions are not exported (afaik). Modifying only the exported functions (and sufficient test cases for those) would not have as much impact on performance, I guess. |
Sorry, I wasn't aware of the normality concept. How is it defined? Do the user functions always return normalized DoubleFloats? |
Yes. Great care is taken to assure the user facing functions consistently provide normalized results. A normalized Double64 is a pair (HI::Float64, LO::Float64) where more to follow |
any error-free transformation takes a mathematical (usually arithmetic or related) function of a typed variable, and generates both the commonly resolved result and a very good approximation to the error in that result. Another way to look at this is with two functions, both operationally same and one working at twice the precision of the other:
A Double64 is the same (hi, lo). Arithmetic and elementary functions of Double64s maintain and develop |
here is the error-free product of two Float64 values
|
Another failing test case:
|
As a consequence of
The first part of the condition is simply wrong! What was the purpose of introducing it? |
It was an error yesterday night. I have the same setup in all the dvi_.._db functions. I will remove them. |
Then also have to specialcase |
I am still working to get |
this may help with log, exp |
Here are the first 30 inverse factorials as Double64s.
|
Corrected the divide tests and covered One more subcase Bring me up to speed on |
Precision failures for corner argument values.
So it makes sense to proceed in the order Calculated relative errors for example:
|
Sounds reasonable. Maybe we need to check for magnitudes >= ldexp(1.0, 996) (textbook) or ldexp(1.0, 995) (safer) and when found use the magnitude specific handling (an alternative implementation of two_sum). Those checks are quite costly -- a more general approach would be to allow the user to choose absolute error management (keeping any uncertain bits in the least significant third or quarter of the LO bits) or throughput when working within more usual numeric spreads? |
Forget what I said about With the given implementation it is simply not possible to represent their tiny return values with higher precision than we observe.
it would be fair to have |
I finished now a version of
|
Some tests to be added to
|
excellent |
Unfortunately there are still
|
At first glance, this appears to happen when the product should be >= floatmax(Double64). (e.g. prevfloat(x)y and xprevfloat(y) work as expected). |
The current value of
|
I don't think it is a good idea to shrink the floatmax. |
I am going through the arithmetic. |
(note) at least part of the problem is in the way that fma handles
if I will post here after determining (a) where best to pretest for nonfinite product and (b) checking if that fixes known problems and (c) adding tests to cover. |
My experience: it is good to start with (c) in order to keep the once broken corner cases as regression tests. |
That is good advice. I implemented an almost correct adjustment the These two should be zero (and were zero before the mul_ changes)
So, either it is a subtle aspect of the reworked mul_ (I swapped in a one-flop faster implementation to help cover the conditional) or those routines (or functions they call) are slightly inexact .. to be determined. For the additional tests, they need to cover more Do you have other value pairs, or predicates on results to test already in mind? |
In the moment I have nothing new, so the following cases have already been reported:
|
I have had to spend time elsewhere, and need to do that for a short while longer. |
Please re-open this issue:
The root cause seems to be in
DoubleFloats
:Originally posted by @KlausC in #149 (comment)
The text was updated successfully, but these errors were encountered: