Failing tests for mlx > 0.19.0 #93

filipstrand · 2024-11-09T19:57:05Z

The test test_image_generation_schnell runs fine for mlx versions 0.19.0 and below, but from 0.19.3 and onwards it gives the following error:

       [[ 94,  64,  37],
        [ 99,  65,  38],
        [113,  77,  52],
        ...,
        [ 88,  61,  48],
        [ 94,  66,  53],
        [ 98,  73,  60]],

       [[104,  77,  58],
        [113,  80,  62],
        [114,  81,  64],
        ...,
        [ 87,  66,  51],
        [ 95,  73,  58],
        [104,  84,  75]]], dtype=uint8)
percent_mismatch = np.float64(34.19195911871693)
precision  = 6
reduced    = array([ True, False,  True, ..., False, False, False])
reduced_error = array([1, 1, 1, ..., 3, 2, 1], dtype=uint8)
remarks    = ['Mismatched elements: 264695 / 774144 (34.2%)',
 'Max absolute difference among violations: 51',
 'Max relative difference among violations: 14.']
strict     = False
val        = array([ True, False,  True, ..., False, False, False])
verbose    = True
x          = array([ 45,  25,  14, ..., 107,  86,  76], dtype=uint8)
y          = array([ 45,  24,  14, ..., 104,  84,  75], dtype=uint8)

Have not investigated this yet but it looks like a small diff. Right now our assertion is pretty strict and does an array_equal comparison on the final image compared to the fixed reference one:

np.testing.assert_array_equal(
                np.array(Image.open(output_image_path)),
                np.array(Image.open(reference_image_path)),
                err_msg=f"Generated image doesn't match reference image. Check {output_image_path} vs {reference_image_path}",
            )

When looking at the output of the generated image, it is visually identical to the reference image.

My initial guess is that this is caused by a bug fix in later mlx versions and we should update our reference image to reflect this.

The text was updated successfully, but these errors were encountered:

anthonywu · 2024-11-09T20:57:55Z

Are the results close enough to use https://numpy.org/doc/2.1/reference/routines.testing.html to overcome?

filipstrand · 2024-11-09T21:57:06Z

Did some very quick tests now and running assert_allclose with an absolute tolerance of 55 worked

np.testing.assert_allclose(
                np.array(Image.open(output_image_path)),
                np.array(Image.open(reference_image_path)),
                atol=55,
                err_msg=f"Generated image doesn't match reference image. Check {output_image_path} vs {reference_image_path}",
            )

When I tried a smaller tolerance, like atol=50 and below, it did not pass. I guess this is a pretty large diff that we would allow in this case, considering that the pixel values of the arrays we compare in np.array(Image.open(output_image_path)) are between 0 and 255. Did not try the other comparison methods yet.

I'll try to look into to other alternatives for this tomorrow.

filipstrand self-assigned this Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing tests for mlx > 0.19.0 #93

Failing tests for mlx > 0.19.0 #93

filipstrand commented Nov 9, 2024

anthonywu commented Nov 9, 2024

filipstrand commented Nov 9, 2024

Failing tests for mlx > 0.19.0 #93

Failing tests for mlx > 0.19.0 #93

Comments

filipstrand commented Nov 9, 2024

anthonywu commented Nov 9, 2024

filipstrand commented Nov 9, 2024