Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting to money is lossy and too eager #4276

Open
seancolsen opened this issue Feb 20, 2025 · 3 comments
Open

Casting to money is lossy and too eager #4276

seancolsen opened this issue Feb 20, 2025 · 3 comments
Labels
ready Ready for implementation restricted: maintainers Only maintainers can resolve this issue type: bug work: backend Related to Python, Django, and simple SQL
Milestone

Comments

@seancolsen
Copy link
Contributor

  1. Create table via import

  2. Copy and paste text, choosing any of the following text snippets:

    Birthday
    February 2
    December 10
    January 7
    
    Star
    Gliese 581
    LHS 1140
    Kepler-22
    
    Gene
    TP53
    BRCA1
    Pax6
    
  3. Expect the column to be text.

  4. Instead, observe that it's inferred as Money.

This is bad because information is lost when the import is confirmed.

I was able to reproduce this on 0.2.0 and 0.1.7, so I don't think it's a regression

@seancolsen seancolsen added ready Ready for implementation restricted: maintainers Only maintainers can resolve this issue type: bug work: backend Related to Python, Django, and simple SQL labels Feb 20, 2025
@seancolsen seancolsen added this to the High priority milestone Feb 20, 2025
@seancolsen
Copy link
Contributor Author

I think we should only be casting values to money which have an actual dollar sign and no other letters, i.e. something like this:

^\s*(-\s*)?\$\s*(-\s*)?(\d,?)+(\.\d*)?$

@mathemancer
Copy link
Contributor

Hmm. The lossiness is definitely a problem, and some time in the dim past I argued against this money type at all based on that. Even if the column is intended to be money, you lose the currency information by casting it to the money type. If the column has more than one currency, that information gets lost. I deal with accounting forms in my own life with multiple currencies in a column, and casting such a column to Mathesar's money type would be disastrous, if I didn't have back ups for some reason.

The lossiness involved in casting to that type, or inferring a column to be of that type is due to a combination of these factors:

  • Our money type is intended to be international. It should be able to handle things besides U.S. dollars. I think this is quite important.
  • We don't do "global inference" That is, our inference algorithm only considers one value at a time. This means we can't (for example) check that all of the currency symbols in a column are the same before inferring that it is, in fact, currency. This is something I'd like to improve if (when) we overhaul our type casting and inference.
  • When we came up with this type, we were dealing with SQLAlchemy, which (as I recall) made composite type handling a pain. This means we can't just store the currency symbol separately from the number, meaning conversion to this type means just stripping off non-numeric symbols, leading to lossiness.

I'd be okay with just removing the type from the inference algorithm, meaning a user would have to manually set any column to the money type. This would allow a user working in HKD (for example) to cast their column to the money type manually if they so choose, but they'd have to be explicit about it.

@seancolsen
Copy link
Contributor Author

removing the type from the inference algorithm

Yeah, that sounds like a good short-term compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Ready for implementation restricted: maintainers Only maintainers can resolve this issue type: bug work: backend Related to Python, Django, and simple SQL
Projects
None yet
Development

No branches or pull requests

2 participants