Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: sniff csv quoting and lineterminator #129374

Open
Nikilauz opened this issue Jan 27, 2025 · 4 comments
Open

Feature request: sniff csv quoting and lineterminator #129374

Nikilauz opened this issue Jan 27, 2025 · 4 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@Nikilauz
Copy link

Nikilauz commented Jan 27, 2025

If you read a csv file, alter the data and then write it back, it would be very nice to recreate also the quoting behaviour. So e.g. if the file is quoted in style csv.QUOTE_NONNUMERIC, it would be very useful to have the quoting property set correctly in the dialect object returned by csv.Sniffer().sniff.
Also, I just figured out that the line terminator is also hard-coded to "\r\n". Would also be nice to be adaptive...

@Nikilauz Nikilauz changed the title Feature request: sniff csv quoting Feature request: sniff csv quoting and lineterminator Jan 27, 2025
@tomasr8 tomasr8 added the type-feature A feature request or enhancement label Jan 27, 2025
@encukou encukou added the stdlib Python modules in the Lib dir label Jan 28, 2025
@ericvsmith
Copy link
Member

This probably belongs on discuss.python.org. But in any event, you'll need to specify more information. What specific changes would you like to see?

@Nikilauz
Copy link
Author

Nikilauz commented Feb 3, 2025

In general, I would expect the Dialect object returned by csv.Sniffer.sniff to have its properties set in a way that a writer object initialized with the dialect would reproduce a similar csv string. This seems to be a clear correctness constraint for the dialect functionality for me, so is this really something to put in a discussion forum? I'm right aware that this may be an issue with a questionable benefit-cost-ratio, though...
Anyhow, I'm not familiar with the feedback/issue conventions of the python project and may have misunderstood the purpose of dialects, so feel free to correct me (:

@ericvsmith
Copy link
Member

ericvsmith commented Feb 4, 2025

Could you provide sample code that shows what results you are getting, and explain how that differs from what expect to get?

@Nikilauz
Copy link
Author

Nikilauz commented Feb 5, 2025

Sure:

import csv

s = '42,"hello","world",-1\n'    # e.g. via readline method from some csv file
d = csv.Sniffer().sniff(s)
d.quoting    # 0
d.lineterminator    # '\r\n'

x = list(csv.reader([s], d))    # [['42', 'hello', 'world', '-1']]

f = open("test.csv", "w", newline = "")
csv.writer(f, d).writerows(x)
f.close()

f = open("test.csv", "r+b")
f.read()    # b'42,hello,world,-1\r\n'
f.close()

So instead of b'42,hello,world,-1\r\n' I would expect to get b'42,"hello","world",-1\n'.
More precisely, I would expect d.quoting to be 2 (i.e. csv.QUOTE_NONNUMERIC) and d.lineterminator to be '\n'.

Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

4 participants