-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M::I doesn't read utf8-encoded files correctly #37
Comments
I think the patch as-is is going to break a lot of stuff. The premise behind M::I's read/write functions has always been "I ready/write byts, encoding is your own business". If anything - the issue is that the author_from extractor does not return bytes. |
BTW, if you add |
"use utf8" in the MI code is totally unnecessary. That's only used to indicate that there are utf8-encoded bytes in the physical file currently being parsed. |
I don't think it will break any more things than the other mojibake'd META files we've been dealing with for the past few years as we sort out encoding issues elsewhere in the toolchain.
I disagree. I think it's closer to "unicode -- what's that?"
This is wrong. Interfaces not dealing with the edges (writing to/from disk or the network) should always deal in (decoded) characters, not (encoded) octets. |
@sjn Could you PR your test case as a *.t file? As a/the Designated Unicode Victim, I'll have a go at it. |
On 10/20/2014 06:22 PM, Karen Etheridge wrote:
I much rather have a mojibaked meta rather than an "I have a new M::I on
For greenfield code - you are absolutely correct. M::I is a tool with a |
Of course. :) But I don't think we were far enough through evaluating a patch to assess whether lack of backwards compatibility was even a possibility here. Let's fix the code as we would like it to be, and then be aggressive about testing various install combinations to see what the potential ramifications are. (And, of course, an alpha release is necessary before anything goes stable.) |
perhaps "all_from" need to take arguments?
Or perhaps "all_from" should look for "use utf8;" or a POD "=encoding" section and DWIM? Assuming the author actually went to the trouble of telling Perl or perldoc about encodings, it would be a shame for M::I to ignore it. |
@dagolden agreed, an extra flag (and perhaps some internal guesswork) is the way to go. To clarify - I would be delighted to get this fixed (and not just for UTF8) My initial answer #37 (comment) was only pointing out the incorrectness of the proposed one-line patch And the second paragraph of #37 (comment) was specifically to counter the dangerous attitude "let's rewrite everything the modern way, redefining all interfaces in the process, and then see what the fallout is" |
On Mon, Oct 20, 2014 at 11:58:43PM -0700, Peter Rabbitson wrote:
No one said that, so I think you're inferring too much here. However, it's perfectly reasonable to say "please don't merge this yet - |
Indeed. It was written during "the back compatibility target doesn't
|
#55 should fix this issue (??) |
... since Module::Install is b0rken. Perl-Toolchain-Gang/Module-Install#37
Module/Install.pm's _read() and _write() functions don't set the encoding correctly, leading to utf8-bugs in the author field in META.yml. Here's how to reproduce this error:
Create two files;
Makefile.PL
......and
lib/Broken.pm
Make sure the AUTHORS field is saved correctly as utf8. You can check this with
xxd -g1 -s 179 -l 70 lib/Broken.pm
, which should give the following output:Then, run
perl Makefile.PL
, using Module::Install version 1.12. The following META.yml will then be generated:The error is in the author field, specifically
�\x89ric Cholet
.The following patch to Module::Install with fix the issue:
This parch will make one test in the Module::Install's test suite fail. Not sure what's going on there.
The text was updated successfully, but these errors were encountered: