-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use more appropriate types for cross-platform unicode support #19
Conversation
list_missing_volumes fails on Windows, because it expects '/' as the path separator.
Switch over to widestring and W/Ex variants of the unrar_sys functions everywhere, fixing unicode issues at the cost of using u16/u32 instead of u8 types for strings. Though these are what the underlying unrar library is also using. Changed the public Archive API: - More permissive with the types accepted for archive filename, destination and password. - Return PathBufs rather than Strings when representing filenames. Also tried to avoid taking ownership of filename inside Archive. But despite the efforts, all the input strings/paths still end up getting copied in the conversion to (Wide)CString. Passwords are still Strings, as the old RARSetPassword interface only supports char. It will be necessary to switch to using UCM_NEEDPASSWORDW first. Apparently this is also required for archives with encrypted headers. Fixes muja#11, and partially addresses muja#4.
Thank you very much for this PR! I believe this should also fix #4. Some things to note:
|
#[test] | ||
fn unicode_list() { | ||
let mut entries = Archive::new("data/unicode.rar").list().unwrap(); | ||
assert_eq!(entries.next().unwrap().unwrap().filename, PathBuf::from("te…―st✌")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this string non-utf8? How would this work? Aren't &str
s UTF-8? Wouldn't you need to use a Vec<u8>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's UTF-8, I haven't yet come up with any good way of crafting a suitable UTF-16 test case.
The reason why I opted to borrowing over owning in To go in more detail, the options I considered were:
For So the decisions here boil down to whether to accept the cost of extra copy or the need to pass owned by reference. Personally, I still prefer the latter but the final call is yours. |
I think it's okay to get this merged for now. However, I'd like to wait before doing a 0.5 release as I've been planning to restructure the library for a while now but have never gotten around to it. I'm not sure if you're aware, but I've talked about it in PR #10, too. My idea is to give the user more control over what to do with each entry by separating the Maybe you have some thoughts about that, too. I opened an issue: #20 |
However, would it be possible to not |
Also (sorry for spamming), would you like to be included in the authors file? :) I think you more than deserve to be mentioned. |
I can have it not Also inclusion in the authors file is fine with me. |
I see what you're getting at, with the
Options 1) or 2) seem like good solutions and are more honest with the caller. 3) seems like black magic and would possibly contain some element of surprise. The same I would do for the We could also 1) Sorry for bringing this up only after you refactored your PR to remove the panics. |
No problem. Doing 1) for I suppose it should be 1) for |
Well if it really only |
I'd be fine with that, but it would leave some room for mistakes (as in nul getting inserted in There is |
- Add NulError, so that we can combine ffi::NulError and widestring::NulError. - Check filename and password for nul values on Archive creation.
Well, I went ahead and gave it a go, but I think it might be better to just revert the last 2 commits and panic on all cases for now. Because doing the conversions in |
I think for now this looks fine, especially since we explicitly document where the function might panic. #22 should be implemented though in the future to be able to handle these higher level errors gracefully. Thank you for your time! :) |
@vjoki unrelated to this but do you prefer your full name or your alias for the authors list? |
Alias would be fine. |
Interally things are switched over to using widestring and
W
/Ex
variants of theunrar_sys
functions, as these seem to be preferred (unrar
internally converts non-wchar
strings). This is also required for unicode support on Windows (UTF-16), but also works on other platforms (by using UTF-32).Breaking changes to public API:
destination and password. (TBH could be more permissive if I didn't (pointlessly) try to avoid ownership inside
Archive
.)PathBuf
s rather thanString
s when representing filenames.One thing still missing is that passwords are still
String
s, as the oldRARSetPassword
interface onlysupports
char
s. It's going to necessary to add support forUCM_NEEDPASSWORDW
first, which requires some more changes to the callback.