What is the current status of locale support (particularly en_US.UTF-8
and C
)?
#23010
Unanswered
akinomyoga
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Although the default locale in Termux seems to be
en_US.UTF-8
, the locale support of Termux appears to be incomplete. Here, I'd like to know the latest information about how it is incomplete (what's available and what's not) and how we could work around issues related to incomplete locale support better. There doesn't seem to be any official documentation about the locale support.Original issue
Suppose one wants to manipulate bytes in binary data (which does not contain NUL) stored in a shell variable within a Bash script. Usually, one can achieve this by setting
LC_CTYPE=C
and count the number of bytes by${#data}
or access a byte with${data:index:1}
. However, this doesn't seem to work in Termux. For example, you can see the issue with the following example:Although we expect
3
as the result, Bash returns1
in Termux. In all the other environments,3
is obtained as expected.Past discussions
There is an old discussion from 2020:
The issue asked whether the locale
C
is available. The answer was that Termux doesn't support locales. However, it would be unclear what happens when no locales are supported. If it were not supported literally at all, many of the basic C APIs would be unavailable (e.g.,printf
,isalpha
,tolower
,strftime
, etc. all depend on the current locale). Thus, it would be reasonable to think something is assumed for the results of the actions that rely on locale. What is that?A StackOverflow question from 2021
states that
which implies that
en_US.UTF-8
would have been introduced between 2020 and 2021.There is also a comment in a discussion from 2022:
The comment says
which contradicts the first information from 2020. Does this mean that Termux/Bionic introduced a certain support for the locale
en_US.UTF-8
andC
between 2020 and 2022?However, as of 2025, the locale
C
is incomplete as illustrated in the first example. Even foren_US.UTF-8
, another issue from 2023 reports thaten_US.UTF-8
is unsupported (or not complete enough to pass the tests):Those four statements in past discussions don't seem to be really consistent with each other, so I think some of them (or all) are untrustworthy. If all of them are somewhat correct, I guess it would mean Termux supports neither of
en_US.UTF-8
norC
, but an unspecified amalgam ofen_US.UTF-8
andC
. Or it might be switching back and forth betweenen_US.UTF-8
andC
every single year.Bionic libc
The third mentioned Bionic libc, so can I assume that Termux packages adopt Bionic as the C standard library? I also tried to look up information in Bionic. However, Bionic doesn't seem to have a place to report an issue or ask questions. Instead, I find the following comment in
/libc/bionic/locale.cpp
of the Bionic codebase:This seems to imply that Bionic supports both
C
anden_US.UTF-8
(a synonym ofC.UTF-8
) separately. This comment has existed at least since 2016, which is inconsistent with the observation above.I also found a mention on locale in the documentation (boldfaced by me):
This part of the documentation seems to have been introduced by commit aosp-mirror/platform_bionic@046fe15, whose commit message says
So it seems to imply that Bionic actually only supports
en_US.UTF-8
(a synonym ofC.UTF-8
). If this is true, it seems to me that the first information in the code comment Bionic'slocale.cpp
would be wrong. Or the support for theC
locale might have been dropped at some point between 2015 and 2022.I'm not sure which information I should believe. In either case, the behavior is not consistent with the past reports for Termux. Another possibility would be that the upstream Bionic and the Bionic used by Termux are actually different versions. Another possibility would be that Termux only uses Bionic partially, and the locale part might have extensions/modifications.
Timeline
To summarize the timeline, we could make the following table for the locale support:
C
anden_US.UTF-8
en_US.UTF-8
C
oren_US.UTF-8
en_US.UTF-8
(broken)en_US.UTF-8
(broken)C
Every piece of the information is inconsistent, so I'm confused about which information would be really trustworthy, and what would be the relationship between the C library used in Termux packages and the upstream Bionic.
Questions
C
locale (which is separate fromC.UTF-8
/en_US.UTF-8
)? If not, would it be supported in the future?Beta Was this translation helpful? Give feedback.
All reactions