Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Cymraeg language (Welsh) #140

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fin-w
Copy link

@fin-w fin-w commented Oct 8, 2024

This adds support for Welsh, based on trigrams generated from the 11+ million Welsh words contained in the Corpws Cenedlaethol Cymraeg Cyfoes https://corcencc.org/, which is licensed under Creative Commons Attribution Non Commercial Share Alike 4.0 International.

Citation:
Knight, D., Morris, S., Fitzpatrick, T., Rayson, P., Spasić, I., Thomas, E-M., Lovell, A., Morris, J., Evas, J., Stonelake, M., Arman, L., Davies, J., Ezeani, I., Neale, S., Needs, J., Piao, S., Rees, M., Watkins, G., Williams, L., Muralidaran, V., Tovey-Walsh, B., Anthony, L., Cobb, T., Deuchar, M., Donnelly, K., McCarthy, M. and Scannell, K. (2020). CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes – the National Corpus of Contemporary Welsh. Cardiff University, http://doi.org/10.17035/d.2020.0119878310

@fin-w
Copy link
Author

fin-w commented Oct 8, 2024

One test is failing when running cargo test and make test :

---- core::detect::tests::test_detect_with_options_with_filter_list_except stdout ----
thread 'core::detect::tests::test_detect_with_options_with_filter_list_except' panicked at src/core/detect.rs:148:9:
assertion `left == right` failed
  left: Cym
 right: Eng

I'm not sure what the correct fix is for this. I'll have a look at the test again and try to correct things though.

@fin-w fin-w force-pushed the support_cymraeg_welsh branch from 81a9361 to 09635c0 Compare October 8, 2024 17:54
@fin-w
Copy link
Author

fin-w commented Oct 8, 2024

It seems like I just had to filter out Welsh in the test to make it pass, so I've fixed that. Hopefully this is ready to merge now?

@fin-w fin-w force-pushed the support_cymraeg_welsh branch from af65af5 to fda82f3 Compare October 8, 2024 18:28
@fin-w fin-w force-pushed the support_cymraeg_welsh branch from fda82f3 to 1f9cf0c Compare October 9, 2024 23:58
@fin-w fin-w force-pushed the support_cymraeg_welsh branch from 1f9cf0c to 574b33a Compare October 10, 2024 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant