You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! Tantivy is used by multiple projects - I've found it as a dependency of ParadeDB. I would love it if it supported stemming in Polish language, as it would enable Polish language support downstream.
I'm not sure though if it's currently possible? I saw that tantivy uses rust_stemmers as a dependency for multi-language stemming. However, its repository appears to be unmaintained with 13 open PRs and the last commit being from 2021. Also, it appears that a language needs to be added to Snowball prior to adding to rust_stemmers, which has an open PR (since 2021) for the Polish language: snowballstem/snowball#159
Sadly, I don't know NLP well enough to contribute - but I hope this write-up comes in handy for someone :)
Thanks for maintaining this project!
The text was updated successfully, but these errors were encountered:
Hello! Tantivy is used by multiple projects - I've found it as a dependency of ParadeDB. I would love it if it supported stemming in Polish language, as it would enable Polish language support downstream.
I'm not sure though if it's currently possible? I saw that
tantivy
usesrust_stemmers
as a dependency for multi-language stemming. However, its repository appears to be unmaintained with 13 open PRs and the last commit being from 2021. Also, it appears that a language needs to be added to Snowball prior to adding torust_stemmers
, which has an open PR (since 2021) for the Polish language: snowballstem/snowball#159Sadly, I don't know NLP well enough to contribute - but I hope this write-up comes in handy for someone :)
Thanks for maintaining this project!
The text was updated successfully, but these errors were encountered: