Skip to content
This repository was archived by the owner on Oct 30, 2018. It is now read-only.

Comments

Automatic detection of the char encoding using the juniversalchardet#39

Open
chechu wants to merge 2 commits intoGravityLabs:masterfrom
chechu:c6ccc8adde01b8832324aab2e54d0e605cb65488
Open

Automatic detection of the char encoding using the juniversalchardet#39
chechu wants to merge 2 commits intoGravityLabs:masterfrom
chechu:c6ccc8adde01b8832324aab2e54d0e605cb65488

Conversation

@chechu
Copy link

@chechu chechu commented Feb 29, 2012

First, thank you for your work!

I have included the autodetection of the char encoding used by the web page using the juniversalchardet library. Feel free to include it in the master branch or discard it :-)

Moreover, I have added some code to make easy the integration of an autodetection language library (as jlangdetect or lingpipe) in StopWords.scala. Nowadays I am using my own private language identifier but it would be easy to include some other library. Maybe in the future :-)

Thank you again, and good luck

Jesus Lanchas added 2 commits February 29, 2012 14:18
…library.

Moreover, the system now is prepared to use a language detection previously to count the stop words in each fragment.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant