This repository was archived by the owner on Oct 30, 2018. It is now read-only.
Automatic detection of the char encoding using the juniversalchardet#39
Open
chechu wants to merge 2 commits intoGravityLabs:masterfrom
Open
Automatic detection of the char encoding using the juniversalchardet#39chechu wants to merge 2 commits intoGravityLabs:masterfrom
chechu wants to merge 2 commits intoGravityLabs:masterfrom
Conversation
added 2 commits
February 29, 2012 14:18
…library. Moreover, the system now is prepared to use a language detection previously to count the stop words in each fragment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First, thank you for your work!
I have included the autodetection of the char encoding used by the web page using the juniversalchardet library. Feel free to include it in the master branch or discard it :-)
Moreover, I have added some code to make easy the integration of an autodetection language library (as jlangdetect or lingpipe) in StopWords.scala. Nowadays I am using my own private language identifier but it would be easy to include some other library. Maybe in the future :-)
Thank you again, and good luck