Adds the link_filter.yml file to the default configuration when running ACHE as a REST server#175
Adds the link_filter.yml file to the default configuration when running ACHE as a REST server#175jpmantuano wants to merge 10 commits intoVIDA-NYU:masterfrom
Conversation
… ACHE as a REST server
|
Thanks @jpmantuano ! I'll review it as soon as possible. |
aecio
left a comment
There was a problem hiding this comment.
I left some comments. Let me know what if you think and can continue working on that.
| if (pattern.whitelist != null) | ||
| this.whitelist = WildcardMatcher.fromWhitelist(pattern.whitelist); | ||
| if (pattern.blacklist != null) | ||
| pattern.blacklist.add("tech"); |
There was a problem hiding this comment.
Why is "tech" hard-coded? Maybe you forget to remove it from the commit?
| @@ -0,0 +1,845 @@ | |||
| package focusedCrawler.util.readability; | |||
There was a problem hiding this comment.
Is there any difference from this code to the version at https://github.com/wuman/JReadability/blob/master/src/main/java/com/wuman/jreadability/Readability.java ?
Maybe you could add this in a separate pull request.
| return this; | ||
| } | ||
|
|
||
| public Builder fromYamlFile(String file, List<String> whitelist, List<String> blacklist) { |
There was a problem hiding this comment.
This method looks too similar to fromYamlFile(String file). Maybe fromYamlFile(String file) could call this method passing empty whitelists and black lists?
| if (configPath != null && !configPath.isEmpty()) { | ||
| baseConfig = new Configuration(configPath); | ||
| linkFilterConfig = new LinkFilterConfig(configPath); | ||
| linkFilterConfig.setFileLocation(configPath); |
There was a problem hiding this comment.
This setter doesn't seem necessary since configPath is already passed in the constructor.
| public boolean accept(URL link) { | ||
|
|
||
| String url = link.toString(); | ||
| String url = link.toString().toLowerCase(); |
There was a problem hiding this comment.
I don't think that lowercase should be always be called by default. URL paths are not case sensitive. Maybe we could add a config option to make link filters case insensitive.
Can you have a look at my work trying to add the link_filters.yml to the default configuration of ACHE when running it as a REST Server. Right now I'm just copying the file, but my goal is copy how you load the configurations. But I am failing to map the contents of the link_filters.yml loaded as json format to the LinkFilterConfig class I added.