Skip to content
This repository was archived by the owner on Oct 30, 2018. It is now read-only.

Comments

Treat content as HTML even if it has junk before the start of the HTML#98

Open
gnp wants to merge 2 commits intoGravityLabs:masterfrom
gnp:master
Open

Treat content as HTML even if it has junk before the start of the HTML#98
gnp wants to merge 2 commits intoGravityLabs:masterfrom
gnp:master

Conversation

@gnp
Copy link

@gnp gnp commented Aug 2, 2015

I've seen pages like this in the wild, for example with <script> stuff before the HTML doctype stuff. This fallback helped me still be able to run Goose on those pages.

@gnp
Copy link
Author

gnp commented Aug 2, 2015

I have JDK 8 installed, and I had to invoke Maven like this:

JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/ mvn clean package

else I got compiler complaints about JDK classfiles being "broken".

@gnp
Copy link
Author

gnp commented Aug 2, 2015

With some refactoring to split out the parsing from the fetching in HtmlFetcher.scala, it would be possible to write unit tests that exercise the different cases, including the one I built this for.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant