Skip to content

Comments

Merged all changes.#1

Open
raisercostin wants to merge 360 commits intotheodoreLee:masterfrom
raisercostin:master
Open

Merged all changes.#1
raisercostin wants to merge 360 commits intotheodoreLee:masterfrom
raisercostin:master

Conversation

@raisercostin
Copy link

Merged various forks to include as many as possible improvements made to goose in the main trunk.

marcosinger and others added 30 commits July 14, 2012 15:23
warrd and others added 30 commits October 13, 2014 18:17
Goose uses a HashSet for iterating topNode candidates
But HashSet doesn't guarantee ordering, so when two candidates have
the same score, the choice is basically random. This is not acceptable.
Now, by using LinkedHashSet we make sure that in case of draw, we choose
the first tag that was found in the DOM tree.
Using LinkedHashSet to avoid inconsistency
Accept cookies from web sites which put all the cookies into one request
header.
Conflicts:
	build.sbt
	src/main/scala/com/gravity/goose/Configuration.scala
Conflicts:
	README.md
	build.sbt
	pom.xml
	src/main/scala/com/gravity/goose/Article.scala
	src/main/scala/com/gravity/goose/Configuration.scala
	src/main/scala/com/gravity/goose/opengraph/OpenGraphData.scala
	src/test/scala/com/gravity/goose/GooseTest.scala
Conflicts:
	pom.xml
	src/main/scala/com/gravity/goose/Article.scala
	src/main/scala/com/gravity/goose/Configuration.scala
	src/main/scala/com/gravity/goose/Crawler.scala
	src/main/scala/com/gravity/goose/images/ImageExtractor.scala
	src/main/scala/com/gravity/goose/images/StandardImageExtractor.scala
	src/main/scala/com/gravity/goose/images/UpgradedImageIExtractor.scala
	src/main/scala/com/gravity/goose/network/HtmlFetcher.scala
	src/test/scala/com/gravity/goose/TestUtils.scala
	src/test/scala/com/gravity/goose/TextExtractionsTest.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.