Best way to filter initial site and then crawl on those urls #1318
Unanswered
SCantergiani
asked this question in
Forums - Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey guys,
I need some advice. I have a crawler that starts on a site with a bunch of list items (like a news portal). It has a depth of 1, so it goes into each item and scrapes the relevant content using a JsonCssStrategy. I want it to only follow links in the table and then apply the JsonCssStrat instead of checking every link. I've tried css_selector, but that seems to cancel my JsonCssStrategy. I also tried target_elements and filter chains with no luck. Any tips?
Beta Was this translation helpful? Give feedback.
All reactions