Skip to content

Help w/ Errors #6

@senrabdet

Description

@senrabdet

Hi there:

Am hoping this post is being monitored even if a bit old...:). I'm trying to use it, and having some issues that I suspect are related to using python 3x instead of 2x, but am unsure. Looks pretty elegant and powerful, so am hoping for help:), but am wondering too whether I need an approach that takes google fighting automated searches like this into account.

E.g., the print statement in searchEngines.py throws an error that I think requires adding () so it becomes print("total page:{0}".format(self.totalPage))....I changed the import and added "from seCrawler.common.searchEngines import SearchEngines" in searchResultPages.py".

Starting the crawler from the readme, scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50 (without or without the ''', so same result with scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50), throws and error:

File "/spiders/keywordSpider.py", line 21, in init
for url in pageUrls:
TypeError: iter() returned non-iterator of type 'searResultPages'

I'm seeing some posts around with python 3, need to use def next(self): instead of next(), but I don't believe there is a next() statement anywhere in the code. The " init" file in the spider folder is essentially commented out/blank.

My guess is this has something to do with turning number of pages into an integer, and then iterating through whatever that value is (50 as the default).

Q: if anyone notices this, suggestions how how to tweak code to make it work? Am I behind the times, and all search engines robots.txt forbid what I'm trying to do?

FYI I don't think this is related to which search engine is being used....as crawl command throws same error whether using google, bing, or baidu.

Thx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions