-
Notifications
You must be signed in to change notification settings - Fork 135
Description
After a while (maybe half hour) Ache stops crawling and gives lots of "Still waiting to process downloaded pages..." messages, I have checked the load of all CPUs with htop and just found there's no busy worker.
I'm experimenting with Ache. I've written config ache like what has been mentioned in the guide and use the config file in ./config/config__website_crawl/ache.yml. The parts I've changed contain only two properties:
target_storage.visited_page_limit: 50
crawler_manager.downloader.download_thread_pool_size: 4
I've played around with -XX:+UseG1GC and -Xmx4g to get enough capability for my project. Also, the running environment is an unix server with a constraint on the maximum number of process for each user at 20.
My jdk version:
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
JVM:
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
So after about half an hour (I think - whenever I screen back into the running window) I see lots of msgs (pasted below) and it seems it is trapped in an infinite loop.
[2021-04-11 04:03:02,565] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:07,613] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:12,661] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:17,708] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:22,757] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:27,805] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:32,853] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:37,901] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:42,948] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:47,997] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:53,045] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:03:58,092] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:03,139] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:08,187] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:13,236] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:18,284] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:23,332] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:28,379] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:33,426] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:38,474] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:43,522] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:48,568] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:53,616] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:04:58,664] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:03,711] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:08,759] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:13,806] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:18,854] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:23,902] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:28,949] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:33,997] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:39,043] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:44,091] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:49,139] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:05:54,186] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
I tried to use ctrl+c to sent SIGINT but got OOM error:
^C^CJava HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated
^CJava HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated
[2021-04-11 04:06:04,281] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
^CJava HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated
^CJava HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated
[2021-04-11 04:06:09,329] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
[2021-04-11 04:06:14,376] INFO [AsyncCrawler] (HttpDownloader.java:232) - Still waiting to process downloaded pages...
Has anyone seen this before ?
Thanks,
Stan