-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
@essiembre Hi Pascal, I have faced a couple of issues with Elasticserach Importer. First off, I found the commit count is way off as seen in the following log segment. The site ran previously and was committed to file system. Later I added Elasticsearch committer and ran it after I removed the output folder.
- I wonder what are the relations between reference count and actual commit count. Don't they have to match?
- It seems the previous crawl was cached somewhere. How can I clean it up?
big-site: 2018-02-07 03:39:29 INFO - big-site: Crawler finishing: committing documents. big-site: 2018-02-07 03:39:29 INFO - Committing 181 files big-site: 2018-02-07 03:39:29 INFO - Sending 50 commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Done sending commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Sending 50 commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Done sending commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Sending 50 commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Done sending commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Sending 31 commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Done sending commit operations to Elasticsearch. big-site: 2018-02-07 03:39:29 INFO - Elasticsearch RestClient closed. big-site: 2018-02-07 03:39:29 INFO - big-site: 10195 reference(s) processed.
The committer config:
<committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
<nodes>somewhere in the jungle</nodes>
<indexName>big-site-index</indexName>
<queueDir>$workdir/commit</queueDir>
<connectionTimeout>5 minutes</connectionTimeout>
<socketTimeout>5 minutes</socketTimeout>
<typeName>Documents</typeName>
<commitBatchSize>50</commitBatchSize>
<maxRetries>1</maxRetries>
</committer>Reactions are currently unavailable