Skip to content

Push index to AWS OpenSearch through tunneling #117

@caoyang1211

Description

@caoyang1211

I created an OpenSearch domain on AWS inside a VPC. To read from or write data to the domain from my laptop, I have to run SecureCRT, create a session for a Bastion server that is running on AWS and has access to the VPC, and set up port forwarding to redirect traffic https://127.0.0.1:60443 to the OpenSearch domain https://vpc-*****-us-east-1-1-1-yzdjblkpyhbyaytxcnbzrxavpm.us-east-1.es.amazonaws.com.

I verified that port forwarding was working by running a curl command to index multiple JSON files into that OpenSearch domain with a --insecure option to ignore certificate checks. The command is like this:
curl -H "Content-Type:application/json" --insecure -XPOST "https://127.0.0.1:60443/_bulk" --data-binary @TutorialVideoDbRecords.json"

To use the Norconex crawler to index web pages to the OpenSearch domain, I set https://localhost:60443 in the Norconex config file and ran the crawler, it reported "Failure occured on node: "null" and "Host name 'localhost' does not match the certificate subject provided by the peer (CN=*.us-east-1.es.amazonaws.com)"

So it looks like the problem is caused by a certificate validation failure. Is there an option in the config file that can ignore certificate checks like the --insecure option in the curl command? My configuration is as following, no user credential is required to access the OpenSearch domain inside the VPC :

        <committer class="ElasticsearchCommitter">
			<nodes>https://localhost:60443</nodes>
			<indexName>tutorials_videos</indexName>
        </committer>

The error message I got after running the crawler is as following:
0:42:22.020 [tutorial video#1] ERROR ElasticsearchCommitter - Failure occured on node: "null". Check node logs.
10:42:22.022 [tutorial video#1] ERROR COMMITTER_BATCH_ERROR - CommitterEvent[connectionTimeout=1000,credentials=Credentials[username=,password=,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings=com.norconex.committer.core3.CommitterException: Could not commit JSON batch to Elasticsearch.,restrictions=[],request=]
10:42:22.127 [tutorial video#2] INFO DOCUMENT_COMMITTED_UPSERT - https://kidshealth.org/en/teens/center/concussions-ctr.html - Committers: ElasticsearchCommitter
10:42:22.129 [tutorial video#1] ERROR COMMITTER_UPSERT_ERROR - CommitterEvent[connectionTimeout=1000,credentials=Credentials[username=,password=
,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings=com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not process one or more files form committer batch located at C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\queue\batch-1656600137136000000. Moved them to error directory: C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\error,restrictions=[],request=UpsertRequest[reference=https://kidshealth.org/en/parents/pilonidal_gips_animation.html]]
10:42:22.129 [tutorial video#1] ERROR CrawlerCommitterService - Could not execute "upsert" on committer: ElasticsearchCommitter[connectionTimeout=1000,credentials=Credentials[username=,password=********,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings={},restrictions=[]]
com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not process one or more files form committer batch located at C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\queue\batch-1656600137136000000. Moved them to error directory: C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\error
at com.norconex.committer.core3.batch.queue.impl.FSQueue.moveUnrecoverableBatchError(FSQueue.java:429) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:364) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeBatchDirectory(FSQueue.java:338) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.queue(FSQueue.java:331) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.doUpsert(AbstractBatchCommitter.java:87) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.AbstractCommitter.upsert(AbstractCommitter.java:215) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.lambda$upsert$1(CrawlerCommitterService.java:84) ~[norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.executeAll(CrawlerCommitterService.java:129) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.upsert(CrawlerCommitterService.java:80) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:30) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:24) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) [norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.collector.http.crawler.HttpCrawler.executeCommitterPipeline(HttpCrawler.java:388) [norconex-collector-http-3.0.0.jar:3.0.0]
at com.norconex.collector.core.crawler.Crawler.processImportResponse(Crawler.java:681) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler.processNextQueuedCrawlData(Crawler.java:614) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler.processNextReference(Crawler.java:556) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:923) [norconex-collector-core-2.0.0.jar:2.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not consume batch. Number of attempts: 1
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:407) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: com.norconex.commons.lang.exec.RetriableException: Execution failed, maximum number of retries reached.
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:204) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: com.norconex.committer.core3.CommitterException: Could not commit JSON batch to Elasticsearch.
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:543) ~[norconex-committer-elasticsearch-5.0.0.jar:5.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.consume(AbstractBatchCommitter.java:112) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.lambda$consumeRetriableBatch$1(FSQueue.java:398) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:177) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: java.io.IOException: Host name 'localhost' does not match the certificate subject provided by the peer (CN=.us-east-1.es.amazonaws.com)
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:901) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:276) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:537) ~[norconex-committer-elasticsearch-5.0.0.jar:5.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.consume(AbstractBatchCommitter.java:112) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.lambda$consumeRetriableBatch$1(FSQueue.java:398) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:177) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Host name 'localhost' does not match the certificate subject provided by the peer (CN=
.us-east-1.es.amazonaws.com)
at org.apache.http.nio.conn.ssl.SSLIOSessionStrategy.verifySession(SSLIOSessionStrategy.java:209) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.nio.conn.ssl.SSLIOSessionStrategy$1.verify(SSLIOSessionStrategy.java:188) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(SSLIOSession.java:360) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(SSLIOSession.java:523) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:120) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.12.jar:4.4.12]
... 1 more
10:42:22.149 [tutorial video#1] INFO Crawler - Could not process document: https://kidshealth.org/en/parents/pilonidal_gips_animation.html (Could not execute "upsert" on 1 committer(s): "ElasticsearchCommitter". Check the logs for more details.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions