Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket timeout when querying solr #346

Open
TreyRhodes1 opened this issue Mar 16, 2022 · 2 comments
Open

Socket timeout when querying solr #346

TreyRhodes1 opened this issue Mar 16, 2022 · 2 comments

Comments

@TreyRhodes1
Copy link

TreyRhodes1 commented Mar 16, 2022

We recently upgraded from the spark-solr jar 3.2.3 to 4.0.0 and upgraded our spark version to 3.1.2. Our application now runs utilizing Kubernetes instead of Yarn. Our workloads complete successfully most of the time. When they do fail it's because our application has failed all retries to query Solr. This is the full error:

22/03/16 16:46:58 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) (10.180.38.5 executor 1): java.lang.RuntimeException: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: https://solrserv:9985/solr/collection_shard1_replica5
	at com.lucidworks.spark.query.StreamingResultsIterator.hasNext(StreamingResultsIterator.java:87)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.agg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: https://solrserv:9985/solr/collection_shard1_replica5
	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:692)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
	at com.lucidworks.spark.util.SolrQuerySupport$.queryAndStreamResponsePost(SolrQuerySupport.scala:180)
	at com.lucidworks.spark.util.SolrQuerySupport$.querySolr(SolrQuerySupport.scala:209)
	at com.lucidworks.spark.util.SolrQuerySupport.querySolr(SolrQuerySupport.scala)
	at com.lucidworks.spark.query.StreamingResultsIterator.fetchNextPage(StreamingResultsIterator.java:107)
	at com.lucidworks.spark.query.StreamingResultsIterator.hasNext(StreamingResultsIterator.java:80)
	... 35 more
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
	at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
	at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
	at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
	at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
	at shaded.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
	at shaded.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
	at shaded.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
	at shaded.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
	at shaded.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
	at shaded.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
	at shaded.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
	at shaded.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
	at shaded.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
	at shaded.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at shaded.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
	at shaded.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at shaded.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at shaded.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at shaded.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at shaded.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at shaded.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:571)
	... 44 more

Once this error has showed up enough times for one query, it causes the entire application to fail.

This error only seems to happen when our Solr queries take longer than a threshold, appearing to be 120000ms. We've determined that Zookeeper and Solr do nothing to sever the connection between Solr and our client application, so I suspect a timeout is set somewhere in our application or dependent jars, specifically spark-solr. Is there a configuration set somewhere to timeout Solr queries that take too long?

@TreyRhodes1
Copy link
Author

I was able to resolve this issue by updating line 177 of src/main/scala/com/lucidworks/spark/util/SolrSupport.scala with this bit of code

new HttpSolrClient.Builder()
      .withBaseSolrUrl(shardUrl)
      .withHttpClient(getCachedCloudClient(zkHost).getHttpClient)
      .withConnectionTimeout(240000) # new
      .withSocketTimeout(600000) # new
      .build()

Is there a way to set this configurations without modifying the source code?

@4d1in3
Copy link

4d1in3 commented Dec 12, 2023

we are also facing same issue while updating records on a huge collection. is there any workaround rather than changing the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants