Socket timeout when querying solr #346

TreyRhodes1 · 2022-03-16T21:03:06Z

We recently upgraded from the spark-solr jar 3.2.3 to 4.0.0 and upgraded our spark version to 3.1.2. Our application now runs utilizing Kubernetes instead of Yarn. Our workloads complete successfully most of the time. When they do fail it's because our application has failed all retries to query Solr. This is the full error:

22/03/16 16:46:58 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) (10.180.38.5 executor 1): java.lang.RuntimeException: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: https://solrserv:9985/solr/collection_shard1_replica5
	at com.lucidworks.spark.query.StreamingResultsIterator.hasNext(StreamingResultsIterator.java:87)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.agg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: https://solrserv:9985/solr/collection_shard1_replica5
	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:692)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
	at com.lucidworks.spark.util.SolrQuerySupport$.queryAndStreamResponsePost(SolrQuerySupport.scala:180)
	at com.lucidworks.spark.util.SolrQuerySupport$.querySolr(SolrQuerySupport.scala:209)
	at com.lucidworks.spark.util.SolrQuerySupport.querySolr(SolrQuerySupport.scala)
	at com.lucidworks.spark.query.StreamingResultsIterator.fetchNextPage(StreamingResultsIterator.java:107)
	at com.lucidworks.spark.query.StreamingResultsIterator.hasNext(StreamingResultsIterator.java:80)
	... 35 more
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
	at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
	at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
	at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
	at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
	at shaded.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
	at shaded.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
	at shaded.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
	at shaded.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
	at shaded.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
	at shaded.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
	at shaded.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
	at shaded.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
	at shaded.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
	at shaded.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at shaded.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
	at shaded.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at shaded.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at shaded.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at shaded.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at shaded.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at shaded.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:571)
	... 44 more

Once this error has showed up enough times for one query, it causes the entire application to fail.

This error only seems to happen when our Solr queries take longer than a threshold, appearing to be 120000ms. We've determined that Zookeeper and Solr do nothing to sever the connection between Solr and our client application, so I suspect a timeout is set somewhere in our application or dependent jars, specifically spark-solr. Is there a configuration set somewhere to timeout Solr queries that take too long?

The text was updated successfully, but these errors were encountered:

TreyRhodes1 · 2022-03-22T20:26:11Z

I was able to resolve this issue by updating line 177 of src/main/scala/com/lucidworks/spark/util/SolrSupport.scala with this bit of code

new HttpSolrClient.Builder()
      .withBaseSolrUrl(shardUrl)
      .withHttpClient(getCachedCloudClient(zkHost).getHttpClient)
      .withConnectionTimeout(240000) # new
      .withSocketTimeout(600000) # new
      .build()

Is there a way to set this configurations without modifying the source code?

4d1in3 · 2023-12-12T03:58:31Z

we are also facing same issue while updating records on a huge collection. is there any workaround rather than changing the code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socket timeout when querying solr #346

Socket timeout when querying solr #346

TreyRhodes1 commented Mar 16, 2022 •

edited

Loading

TreyRhodes1 commented Mar 22, 2022

4d1in3 commented Dec 12, 2023

Socket timeout when querying solr #346

Socket timeout when querying solr #346

Comments

TreyRhodes1 commented Mar 16, 2022 • edited Loading

TreyRhodes1 commented Mar 22, 2022

4d1in3 commented Dec 12, 2023

TreyRhodes1 commented Mar 16, 2022 •

edited

Loading