You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently upgraded from the spark-solr jar 3.2.3 to 4.0.0 and upgraded our spark version to 3.1.2. Our application now runs utilizing Kubernetes instead of Yarn. Our workloads complete successfully most of the time. When they do fail it's because our application has failed all retries to query Solr. This is the full error:
22/03/16 16:46:58 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) (10.180.38.5 executor 1): java.lang.RuntimeException: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: https://solrserv:9985/solr/collection_shard1_replica5
at com.lucidworks.spark.query.StreamingResultsIterator.hasNext(StreamingResultsIterator.java:87)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
at scala.collection.Iterator$ConcatIterator.hasNext(Iterator.scala:222)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.agg_doAggregateWithKeys_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: https://solrserv:9985/solr/collection_shard1_replica5
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:692)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
at com.lucidworks.spark.util.SolrQuerySupport$.queryAndStreamResponsePost(SolrQuerySupport.scala:180)
at com.lucidworks.spark.util.SolrQuerySupport$.querySolr(SolrQuerySupport.scala:209)
at com.lucidworks.spark.util.SolrQuerySupport.querySolr(SolrQuerySupport.scala)
at com.lucidworks.spark.query.StreamingResultsIterator.fetchNextPage(StreamingResultsIterator.java:107)
at com.lucidworks.spark.query.StreamingResultsIterator.hasNext(StreamingResultsIterator.java:80)
... 35 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
at shaded.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at shaded.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at shaded.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
at shaded.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at shaded.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at shaded.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at shaded.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at shaded.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
at shaded.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at shaded.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at shaded.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at shaded.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at shaded.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at shaded.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at shaded.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at shaded.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at shaded.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:571)
... 44 more
Once this error has showed up enough times for one query, it causes the entire application to fail.
This error only seems to happen when our Solr queries take longer than a threshold, appearing to be 120000ms. We've determined that Zookeeper and Solr do nothing to sever the connection between Solr and our client application, so I suspect a timeout is set somewhere in our application or dependent jars, specifically spark-solr. Is there a configuration set somewhere to timeout Solr queries that take too long?
The text was updated successfully, but these errors were encountered:
We recently upgraded from the spark-solr jar 3.2.3 to 4.0.0 and upgraded our spark version to 3.1.2. Our application now runs utilizing Kubernetes instead of Yarn. Our workloads complete successfully most of the time. When they do fail it's because our application has failed all retries to query Solr. This is the full error:
Once this error has showed up enough times for one query, it causes the entire application to fail.
This error only seems to happen when our Solr queries take longer than a threshold, appearing to be 120000ms. We've determined that Zookeeper and Solr do nothing to sever the connection between Solr and our client application, so I suspect a timeout is set somewhere in our application or dependent jars, specifically spark-solr. Is there a configuration set somewhere to timeout Solr queries that take too long?
The text was updated successfully, but these errors were encountered: