Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark-Solr can't load non-stored multivalued fields with docValues=true and useDocValuesAsStored=true #307

Open
uyilmaz opened this issue Dec 12, 2020 · 1 comment

Comments

@uyilmaz
Copy link

uyilmaz commented Dec 12, 2020

Using Solr 8.4.0, Spark-Solr 3.6.1 Spark: 2.11

When a field is configured with:

stored="false" docValues="true" useDocValuesAsStored="true"

in Solr, you are able to retrieve it in query results even if it's not stored, docValues is used instead. This works in spark-solr, only not with multiValued=true fields.

SolrJ and regular solr api can provide such fields, but when we use them with spark-solr:

val s1 = Map(
      "zkHost" -> "myZK",
      "collection" -> "myCollection",
      "query" -> "multivaluedField:[* TO *]",
      "fields" -> "multivaluedField",
      "max_rows" -> "100000",
      "flatten_multivalued"-> "false"
    )
    
val data = spark.read.format("solr").options(s1).load

data.createOrReplaceTempView("myTable")

Results with:
data: org.apache.spark.sql.DataFrame = [id: string]
Notice that multiValuedField is not resolved.

This is a serious issue in my opinion, because it prohibits you from using streaming method when you need multiValued fields in an RDD.

@uyilmaz
Copy link
Author

uyilmaz commented Dec 18, 2020

In addition to above, when you specify a streaming expression instead of a query like:

val s1 = Map(
      "zkHost" -> "myZK",
      "collection" -> "myCollection",
      "expr" -> "search(myCollection,q="multivaluedField:[* TO *]",qt="/export",fl="multivaluedField,,id",sort="id asc")",
      "max_rows" -> "100000",
      "flatten_multivalued"-> "false"
    )

the "flatten_multivalued" parameter loses its effect, multivalued fields always get flattened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant