Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

18 search improvements #27

Draft
wants to merge 89 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
9ff8e4f
Bump luceneVersion from 5.3.0 to 8.6.0
dependabot[bot] Aug 11, 2020
48433e5
Update imports and replace deprecated functionality
patrick-austin Jan 11, 2022
60b659b
Enable basic sorted set facets #19
patrick-austin Jan 12, 2022
f3b1dff
Update pom.xml with Facets #19
patrick-austin Jan 12, 2022
fc0d2d3
Merge pull request #20 from icatproject/dependabot/maven/luceneVersio…
patrick-austin Jan 24, 2022
1229a9a
Query on datafile date property. Fixes #8
stuartpullinger Jun 5, 2020
290ad81
Update release notes for 1.1.1 release
MRichards99 Aug 24, 2021
d80515e
[maven-release-plugin] prepare release v1.1.1
MRichards99 Aug 24, 2021
7bf3459
[maven-release-plugin] prepare for next development iteration
MRichards99 Aug 24, 2021
a44db65
[maven-release-plugin] prepare release v1.1.1
stuartpullinger Aug 27, 2021
32f9fbe
Converted setup to python 3
stuartpullinger Jan 14, 2020
b902385
Update icat.utils version
MRichards99 Sep 16, 2021
4416be4
Update version and release notes
MRichards99 Sep 16, 2021
8a36fb0
Add snapshot to version
MRichards99 Sep 16, 2021
f0be663
[maven-release-plugin] prepare release v1.1.2
MRichards99 Sep 16, 2021
0ea7709
Replace travis.yml with ci-build.yml #13
patrick-austin Jan 11, 2022
df3f18a
Update CI status badge for GHA #13
patrick-austin Jan 11, 2022
093a0ff
Move strategy matrix inside build #13
patrick-austin Jan 11, 2022
e81defd
Remove redundant inclue #13
patrick-austin Jan 21, 2022
aec760f
Change OpenJDK distribution #13
patrick-austin Jan 21, 2022
3a4c301
Change Maven command to "mvn test -B" #13
patrick-austin Jan 24, 2022
b5b5d2d
Avoid index error for maxScore
patrick-austin Feb 2, 2022
3ecdaac
Add synonym injection on search #16
patrick-austin Jan 11, 2022
bcf46af
Avoid index error for maxScore
patrick-austin Feb 2, 2022
2046da5
Handle facet exceptions from server tests #19
patrick-austin Feb 10, 2022
7c12768
Add script to generate synonyms from csv #16
patrick-austin Feb 11, 2022
b32f3aa
Take equivalent labels into account #16
patrick-austin Feb 12, 2022
3b5fd8c
Change order of terms in tests #16
patrick-austin Feb 12, 2022
fea2d47
Replace searcherManager with readerManager #19
patrick-austin Mar 9, 2022
fee6356
Merge branch 'master' into dependabot/maven/luceneVersion-8.6.0
patrick-austin Mar 23, 2022
a4a822b
Enable sorting of string fields #25
patrick-austin Mar 24, 2022
8eda4ca
Add support for fields and searchAfter #25
patrick-austin Mar 26, 2022
851cedb
Implement incremental sharding #26
patrick-austin Apr 4, 2022
bb53a1c
Merge branch '18_search_improvements' into 25_enable_field_sorting
patrick-austin Apr 4, 2022
cf74dc8
Merge pull request #28 from icatproject/25_enable_field_sorting
patrick-austin Apr 4, 2022
04ae002
Merge branch '18_search_improvements' into 26_multireader_subindices
patrick-austin Apr 4, 2022
e7b47db
Merge pull request #29 from icatproject/26_multireader_subindices
patrick-austin Apr 4, 2022
9477ea8
Rename JSON keys for clarity over id #18
patrick-austin Apr 6, 2022
434b66b
Text fields and related entities #30
patrick-austin Apr 8, 2022
41daae5
Merge branch '18_search_improvements' into 19_enable_facets
patrick-austin Apr 8, 2022
f1801b0
Merge pull request #31 from icatproject/30_encode_related_ids
patrick-austin Apr 12, 2022
fbc99e6
Enable generic String and range facets #19
patrick-austin Apr 13, 2022
8907a7c
Basic unit conversion #19
patrick-austin Apr 14, 2022
8438e1f
Add unit conversion dependencies #19
patrick-austin Apr 14, 2022
45a3948
Refactor unit conversion to utils #19
patrick-austin Apr 30, 2022
8856738
Use mapping for parseSearchAfter types #19
patrick-austin May 25, 2022
008c68a
WIP sharding changes from stash #19
patrick-austin Jun 1, 2022
49373f5
Add fields needed for DGS component #19
patrick-austin Jun 8, 2022
2fc0f8e
Use .keyword for string facets #19
patrick-austin Jun 10, 2022
757da57
Filters and aborted search support #19
patrick-austin Jun 16, 2022
973d31c
Allow searchAfter for uneven shards #19
patrick-austin Jun 16, 2022
b3d4c52
Sparse string faceting fix #19
patrick-austin Jun 15, 2022
663ea42
Enable parsing of multivalued filters #19
patrick-austin Jun 17, 2022
eaafc89
Refactors and Javadoc comments #19
patrick-austin Jun 20, 2022
4913230
Support for searching on sample name #19
patrick-austin Jun 22, 2022
338dda3
SampleParameter, fileCount, value in range #19
patrick-austin Jul 22, 2022
ce51e33
Add utility to lock #19
patrick-austin Aug 2, 2022
5f59e1d
Formatting changes #19
patrick-austin Jul 24, 2022
902654b
Improved timeout and search syntax errors #19
patrick-austin Aug 5, 2022
1eac7e0
Error handling fix and range check for lock #19
patrick-austin Aug 9, 2022
182b5e5
Fix shardList not accepting new shards #19
patrick-austin Aug 17, 2022
cd37717
Merge pull request #22 from icatproject/19_enable_facets
patrick-austin Aug 17, 2022
2a24bf7
Merge branch '18_search_improvements' into 16_enable_synonyms
patrick-austin Aug 17, 2022
eabef14
Merge branch '18_search_improvements' into 16_enable_synonyms
patrick-austin Aug 17, 2022
d8d1e76
Move synonym analyzer to DocumentMapping #16
patrick-austin Aug 17, 2022
32c2f33
Add support for faceting DatasetTechnique #18
patrick-austin Sep 7, 2022
d051925
Update version #18
patrick-austin Sep 9, 2022
2e359ee
Refactor Field and large Lucene functions #18
patrick-austin Sep 29, 2022
4a7e9db
run.properties settings updates #18
patrick-austin Oct 12, 2022
deceb46
Merge branch '18_search_improvements' into 16_enable_synonyms
patrick-austin Oct 17, 2022
7e53648
parse_synonyms clean up and check for null synonyms #16
patrick-austin Oct 17, 2022
c790b5d
Remove returns from Field.java #18
patrick-austin Oct 21, 2022
8662e05
Update Lucene to 8.11.2 and remove search caching #18
patrick-austin Nov 24, 2022
885b876
Replace numRamDocs with hasUncommittedChanges #18
patrick-austin Nov 24, 2022
ee9da02
Cache state for facets #18
patrick-austin Jan 20, 2023
421020b
InvestigationFacilityCycle support
patrick-austin Jan 23, 2023
0a2f653
Merge pull request #34 from icatproject/18_memory_leaks
patrick-austin Sep 6, 2023
1e8ea2b
Merge pull request #38 from icatproject/18b_store_state
patrick-austin Sep 6, 2023
4a511a9
Merge pull request #21 from icatproject/16_enable_synonyms
patrick-austin Sep 6, 2023
65a1c44
Merge branch 'master' into 18_search_improvements
patrick-austin Sep 6, 2023
3ce34c6
Replace javax with jakarta in new files
patrick-austin Sep 6, 2023
453a725
3.0.0 release notes
patrick-austin Sep 8, 2023
d31e5b7
Index id as long instead of String #18
patrick-austin Sep 26, 2023
3dc957a
Refactor facetable fields into run.properties #18
patrick-austin Sep 28, 2023
c9f2154
Add short explanations of new properties #18
patrick-austin Oct 5, 2023
b6d3e60
Add special handling for InvestigationInstrument filters #18
patrick-austin Oct 6, 2023
61301a2
Fix for Investigation Sample filtering #18
patrick-austin Oct 10, 2023
e3f393e
Account for IcatUnits refactors
patrick-austin Mar 22, 2024
bcbe497
Add new properties to init logging
patrick-austin Apr 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 193 additions & 0 deletions src/main/java/org/icatproject/lucene/Field.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
package org.icatproject.lucene;

import javax.json.JsonObject;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.DoublePoint;
import org.apache.lucene.document.LongPoint;
import org.apache.lucene.document.NumericDocValuesField;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetField;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.NumericUtils;

/**
* Wrapper for the name, value and type (String/Text, long, double) of a field
* to be added to a Lucene Document.
*/
class Field {

private abstract class InnerField {

public abstract Document addSortable(Document document) throws NumberFormatException;

public abstract Document addToDocument(Document document) throws NumberFormatException;

}

private class InnerStringField extends InnerField {

private String value;

public InnerStringField(String value) {
this.value = value;
}

@Override
public Document addSortable(Document document) throws NumberFormatException {
if (DocumentMapping.sortFields.contains(name)) {
if (name.equals("id")) {
// Id is a special case, as we need to to be SORTED as a byte ref to allow joins
// but also SORTED_NUMERIC to ensure a deterministic order to results
Long longValue = new Long(value);
document.add(new NumericDocValuesField("id.long", longValue));
document.add(new StoredField("id.long", longValue));
document.add(new LongPoint("id.long", longValue));
}
document.add(new SortedDocValuesField(name, new BytesRef(value)));
}
return document;
}

@Override
public Document addToDocument(Document document) throws NumberFormatException {
addSortable(document);

if (DocumentMapping.facetFields.contains(name)) {
document.add(new SortedSetDocValuesFacetField(name + ".keyword", value));
document.add(new StringField(name + ".keyword", value, Store.NO));
}

if (DocumentMapping.textFields.contains(name)) {
document.add(new TextField(name, value, Store.YES));
} else {
document.add(new StringField(name, value, Store.YES));
}

return document;
}

}

private class InnerLongField extends InnerField {

private long value;

public InnerLongField(long value) {
this.value = value;
}

@Override
public Document addSortable(Document document) throws NumberFormatException {
if (DocumentMapping.sortFields.contains(name)) {
document.add(new NumericDocValuesField(name, value));
}
return document;
}

@Override
public Document addToDocument(Document document) throws NumberFormatException {
addSortable(document);
document.add(new LongPoint(name, value));
document.add(new StoredField(name, value));
return document;
}

}

private class InnerDoubleField extends InnerField {

private double value;

public InnerDoubleField(double value) {
this.value = value;
}

@Override
public Document addSortable(Document document) throws NumberFormatException {
if (DocumentMapping.sortFields.contains(name)) {
long sortableLong = NumericUtils.doubleToSortableLong(value);
document.add(new NumericDocValuesField(name, sortableLong));
}
return document;
}

@Override
public Document addToDocument(Document document) throws NumberFormatException {
addSortable(document);
document.add(new DoublePoint(name, value));
document.add(new StoredField(name, value));
return document;
}

}

private String name;
private InnerField innerField;

/**
* Creates a wrapper for a Field.
*
* @param object JsonObject containing representations of multiple fields
* @param key Key of a specific field in object
*/
public Field(JsonObject object, String key) {
name = key;
if (DocumentMapping.doubleFields.contains(name)) {
innerField = new InnerDoubleField(object.getJsonNumber(name).doubleValue());
} else if (DocumentMapping.longFields.contains(name)) {
innerField = new InnerLongField(object.getJsonNumber(name).longValueExact());
} else {
innerField = new InnerStringField(object.getString(name));
}
}

/**
* Creates a wrapper for a Field.
*
* @param indexableField A Lucene IndexableField
*/
public Field(IndexableField indexableField) {
name = indexableField.name();
if (DocumentMapping.doubleFields.contains(name)) {
innerField = new InnerDoubleField(indexableField.numericValue().doubleValue());
} else if (DocumentMapping.longFields.contains(name)) {
innerField = new InnerLongField(indexableField.numericValue().longValue());
} else {
innerField = new InnerStringField(indexableField.stringValue());
}
}

/**
* Adds a sortable field to the passed document. This only accounts for sorting,
* if storage and searchability are also needed, see {@link #addToDocument}. The
* exact implementation depends on whether this is a String, long or double
* field.
*
* @param document The document to add to
* @return The original document with this field added to it
* @throws NumberFormatException
*/
public Document addSortable(Document document) throws NumberFormatException {
VKTB marked this conversation as resolved.
Show resolved Hide resolved
return innerField.addSortable(document);
}

/**
* Adds this field to the passed document. This accounts for sortable and
* facetable fields. The exact implementation depends on whether this is a
* String, long or double field.
*
* @param document The document to add to
* @return The original document with this field added to it
* @throws NumberFormatException
*/
public Document addToDocument(Document document) throws NumberFormatException {
VKTB marked this conversation as resolved.
Show resolved Hide resolved
return innerField.addToDocument(document);
}

}
13 changes: 0 additions & 13 deletions src/main/java/org/icatproject/lucene/IcatAnalyzer.java
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,8 @@
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.analysis.en.EnglishPossessiveFilter;
import org.apache.lucene.analysis.en.PorterStemFilter;
// import org.apache.lucene.analysis.standard.StandardAnalyzer ;
import org.apache.lucene.analysis.standard.StandardTokenizer;

// public class IcatAnalyzer extends Analyzer {

// @Override
// protected TokenStreamComponents createComponents(String fieldName) {
// StandardAnalyzer analyzer = new StandardAnalyzer(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET);
// Analyzer.TokenStreamComponents stream = analyzer.createComponents(fieldName);
// sink = new EnglishPossessiveFilter(stream.getTokenStream());
// sink = new PorterStemFilter(sink);
// return new TokenStreamComponents(source, sink);
// }
// }

public class IcatAnalyzer extends Analyzer {

@Override
Expand Down
Loading