Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/5905 optimze spatial cross land #141

Merged
merged 18 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,14 @@ Start a local instance of indexer
$ docker-compose -f docker-compose-dev.yaml up # [-d: in daemon mode | --build: to see the console logs]
```

### Notes on calculation - Centroid
We pre-calculate the centroid point of each spatial extents using the following method
1. Use the shape file contains the land only area, adjust it to reduce the line complexity
2. Remove the land from the spatial extents
3. Then cut the spatial extents into grid and may result in multiple polygon
4. In each grid, calculate the centroid, if it is outside of the polygon (aka a U shape), use internal point.
5. Store the point in the centroid attribute.

### Endpoints:

| Description | Endpoints | Environment |
Expand Down
4 changes: 4 additions & 0 deletions indexer/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,10 @@
<groupId>org.geotools</groupId>
<artifactId>gt-referencing</artifactId>
</dependency>
<dependency>
<groupId>org.geotools</groupId>
<artifactId>gt-shapefile</artifactId>
</dependency>
<!-- https://mvnrepository.com/artifact/org.geotools/gt-epsg-hsql -->
<dependency>
<groupId>org.geotools</groupId>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package au.org.aodn.esindexer.configuration;

import au.org.aodn.esindexer.utils.GeometryUtils;
import au.org.aodn.esindexer.utils.VocabsIndexUtils;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnMissingBean;
Expand All @@ -12,10 +13,19 @@
import java.util.concurrent.*;
import org.springframework.beans.factory.annotation.Value;

import javax.annotation.PostConstruct;

@Configuration
@EnableRetry
@EnableAsync
public class IndexerConfig {
@Value("${app.geometry.gridLandSize:10.0}")
protected double cellSize;

@PostConstruct
public void init() {
GeometryUtils.setCellSize(cellSize);
}
/**
* We need to create component here because we do not want to run test with real http connection
* that depends on remote site. The test config need to create an instance of bean for testing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ protected void deleteIndexStore(String indexName) {
try {
BooleanResponse response = portalElasticsearchClient.indices().exists(b -> b.index(indexName));
if (response.value()) {
log.info("Deleting index: " + indexName);
log.info("Deleting index: {}", indexName);
portalElasticsearchClient.indices().delete(b -> b.index(indexName));
log.info("Index: " + indexName + " deleted");
log.info("Index: {} deleted", indexName);
}
} catch (ElasticsearchException | IOException e) {
throw new DeleteIndexException("Failed to delete index: " + indexName + " | " + e.getMessage());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,7 @@ private String getUUID(int index) {
};
}
else {
logger.warn("Query return empty: {}", response.toString());
throw new MetadataNotFoundException("Unable to find any metadata records in GeoNetwork");
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -299,80 +299,92 @@ public List<BulkResponse> indexAllMetadataRecordsFromGeoNetwork(String beginWith

long dataSize = 0;
long total = 0;
for (String metadataRecord : geoNetworkResourceService.getAllMetadataRecords(beginWithUuid)) {
if(metadataRecord != null) {
try {
// get mapped metadata values from GeoNetwork to STAC collection schema
final StacCollectionModel mappedMetadataValues = this.getMappedMetadataValues(metadataRecord);
int size = indexerObjectMapper.writeValueAsBytes(mappedMetadataValues).length;

// We need to split the batch into smaller size to avoid data too large error in ElasticSearch,
// the limit is 10mb, so to make check before document add and push batch if size is too big
//
// dataSize = 0 is init case, just in case we have a very big doc that exceed the limit
// and we have not add it to the bulkRequest, hardcode to 5M which should be safe,
// usually it is 5M - 15M
//
if(dataSize + size > 5242880 && dataSize != 0) {
if(callback != null) {
callback.onProgress(String.format("Execute batch as bulk request is big enough %s", dataSize + size));
}

results.add(self.executeBulk(bulkRequest, callback));

dataSize = 0;
bulkRequest = new BulkRequest.Builder();
}
// Add item to bulk request to Elasticsearch
bulkRequest.operations(op -> op
.index(idx -> idx
.id(mappedMetadataValues.getUuid())
.index(indexName)
.document(mappedMetadataValues)
)
);
dataSize += size;
total++;

if(callback != null) {
callback.onProgress(
String.format(
"Add uuid %s to batch, batch size is %s, total is %s",
mappedMetadataValues.getUuid(),
dataSize,
total)
try {
for (String metadataRecord : geoNetworkResourceService.getAllMetadataRecords(beginWithUuid)) {
if (metadataRecord != null) {
try {
// get mapped metadata values from GeoNetwork to STAC collection schema
final StacCollectionModel mappedMetadataValues = this.getMappedMetadataValues(metadataRecord);
int size = indexerObjectMapper.writeValueAsBytes(mappedMetadataValues).length;

// We need to split the batch into smaller size to avoid data too large error in ElasticSearch,
// the limit is 10mb, so to make check before document add and push batch if size is too big
//
// dataSize = 0 is init case, just in case we have a very big doc that exceed the limit
// and we have not add it to the bulkRequest, hardcode to 5M which should be safe,
// usually it is 5M - 15M
//
if (dataSize + size > 5242880 && dataSize != 0) {
if (callback != null) {
callback.onProgress(String.format("Execute batch as bulk request is big enough %s", dataSize + size));
}

results.add(self.executeBulk(bulkRequest, callback));

dataSize = 0;
bulkRequest = new BulkRequest.Builder();
}
// Add item to bulk request to Elasticsearch
bulkRequest.operations(op -> op
.index(idx -> idx
.id(mappedMetadataValues.getUuid())
.index(indexName)
.document(mappedMetadataValues)
)
);
}
dataSize += size;
total++;

if (callback != null) {
callback.onProgress(
String.format(
"Add uuid %s to batch, batch size is %s, total is %s",
mappedMetadataValues.getUuid(),
dataSize,
total)
);
}

} catch (FactoryException | JAXBException | TransformException | NullPointerException e) {
/*
* it will reach here if cannot extract values of all the keys in GeoNetwork metadata JSON
* or ID is not found, which is fatal.
*/
log.error("Error extracting values from GeoNetwork metadata JSON: {}", metadataRecord);
if(callback != null) {
callback.onProgress(
String.format(
"WARNING - Skip %s due to transform error -> %s",
metadataRecord,
e.getMessage()
));
} catch (FactoryException | JAXBException | TransformException | NullPointerException e) {
/*
* it will reach here if cannot extract values of all the keys in GeoNetwork metadata JSON
* or ID is not found, which is fatal.
*/
log.error("Error extracting values from GeoNetwork metadata JSON: {}", metadataRecord);
if (callback != null) {
callback.onProgress(
String.format(
"WARNING - Skip %s due to transform error -> %s",
metadataRecord,
e.getMessage()
));
}
}
}
}
}
// In case there are residual
BulkResponse temp = self.executeBulk(bulkRequest, callback);
results.add(temp);

// In case there are residual
BulkResponse temp = self.executeBulk(bulkRequest, callback);
results.add(temp);
if(callback != null) {
callback.onComplete(temp);
}

if(callback != null) {
callback.onComplete(temp);
// TODO now processing for record_suggestions index
log.info("Finished execute bulk indexing records to index: {}",indexName);
}
catch(Exception e) {
log.error("Failed", e);

if (callback != null) {
callback.onComplete(
String.format(
"WARNING - Cannot process due to error -> %s, need to run 'Delete index and reindex in geonetwork?'",
e.getMessage()
));
}
}

// TODO now processing for record_suggestions index
log.info("Finished execute bulk indexing records to index: {}",indexName);

return results;
}
/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ public abstract class StacCollectionMapperService {
@Mapping(target="license", source = "source", qualifiedByName = "mapLicense")
@Mapping(target="providers", source = "source", qualifiedByName = "mapProviders")
@Mapping(target="citation", source="source", qualifiedByName = "mapCitation")
@Mapping(target="summaries.centroid", source = "source", qualifiedByName = "mapSummaries.centroid")
@Mapping(target="summaries.status", source = "source", qualifiedByName = "mapSummaries.status")
@Mapping(target="summaries.scope", source = "source", qualifiedByName = "mapSummaries.scope")
@Mapping(target="summaries.credits", source = "source", qualifiedByName = "mapSummaries.credits")
Expand All @@ -67,7 +68,6 @@ public abstract class StacCollectionMapperService {
@Mapping(target="summaries.revision", source = "source", qualifiedByName = "mapSummaries.revision")
public abstract StacCollectionModel mapToSTACCollection(MDMetadataType source);


private static final Logger logger = LogManager.getLogger(StacCollectionMapperService.class);

@Value("${spring.jpa.properties.hibernate.jdbc.time_zone}")
Expand All @@ -89,7 +89,7 @@ String mapUUID(MDMetadataType source) {
*/
@Named("mapExtentBbox")
List<List<BigDecimal>> mapExtentBbox(MDMetadataType source) {
return createGeometryItems(
return GeometryUtils.createGeometryItems(
source,
BBoxUtils::createBBoxFrom
);
Expand Down Expand Up @@ -344,9 +344,17 @@ private HashMap<GeoNetworkField, String> getMetadataDateInfoFrom(List<AbstractTy
return dateMap;
}

@Named("mapSummaries.centroid")
List<List<BigDecimal>> mapGeometryCentroid(MDMetadataType source) {
return GeometryUtils.createGeometryItems(
source,
GeometryUtils::createCentroidFrom
);
}

@Named("mapSummaries.geometry")
Map<?,?> mapSummariesGeometry(MDMetadataType source) {
return createGeometryItems(
return GeometryUtils.createGeometryItems(
source,
GeometryUtils::createGeometryFrom
);
Expand Down Expand Up @@ -955,59 +963,6 @@ protected List<LanguageModel> mapLanguages(MDMetadataType source) {
return results;
}

protected <R> R createGeometryItems(
MDMetadataType source,
Function<List<List<AbstractEXGeographicExtentType>>, R> handler) {

List<MDDataIdentificationType> items = MapperUtils.findMDDataIdentificationType(source);
if(!items.isEmpty()) {
if(items.size() > 1) {
logger.warn("!! More than 1 block of MDDataIdentificationType, data will be missed !!");
}
// Assume only 1 block of <mri:MD_DataIdentification>
// We only concern geographicElement here
List<EXExtentType> ext = items.get(0)
.getExtent()
.stream()
.filter(f -> f.getAbstractExtent() != null)
.filter(f -> f.getAbstractExtent().getValue() != null)
.filter(f -> f.getAbstractExtent().getValue() instanceof EXExtentType)
.map(f -> (EXExtentType) f.getAbstractExtent().getValue())
.filter(f -> f.getGeographicElement() != null)
.toList();

// We want to get a list of item where each item contains multiple, (aka list) of
// (EXGeographicBoundingBoxType or EXBoundingPolygonType)
List<List<AbstractEXGeographicExtentType>> rawInput = ext.stream()
.map(EXExtentType::getGeographicElement)
.map(l ->
/*
l = List<AbstractEXGeographicExtentPropertyType>
For each AbstractEXGeographicExtentPropertyType, we get the tag that store the
coordinate, it is either a EXBoundingPolygonType or EXGeographicBoundingBoxType
*/
l.stream()
.map(AbstractEXGeographicExtentPropertyType::getAbstractEXGeographicExtent)
.filter(Objects::nonNull)
.filter(m -> (m.getValue() instanceof EXBoundingPolygonType || m.getValue() instanceof EXGeographicBoundingBoxType))
.map(m -> {
if (m.getValue() instanceof EXBoundingPolygonType exBoundingPolygonType) {
if (!exBoundingPolygonType.getPolygon().isEmpty() && exBoundingPolygonType.getPolygon().get(0).getAbstractGeometry() != null) {
return exBoundingPolygonType;
}
} else if (m.getValue() instanceof EXGeographicBoundingBoxType) {
return m.getValue();
}
return null; // Handle other cases or return appropriate default value
})
.filter(Objects::nonNull) // Filter out null values if any
.toList()
)
.toList();
return handler.apply(rawInput);
}
return null;
}
/**
* Special handle for MimeFileType object.
* @param onlineResource
Expand Down
Loading
Loading