The Data Service accepts client requests to retrieve resources that have been registered on their behalf. These will be the response to the initial requests sent to the Palisade service that have been collected, filtered and in conformance to the defined rules and to the context of the request. The client is expected to send a request containing the token and resource id that is used to uniquely identify the resource request. The response will be an output stream holding the data resources. To see more information on client requests, see the Palisade Client's library.
The key components of the service is the implementation of the DataService interface and the supporting services. This comes in the form of an implementation of DataService interface, SimpleDataService, that uses the Palisade Reader library for the implementation of a solution using a database. This is then wrapped in a class, AuditableDataService, that will provide the data in a form that can be used in constructing the response that is sent back to the client, and the audit message that is sent to the Audit Service using the AuditMessageService.
The Data Service is implemented as a RESTFul service using the ChunkedHttpWriter. This will start with authorising the request (DataService.authoriseReqeust). If successful, the next step will be to read the data (DataReader.read) which will return a stream of the data to the client. Upon completion, an audit success message (AuditSuccessMessage) is sent to the Audit Service indicating a successful transfer of the data. If during the authorisation request or at any time in the read request there is a error, the request is stopped, and an error message (AuditErrorMessage) is sent to the Audit Service.
Kafka streaming is only used for the processing of audit messages.
This is exposed in the functionality provided by the AuditMessageService.
This will take both audit success and audit error messages and process them using the functionality provided in the AkkaRunnableGraph's runner
method.
In this method, the message types are separated into success and error messages where they are then used to construct messages for the respective kafka error and success topics.
DataRequest | (response to client) | AuditSuccessMessage | AuditErrorMessage |
---|---|---|---|
*token | InputStream | *token | *token |
leafResourceId | leafResourceId | leafResourceId | |
userId | ***userId | ||
resourceId | ***resourceId | ||
context | ***context | ||
**attributes | ***attributes | ||
serverMetadata | error | ||
serverMetadata |
*token comes in the body of the request from the client (DataRequest) and is stored in the header metadata for the audit messages
**attributes will include the numbers for records processed and records returned
***data that may not be available depending on when the error occurred
The Data Service should be extensible like the User, Resource and Policy Services. This means being generic over all of the following:
- Readers - different data sources and protocols to read from them such as
HDFS
,S3
- Serialisers - different formats for serialised data such as
JSON
,AVRO
- Record Types - different record objects to which rules apply such as
Employee
- Writers - different clients can transform how Palisade is used, but this may require different protocols with the Data Service such as
/read/chunked
As such, for a given request, the API aims to be as flexible as possible depending on what is required in a given deployment.
The AbstractResponseWriter provides a default flow of data that is likely followed by all implementations.
All @Bean
annotated objects for a ResponseWriter
will be collected by the AkkaHttpServer and their Route
will be added to the server.
Then, the following are used to deduce which implementation of each component to use:
- Http endpoint decides the
ResponseWriter
based on itsRoute
as mentioned, e.g. a request to/read/chunked
would use theChunkedHttpWriter
, writing data back as a chunked Http response. - Resource URI decides the
DataReader
based on its scheme, e.g. a request forhdfs:/some/file
would use theHadoopDataReader
. - Resource Serialised-Format decides the
Serialiser
, using the Resource Type to initialise the serialiser with adomainClass
. - Resource Type dictates the Record Type, although no explicit action or decision is taken to this end.
The application exposes one endpoint to the client for retrieving the resources. This will be the data that has previously been requested and prepared in the initial request to the Palisade services.
POST data-service/read/chunked
- returns a
200 OK
and a streamed HTTP response body which will provide the resource(s).
- returns a
curl -X POST data-service/read/chunked -H "content-type: application/json" --data \
'{
"token": "test-token",
"leafResourceId": "file:/user.json"
}'
The response body will be an octet-stream of data from the requested resource with policy rules applied, no matter the file type.
For example, a user.json
resource might be:
{
usernamne: "alice",
password: null,
postcode: "SW1 XXX"
}
but a user.avro
resource will return a non-human-readable binary blob.