Skip to content

Latest commit

 

History

History
115 lines (88 loc) · 7.7 KB

README.md

File metadata and controls

115 lines (88 loc) · 7.7 KB

A Tool for Complex and Scalable Data Access Policy Enforcement

Data Service

The Data Service accepts client requests to retrieve resources that have been registered on their behalf. These will be the response to the initial requests sent to the Palisade service that have been collected, filtered and in conformance to the defined rules and to the context of the request. The client is expected to send a request containing the token and resource id that is used to uniquely identify the resource request. The response will be an output stream holding the data resources. To see more information on client requests, see the Palisade Client's library.

The key components of the service is the implementation of the DataService interface and the supporting services. This comes in the form of an implementation of DataService interface, SimpleDataService, that uses the Palisade Reader library for the implementation of a solution using a database. This is then wrapped in a class, AuditableDataService, that will provide the data in a form that can be used in constructing the response that is sent back to the client, and the audit message that is sent to the Audit Service using the AuditMessageService.

Flow of Control

Data Service diagram

The Data Service is implemented as a RESTFul service using the ChunkedHttpWriter. This will start with authorising the request (DataService.authoriseReqeust). If successful, the next step will be to read the data (DataReader.read) which will return a stream of the data to the client. Upon completion, an audit success message (AuditSuccessMessage) is sent to the Audit Service indicating a successful transfer of the data. If during the authorisation request or at any time in the read request there is a error, the request is stopped, and an error message (AuditErrorMessage) is sent to the Audit Service.

Kafka streaming is only used for the processing of audit messages. This is exposed in the functionality provided by the AuditMessageService. This will take both audit success and audit error messages and process them using the functionality provided in the AkkaRunnableGraph's runner method. In this method, the message types are separated into success and error messages where they are then used to construct messages for the respective kafka error and success topics.

Message Model and Database Domain

DataRequest (response to client) AuditSuccessMessage AuditErrorMessage
*token InputStream *token *token
leafResourceId leafResourceId leafResourceId
userId ***userId
resourceId ***resourceId
context ***context
**attributes ***attributes
serverMetadata error
serverMetadata

*token comes in the body of the request from the client (DataRequest) and is stored in the header metadata for the audit messages
**attributes will include the numbers for records processed and records returned
***data that may not be available depending on when the error occurred

Flow of Data

The Data Service should be extensible like the User, Resource and Policy Services. This means being generic over all of the following:

  • Readers - different data sources and protocols to read from them such as HDFS, S3
  • Serialisers - different formats for serialised data such as JSON, AVRO
  • Record Types - different record objects to which rules apply such as Employee
  • Writers - different clients can transform how Palisade is used, but this may require different protocols with the Data Service such as /read/chunked

Data Streaming Pathway diagram

As such, for a given request, the API aims to be as flexible as possible depending on what is required in a given deployment. The AbstractResponseWriter provides a default flow of data that is likely followed by all implementations. All @Bean annotated objects for a ResponseWriter will be collected by the AkkaHttpServer and their Route will be added to the server. Then, the following are used to deduce which implementation of each component to use:

  • Http endpoint decides the ResponseWriter based on its Route as mentioned, e.g. a request to /read/chunked would use the ChunkedHttpWriter, writing data back as a chunked Http response.
  • Resource URI decides the DataReader based on its scheme, e.g. a request for hdfs:/some/file would use the HadoopDataReader.
  • Resource Serialised-Format decides the Serialiser, using the Resource Type to initialise the serialiser with a domainClass.
  • Resource Type dictates the Record Type, although no explicit action or decision is taken to this end.

REST Interface

The application exposes one endpoint to the client for retrieving the resources. This will be the data that has previously been requested and prepared in the initial request to the Palisade services.

  • POST data-service/read/chunked
    • returns a 200 OK and a streamed HTTP response body which will provide the resource(s).

Example JSON Request

curl -X POST data-service/read/chunked  -H "content-type: application/json" --data \
'{
   "token": "test-token",
   "leafResourceId": "file:/user.json"
 }'

Octet-Stream Response

The response body will be an octet-stream of data from the requested resource with policy rules applied, no matter the file type. For example, a user.json resource might be:

{
  usernamne: "alice",
  password: null,
  postcode: "SW1 XXX"
}

but a user.avro resource will return a non-human-readable binary blob.