Skip to content

Commit

Permalink
Added readme content
Browse files Browse the repository at this point in the history
  • Loading branch information
vsadokhin committed Sep 30, 2018
1 parent 5d1e08e commit ab5c818
Showing 1 changed file with 106 additions and 1 deletion.
107 changes: 106 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,106 @@
# iot
# iot
###Conceptual model
Load Balancer 1 <-> Receive Cluster -> Broker/streaming -> Consumers Cluster -> Storage

Load Balancer 2 <-> Reading Cluster <-> Storage

###Implementation
*receive-api*, *statistics-api*, *stream-consumer* are java applications. First two are web applications to create metric and get statistics. Being packed in docker they can be scaled, for instance, in AWS ECS on production.

*data* module encapsulates read/write feature with storage. It is used by *stream-consumer* to write and *statistics-api* to query statistics.

Broker is *Kafka*, Storage is *Cassandra*. I chose both for scalability, high availability, performance, fault-tolerance. Both can be run and scaled, for example, in AWS ECS.

Note that nothing above is carved in stone. For example, receiving/reading can be switched to AWS API Gateway, broker/streaming can be AWS Kinesis, reading and stream consumer might be AWS Lambdas based on my *data* module etc.

The implementation does not include load balancer setup/pick. It can be cloud based approach like AWS Application/Elastic Load Balancer or Google Cloud Load Balancer. Alternatively, it can be a handmade approach with Nginx, HAProxy etc.

###How to run
Requirements: Java 8, Docker, Bash, open ports 8080 and 8081.

Execute from root folder to build modules and start everything in docker:
```bash
./run.sh
```

###How to access the service
##### 1. Create metric

Perform POST request to localhost:8080/metric with JSON body like
```json
{"sensorId":"sensor123", "type":"thermostat", "when":1538139260752, "value":1.1}
```
**sensorId** is a string value
**type** is string value
**when** is long for milliseconds
**value** has float type
All fields are required. Content type has to be application/json.

The request can be made with curl:
```bash
curl -XPOST -d '{"sensorId":"s2", "type":"t1", "when":"1538139260752", "value":1.1}' -H "Content-Type: application/json" localhost:8080/metric
```

##### 2. Get statistics

Perform GET request to localhost:8081/statistics with username **getStatisticsUser** and password **statistics123** and query parameters:

**aggregator** is required, supported values: min, max, avg
**type** is optional, specify to aggregate by type
**sensorId** is optional, specify to aggregate by sensorId
**from** is required, milliseconds for timeframe start, must be lower than to
**to** is required, milliseconds for timeframe stop, must be greater than from

*type* or *sensorId* must be specified, only one of them and only one value, multiple types/sensorIds won't be considered by API.

The request can be made with curl:
```bash
curl -u getStatisticsUser:statistics123 localhost:8081/statistics?aggregator=min\&type=t1\&from=1538139260752\&to=1538139260753
curl -u getStatisticsUser:statistics123 localhost:8081/statistics?aggregator=min\&sensorId=mySensor\&from=123\&to=456
```

##### 3. Simulate at least 3 IoT devices sending data every second
Note that author manipulates *metric* term considering that 1 IoT device might send multiple metrics. This test simulates 3 simultaneously incoming metrics.
Run from root folder:
```bash
./gradlew :qa:load test -Pqa-tests -Dsimultaneous.metrics=3 -Dduration=60
```
**simultaneous.metrics** is number of metrics has to be sent, 3 is default
**duration** determines how long to send data in seconds, 60 is default

The test will fail if count of metrics don't match with **simultaneous.metrics** * **duration** at the end.
The test does not check reading API. Feel free to query manually.

###Limitations
**Receiving format**
It is JSON now but it is a subject to change depending on real production cases. For example, author believes that different IoT devices might send metrics in different formats. Also I have a feeling that some devices might send measurements in bulk. **receive-api** is good enough to be enhanced and meet both cases on demand.

**Float value**
Current metric is hardcoded with float type. Float might not be sufficient in some cases. I also think that IoT devices might send not only single value but also more complex data, for example, coordinates like latitude/longitude. It might even happen that value is not a number at all. Both Cassandra and my implementation are fine with tuning type or even supporting multiple types if it is required.

**Readings**
Only min, max, avg are implemented. Median or other percentile statistics can be implemented with [custom aggregate functions](https://stackoverflow.com/questions/52528838/how-to-get-x-percentile-in-cassandra).

Readings are only provided either by one *type* or one *sensorId*. Also, getting statistics by type is limited by one week range and selecting by sensorId by one day. Those limitations are subjects to change depending on production cases.

*Milliseconds* can be fine for robots but it is still not human readable and not convenient format. I would change it to be a formatted date string like *yyyy-MM-dd HH:mm:ss.SSSZ* or even to support multiple formats.

*statistics-api*/*data* modules are flexible enough to be enhanced to support more readings and multiple readings at time.

**Scalability, high availability, performance, fault-tolerance**
They will depend on a particular infrastructure implementation, e.x. clusters' setup, nodes amount, auto scaling, cross datacenter replication etc.

**Secure Web Service**
The implementation contains only three things regarding the topic:

a) *statistics-api* requires basic authorization. Even so there is only one in memory user, it can be switched to real DB storage

b) I created two roles in Cassandra called *iot_write_role* and *iot_statistics_role* for writing (*stream-consumer*) and selecting (*statistics-api*) respectively

c) I was not able to perform CQL injection via my reading API probably because of using datastax cassadra driver library and its query builder. So I claim here it is CQL injection free.

If in-transfer security is important it can be achieved with SSL certificates and proper configuration for Kafka, Cassandra and *statistics-api* module.

It seems to me that Cassandra does not provide in-rest encryption out of the box but [people say it can be done one way or another](https://stackoverflow.com/questions/47046285/encrypting-the-database-at-rest-without-paying).

Also, going with AWS based solution there might be IAM Roles properly configured (not) to provide an access to different system parts.

0 comments on commit ab5c818

Please sign in to comment.