This project is designed to estimate delivery fares based on GPS data logs. The program reads GPS data for deliveries, filters out invalid points, calculates the distance using the Haversine formula, and generates fare estimates. It leverages concurrency for high performance and efficiency, making it suitable for large datasets.
This folder contains the input CSV files with the GPS data for deliveries. The input CSV file should have the following format:
id_delivery,lat,lng,timestamp
- id_delivery: A unique identifier for each delivery.
- lat: The latitude coordinate of the delivery point.
- lng: The longitude coordinate of the delivery point.
- timestamp: A UNIX timestamp representing the time of the delivery point.
Example:
1,35.706552,51.412262,1723697700
1,35.702591,51.412704,1723697730
This folder will contain the output files generated by the program:
filtered_data.csv
: Contains the valid delivery points after filtering.fares.csv
: Contains the fare estimates for each delivery, in the formatid_delivery, fare_estimate
.
Make sure you have Go installed on your system. You can verify by running:
go version
- Place your input data file (e.g.,
sample_data.csv
) inside theinput_dataset/
folder. - Run the program using the following command:
go run main.go
This will process the input data, filter out invalid points, calculate delivery fares, and generate two output files:
output_dataset/filtered_data.csv
output_dataset/fares.csv
If you want to use your own dataset, follow these steps:
- Ensure your input CSV file follows the required format:
id_delivery,lat,lng,timestamp
. - Place the file in the
input_dataset/
folder. - Update the path in the
main.go
file to reference your dataset:
chunks, err := readDataChunks("input_dataset/your_custom_data.csv")
- Run the program:
go run main.go
The program will generate the filtered data and fare estimates for your dataset in the output_dataset/
folder.
The unit tests validate individual functions like:
- haversine(): For distance calculation between two geographical points.
- filterInvalidPoints(): To ensure points that exceed the speed threshold are filtered.
- calculateFare(): For correct fare estimation based on filtered points. Run the unit tests with:
go test -v
The E2E tests cover the entire flow, from reading raw data, processing it, and generating output files. The E2E tests simulate how the program will behave in a real-world scenario and ensure the correct integration of all functions.
To run the end-to-end tests, use:
go test -v -run TestEndToEndFlow
This will check:
- Reading the input dataset.
- Filtering invalid points.
- Calculating fares.
- Writing the final output (fares.csv and filtered_data.csv).
- Data Ingestion: The program reads GPS data in chunks from the input file.
- Concurrency: Each chunk is processed concurrently using Go's goroutines, speeding up the filtering and fare calculation process.
- Filtering: The program uses the Haversine formula to calculate the distance between consecutive points and filters out any points where the speed exceeds 100 km/h.
- Fare Calculation: Fares are calculated based on the distance, time of day (daytime or nighttime rates), and idle time. The minimum fare for any delivery is set to 3.47.
- Output: The program writes filtered data to
filtered_data.csv
and fare estimates tofares.csv
.
- The program is designed to handle large datasets efficiently.
- File writing operations are synchronized using a mutex to avoid race conditions in concurrent environments.
- The project includes comprehensive unit tests and end-to-end tests to ensure correctness and robustness.
This project is open-source and available for use under the MIT License.
I would be so happy to ask me about this project ! 😊