See the following publication for more details on the data processing and collection process. Please cite the following paper when using the dataset. We have provided the sample training scripts to show the usage of the dataset along with the model architectures.
@inproceedings{xing2018enabling,
title={Enabling Edge Devices that Learn from Each Other: Cross Modal Training for Activity Recognition},
author={Xing, Tianwei and Sandha, Sandeep Singh and Balaji, Bharathan and Chakraborty, Supriyo and Srivastava, Mani},
booktitle={Proceedings of the 1st International Workshop on Edge Systems, Analytics and Networking},
pages={37--42},
year={2018},
organization={ACM}
}
CMActivities dataset contains video, audio, and IMU modalities collected using two smartphones from users performing activities. We refer to the users performing activities as performers hereafter. An observer is holding the first smartphone which is used to record and timestamp the video (along with audio) of the performers. In this way, the first smartphone is acting as an ambient sensor which is recording (video and audio) of the performer. The second smartphone was in the trouser’s front pocket of the performers. The second smartphone was used to timestamp the IMU data captured from the left and right wrist sensors worn by the performers. Both smartphones were synchronized using NTP.
CMActivities dataset is collected from two performers doing seven activities (upstairs, downstairs, walk, run, jump, wash hand and jumping jack). Every data collection session roughly lasted for 10 seconds, where the performer performed a singular activity. The training split is generated from the 624 training sessions. The test split is generated from the 71 test sessions. Each split contains the data from both performers. The validation split is generated by using a part of the training sessions.
- The audio and IMU windowed samples are released. Audio samples are processed, and 193 features are extracted. IMU samples are available in raw form.
- The training, validation, and test samples are available below.
- Training samples download
- Validation samples download
- Test samples download
- Other samples from CMActivities dataset: Transfer samples download, Limited train download, Personalization download
- More data processing details are in the publication.
- We plan to release the video samples using the intermediate representation soon.
- Audio Model Training Notebook
- Training accuracy: 99.9%
- Validation accuracy: 99.5%
- Test accuracy: 90.9%
- IMU Model Training Notebook
- Training accuracy: 99.8%
- Validation accuracy: 95%
- Test accuracy: 90.5%
@inproceedings{sandha2020time,
title={Time awareness in deep learning-based multimodal fusion across smartphone platforms},
author={Sandha, Sandeep Singh and Noor, Joseph and Anwar, Fatima M and Srivastava, Mani},
booktitle={2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI)},
pages={149--156},
year={2020},
organization={IEEE}
}
- Fusion_Training_Vanilla.ipynb: Vanilla fusion training code with audio and IMU modalities synced.
- Augmentation_Data_generation.ipynb: Create augmented training data by introducing artificial errors between audio and IMU modalities.
- Fusion_Training_Augmented_1000ms.ipynb: Training fusion model with 1000ms time-shift augmentation.
- Testing_Time_Shifting_1000ms.ipynb: Tests the vanilla model and augmented model on the testing data by introducing errors in the test data.
Note: Notebook-1 (Fusion_Training_Vanilla.ipynb) directly uses the data samples available for download. Notebook-2 (Augmentation_Data_generation.ipynb) creates new augmented samples used by the Notebook-3 (Fusion_Training_Augmented_1000ms.ipynb). Notebook-4 (Testing_Time_Shifting_1000ms.ipynb) uses the models trained by Notebook-1 and Notebook-3 along with the test data samples that are available.