Data description
Version 1.2.2 of the original dataset (Zheng et al. 2011) was downloaded on the 19.11.2024 and processed it in the following manner:
- Merged the data of all users into a single dataset
- Added transport mode labels and removed all trajectories without a transport mode label.
- Split the trajectories into segments based on the user id, transportation mode and time difference between consecutive points. A new segment is created if the time difference is larger than 10 minutes.
- Split the segments (from the previous step) further based on the distance between consecutive points. A new segment is created if the distance is larger than 100 meters. The created segment ids are unique across all users.
- Removed all segments with less than 100 points.
- Projected the data into UTM zone 50N (EPSG: 32650)
- Removed all segments that move outside of the bounding box of Beijing (406993 , 487551 , 4387642, 4463488 in EPSG 32650)
- Split the data into 4 sets of training, testing and validation data.
The full process is documented in this GitHub Repository.
References
Zheng, Yu, Hao Fu, Xing Xie, Wei-Ying Ma, and Quannan Li. 2011. Geolife GPS Trajectory Dataset - User Guide. Geolife GPS trajectories 1.1. https://www.microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide/.