Training Data Sets

Please consider joining the mailing list to directly receive updates related to the competition.
GIS Cup Training Data Sets: Download
The training data set provided for the GIS Cup contains 20 files: 10 input files and 10 output files. The each file format is explained below.
Input Files
A single input file contains a GPS trace route of an individual trip. Each row of an input file represents a single GPS reading in the form of: <Time>,<Latitude>,<Longitude>

  • <Time>: Represents the number of seconds since the start of the trip.
  • <Latitude>: Represents latitudinal location of the GPS reading in degrees.
  • <Longitude>: Represents longitudinal location of the GPS reading in degrees.

Output Files
The output files are provided to allow contestants to train and test their submissions. They also serve as an example of the required output file format that is expected of the submitted executable. Please see the Submission and Evaluation section for additional details. Each row of an output file represents a single map matched GPS reading in the form of: <Time>,<EdgeId>,<Confidence>

  • <Time>: Represents time of the original GPS reading as given in the input file.
  • <EdgeId>: The identifier of the edge that the GPS reading matches to. Note that value of <EdgeId>must be one of the <EdgeId>values in the WA_Edges.txt and WA_EdgeGeometry.txt files
  • <Confidence>: A real number between 0.00 and 1.00 that indicates the confidence of the map matching algorithm about the correctness of the map matched GPS reading. 1.00 means that the algorithm is 100% percent confident that the output result is correct. 0.00 means that the algorithm is totally uncertain about the correctness of its output result. 0.70 (as an example) means that the algorithm is 70% confident that output reading is correct. In practice, the confidence value is important because various application would reason about that value before taking decisions using the map matched result.