Special Topics in GIS Module 3: Data Quality - Assessments


 This week we studied how to measure accuracy and completeness within our map data. We were given shapefiles of County maintained centerline street data and US Census maintained TIGER line street data to compare. We used a grid system to compare the lengths of the road polylines within each grid square. 

Haklay (2010) defines completeness as the measure of the lack of data, or how comprehensively the data represents the real-world. From our lab instructions, we know that we can infer that more roads indicate more complete data. Of the two road layers, the sum of the Street Centerline shapefile was greater, and is therefore more complete.

I clipped the data of both layers down to the grid by using the Clip geoprocessing tool. I needed to find the total length of each polyline for each layer with each grid square. I then used the Pairwise Intersect tool for the County layer and the Grid, and then on the intersection between the TIGER Line layer and the grid. The attribute table for each new layer showed a new column with the Grid Code and the Shape Length. For each attribute table, I added a new column to calculate the total kilometers using the Calculate Geometry tool. I learned by doing this that when creating a new field, I need to set the data type to “Double” to see a decimal number in the results. I could then use the Statistics tool to review the sum of the road length for each layer.

Using the Statistics tool, I calculated the sum of the total length for each grid. Please see the table below for the results.

I also used the Tabulate Intersection geoprocessing tool, which created a new column showing a percentage for each polyline segment in each grid square. I examined the sum of the total percentages, just as a cross reference. I thought to use the percentages to determine which shapefile covered more area per grid square. 

 

Total Length

Sum of total % per Grid Square

Estimated Percent of Grid Square Cover                        (Total Length / Sum of total %)

County Street Centerline

10,671.1

36,089.4

29.5%

TIGER Road

11,253.4

29,500.1

38.15%

% Difference = (10671.1-11,253.4)/10,671.1 X 100 =  -5.46%

The percent difference, using the County centerline as a base, is -5.46%. 

My results are that the TIGER Road data had more length overall and covered a greater percentage of each grid square. I can deduce that the TIGER Road data demonstrates a higher level of completeness than the County road data. 


References:

Haklay, M. (2010). How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and Design. 37(682-703). doi:10.1068/b35097


Comments