Thursday, April 7, 2016

Geocoding: Sand Mine Locations in Wisconsin

Background

The objective of this exercise was to be able to correctly identify roughly sixteen sand mines across the state of Wisconsin by geocoding. The data from this exercise originated from the Wisconsin Department of Natural Resources and were not in the correct format for geocoding initially. This lab served the purpose of introducing us to "imperfect data" and the errors that can occur if it is not properly formatted.

Geocoding: The process of assigning a physical location to an object, or correcting a previously set address. This is beneficial in analyzing data and specifically networking from one place to another.


Methods

As stated above the data provided to us from the Wisconsin DNR was not formatted in a way to be able to readily geocode the locations of the sandmines. The table contained one field for the address of a mine which cannot be read by the geocoding program. So, it was necessary to split the address into varying fields to separate street, city, state, county, etc. In addition to combining the whole address into one single field, the WDNR sometimes only provided the Public Land Survey description of the property. In that case a separate field was required to account for the description.

In the table I normalized I separated the orginal address field into ADDRESS_STREET, ADDRESS_CITY,  ADDRESS_STATE, ADDRESS_ZIP, and PLSS. This way the parts of the address were individual and could then be used to find that specific location on the map. In Table 1 you can see the result of my normalized table.

Table 1. Normalized table used to geocode the addresses of sand mines in Wisconsin. 
Once the table was normalized it could be added to ArcMap. The geocoding toolbar was turned on, and connected to ___________________. The interface automatically tried to match the addresses from the table to a location on the map.

Merge tool
I found the four other students that were assigned the same mines and used their shapefiles
-An error occured in merging because the zip code field different in one of the students tables, so I deleted that field because it was not needed. The field type was double and the student changed it to text.

Project
I projected the selected mines and my mines to State Plane System- Wisconsin Central: NAD_1983_2011_StatePlane_Wisconsin_Central_FIPS_4802

Select by attributes
To ensure the distance between my geocoded mines was matched with the same mine ID as the mines geocoded from my classmates, I created a new feature class for each of the sixteen mines my classmates geocoded. So, there were sixteen diferent feature classes, each representing one unique mine ID. Some feature classes only contained one mine, while others contained three or four depending on how the students geocoded or if they actually did the assignement.

As for comparing the distances between my mines and the actual location of the mine, I again created a separate feature class for each unique mine ID for the actual mine locations.

Near tool
To succesfully get the distance to the closest mine the near tool was used. Some mines were way off their actual location and closer to a different mine, that is why the sixteen feature classes were created to identify the closest mine to each specific mine ID. When the near tool is run the distance will appear in the attribute table. But, since the each mine was run separately with this tool, only that specific mine ID distance will be correct. The tables (Table 2 and 3). below show the distances produced from running the near tool. The input features were my selected mines and the near features was the projected student mines or actual locations.

Table 3. Table showing the distance
from my geocodedmines to the actual 
locations of the mines. Unit in meters. 
Table 2. Table showing the distance from
my geocoded mines to theclosest 
geocoded mine of my classmates. 
Unit in meters. 


















Results
Figure 1. Map comparing the mines I  geocoded to the mines geocoded by classmates, and the actual location of the mines.  
From the map it appears my geocoded mines were not too far off from the actual location of the mines, however, the closest one of my mines was to the actual location is 260 meters. In fact, three mines 237, 238, and 235 were geocoded in the same location northeast of their actual locations. The geocoding toolbor automatically assigned the mines to this location, and I made the mistake of not correcting those locations. Also, I confused mine 269 for mine 270 and vice versa. I noticed this while running the near tool when the table produced a closer distance for another mine than the one I was searching for. This could be attributed to not reading the address correctly and that there were two mines close by!

In reviewing  locations where my mines were far from the actual location I noticed that some of the mines were classified as inactive, which could be a reason to why the locactions did not match. The mines that did not appear on the basemap or google earth were the hardest to identify because there was no way of telling where the mines driveway was. 


Conclusion

This exercise clearly demonstrated the importance of normalizing data and being aware of the errors possible throughout the whole geocoding process. According to Lo ___________, there are three types of error: Gross, Systematic, and Random. During this exercise many errors affected the how the mines were geocoded. 

Gross error: the user makes a mistake in managing the data. A great example of this was stated above for mines 235, 237, and 238 that were all geocoded in the same location. That was an oversight by me to not go in and manually change where the mine was. 

Systematic error: errors attributed to bias in measurements of the user and faulty equipment to name a few. In this exercise an example of systematic error was my personal mine locations differed (in some cases a lot) from the mines of my classmates. We each had our own bias as to we thought the coordinate of the mine should be located. Therefore, we had varying data. 

Random error: errors due to the limitation of the equipment taking the measurements for data collection. 

Specifically the some errors in this exercise could have been easily avoided. First, if all of the students were given a normalized addresses in the attribute table that would avoid any mistakes from trying to transform the table to work with the data. Second, more guidelines as to where the coordinate should be placed in relation to the mine would help in eliminating some human bias. In conclusion, this exercise demonstrated the purpose of preparing data ahead of time, especially when the data will be distributed to a team to work on.


No comments:

Post a Comment