Placing an address on a map either to find or place or to provide business context is becoming vitally important in our society. Location matters.
Most commonly address information that is stored in a database is not something that is regularly maintained. Often information is captured in free text fields which results in data irregularities and inconsistencies.
The purpose of this article is to provide some insight into how to better manage an address dataset which would potentially be batch geocoded and how to optimise the capturing of these address datasets for geocoding in ArcGIS Online.
There are a number of variables at play which can affect the final outcome of a geocoding exercise (the most pivotal being the quality and accuracy of the reference data you are matching against) and it is never as simple as receiving an address dataset and geocoding it, often times clients want quantifiable measures of accuracy for the geocoded dataset and the GIS personnel working on the project are often expected to clean and normalise addresses in order to improve match rates.
Here are a few helpful tips which will help ensure accurate geocodes when using the World geocoder in ArcGIS Online.
- Use single-line addresses
Geocoding single-line addresses is both faster and often more accurate than feeding the address records to the geocoder field by field. This is for a number of reasons, the most obvious being that often the incorrect information is captured in the wrong field.
- An address should look like an address
The ArcGIS Online geocoder uses a form of programmatic pattern matching. If an address does not match the patterns in the locator, your geocodes suffer.
Best practice is to ensure your addresses look as follows:
[HOUSE NUMBER] [ ] [STREET NAME] [ ] [STREET TYPE] [, ] [SUBURB] [, ] [CITY] [, ] [PROVINCE] [, ] [POSTAL CODE]
[CORNER OF] [ ] [STREET NAME] [ ] [STREET TYPE] [ ] [AND] [STREET NAME] [STREET TYPE] [, ] [SUBURB] [, ] [CITY] [, ] [PROVINCE] [, ] [POSTAL CODE]
[POI] [, ] [SUBURB] [, ] [CITY] [, ] [PROVINCE] [, ] [POSTAL CODE]
- A city is more important than a suburb
Suburbs in South Africa remain loosely defined and differ from dataset to dataset. The inclusion of extensions creates an additional host of problems and often suburb names change, or an individual may say their street falls in a neighbouring suburb for various reasons. You are more likely to get an accurate geocode using a city alone instead of using a suburb which does not match the suburb in the reference data you’re matching against.
- Never trust a postal code
Many people do not even know their postal code and it does more harm than good by including an incorrect postal code in an address for geocoding in ArcGIS Online as the address will be scored down. What makes things even more confusing is the fact that a particular street may have a ‘box’ code and ‘street’ code which differ and both may not be accurately represented in the reference data being matched against. If you are going to include postal codes in your addresses to geocode, please ensure they all have four digits, otherwise ArcGIS Online will not recognise the postal code for what it is.
Clicking the image above will download an archive containing a toolbox with a simple Python script that uses a lookup table of freely available data from Statistics South Africa and the South African Post Office to attempt to normalise and clean address datasets prior to geocoding particularly for ArcGIS Online. You can use it in the same way you would use any other tool in ArcMap. Applying the 80/20 principal we have attempted to use the minimal amount of code in order to clean and normalise the majority of addresses, however each dataset is going to have its own nuances so it will be up to you modify the script in order to optimise it for each of your use cases.
If you’ve never used Python, don’t despair, the tool already does most of the heavy lifting for you and there is still much to be gained by adding text replacements and additional street types to the portions of the code indicated below. Simply navigate to the toolbox in an ArcCatalog window, right click on the script and select “Edit…” to be able to incorporate the additional records as and when required. If you would like to add additional functionality, some Python scripting knowledge will be advantageous.
Ultimately, the expectations for any geocoding exercise need to be realistically aligned with the quality of input address data. We must be aware that many datasets in South Africa still have a long way to go and with the dynamic nature of road networks there will always be gaps in the reference data used for geocoding, even in ArcGIS Online. It is up to us as the GIS users to ensure that we prepare our data correctly prior to geocoding in order to achieve the favourable results we seek.