8th August 2009
In this article, some extensions and modifications to the models in geocoding databases are discussed in three areas of change: the statoid levels, localization/multi-lingual support and diachronic support.
Several geocoding services on the web allow you to map a string, representing a location, into
a longitude/latitude pair (point) or sets of such pairs defining a boundary (region).
For instance Yahoo Geocoding API.
Typically, the supported strings represent a location in a hierarchical ontology of subdivisions of the world.
This helps disambiguate locations that have the same name, for instance 'London, UK' versus 'London, Ontario'.
I discuss several ways in which the hierarchical ontology supported by these systems can be improved upon.
To a large extent, the world is administratively divided into a hierarchy of
administrative divisions.
Different countries have different levels of administrative subdivisions. I will use
the term 'statoid' as proposed by Gwillim Law
to refer to administrative divisions
of the world at any level. Most webservices support a subset of these subdivisions.
country > state/province > municipality > town
For instance, GeoNames includes
the concepts of 'first order administrative subdivision',
'second order administrative subdivision' and 'administrative subdivision' and provides a
tree structure representation of the subdivisions.
Many countries have exceptions and extensions to such strictly hierarchical levels:
Locations have different names in different languages:
Note that locations do not necessarily have one 'official' name, especially if a country is officially multi-lingual (e.g. Brussels).
Another area of localization is the name of the statoid level itself. If we were to display the difference between the city of Luxembourg and
the region of Luxembourg, we would have to show the level names 'city' and 'region'. It would be helpful in a worldwide
information system to have translations for these levels available as well. Examples:
The division of the world into statoids has changed over time, and continually changes:
provinces get redefined, towns are annexed by neighbors, regions are redivided between countries etc.
Most webservices attempt at providing
the most up-to-date data, but this is a problem if you want to find information on statoids that no longer exist, e.g. for
enriching or presenting historical/genealogical data. Also, it means that these services cannot give information about changes
to a statoid over time. There are several sources available that have data about changes to statoids, e.g.
provinciale herindelingen (changes to provinces) for the Netherlands.
It would be desirable to have such information and the subdivision data itself in a unified format and system.
This would allow you to get 'snapshots' of the statoid ontology for different times, and track changes to statoids over time.
For genealogy, where many records concern old countries such as Prussia or New Holland and their respective subdivisions, it would provide a way of associating records to statoids, by querying the statoid set as it existed
at the time of the record. A special case of this is support for older names and spelling variations of locations.