In previous work ambiguity was examined. geo-nongeo ambiguity seemed to cause more problems that geo-geo. A stop word list was created by comparing counts of occurances in a corpus of everyday English, and in a geographical corpus (derived from geograph.org). When ranked by count things that were higher in the non geo corpus were assumed to be non-geo names on average and vice versa. This rough and ready technique was used to prune placenames in an experiment that estimated web-coverage fromvarious sources. It improved the results, making our estimates of web coverage correlate better.
There may be other ways to do the same thing, perhaps based on spatial measures. I am looking at this, and hope to show that places that are often further away from the region, are often stop words (links, login etc).
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment