rob's PhD: stop words

Tuesday, 24 March 2009

stop words

In previous work ambiguity was examined. geo-nongeo ambiguity seemed to cause more problems that geo-geo. A stop word list was created by comparing counts of occurances in a corpus of everyday English, and in a geographical corpus (derived from geograph.org). When ranked by count things that were higher in the non geo corpus were assumed to be non-geo names on average and vice versa. This rough and ready technique was used to prune placenames in an experiment that estimated web-coverage fromvarious sources. It improved the results, making our estimates of web coverage correlate better.

There may be other ways to do the same thing, perhaps based on spatial measures. I am looking at this, and hope to show that places that are often further away from the region, are often stop words (links, login etc).

rob's PhD

Tuesday, 24 March 2009

stop words

No comments:

Post a Comment

Followers

Blog Archive

About Me