Friday, 27 March 2009

I have been looking for patterns. Does something in the stats I have collected correlate with better definition of regions? I cannot find anything yet, all suprisingly randomly distributed.

So I am going back a step. In Geo-Tagging For Imprecise Regions of Different Sizes we found that the resouces from which georeferences came altered depending on the size of the region being searched for. We did this for a very small sample of a short list of region names. All the same it was a reasonable effort. The reason the sample was so small was that manual geo-tagging was employed to provide a ground truth. Thus it was possible to say where the error was. I now have a list of regions (NOT imprecise), and the boundaries for them. I am going to count the resources now for each region and see if the counts change dependant on the size of the region.

Additionally the resource rows are of various sizes within the resources (and they overlap in size). I wonder if there is a better way to characterise the sizes of the resource items? In Mapping Geographic Coverage of the Web we found Yahoo! document count a good surrogate (though certain places were very ambiguous and needed to be excluded). Maybe that will work?

No comments:

Post a Comment