Now indexing the first 3 directories of os50kcorpus. I am still collecting another 2, but have run out of patience to see what exactly I have in the first 3, which is pages 1-150 of each settlement name. Will it be skewed by the way it was created?
The other thing that can happen with this corpus is a search for the 2500 regions, that I can geocoded against a random selection of pages; are there any differences in scopes etc? Since this process is slow, I might choose a smaller set of "regions" the middle set is uk settlements, which are bounded by rural areas anyway and seem different to snis and counties. There are also many more of them, and it seems to me there are too many. It would probably have been better to select about 100 - 500 of them, but to put more effort into finding neighbourhood data (like snis) for different cities. Probably a bit late now.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment