Wednesday, 14 October 2009
issues
There were some issues with the last post. The number of pages for each region name from ech source was different, and I counted total number, not mean per file. Now created a new set from the corpus tht mirrors the numbers of files from web. The index is still different because I use Lucene, and who knows how Yahoo! do it. Thus even when a region name is in the settlement set, the pages retreived by my index differs from the web one.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment