I have been sorting out the graphs for the document. not as easy as it could be using excel and knowing it will have to be b&w for the thesis. Anyway as I have been outputting the results, they have looked different to before with lovely smooth curves from the web and knobbly ones from the corpus, also far fewer georefs in the corpus. I was going to output all the graphs and then try to understand what had happened.
Turns out I got them round the wrong way, an early error and the fact that I reference the data sets by a code rater than anything descriptive meant I have been disproving my theory all week. When I saw the smooth graphs I could not beleive they were my corpus ones because that would suggest that my corpus is better not just the same as the web. I also used the wrong corpus set, I used first 100 docs whereas I have another set that matches the file quantities for the web crawl. The correct set is bulding now, it is taking an age because as I first though it has 5x (about 5 million) the georefs in it. Someone recently called me "the stupidest clever person she knows"; Mmmmm.
I think the results will be quite good, when I get them.
I am going to have to work the weekend mostly, because I spent more time that I should have going to meetings that did not happen because the person who called them did not turn up themselves. If they could only have told me I would have saved 5 hours of my time in travel and sitting about (no fun when you are not paid bth and are already working every Saturday). Oh, and apparently I didn't need to be there anyway.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment