ok, the assignment discription asks us to:
"implement tf-idf as discribed in class slides"
therefore i've been wondering:
1. are we really supposed to return the vector space similarity of hte query term doc
and the current doc as the "relevence score"?
2. since the normalizer is defined to be (sqr (tf-idf)squred) shouldn't each individual
term have its own normalizer? in stead of a whole document sharing 1 normalizer?
3. if the query term were to be treated as a doc, then the Tf would be "1" for each
term, and since idf/normalizer is a constant then the whole "vector space similarity"
deal would just be changing the fromular of weight to weight=tf*(idf/normalizer)^2?
sorry to have ranted on like this. alot of my questions sound silly, but i am just really confused …
any explaination is appreciated.
cheers, jiatao