InfoVis Online - Lecture 6

Vector Space Retrieval

Document = Set of Words

Each Word = Dimension in Vector

–After removing very common and rare words

–Stemming

è(retriev*, inform*, visual*, interact*) = 4D vector

Each Word / Dimension Weighted based on Frequency

-“Inverse” = 1 / Frequency

èThe less frequent, the greater the weight

Similarity of Documents = Angle between Vectors

-Two text passages similar if their vectors point in a similar direction