Not all Words are Created Equal
Weighting
- Content-word weighting is a formalization of two common-sense insights.
- Content words that appear several times in a transcript
are probably more meaningful than content words that appear just once. (local weighting)
- Infrequently used words are likely to be more interesting than common
words. (global weighting)
-
The product of the local and global weighting functions is applied to each non-zero element
of A aij = L(i,j) * C(i) . Where L(i,j) is
the local weighting function for content-word i in transcript
j and C(i) is the global weighting function
for content-word i .