ContentWord x Transcript Representation
- In the LSI model, content-words and transcripts are represented by an m x n incidence matrix A .
- Each of the m unique content-words in the transcript collection are assigned a row in the matrix .
- Each of the n transcripts in the collection are assigned a column in the matrix.
- A non-zero element aij , where A =
[aij] indicates not only that content-word i occurs in transcript j , but also the number of times the content-word appears in that transcript .