ContentWord x Transcript Representation

In the LSI model, content-words and transcripts are represented by an m x n incidence matrix A .

Each of the m unique content-words in the transcript collection are assigned a row in the matrix .

Each of the n transcripts in the collection are assigned a column in the matrix.

A non-zero element a_ij , where A = [a_ij] indicates not only that content-word i occurs in transcript j , but also the number of times the content-word appears in that transcript .