RIMainApplication.java
Main start up class to provide user interface to the application. Contains RIXMLConverterFrame user input screen and TextDemo user input screen.
RIXMLConverterFrame.java
Used by Admin/teacher. GUI class used to browse and select the Text based VC and uses the convert function of RIDelegate to convert to XML format, create indexes, and store the indexes.
TextDemo.java
Used by All users. This is the search screen to search by content words and search by author. Will return the output in xml format.
RIDelegate.java
Delegate class is in between the user interface and the core application logic. Used as Facade to serve as single point interface to the application. Used by RIXMLConverterFrame and TextDemo.
ContentWord.java
Java bean class which represents the indexed data. (ie this object represents a record of the indexed data in the table RIIndex. Attributes include documentID, authorID, statementID, RIWord
RIContentWord.java
Singleton class, which maintains the list of approved content words. If any new content word is found in the transcripts, it will add it with an autogenerated number. When the instance is created typically at server startup, the content words are parsed out of an xml file and a map of content words is created in memory.
After the new index is created with the saved transcript, the recently added content words are saved back in the XML file.
Sample xml file content for content.xml
<CONTENTWORDS>
<CONTENTWORD>
<WORDID>122</WORDID>
<WORDTEXT>accomplished</WORDTEXT>
</CONTENTWORD>
</CONTENTWORDS>
RIIndexer.java
Core program which creates the index for a transcript. The createIndex method calls the RITranscript class to parse the Saved Transcript(Which I assume to be in an xml format as shown below)
<TRANSCRIPT>
<SESSIONNAME>CIS510</SESSIONNAME>
<SESSIONNUMBER>1</SESSIONNUMBER>
<SL>
<NO>1</NO>
<NAME>suresh rangan</NAME>
<COMMENT>
Hai Everyone</COMMENT>
</SL>
<SL>
<NO>2</NO>
<NAME>john menke</NAME>
<COMMENT> in
first case
maintainability we
were working
on a
J2EE application
using struts
and elected
to use
a pattern given
in many
books where
a single
Struts action
is used
to control
many views
(JSP's)...this proved
to be hard to
maintain</COMMENT>
</SL>
<SL>
<NO>5</NO>
<NAME>john menke</NAME>
<COMMENT> it
seemed like
a good idea
but in
practice figuring
out what
code in
the action
was for
what particular
JSP caused
maintenance problems
we discovered
it's much
better to
write more
code (JSP's)
and have
one Struts
Action per
JSP</COMMENT>
</SL>
</TRANSCRIPT>
This transcript file is parsed to create collection of statements. Each statements will be encapsulated into a RIStatement object. The parsed output will be a TranscriptSession which will have a collection of statements and the session name with session id. This id will unique and will be generated every time the class advisor open the chat session.
Then the content words are created from all the collected statements from different authors in the vc session.
How the contentwords are filtered from the statements?
NonSchematicFilter.xml is created with the collection of frequently used words, verbs, prepositions, adjectives. This list can be refined for more additions. Basically I use this list to remove any words in the statement. I am not looking for exact match to remove the verbs. Though we have “give” as verb, word like given will also be removed from the statement.
With this list we also add all the login names from the list of user profiles as these words should also be eliminated(ie Dear Dr Chang in the statement will be eliminated).
These nonschematic words are parsed from the xml file NonSchematicFilter.xml through the singleton class RINonSchematic.java
LoginNames are parsed from the xml file UserProfile.xml through the singleton class RIUserProfile.java
Once the list of non content words is created, the statements are broken into individual words and removed with the left out real content words eligible for INDEXING. If the content words qualifies the matching rule as per the content word, then the ContentWord object is created with all the attributes like statementid, contentwordid, authorid, documentid. Relation is established between the statement, contentword, author, document.
Once all the content words are created with relation, we now are ready to call the persistence manager RIPersistenceManager.java to persist the index.
RINonSchematic.java
The responsibility of the Singleton class is already discussed above. This is basically a parser to provide the list of all noncontent words to be eliminated from statements.
RIPersistenceManager.java
Singleton class is to insert the indexed data, to query the data based on contentID, authorID etc., using simple jdbc-odbc bridge. The database used is access RIIndex.mdb
RIStatement.java
The responsibility of the class is already discussed above.
RITranscript.java
The responsibility of the Singleton class is already discussed above.
RIUserProfile.java
The responsibility of this SingleTon class to parse the UserProfile.xml into collection of UserProfile Objects. Sample data of UserProfile.xml is
<USERPROFILE>
<LOGIN>
FNAME>SK</FNAME>
<LNAME>Chang</LNAME>
ID>1</ID>
</LOGIN>
<LOGIN>
FNAME>Suresh</FNAME>
<LNAME>Rangan</LNAME>
ID>2</ID>
</LOGIN>
The data is parsed into Name-ID map. The data can be appropriately retrieved either based on loginName or by LoginID.
RIWord.java
This class implements comparable. This RIWord is holds the word information. The equalsTo implementation is overridden to provide partial pattern matching so that a subtle variation in the word is also taken care. We can provide more complicated pattern matching by overriding this method, but for simplicity I have taken 50%of length matching as word match.
TranscriptSession.java
This class represents a transcript session which contains the parsed statements
XMLRI.java
Super class for RIContentWord.java, RINonSchematic.java, RITranscript.java, RIUserProfile.java
This does the generic parsing of xml into documentbuilder and all the other generic utility methods like getTextNodeValue etc.,
XMLWriter.java
Utility class to write the content words into xml.