Languages that let users create custom icons and iconic/visual sentences are receiving increased attention as multimedia applications become more prevalent. Visual language systems let the user introduce new icons, and create iconic/visual sentences with different meanings and the ability to exhibit dynamic behavior. Furthermore, visual programming systems support problem solving and software development through the composition of basic software components using spatial operators such as "connect port #1 of component A to port #2 of component B".
We will first introduce the elements of visual languages, then describe how visual languages can be extended to deal with multimedia. We will discuss visual programming languages both for general purpose problem solving and for special application to database querying. Finally on-line bibliographies for further reference and some thoughts concerning the future of visual languages and visual programming languages are provided.
Operators can be visible or invisible. Most system-defined spatial/temporal operators are
invisible, whereas all user-defined operators are visible for the convenience of the user.
For example, excluding the dialog box, the visual sentence in Figure 1(a) is the horizontal
combination of three icons. Therefore, it can be expressed as:
( CHILDREN hor SCHOOL_HOUSE ) hor SUNRISE
where hor is an invisible operator denoting a horizontal combination. But if we look at Figure 2, the cat is a visible operator denoting a process to be applied to the fish in the fish tank. An operation icon can be regarded as a visible operator.
The four most useful domain-independent spatial icon operators are ver, for vertical
composition; hor, for horizontal composition; ovl for overlay; and con, for connect.
The operators ver, hor and ovl are usually invisible (see Figure 1 for an example, where
the hor operator is invisible).
On the other hand, the operator con is usually visible as a connecting line
(see Figure 3 for an example, where the connecting lines among the icons
called places and transitions are visible). This operator con is very
useful in composing visual programs (see Section 3).
Icon SCHOOL_HOUSE WHO: nil DO: study WHERE: school WHEN: nilIn other words, the SCHOOL_HOUSE icon has the meaning "study" if it is in the DO location, or the meaning "school" in the WHERE location. Its meaning is "nil" if it is in the WHO or WHEN location. An equivalent linearized conceptual graph is as follows:
[Icon = SCHOOL_HOUSE] --(sub)--> [WHO = nil] --(verb)-> [DO = study] --(loc)--> [WHERE = school] --(time)-> [WHEN = nil]The meaning of a composite icon can be derived from the constituent icons, if we have the appropriate inference rules to combine the meanings of the constituent icons. Conceptual dependency theory can be applied to develop inference rules to combine frames (Ref 4). Conceptual operators can be used to combine conceptual graphs (Ref 5). As a simple example, the merging of the frames for the icons in the visual sentence shown in Figure 1(a) will yield the frame:
Visual_Sentence vs1 WHO: children DO: study WHERE: nil WHEN: morningWe can derive this frame by merging the frames of the four icons using the following rule:
At the University of Pittsburgh and the Knowledge Systems Institute, we have developed a
formal framework for visual language semantics that is based on the notion of an icon
algebra and have designed several visual languages for the speech impaired.
We have since
extended the framework to include the design of multidimensional languages - visual
languages that capture the dynamic nature of multimedia objects through icons, earcons
(sound), micons (motion icons), and vicons (video icons). The user can create a
multidimensional language by combining these icons and have direct access to multimedia
information, including animation.
We have successfully implemented this framework in developing BookMan ,
a virtual library used by the students and faculty of the Knowledge Systems Institute.
As part of this work, we extended the visual language concepts to develop teleaction
objects, objects that automatically respond to some events or messages to perform certain
tasks (Ref 7). We applied this approach
to emergency management, where the information system must react to flood
warnings, fire warnings, and so on, to present multimedia information and to take actions
An Active Medical Information System was also developed based upon
this approach (Ref 9).
Figure 4 shows the search and query options available with BookMan.
Users can perform a range of tasks, including finding related books, finding
books containing documents similar to documents contained in the current book,
receiving alert messages when related books or books containing similar documents have
been prefetched by BookMan, finding other users with similar interests or receiving alert
messages about such users (the last function requires mutual consent among the users)
Much of this power stems from the use of Teleaction Objects (TAOs).
TAOs can accommodate a wide range of functions. For example, when the
user clicks on a particular book, it can automatically access information about related
books and create a multimedia presentation from all the books.
The drawback of TAOs is that they are complex objects and therefore the end user can
not easily manipulate them with traditional define, insert, delete, modify, and update
commands. Instead, TAOs require direct manipulation, which we provided through a
The physical appearance of a TAO is described by a multidimensional sentence. The
syntactic structure derived from this multidimensional sentence controls its dynamic
multimedia presentation. The TAO also has a knowledge structure called the active index
that controls its event-driven or message-driven behavior. The multidimensional sentence
may be location-sensitive, time-sensitive or content-sensitive. Thus, an incremental
change in the TAO’s external appearance is an event that causes the active index to react.
As we will describe later, the active index itself can be designed using a visual-language
Section 1 described the icons and operators in a visual
(not multidimensional) language. In a multidimensional language, we need not only
icons that represent objects by images, but also icons that represent the different types of
media. We call such primitives generalized icons and define them as x = (xm, xp) where
xm is the meaning and xp is the physical appearance. To represent TAOs, we replace the xp
with other expressions that depend on the media type:
o Icon: (xm, xi) where xi is an image
o Earcon: (xm, xe) where xe is sound
o Micon: (xm, xs) where xs is a sequence of icon images (motion icon)
o Ticon: (xm, xt) where xt is text (ticon can be regarded as a subtype of icon)
o Vicon: (xm, xv) where xv is a video clip (video icon)
The combination of an icon and an earcon/micon/ticon/vicon is a multidimensional sentence.
For multimedia TAOs, we define operators as
o Icon operator op = (opm, opi), such as ver (vertical composition), hor (horizontal composition), ovl (overlay), con (connect), surround, edge_to_edge, etc.
o Earcon operator op = (opm, ope), such as fade_in, fade_out, etc.
o Micon operator op = (opm, ops), such as zoom_in, zoom_out, etc.
o Ticon operator op = (opm, opt), such as text_merge, text_collate, etc.
o Vicon operator op = (opm, opv), such as montage, cut, etc.
Two classes of operators are possible in constructing a multimedia object.
As we described in Section 1, spatial operators are operators that involve
spatial relations among image, text or other spatial objects. A multimedia object can also
be constructed using operators that consider the passage of time. Temporal operators,
which apply to earcons, micons, and vicons, make it possible to define the temporal
relation (Ref 10) among generalized icons. For example, if one wants to watch a
video clip and at the same time listen to the audio, one can request that the video co_start
with the audio. Temporal operators for earcons, micons, ticons and vicons include
co_start, co_end, overlap, equal, before, meet, and during and are usually treated as
invisible operators because they are not visible in the multidimensional sentence.
When temporal operators are used to combine generalized icons, their types may change.
For example, a micon followed in time by another icon is still a micon, but the temporal
composition of micon and earcon yields a vicon. Media type changes are useful in
adaptive multimedia so that one type of media may be replaced/combined/augmented by
another type of media (or a mixture of media) for people with different sensory
We can add still more restrictions to create subsets of rules for icons, earcons, micons
and vicons that involve special operators:
o For earcons, special operators include fade_in, fade_out,
o For micons, special operators include zoom_in, zoom_out,
o For ticons, special operators include text_collate, text_merge,
o For vicons, special operators include montage, cut.
These special operators support the combination of various types of generalized icons
so that the multidimensional language can fully reflect all multimedia types.
Figure 1(b) without the
dialog box illustrates a simple visual sentence, which describes the
to-do item for TimeMan. With the dialogue box, the figure becomes
a multidimensional sentence used by TimeMan to generate "The children drive to school
in the morning" in synthesized speech. The multidimensional sentence has the syntactic
(DIALOG_BOX co_start SPEECH) ver (((CHILDREN hor CAR) hor
SCHOOL_HOUSE) hor SUNRISE)
Figure 5 is a hypergraph of the syntactic structure. The syntactic structure is essentially a
tree, but it has additional temporal operators (such as co_start) and spatial operators
(such as hor and ver) indicated by dotted lines. Some operators may have more than two
operands (for example, the co_start of audio, image, and text), which is why the structure
is called a hypergraph. The syntactic structure controls the multimedia presentation of the
For World Wide Web applications, the HTML language can be extended to TAOML (Teleaction Object Markup Language) so that
teleaction objects can be specifed using HTML enhanced by a multidimensional language
and realized as Web pages. For example, the TAOML pages can serve as the interface to an
Active Medical Information System (Ref 9).
Multidimensional languages must also account for multimedia dynamics because many media types vary with time. This means that a dynamic multidimensional sentence changes over time. Transformation rules for spatial and temporal operators can be defined to transform the hypergraph in Figure 5 to a Petri net that controls the multimedia presentation. Figure 3 represents the Petri net of the sentence in Figure 1(b). As such, it is also a representation of the dynamics of the multidimensional sentence in Figure 1(b). The multimedia presentation manager can execute this Petri net dynamically to create a multimedia presentation (Ref 11). For example, the presentation manager will produce the visual sentence in Figure 1(b) as well as the synthesized speech.
The basic software components can be defined by the programmer/user or obtained from a predefined software component library. Each software component has a visual representation for ease of comprehension by the user. Therefore software components are generalized icons, and a visual program is a visual sentence composed from generalized icons that are software components. Since the software components are connected together to form a visual program, a visual program can be represented by graph where the basic components in the graph may have multiple attachment points. Examples of commercially available visual programming systems include Prograph which is an object-oriented programming language with dataflow diagrams as its visualization (Ref 12), and LabVIEW which supports the interconnections of boxes representing software/hardware components (Ref 13).
Visual programming is appealing because the programmer or end user can easily manipulate the basic software components and interactively compose visual programs with the help of visual programming tools. Some would claim that visual programming is more intuitive and therefore simpler than traditional programming. Some would further claim that even untrained people can learn visual programming with little effort. However such claims remain to be proven, especially for large-scale software development (Ref 14).
As described in the previous two sections, visual languages and multidimensional languages are useful in specifying the syntactic structure, knowledge structure and dynamic behavior of complex multimedia objects such as TAOs (teleaction objects). We can also construct visual programs using active index cells, which are the key elements of TAOs (Ref 15). Without the active index cell, a TAO would not be able to react to events or messages, and the dynamic visual language would lose its power. As an example of visual programming, we can specify index cells using a visual programming tool to be described in Section 3.2. The index cells can thus be connected together as a visual program to accomplish a given task.
An index cell can be either live or dead, depending on its internal state. The cell is live if
the internal state is anything but the dead state. If the internal state is the dead state, the
cell is dead. The entire collection of index cells, either live or dead, forms the index cell
base. The set of live cells in the index cell base forms the active index.
Each cell has a built-in timer that tells it to wait a certain time before deactivating (dead
internal state). The timer is reinitialized each time the cell receives a new message and
once again becomes active (live). When an index cell posts an output message to a group
of output index cells, the output index cells become active. If an output index cell is in a
dead state, the posting of the message will change it to the initial state, making it a live
cell, and will initialize its timer. On the other hand, if the output index cell is already a
live cell, the posting of the message will not affect its current state but will only
reinitialize its timer.
Active output index cells may or may not accept the posted message. The first output
index cell that accepts the output message will remove this message from the output list
of the current cell. (In a race, the outcome is nondeterministic.) If no output index cell
accepts the posted output message, the message will stay indefinitely in the output list of
the current cell. For example, if no index cells can provide the BookMan user with
information about nuclear winter, the requesting message from the nuclear winter index
cell will still be with this cell indefinitely.
After its computation, the index cell may remain active (live) or deactivate (die). An
index cell may also die if no other index cells (including itself) post messages to it. Thus
the nuclear winter index cell in BookMan will die if not used for a long time, but will be
reinitialized if someone actually wants such information and sends a message to it.
Occasionally many index cells may be similar. For example, a user may want to attach an
index cell to a document that upon detecting a certain feature sends a message to another
index cell to prefetch other documents. If there are 10,000 such documents, there can be
ten thousand similar index cells. The user can group these cells into an index cell type,
with the individual cells as instances of that type. Therefore, although many index cells
may be created, only a few index cell types need to be designed for a given application,
thus simplifying the application designer’s task.
Figure 6(a) shows the construction of the state-transition diagram. The prefetch index cell
has two states: state 0, the initial and live state, and state -1, the dead state. The designer
draws the state-transition diagram by clicking on the appropriate icons. In this example,
the designer has clicked on the fourth vertical icon (zigzag line) to draw a transition from
state 0 to state 0. Although the figure shows only two transition lines, the designer can
specify as many transitions as necessary from state 0 to state 0. Each transition could
generate a different output message and invoke different actions. For example, the
designer can represent different prefetching priority levels in BookMan by drawing
The designer wants to specify details about transition2 and so has highlighted it.
Figure 6(b) shows the result of clicking on the input message icon (top icon to the right of the
State Transition Specification Dialog box.) IC Builder brings up the Input Message
Specification Dialog box so that the designer can specify the input messages. The
designer specifies message 1 (start_prefetch) input message. The designer could also
specify a predicate, and the input message is accepted only if this predicate is evaluated
true. Here there is no predicate, so the input message is always accepted.
Figure 6(c) shows what happens if the designer clicks on the output message icon in Figure 6(a) (bottom icon to the right of the State Transition Specification Dialog box). IC Builder brings up the Output Message Specification Dialog box so that the designer can specify actions, output messages, and output index cells. In this example, the designer has specified three actions: compute_schedule (determine the priority of prefetching information), issue_prefetch_proc (initiate a prefetch process), and store_pid (once a prefetch process is issued, its process id or pid is saved so that the process can be killed later if necessary). In the figure there is no output message, but both input and output messages can have parameters. The index cell derives the output parameters from the input parameters.
The construction of active index from index cells is an example of visual programming for general purpose problem solving - with appropriate customization the active index can do almost anything. In the following, we will describe a special application of visual programming to database querying.
A visual sentence or multidimensional sentence can also be either location-sensitive,
time-sensitive, or content-sensitive. In Section 1 we gave examples of different types of
visual sentences. The resulting language is a dynamic visual language or dynamic
A dynamic visual language for virtual reality (VR) serves as a new paradigm in a querying
system with multiple paradigms (form-based queries, diagram-based queries and so on)
because it lets the user freely switch paradigms (Ref 16). When the user initially
browses the virtual library, the VR query may be more natural; but when the user wants
to find out more details, the form-based query may be more suitable. This freedom to
switch back and forth among query paradigms gives the user the best of all worlds, and
dynamic querying can be accomplished with greater flexibility.
From the viewpoint of dynamic languages, a VR query is a location-sensitive
multidimensional sentence. As Figure 4(b) shows, BookMan indicates the physical
locations of books by marked icons in a graphical presentation of the books stacks of the
library. What users see is similar (with some simplification) to what they would
experience in a real library. That is, the user selects a book by picking it from the shelf,
inspects its contents and browses adjacent books on the shelf.
In Figure 4(a), initially the user is given the choice of query paradigms: search by title,
author, ISBN, or keyword(s). If the user selects the virtual library search, the user can then
navigate in the virtual library, and as shown in Figure 4(b), the result is a marked object. If
the user switches to a form-based representation by clicking the DetailedRecord button,
the result is a form as shown in Figure 4(c). The user can now use the form to find
books of interest, and switch back to the VR query paradigm by clicking the VL
location button in Figure 4(c).
Essentially, the figure illustrates how the user can switch between a VR paradigm (such
as the virtual library) and a logical paradigm (such as the form).
There are certain admissability conditions for this switch. For a query in the logical
paradigm to be admissable to the VR paradigm, the retrieval target object should also be
an object in VR. For example, the virtual reality in the BookMan library is stacks of
books, and an admissable query would be a query about books, because the result of that
query can be indicated by marked book icons in the virtual library.
Conversely, for a query in the VR paradigm to be admissable to the logical paradigm,
there should be a single marked VR object that is also a database object, and the marking
is achieved by an operation icon such as similar_to (find objects similar to this object),
near (find objects near this object), above (find objects above this object), below (find
objects below this object), and other spatial operators. For example, in the VR for the
virtual library, a book marked by the operation icon similar_to is admissable and can be
translated into the logical query "find all books similar to this book".
Visual query systems for multimedia databases, like BookMan, are under active
investigation at many universities as well as industrial laboratories (Ref 17). These systems are
very flexible. For example, a user can easily and quickly ask for any engineering
drawing that contains a part that looks like the part in another drawing and that has a
signature in the lower right corner that looks like John Doe’s signature. In
BookMan we have a mechanism that lets users create similarity
retrieval requests that prompt BookMan to look for books similar to the book being
and then perform searches on the World Wide
Web using a Web browser enhanced with an active index (Ref 18).
The average programmer and end user are used to a hybrid mode of human-computer
interaction, involving text, graphics, sound and the like. Thus, "pure"
visual programming languages are sometimes hard to justify. On the
other hand, languages allowing hybrid mode of interactions are already
unavoidable due to the explosion of multimedia computing and network
As multimedia applications become even more widespread, we
expect to see more special-purpose or general-purpose visual language systems
and visual programming systems in which visual and multidimensional languages play
an important role, both as a theoretical foundation and as a means to explore new
Figure 2. Content-Sensitive visual sentences (a) and (b) show the fish tank and cat
metaphor for the time management personal digital assistant TimeMan. Each fish
represents a to-do item. When the to-do list grows too long, the fish tank is
overpopulated and the cat appears. The fish tank icon and cat operation icon have
corresponding index cells receiving messages from these icons when they are changed by
Figure 3. A time-sensitive visual sentence for the Petri net controlling the presentation of
the visual sentence shown in Figure 1(b).
Figure 4. The virtual library BookMan lets the user (a) select different
search modes, (b) browse the virtual library and select desired book for further
inspection, and (c) switch to a traditional form-based query mode.
Figure 5. The syntactic structure of the multidimensional sentence shown in Figure 1(b).
This structure is a hypergraph because some relational operators may correspond to lines
with more than two end points.
Figure 6. The visual specification for an active index cell of the virtual library
BookMan: (a) the state transitions, (b) input message, (c) output message and