Let's return to studying abstract data types by examining the List. We can define this in various ways - by its name alone it is perhaps only vaguely specified. Let's look at how the text defines it:
See Chapter 12 for detailed specifications. See ListInterface.java
Note that indexing for this ADT starts at 1, not 0. This may seem odd, but this is how the author defined it.
We will look at a few of these operations and see the similarities to and differences from our Bag ADT.
Recall that at this point we are looking at a List from a user's point of view. So we aren't interested in how it's implemented, we're interested in how it can be used. A List is a very general and useful structure. It can be used for:
How about using it as a Bag? We could, but would need to add the Bag methods.
Note that it may not be the ideal ADT for some of these behaviors. We will look at how some of these operations are done and their efficiencies soon. However, we may choose to use it because it can do all of these things. See ListExample.java.
Standard Java has a List interface. It is a superset of the operations in the author's ListInterface. Additionally, some operations have different names and special cases may be handled differently. Note also that indexing starts at 0 for Java's list interface. Despite these differences, the idea is the same. For example, compare ListExample.java (uses the author's List Interface) with ListExampleJava.java (which uses Java's List Interface).
In the introduction, we looked at Lists from a user's perspective. Now, we'll look at the ADT from an implementer's perspective. With any ADT, we need to answer this two questions:
There are a couple of ways to represent the data. Let's take a look at some of them.
Let's first consider using an array. This makes sense since it can store multiple values and allow them to be manipulated in various ways.
We also need to keep track of the logical size.
To allow for an arbitrary number of items, we will dynamically resize when needed. This is the same idea as for our Bag and Stack.
Let's start with an add method. Unlike for Bag, with our List we can add at an arbitrary index.
How would we add newEntry
to newPosition
in the List? Remember that the author will not use index 0; the items will be in list[1] to list[numberOfEntries].
In the author's implementation, there is a makeRoom()
method. It's used to make room for the item being added (if the new value is not at the end of the values in the array). How does that work? It's a basic shifting algorithm. However, when implementing it, we must be careful to shift from the correct side. If you start on the wrong side, you will copy, not shift!
What happens if we start on the wrong side?
What about removing data?
Since the data must stay contiguous, we are basically doing the opposite of what we did to add:
numberOfEntries
From the textbook, we have:
How do you think removeGap
is implemented?
The approach to implementing the other methods should be the same:
See the text for discussions of more operations and AList.java for the entire implementation.
We mentioned previously that in standard Java there is a List interface similar to the author's ListInterface. Let's now take a look at the List implementation. Recall that for now we are considering only array-based implementations. In this case, Java provides two implementations.
The ArrayList is a class developed as part of the standard Java Collections Framework. It is built from scratch to implement the List interface and uses a dynamic expanding array (similar to what we discussed but with a slightly different size increase factor). In real applications where a List is needed you will likely use this class.
The Vector is a class created before the Java Collections Framework was developed. It was designed to be a dynamically expanding collection. When the Collections Framework was developed, Vector was retrofitted into it through the addition of the standard List methods. However, previous methods were also kept, so for a lot of operations there are two (almost) equivalent methods in the Vector class, e.g.:
public E remove(int index)
public void removeElementAt(int index)
Note that the only major difference is the return type.
There is one other interesting difference between Vector and ArrayList. Vector is synchronized and ArrayList is not. Being synchronized means that if multiple Threads attempt to modify a Vector "at the same time", only one will be allowed to do so (and the others will have to wait their turn). The idea is that the data remains consistent when used with multiple Threads. The ArrayList makes no such guarantee though. Threads are objects that allow parts of programs to execute in "pseudo-parallel". We won't discuss them in this course, but they might be discussed in Operating System or Computer Architecture courses.
Now let's implement our ListInterface using a linked data structure. Much of the implementation is identical to our LinkedBag:
However, there are some important differences between the two. The List interface requires data to be kept in positional order. Thus, we cannot arbitrarily move data around. The Bag always removed Nodes from the front and moved data to allow arbitrary delete; lists will not allow that. Lists also require that we allow the user to insert and remove in a given position, which means that we will need to add and remove Nodes from the middle of the list. This was not needed for LinkedBag since Bags had no position.
Let's focus on the parts of the LinkedList that differ from the LinkedBag. We'll begin with the remove
method:
What do we need to do here? We must first get to the object at givenPosition
. Then, we must "remove" it in such a way that the rest of the list is still connected (i.e. we must link the previous node to the next node). So, we must connect the node before it with the node after it. We can then link the "before" node's next
field to the "after" node (which we can get from the node-to-delete's next
field).
The Linked List implementation we will look at later has a private method for getting a reference to the node. It walks along the linked list, counting the nodes as it goes. Once it gets to the given position, it returns that reference.
Why do we start counting at one? How can we use this to get the locations of all three nodes we need?
Notice that if givenPosition
> numberOfEntries
, an assertion occurs (crashing the program). Why don't we handle this problem? The method is private. The idea is that as class designers, we make sure the error cannot occur, that is why it is an assert
. Users of the class cannot call this method, so there is no problem for them. The index test is done before getAtNode
is called (e.g. see getEntry(int givenPosition)
).
Is there anything else we should be concerned with when trying to delete a node? Let's discuss them.
Let's take a look at the implementation:
A complete implementation of the Linked List can be found here: LList.java.
We discussed before that if we are inserting a node at the end of the list, we must traverse the entire list first to find the last previous node. This is inefficient if we do a lot of adds to the end of the list (we'll discuss the particulars later). We could save time if we kept an additional instance variable (lastNode
) that always refers to the end of the list. Now, adding to the end of the list is easy. However, this has some side-effects. What are those side-effects?
Thus, adding an extra instance variable to save time with one operation can increase the complexity of other operations. The complexity increases only by a small amount, but we still need to consider it. Let's look at an operation both without and with the lastNode reference. The text looks at add() methods so let's look at a different one: remove(int givenPosition)
.
When, if at all, will we need to worry about the lastNode
reference for the remove method? As with all methods, we want to think about:
For the normal case, we remove a node from the "middle" of the list and the lastNode
reference does not change at all. What are the special cases? There are two somewhat-related special cases.
Below is the remove(int givenPosition)
method, modified to handle the lastNode
variable. The red and blue code indicate the changes from the remove method shown above to handle the lastNode
instance variable.
Now instead of null
, the last node has a reference to the front node. Why might you want this? Think about adding/removing to the beginning or ending of the linked list. This can be very useful for implementing Queues (we'll see more about this later). What node(s) do you think we should keep track of?
Each node has a link to the one before and the one after it. We call these references "previous" and "next". With links in both directions, we can easily traverse the list in either direction. This gives us more general access and can be more useful, which is more beneficial if we have a reference to the end of the list as well as the beginning, or we make it circular. This is used in the standard JDK LinkedList implementation and in the author's Deque. Some operations may be somewhat faster (which operations do you think?), but there is more overhead involved (what do you think that overhead is?). How do you think the runtime of operations like binary search on a doubly linked list compare to binary search on a singly linked list (or on an array)?
What are the Big-O complexities for our List implementations? We saw for the Bag that it did not matter (much) whether we used an array or a linked list. Can we say the same for the ListInterface? Let's look at one operation in particular to highlight the difference: getEntry(int i)
. This accesses an arbitrary location in the list (See LinkedVSArrayList.java). Let's compare our AList and LList implementations with regard to this operation.
For the AList, we simply index our array, so what runtime do you think it has?
What about our LList? Now, it depends on the index. Sequential access requires us to traverse the list i Nodes to get to Node i. So, what's the worst-case runtime? But, it could be less, depending on where the object is located. So maybe we should also consider the average case here, just to be thorough.
To do this, we need to make an assumption about the index chosen. Let's assume that all index values are equally likely (i.e. uniformly distributed). If this is not the case, we can still do the analysis, if we know the actual probability distribution for the index choice. Let i be the index we want to access (in probability, we would call this a Random Variable) and let P(i) be the probability of choosing i. Since we assume a uniform distribution, P(i) = 1/N for all i. Let's define our key operation to be "looking at" a node in the list. So for a given index i, we will require i operations; let's call this value Ops(i).
Now define the average number of operations to be:
In an absolute sense, this is better than the worst case, but asymptotically it is the same (why?). So, in this case the worst and average cases for getEntry
on a linked list are the same: O(N).
Recall that the ListInterface
is just a set of methods that indicate the behaviors of classes that implement the interface. As we saw, classes could use arrays or linked lists (or possibly another data structure) to store the data. So, how could a user of any List implementation access the data in a sequential way? Using what we saw above, we could:
toArray()
method and step through it like you would with any other arraygetEntry()
method to directly step through the ListWhat are the downsides to those approaches?
An iterator is an object that allows us to iterate through a list in a sequential way, regardless of how the list is implemented. The details of how we progress are left up to the implementer. The user of the interface just knows that it goes through the List in order.
Why do we need these? What good are they? We will see that the implementation can be a bit convoluted, leading to questions like "are these things really worth while?" They are worth while for two main reasons:
For the first reason (multiple co-existing iterations), consider the following situation. We have a set of data and we want to find the mode of that set (the most frequent value). How can we do this? Start at the first value and count how many times it occurs through the rest of the list. Then proceed to the next value and count how many times it occurs. Repeat for each value in the List, keeping track of the value with the highest count. See the in-class example with this List:
49 | 16 | 49 | 55 | 24 | 16 | 33 | 16 | 24 | 55 | 16 | 24 | 49 |
Note that we have two separate "iterations" through the list being accessed in the same code. One is going through the list, identifying each item. The other is counting the occurrences of that item. Logically, they are separate, even though they are progressing through the same List. For a List, we can also do this with nested for loops and the getEntry() method. However, the implementation of getEntry() is very inefficient for a linked list (as discussed above).
This brings us to the second main reason iterators are worth while: You can tailor the implementation to the data structure. Consider again LinkedVSArrayList.java where we were using getEntry()
to get each item in a List. For the AList, this was fine since we have direct access to the locations. However, for LList this had terrible runtime, giving us O(N2) to get all of the values.
The Linked List implementation had such poor runtime because each getEntry()
method call restarts at the beginning of the linked list. What if we could "remember" where we stopped the last time and resume from there the next time? An iterator tailored to a linked list can do this for us, thereby saving a lot of time. Let's take a look at how this would work on the board.
Iterators tend to have these three operations:
public boolean hasNext()
public T next()
public void remove()
Many different data structures can be iterated over, so these can't be specific to the List ADT. Additionally, these are separate from any other functionality that a given class might have. Therefore, it is created as an interface. In fact, that's what Java's Iterator interface looks like:
This is a simple iterator that can be used with most Collections, but how is this interface implemented? Also, where is it implemented? We want it to be part of a List, but how can that be done, since List is itself an interface? This is a bit convoluted, so we need to consider this carefully.
There are two ways we can implement this interface:
We can combine the ideas of an internally-implemented and externally-implemented iterator to produce a useful iterator. We write our list classes so that each has the ability to generate an iterator object that allows sequential access to its elements, without violating data abstraction. Thus, the iterator object is separate (but related to) the underlying list that it iterates over. Multiple iterator objects can be created for a given list, each with its own current "state". This is actually an external implementation and it's the technique used in standard Java. Since this is the preferable implementation, we'll look at it in some detail. Note that this will heavily rely on object-oriented ideas and coding, so keep that in mind.
We can implement this with only a single extra method added to our List:
This will return an iterator built on top of the current list, but with its own "state" so that multiple iterators can be used on one list. Let's look at that method for the linked list implementation:
So, the method is a little anti-climactic. It turns out that almost all of the work is done in the IteratorForLinkedList
class. This class will be built on the current list and will simply have the ability to go through all of the data in the list in an efficient way. Since it is tailored to the linked list, we can make it a private (inner) class and it can directly access our linked list instance variables. Let's look at the details in LinkedListIteratorExample.java and LinkedListWithIterator.java (and ListWithIteratorInterface.java).
Let's now focus on the implementation. Recall that we said the iterator could be tailored to the underlying list. The interface is the same, but the way it is done depends on whether the list is implemented with an array or a linked list. The Linked List implementation uses a Node reference as the sole instance variable for the iterator. It is initialized to firstNode
when the iterator is created and it progresses down the list with each call to next()
. Note that with a single Node reference, remove()
is not possible. Why?
So what would we need for the array implementation?
The Iterator interface can be used for any Java Collection. This includes our List<T>
interface, but also others. For a List, we can add more functionality to our iterator, such as traversing in both directions rather than one direction only. However, we must consider whether these additional functionalities have any implications on our existing implementations.
Note that this iterator is bidirectional, and it allows objects to be added or removed. What does that mean for a linked list iterator?
As we discussed previously for Iterator, the best way to implement a ListIterator is to implement it "externally", meaning that the methods are not part of the class being iterated upon. We build a ListIterator object on top of our list so we can have multiple iterations at once. We also want to make the class that implements the ListIterator an inner class so that it has access to the list details. This allows us to tailor our ListIterator to the underlying data structure in the most efficient way. However, we need a bit more logic to handle traversals in both directions, as well as the methods that modify the values or the List.
Regarding the logic, it is explained in great detail in the text and you should read it over (see Chapter 15). Let's look briefly at the standard Java ArrayList. The iterators are actually implemented in the parent class, AbstractList (source code). Note that we need to keep track of the current "direction" to allow the remove() and set() methods to work correctly. For example, in order to implement remove() in an array, we need keep track of the index of the last item that was returned. See the AbstractList source code, in particular the private classes Itr
and ListItr
.
Another interesting issue is that the structure of iterators allows for multiple iterations on the same underlying list. However, if we start modifying the underlying list, we can get into a lot of problems. If one iterator modifies the list it will affect the other, and it could lead to an exception. Because of this, the Standard Java iterators do not allow "concurrent modification". If one iterator modifies the list, other current iterators are invalidated, and will generate an exception if used. How do you think that is implemented?
With JDK 1.5, the Iterable interface was introduced. It is simply:
So any class with an iterator can also implement Iterable.
<< Previous Notes | Daily Schedule | Next Notes >> |