Queues and Priority Queues

Introduction to Queues

Key questions for this section:
- What is a queue?

Data is added to the end and removed from the front. Logically, the items other than the front item cannot be accessed. Some examples of real-life queues are:

Bowling ball return lane: Balls are put in at the end and removed from the front, and you can only see / remove the front (easily ... that's one way this isn't quite like a Queue ADT)
Check-out lanes at the supermarket: People enter at the end of the line and the person at the front of the line are served. (in fact, in British English, these lines are called queues)

The fundamental operations of a queue are:

enqueue - add an item to the end of the queue
dequeue - remove the item from the front of the queue
front - look at the front item in the queue

A Queue organizes data by First In First Out, or FIFO (or LILO - Last In Last Out). Like a Stack, a Queue is a simple but powerful data structure. They're used extensively for simulations. Many real life situations are organized in FIFO, and Queues can be used to simulate these. (Simulations are useful because they allow problems to be developed and analyzed on the computer, saving time and money.)

For example, a bank wants to determine how best to set up its lines to the tellers:

Option 1: Have a separate line for each teller
Option 2: Have a single line, with the customer at the front going to the next available teller

How can we determine which will have better results? We could try each one for a while and measure throughput, customer satisfaction, etc. Obviously this will take time and may create some upset customers. An alternative would be to simulate each one using reasonable data and compare the results. This is just a simple example, but other (often more complex) problems can also be solved through simulation.

Queue Implementation

Key questions for this section:
- What properties should queue operations have?
- How can queues be implemented?

Some properties we'd like:

Be able to hold many values all of the same (or polymorphically the same) type
We need a structure that has access to both the front and the rear.
We'd like both enqueue and dequeue to be as efficient as possible (preferably O(1)).

We've see two basic data storage techniques that allow many values of the same type to be stored:

Linked-list
Array

Let's take a look at both to see whether/how each could be used.

Linked-List Implementation

This implementation is fairly straightforward as long as we have a doubly linked list and access to the front and rear of the list. Enqueue simply adds a new object to the end of the list. Dequeue simply removes an object from the front of the list. Other operations are also simple. We can build our Queue from a LinkedList object, making the implementation even simpler. This is basically what Java does with the Queue interface and the LinkedList class.

Note that Java's Queue is an interface. The LinkedList class implements Queue (among other interfaces). Even though LinkedList can do a lot more than just the Queue operations, if we use a Queue reference to the object, we restrict it to the Queue operations. Compare this to the Stack, which was implemented as a class. However, the textbook author also uses an interface, but implements the Queue from scratch. See LinkedQueue.java, where the Queue is implemented as a linked list with front and rear references.

Can we use another linked list implementation instead? How would we use it to implement:

front
enqueue
dequeue

The text takes this notion one step further. The logic of enqueue and dequeue is the same. However, when we dequeue, rather than removing the node (and allowing it to be garbage collected), we instead just "deallocate it" ourselves. This way we save some overhead of creating new nodes all the time. So, we keep two references: queueNode and freeNode. queueNode is the front of the queue and will be the next node dequeued. freeNode is the rear of the queue and points to the next node to be enqueued (if there aren't any nodes left, then we create a new node). Let's take a look at how this works on the board. See TwoPartCircularLinkedQueue.java.

Array Implementation

Arrays that we have seen so far can easily add at the end, so enqueue is not a problem. What is the runtime of an enqueue operation? Since queues don't place a limit on the number of elements in the queue, what if we need to resize the array? Removing from the front is trickier. In the ArrayList, removing from the front causes the remaining objects to be shifted forward. This gives a runtime of O(N), not O(1) as we would like. So, we will not use an ArrayList. Instead, we will work directly with an array to implement our Queue.

How can we make dequeue an O(1) operation? Is it even possible? What if the front of the Queue could "move" (so, the front may not always be at index 0). We would then keep a head index to tell us where the front is (and a tail index to tell where the end is).

So given these ideas, we can now enqueue at the rear by incrementing the tail index and putting the new object in that location and we can dequeue in the front by simply returning the head value and incrementing the head index.

This implementation will definitely work, but it has an important drawback. Both enqueue and dequeue increment index values. Once we increment front past a location, we never use that location again. Thus, as the queue is used, the data migrates toward the end of the array. Clearly this is wasteful in terms of memory. What can we do to fix this problem? We need a way to reclaim the locations at the front of the array without spending too much time (so, shifting is not a good idea). Any ideas?

When we increment the front and rear index values we do so mod the array length, that is:

backIndex = (backIndex + 1) % queue.length;
queue[backIndex] = newEntry;

As long as backIndex+1 is less than queue.length, the result is a normal increment. However, once backIndex+1 == queue.length, taking the mod will result in 0, returning us to the beginning of the array.

Now, how do we know if the queue is empty or full? Both indexes move throughout the array. So, when front == (back+1) % queue.length, the array is either full or empty. One easy solution is to keep track of the size with an extra instance variable. The text doesn't want to do that (even though the size of a queue is often needed). Instead, they keep one location in the array empty, even if the queue is full. So, the array is full when front == (back + 2) % queue.length and is empty when front == (back + 1) % queue.length. See ArrayQueue.java.

Interlude: Array vs. Linked List Implementations

Key questions for this section:
- How do you decide between array vs. linked list implementations of ADT?
- Why study ADT implementations at all?

So far we have discussed both array- and linked list-based data structures:

So is it better to use an array or a linked list?

Consider Stack and Queue ADTs. As long as the resizing is done in an intelligent way, the array versions of these tend to be a bit faster than the linked list versions. For stacks, push(), pop() are O(1) amortized time for both implementations, but they are a constant factor faster in normal use with the array version. For queues, enqueue(), dequeue() are O(1) amortized time for both implementations, but they are a constant factor faster in normal use with the array version. But notice that the ArrayList does not automatically "down" size when items are deleted, so the ArrayList-based Stack will not either. This could waste memory if it previously had many entries.

In general, you need to decide for a given application which implementation is more appropriate. However, in most programming situations, these data structures are already defined for you in a standard library, such as:

Java Collections Framework (e.g. stack is array-based, queue is linked list-based)
C++ Standard Template Library
Python's Standard Class Library

It's still good to understand how they are implemented, but more often than not we just use the standard version, due to convenience.

Priority Queues

Key questions for this section:
- What is a priority queue? How does it differ from a queue?
- How do you implement a priority queue?

The priority queue is similar to queues and stacks in that you cannot access an arbitrary element in the data structure, but rather must access only one position. However, whereas Queues have FIFO ordering and Stacks have FILO ordering, priority queues use an ordering determined by a priority (e.g. alphabetical order). In CS/COE 1501, you will see situations where priority queues are essential to how certain algorithms work (e.g. Dijkstra's algorithm, Huffman coding, and Prim's algorithm).

The methods for a priority queue are similar in nature to regular queues and stacks, but different in implementation:

add an item to the priority queue (similar to enqueue)
remove and return the highest priority item (similar to dequeue)
peek at the highest priority item (similar to front)

The big difference in implementations is in how the items are stored and removed in the correct order. See PriorityQueueInterface.java. Why do you think the generic type must be Comparable?

Implementation

Let's consider different ways of implementing a priority queue and the space and time efficiency of those implementations.

First, an unsorted array. How might we implement:

add
peek
remove

What would the runtimes of those implementations be?

Now what about a sorted array. How might we implement:

add
peek
remove

What would the runtimes of those implementations be?

We could consider a sorted and unsorted linked list, but the unsorted linked list is similar to an unsorted array. The sorted linked list would be worse than a sorted array (why?).

For any of the above implementations, consider a sequence of N adds followed by N removes. In all cases with a simple array or linked list, one of the operations (either add or remove) is linear. Thus, if we consider N adds followed by N removes, the total runtime will be N² by the following logic (we consider the case of the unsorted array, other cases are similar):

Each add() will be O(1) for a total of N*O(1) = O(N)
- Will the add always be O(1)? What about when the array needs to be resized?
The removes:
1. First remove will require N comparisons to find the highest priority item
2. Second remove will require N-1 comparisons (why only N-1?)
3. Third remove will require N-2 comparisons (why only N-2?)
4. ...
Adding up those operations, we get the sum: 1 + 2 + … + N which we know evaluates to N(N+1)/2 = O(N²).

Thus, we get O(N) + O(N²) = O(N²) runtime. In the amortized case, we have O(N²)/N = O(N) per operation, which is more time than we'd like for the operations.

Why bother looking at adds and removes? Why not only look at one?

We can do better than O(N) amortized time per operation. We can use the Heap data structure. The basic idea of a heap is to partially order the data in a logical complete binary tree. That is, for each node in the tree (T):

T.data has a higher priority than T.leftChild.data
T.data has a higher priority than T.rightChild.data

Note that nothing is said about how T.leftChild.data and T.rightChild.data compare to each other. This is why it is a partial ordering.

Higher priority here can mean either greater than or less than in terms of the value. A Min Heap is where the highest priority value is the smallest (e.g. rankings in a race). A max heap is where the highest priority value is the largest (e.g. goals in a game). The logic is the same for both. See HeapPriorityQueue.java and MaxHeapInterface.java. Let's take a look at an example on the board.

How do we implement the Priority Queue (Max Heap) operations:

peek() (getMax()): easy, just get the root of the tree
add() (add()): not so easy
remove() (removeMax()): even trickier

For both add and remove, we are altering the tree, so we must ensure that the heap property is reestablished. We need to carefully consider where / how to add and remove to keep the tree valid but also not cost too much work.

For add, add a new node at the next available leaf. Then, push the node "up" the tree until it reaches its appropriate spot. We'll call this upHeap since the node is being pushed up the heap. Let's see how this would work.

For remove, we must be careful since the root may have two children. In fact, we're dealing with a problem similar to deleting a node with two children from a Binary Search Tree. To delete the root node will require a major reworking of the tree. So, instead of deleting the root node, we can just overwrite its value with that of the last leaf. Then we delete the last leaf since it's easy to leaf nodes (especially the last). This guarantees that the tree is still complete. Now, the problem is that the new root value may not be the max. So, we must push the node "down" the tree until it reaches its appropriate spot. We'll call this downHeap. Let's see an example on the board.

So what's the runtime of add and remove from a heap? Recall that a complete binary tree has height O(log(N)). In the worst case, both upHeap and downHeap traverse the height of the tree. Thus, add() and remove() are always O(log(N)) in the worst case.

Repeat the analysis from above (where we looked at N adds and N removes) for the heap implementation of a Priority Queue:

Each add() will be O(log(N)) for a total of N*O(log(N)) = O(N*log(N))
Each remove() will also be O(log(N)) for a total of N*O(log(N)) = O(N*log(N))

This gives a total of 2*O(N*log(N)) = O(N*log(N)). Therefore, the amortized operations are O(N*log(N)) / N = O(log(N)) each. This is definitely superior to either the array or the linked list implementation.

Implementing a Heap

To implement a heap, we could use a linked binary tree, similar to that used for a Binary Search Tree. This will work, but we have overhead associated with dynamic memory allocation and access to elements:

to go up and down we need child and parent references
we must keep track of "last leaf in tree" reference
traversing the tree could use recursion, which also incurs overhead

But note that we are maintaining a complete binary tree for our heap. It turns out that we can easily represent a complete binary tree using an array.

The idea behind an array-implemented complete binary tree is that we number the nodes row-wise starting at 1. We can then use these numbers as index values in the array. Thus, for node at index i:

Parent(i) = i/2
LeftChild(i) = 2*i
RightChild(i) = 2i+1

Now we have the benefit of a tree structure with the speed of an array implementation. See MaxHeap.java.

<< Previous Notes

Daily Schedule

Next Notes >>