You should all be familiar with the fundamental performance equation for CPU (ignoring I/O).
Program execution time = (number of instructions executed) * (average clock cycles per instruction) * (clock period time)
In your multi-cycle CPU, clocks_per_instruction varied from 3 to 6 (plus
memory wait states) depending on the length of the instruction. In this part of
your design will attempt to improve the performance by minimizing the avaerage
number of clock cycles per instruction. We would like to make the average close
to 1. To do this, the system will be designed to to each of the individual
clock cycles that make up the instruction in parallel. Thus, a single clock
cycle (phase) from several instructions will be done at the same time. Ideally,
one new instruction will start and one will finish in each clock cycle.
This way the average clock cycles per instruction will tend to be one. In reality, we won't always achieve this but at least we have a goal.
The figure below is take from the Hennessy and Patterson text. If you do not understand pipelining, you should review this chapter.
To implement your pipeline you will need to modify your state controller and add some pipeline registers between pipeline stages. This is one of the possible ways:
There will be situations that have to be taken care of, such as data and control hazards. For instance an attempt to read a register right after its value is changed by the previous instruction, which is not finished yet (RAW hazard).
You will have to solve the following hazards by the following methods:
You do NOT have to implement exception instructions, including syscall and move from/to coprocessors.
You are encouraged to read the text book about the pipelined processors and hazards, and apply this knowledge to your design.