Fault Tolerant Parallel and Distributed Systems
Mondays and Wednesdays from 10:00AM to 11:20AM
Rami Melhem (firstname.lastname@example.org)
219 Mineral Industries Building,
Mondays from 1:00 to 3:00
Thursdays from 1:00 to 3:00
This seminar couse deals with the study of the principles of Fault Tolerant
Computing. Topics covered include: Error detecting and correcting codes,
hardware, software and time redundancy techniques, fault tolerant
multiprocessors, system diagnosis and fault tolerance software,
fault tolerance in distributed and real-time systems and
performance and reliability evaluation techniques.
Reference text books (not required)
Fault-Tolerant Computer System Design, by Dhiraj Pradhan - Prentice Hall.
Fault Tolerance in Distributed Systems, by Pankaj Jalote - Prentice Hall.
Requirements and grading:
One exam in October (15%).
Paper presentations and class participation (25%).
Class project and report (60%).
Presentations by the instructor
Simple error detecting/correcting codes.
Types of redundancy (hardware, software and time redundancies).
Performance and reliability evaluation techniques.
Fault tolerance in distributed systems.
Fault tolerance in real-time systems.
Fault tolerance in Multiprocessor systems.
Presentations by students - possible topics:
Presentations and Demonstrations of Class Projects