Programming Assignment 3

Caching: TLB's and Virtual Memory


Introduction

In this assignment we use caching for two purposes. First, we use a software-managed translation lookaside buffer (TLB) as a cache for page tables to provide the illusion of fast access to virtual page translation over a large address space. Second, we use memory as a cache for disk, to provide the abstraction of an (almost) unlimited virtual memory size, with performance close to that provided by physical memory. We provide no new code for this assignment (the only change is that you need to compile with the VM and USE_TLB flags); your job is to write the code to manage the TLB and to implement virtual memory.

The assumption is that the hardware knows nothing about page tables. Instead it only deals with a software-loaded cache of page table entries, called the TLB. On almost all modern processor architectures, a TLB is used to speed address translation. Given a memory address (an instruction to fetch, or data to load or store), the processor first looks in the TLB to determine if the mapping of virtual page to physical page is already known. If so (a TLB ``hit''), the translation can be done quickly. But if the mapping is not in the TLB (a TLB ``miss''), page tables and/or segment tables are used to determine the correct translation. On several architectures, including Nachos, the DEC MIPS, and the HP Snakes, a ``TLB miss'' simply causes a trap to the OS kernel, which does the translation, loads the mapping into the the TLB and re-starts the program. This allows the OS kernel to choose whatever combination of page table, segment table, inverted page table, etc., it needs to do the translation. On systems without software-managed TLB's, the hardware does the same thing as the software, but in this case, the hardware must specify the exact format for page and segment tables. Thus, software managed TLB's are more flexible, at a cost of being somewhat slower for handling TLB misses. If TLB misses are very infrequent, the performance impact of software managed TLB's can be minimal.

The illusion of unlimited memory is provided by the operating system by using main memory as a cache for the disk. For this assignment, page translation allows us the flexibility to get pages from disk as they are needed. Each entry in the TLB has a valid bit: if the valid bit is set, the virtual page is in memory. If the valid bit is clear or if the virtual page is not found in the TLB, a software page table is needed to tell whether the the page is in memory (with the TLB to be loaded with the translation), or the page must be brought in from disk. In addition, the hardware sets the use bit in the TLB entry whenever a page is referenced and the dirty bit whenever the page is modified.

When a program references a page that is not in the TLB, the hardware generates a TLB exception, trapping to the kernel. The operating system kernel then checks its own page table. If the page is not in memory, it reads the page in from disk, sets the page table entry to point to the new page, and then resumes the execution of the user program. Of course, the kernel must first find space in memory for the incoming page, potentially writing some other page back to disk, if it has been modified.

As with any caching system, performance depends on the policy used to decide which things are kept in memory and which are only stored on disk. On a page fault, the kernel must decide which page to replace; ideally, it will throw out a page that will not be referenced for a long time, keeping pages in memory those that are soon to be referenced. Another consideration is that if the replaced page has been modified, the page must be first saved to disk before the needed page can be brought in; many virtual memory systems (such as UNIX) avoid this extra overhead by writing modified pages to disk in advance, so that any subsequent page faults can be completed more quickly.


Problem 1

Implement software-management of the TLB. For this, you will need to implement some kind of software page translation, for handling TLB misses. Note that with the compile time flag -DUSE_TLB, the hardware no longer deals with page tables; thus, you need to do something about making sure the TLB state is set up properly on a context switch. Most systems simply invalidate all the TLB entries on a context switch; the entries get re-loaded as the pages are referenced. For Problem 2, your page translation scheme should keep track of the dirty and use flags for each page set by hardware in the TLB entry.


Problem 2

Implement virtual memory. For this, you will need routines to move a page from disk to memory and from memory to disk. We recommend that you use the Nachos file system as backing store -- this way, when we implement the file system in Assignment 4, we'll be able to use the virtual memory system as a test case. In order to find unreferenced pages to throw out on page faults, you will need to keep track of all of the pages in the system which are currently in use. A simple way to do this is to keep a ``core map'', which is basically a reverse page table -- instead of translating virtual page numbers to physical pages, a core map translates physical page numbers to the virtual pages that are stored there.


Problem 3 (Extra Credit)

Implement memory-mapped files. The traditional way to access the file system is via Read and Write system calls, but that requires an extra level of copying between the kernel and the user level (since Read and Write transfer data through a temporary kernel buffer). A different interface is simply to "map" the file into the virtual address space; the program can then use load and store instructions directly on the file data. (An alternative way of viewing the file system is as "durable memory"; files just store data structures. If you access data structures in memory using load and store instructions, why not access data structures in files the same way?)

In order to provide memory-mapped files to user programs, you need to implement an "mmap" system call, that takes an open file descriptor, and a location in the address space to put the file:

char* mmap(OpenFileId id, char* requestedAddress, int* length); 
Here, id is the OpenFileId of the file to mmap; requestedAddress points to the virtual address at which the program would like to map the file. If this is NULL (i.e. 0), the kernel may either (a) choose the virtual address on its own (through complicated dynamic virtual address space allocation that most of you will probably not want to do), or (b) fail the call. If the program picks the virtual address, it must be aligned on a page boundary, and all the pages that would be needed by the mmap must not overlap any other allocation (e.g. the program's code, data, and stack, or another mmap'ed region). The kernel does not need to mmap more pages as the file grows.

length points to a 32-bit integer that the kernel should fill with the length of the region mapped (beware endianness!).

mmap returns the virtual address of the beginning of the mapped region. If requestedAddress was non-NULL, mmap must only try to use this address. Otherwise, mmap is free to allocate any virtual address region that does not overlap with any other allocation.

From part 2, you already have the mechanism to bring missing pages of the mapped file into memory -- in fact, you can use the same mechanism to bring in code and data pages from the executable on program startup.

Note that the consequence of supporting mmap is that address spaces will be sparsely populated with (potentially lots of) segments, one per open file. For parts 1 and 2, you should design your software translation mechanisms with this in mind.

Implement a utility program (such as UNIX "cp") that exploits the new interface.

Modifications to the file are committed to disk when the file is closed. Note that you may have consistency problems if you call Read or Write on a mmap'ed file. We do not require that you deal with this, but you should know how you could deal with it.


Evaluation

Evaluate the performance of your system Cache misses (in this case, TLB misses and page faults) can be divided into three
categories.

1. Compulsory misses are those due to the _rst reference to a cached item; no matter what, you have to pull each
referenced page o_ disk and put it into memory and into the TLB.

2. Capacity misses are those due to the size of the cache; if the \working set" of the program is larger than main
memory or the number of TLB entries, the program will incur misses. Capacity misses are those that would not
occur in an in_nite sized cache.

3. Conflict misses are those due to the replacement policy of the cache. These would not occur if the cache used
an "optimal" replacement policy, for the same program running on the same size cache.
Write a set of "useful" user programs that demonstrate both a small and large number of each kind of miss, for both
the TLB and paging from disk. In other words, write one test program that that demonstrates a small number of
capacity TLB misses, then one that demonstrates a small number of capacity page faults, then one that demonstrates
a large number of capacity TLB misses, etc.

As an example, both sort.c and matmult.c in the "test" directory demonstrate a large number of conflict misses for
most standard paging policies.

For each test case, explain its performance on your system, and say how you might improve the performance of
your system. You will probably find it useful to reduce the size of main memory (in machine.h), to more quickly
incur paging behavior.


Starting and using the Code

As mentioned above, there is no new code. You need to run a Makefile with different flags set.

It is probably helpful to start with the code you implemented so far. So, you can go to your current nachos project directory.

1. Go to the code directory and then change dir to /vm and make sure it exists.

2. Use the Makefile under /code/Makefile . You should now be ready.

3. Type make in the code directory and check that you get a nachos executable in the directory vm/.

4. If you run ./nachos -x ../test/halt you should get an assertion that the TLB is uninitialized.


Grading

Although there is no formal walkthrough for this assignment, you might want to answer the following questions to

help yourself learn the code.

1. Explain how address translation works in this assignment. Be specific! As you are reading the code, note that now the flags

"VM" and "USE TLB" are defined. This means that any code contained within an #ifdef VM until #endif will be included by the preprocessor and compiled.

2. Where is the TLB defined? How big is it?

3. When does the TLB get updated? Is there more than one place?

4. What happens on a TLB miss?

5. Do you need a page table? What data structures does the OS need to keep to handle memory references?

6. What page replacement policy will you use? Discuss implementation

7. What prefetching policy will you use? Discuss implementation

8. What cleaning policy will you use? Discuss implementation

 

Due date: Friday July 28 by 11:59pm.


What to turn in:
1. Short description (less than a page) of: the main challenge of this project and what was most beneficial to your learning.
2. Pack all the project files to a FirstName_LastNameProj3.zip file, using WinZip or a similar application.
3. Submit the zipped package by using the computer science department's anonymous FTP server
(cs.pitt.edu/incoming/CS1550/khalifa/
or ftp://ftp.cs.pitt.edu/incoming/CS1550/khalifa/). Please make sure to submit to the project number directory (project3).

Submission must be before the due date, otherwise, it will be considered late. Once you submit, you can't take the file back!
Therefore, please submit your final program and only one time.

Good luck!