一、介绍


这个实验的目的是向 JOS 中添加内存管理,包括两部分内容:

  1. physical memory allocator: for the kernel,粒度为 4096B,一页(Page)的大小,选用适当的数据结构来记录 Physical Page 的分配情况和有多少进程在共享这些 Physical Page。

  2. virtual memory: which maps the virtual addresses used by kernel and user software to addresses in physical memory,映射通过 MMUPage table 共同完成。

 

二、准备


在实验之前,先看下面五个文件:

  1. inc/memlayout.h
  2. kern/pmap.c
  3. kern/pmap.h
  4. kern/kclock.h
  5. kern/kclock.c

memlayout.h describes the layout of the virtual address space that you must implement by modifying pmap.c. The code in pmap.c needs to read this device hardware in order to figure out how much physical memory there is, but that part of the code is done for you: you do not need to know the details of how the CMOS hardware works.

memlayout.h and pmap.h define the PageInfo structure that you’ll use to keep track of which pages of physical memory are free.

kclock.c and kclock.h manipulate the PC’s battery-backed clock and CMOS RAM hardware, in which the BIOS records the amount of physical memory the PC contains, among other things.

Pay particular attention to memlayout.h and pmap.h, since this lab requires you to use and understand many of the definitions they contain. You may want to review inc/mmu.h, too, as it also contains a number of definitions that will be useful for this lab.

下面是 inc/memlayout.h 中设置的虚拟内存布局图例:

                                                Permissions
                                                kernel/user
        4 Gig -> +------------------------------+
                 |                              | RW/--
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                 :            ...               :
                 |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
                 |                              | RW/--
                 |   Remapped Physical Memory   | RW/--
                 |                              | RW/--
    KERNBASE, -> +------------------------------+ 0xf0000000  -----------------+
    KSTACKTOP    |     CPU0's Kernel Stack      | RW/-- KSTKSIZE(8*PGSIZE)     |
                 | - - - - - - - - - - - - - - -|                              |
                 |      Invalid Memory (*)      | --/-- KSTKGAP(8*PGSIZE)      |
                 +------------------------------+                              |
                 |     CPU1's Kernel Stack      | RW/-- KSTKSIZE               |
                 | - - - - - - - - - - - - - - -|                         PTSIZE(1024*4096B
                 |      Invalid Memory (*)      | --/-- KSTKGAP                |
                 +------------------------------+                              |
                 :              .               :                              |
      MMIOLIM -> +------------------------------+ 0xefc00000  -----------------+
                 |       Memory-mapped I/O      | RW/--  PTSIZE
ULIM,MMIOBASE -> +------------------------------+ 0xef800000      
                 |  Cur. Page Table (User R-)   | R-/R-  PTSIZE    +-------+-----+-----+
         UVPT -> +------------------------------+ 0xef400000 ----> | 0x3BD | 0x0 | 0x0 |
                 |          RO PAGES            | R-/R-  PTSIZE    +-------+-----+-----+
       UPAGES -> +------------------------------+ 0xef000000
                 |           RO ENVS            | R-/R-  PTSIZE
   UTOP,UENVS -> +------------------------------+ 0xeec00000
   UXSTACKTOP -> |     User Exception Stack     | RW/RW  PGSIZE(4096B)
                 +------------------------------+ 0xeebff000
                 |       Empty Memory (*)       | --/--  PGSIZE
    USTACKTOP -> +------------------------------+ 0xeebfe000
                 |      Normal User Stack       | RW/RW  PGSIZE
                 +------------------------------+ 0xeebfd000
                 |                              |
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                 .                              .
                 |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
                 |     Program Data & Heap      |
        UTEXT -> +------------------------------+ 0x00800000  ------+
                 |                              |                   |
       PFTEMP -> |       Empty Memory (*)       |                 PTSIZE
                 |                              |                   |
        UTEMP -> +------------------------------+ 0x00400000  ------+
                 |       Empty Memory (*)       |                   |
                 | - - - - - - - - - - - - - - -|                   |
                 |  User STAB Data (optional)   |                 PTSIZE
    USTABDATA -> +------------------------------+ 0x00200000        |
                 |       Empty Memory (*)       |                   |
            0 -> +------------------------------+             ------+

 (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
     "Empty Memory" is normally unmapped, but user programs may map pages
     there if desired.  JOS user programs map pages temporarily at UTEMP.

为了方便找到映射关系,下面贴上物理内存布局:

 +---------------+ <- oxFFFFFFFF (4GB)
 |    32-bit     |
 | memory mapped |
 |    devices    |
 +---------------+
 |               |
 |    unused     |
 |               |
 +---------------+ <- depends on amont of RAM
 |    exetend    |
 |    memory     |
 +---------------+ <- 0x00100000 (1MB EXTPHYSMEM) Begin of kernel
 |    BIOS ROM   |
 +---------------+ <- 0x000F0000 (960KB)
 | 16-bot devices|
 +---------------+ <- 0x000C0000 (768KB)
 |  VGA display  |
 +---------------+ <- 0x000A0000 (640KB IOPHYSMEM)  -------------+
 |               |                                               |
 +---------------+ <- 0x00007c00 begin of bootloader        Low Memory 
 |    stack      |                                               |
 +---------------+ <- 0x00000000   ------------------------------+

这两张图结合 kern/pmap.c 文件,来完成 lab2 的练习。

 

三、Physical Page Management


在 xv6 中,使用 struct run 结构体来描述物理页,而在 JOS 中,则使用 struct PageInfo 来描述,并且与 xv6 不同的是,struct PageInfo 没有内嵌到 free page 中。

通过操作 struct PageInfo 这个数据结构,来完成 Physical Page Allocator

需要实现以下几个函数:

  1. boot_alloc()
  2. mem_init()
  3. page_init()
  4. page_alloc()
  5. page_free()

在开始之前,我们需要关注一下 pmap.c/i386_detect_memory() 这个函数,它通过调用 nvram_read() 读取 NVRAM 来获取物理内存的相关信息:

Ext16mem: 114688K
npages: 32768, npages_basemem = 160, 
Physical memory: 131072K available, base = 640K, extended = 130432K

只有知道了物理内存的大小,才能完成对物理内存的管理。

3.1 boot_alloc()

This simple physical memory allocator is used only while JOS is setting up its virtual memory system.
page_alloc() is the real allocator. If n>0, allocates enough pages of contiguous physical memory to hold ‘n’ bytes. Doesn’t initialize the memory. Returns a kernel virtual address. If n==0, returns the address of the next free page without allocating anything. If we’re out of memory, boot_alloc should panic. This function may ONLY be used during initialization, before the page_free_list list has been set up.

static void *
boot_alloc(uint32_t n)
{
	static char *nextfree;	// virtual address of next byte of free memory
	char *result;

	if (!nextfree) {
		extern char end[]; /*这个值在kernel.ld中定义,是虚拟地址,kernel img的结束地址*/
		nextfree = ROUNDUP((char *) end, PGSIZE); // 
	}

	if (n > 0) {
		result = nextfree; 
		nextfree = ROUNDUP(nextfree + n, PGSIZE);
		if ((uint32_t)nextfree / 1024 / PGSIZE > npages)
			panic("boot_alloc out of memory");
		return result; 
	}
	if (n == 0)
		return nextfree;
	return NULL;
}

注意虽然分配了物理页,但是应该返回分配前的地址。且需返回虚拟地址。

在 kernel.ld 中定义了 end 的值,这个值是一个虚拟地址,而 nextfree 的初始值又是根据 end 得来的,所以 nextfree 也是一个虚拟地址,也可以这么说,虽然我们是在分配物理内存,但是操作的地址只能是虚拟地址,这是因为 kernel 本质上也是一个软件(特殊的软件),在计算机中软件使用的地址都是虚拟地址。只不过在 Kernel 中可以简单的使用 KERNELRBASE 这个偏移来完成虚拟地址与物理地址之间的转换而已。

到目前为止,唯一需要用到物理地址的地方就是设置 PDEPTE 的时候,他们的前20位(PPN)只能存放物理地址(重要!)。

3.2 mem_init()

Set up a two-level page table: kern_pgdir is its linear (virtual) address of the root This function only sets up the kernel part of the address space(ie. addresses >= UTOP). The user part of the address space will be set up later. From UTOP to ULIM, the user is allowed to read but not write. Above ULIM the user cannot read or write.

在 xv6 中,KERNBASE 以上都是属于内核地址空间,而在 JOS 中则不是这样。

在 xv6 中已经学到,每个进程中的 kernel 部分的映射都是一样的,所以这里先设置 kernel 部分的映射,即设置 kernel 部分的 Page Directory 和 Page Table。

//////////////////////////////////////////////////////////////////////
// create initial page directory.

kern_pgdir = (pde_t *) boot_alloc(PGSIZE); /*分配一页物理内存存放Kernel Page Directory*/
memset(kern_pgdir, 0, PGSIZE);

这是第一次使用 boot_alloc() 来分配物理内存,所以 Page Directory 在物理内存上紧跟在 kernel 镜像的后面。

//////////////////////////////////////////////////////////////////////
// Allocate an array of npages 'struct PageInfo's and store it in 'pages'.
// The kernel uses this array to keep track of physical pages: for
// each physical page, there is a corresponding struct PageInfo in this
// array.  'npages' is the number of physical pages in memory.  Use memset
// to initialize all fields of each struct PageInfo to 0.
// Your code goes here:

pages = (struct PageInfo *)boot_alloc(npages * sizeof(struct PageInfo));
memset(pages, 0, npages * sizeof(struct PageInfo));

这是第二次使用 boot_alloc(),紧跟在 Page Directory 后面分配了 npages*6B 大小的物理内存,用来存放物理页的分配情况 PageInfo,注意这个函数还有一部分会在后面完成。

3.3 page_init()

Tracking of physical pages. The ‘pages’ array has one ‘struct PageInfo’ entry per physical page. Pages are reference counted, and free pages are kept on a linked list(page_free_list). Initialize page structure and memory free list. After this is done, NEVER use boot_alloc again. ONLY use the page allocator functions below to allocate and deallocate physical memory via the page_free_list.

按照下面的步骤完成函数:

  1. Mark physical page 0 as in use.This way we preserve the real-mode IDT and BIOS structures in case we ever need them. (Currently we don’t, but…)
  2. The rest of base memory, [PGSIZE, npages_basemem * PGSIZE) is free.
  3. Then comes the IO hole [IOPHYSMEM, EXTPHYSMEM), which must never be allocated.
  4. Then extended memory [EXTPHYSMEM, …). Some of it is in use, some is free. Where is the kernel in physical memory? Which pages are already in use for page tables and other data structures?

NB: DO NOT actually touch the physical memory corresponding to free pages!

根据上面的步骤,我画了一个简易的物理内存的使用图(until now):

+------------------+
|        ...       |
|    not allocat   |
+------------------+ <- boot_alloc(0) - KERNBASE
| npage * PageInfo | 
+------------------+ <- pages,这段内存会映射到虚拟内存[UPAGES, UVPT]
|    kern_pgdir    |
+------------------+ <- end - KERNBASE
|      kernel      |
+------------------+ <- EXTPHYSMEM
|     I/O hole     |
+------------------+ <- IOPHYSMEM
|   Base memory    |
+------------------+

pages 这块内存里的 PageInfo 数组,描述了整个物理内存的 Page 分配情况。

npage * PageInfo =32768 * 6B=196608B,而[UPAGES, UVPT]大小为 1024 * 4096B=4194304B,4194304B»>196608B,可见[UPAGES, UVPT]用来映射全部物理页面是足够的。

根据上面的内存使用情况,对pages数组分段初始化,代码如下:

void page_init(void)
{
	size_t i;
	pages[0].pp_ref = 1; /*将第0页标记为使用*/
	for (i = 1; i < npages_basemem; i++)
	{
		pages[i].pp_ref = 0;
		pages[i].pp_link = page_free_list;
		page_free_list = &pages[i];
	}
	for (i = EXTPHYSMEM / PGSIZE; i < (uint32_t)(boot_alloc(0) - KERNBASE) / PGSIZE; i++)
		pages[i].pp_ref = 1;
	for (i = PADDR(boot_alloc(0)) / PGSIZE; i < npages; i++)
	{
		pages[i].pp_ref = 0;
		pages[i].pp_link = page_free_list;
		page_free_list = &pages[i];		
	}
}

屏幕快照 2019-09-17 上午10.49.17.png

注意通过 boot_alloc(0) 得到的是虚拟地址,需要将其转换成物理地址使用。

3.4 page_alloc()

Allocates a physical page. If (alloc_flags & ALLOC_ZERO), fills the entire returned physical page with ‘\0’ bytes. Does NOT increment the reference count of the page - the caller must do these if necessary (either explicitly or via page_insert). Be sure to set the pp_link field of the allocated page to NULL so page_free can check for double-free bugs. Returns NULL if out of free memory. Hint: use page2kva and memset

为了能更加直观看出内存的分布情况,将内存中 pages 部分放大来看,就是下面这样:

+------------------+
|      ...         |
|  not allocated   |
+------------------+ --------------+
|      ......      |               |
+------------------+ ----+         |
|     pp_link      |     |         |
+------------------+     1         |
|  pp_ref |  ...   |     |         |-> 这段内存映射到虚拟内存[UPAGES, UVPT]
+------------------+ ----+         |
|     pp_link      |     |         |
+------------------+     0         |
|  pp_ref |  ...   |     |         |
+------------------+ <---+-pages---+
|    kern_pgdir    |
+------------------+ <- end - KERNBASE
|      kernel      |
+------------------+ <- EXTPHYSMEM(1MB)
|       ...        |
+------------------+
|  Page 1 for free |
+------------------+
|  Page 0 in use   |
+------------------+

标记 0 处的 PageInfo 描述了 Page 0 的使用情况,标记 1 处的 PageInfo 描述了 Page 1 的使用情况,以此类推…

struct PageInfo * page_alloc(int alloc_flags)
{
	if (!page_free_list)
		return NULL; /*空闲列表为空的话就不能分配*/
	struct PageInfo *return_page = page_free_list; /*返回链表头*/
	/*分配一个物理页*/
	page_free_list = return_page->pp_link; /**/
	return_page->pp_link = NULL; /*从空闲链表中移除*/
	if (alloc_flags & ALLOC_ZERO)
		/*memset为内核函数,需要操作虚拟地址*/
		memset(page2kva(return_page), 0, PGSIZE);
	return return_page;
}

这里理解一下 page2kva() 这个函数:

static inline physaddr_t page2pa(struct PageInfo *pp)
{
    return (pp - pages) << PGSHIFT; 
}
static inline void* page2kva(struct PageInfo *pp)
{
	return KADDR(page2pa(pp));
}

+----------+
|   ...    |       
+----------+
| PageInfo |
+----------+
| PageInfo |
+----------+ <- pp
|   ...    |
+----------+
| PageInfo |
+----------+
| PageInfo |
+----------+
| PageInfo |
+----------+ <- pages 
|   ...    |
+----------+

这里有个指针方面很基础的知识,由于 PageInfo 与其描述的物理内存中相同偏移量的物理页相对应,所以 pp - pages 结果的含义为 pp 所指的 PageInfo 在 pages 中的偏移量,那么 (pp - pages) << PGSHIFT 就表示这个物理页的物理地址,接着再使用 KADDR() 将物理地址转换为虚拟地址。

3.5 page_free()

Return a page to the free list. This function should only be called when pp->pp_ref reaches 0. You may want to panic if pp->pp_ref is nonzero or pp->pp_link is not NULL.

void page_free(struct PageInfo *pp)
{
    if (pp->pp_ref)
        panic("pp->pp_ref != 0");
    if (pp->pp_link)
        panic("pp->pp_link != NULL");
    pp->pp_link = page_free_list;
    page_free_list = pp;
}

第二个 if 语句保证了只会释放一个物理页。

 

四、Virtual Memory


在开始这部分之前,阅读 Intel 80386 Reference Manual 第五六章。

4.1 Virtual,Linear and Physical Address

A virtual address consists of a segment selector and an offset within the segment.

A linear address is what you get after segment translation but before page translation.

A physical address is what you finally get after both segment and page translation and what ultimately goes out on the hardware bus to your RAM.

         Selector  +--------------+      +-----------+
        ---------->|              |      |           |
                   | Segmentation |      |  Paging   |
Software           |              |----->|           |----->  RAM
       Offset(eip) |  Mechanism   |      | Mechanism |
        ---------->|              |      |           |
                   +--------------+      +-----------+
           VA                        LA                 PA

在 boot/boot.S 中,我们设置了如下的 GDT

# Bootstrap GDT
.p2align 2                           # force 4 byte alignment
gdt:
  SEG_NULL				       # null seg
  SEG(STA_X|STA_R, 0x0, 0xffffffff)  # code seg
  SEG(STA_W, 0x0, 0xffffffff)	       # data seg

这个 GDT 表设置了 3 个 segment descriptor,其中 code seg 和 data seg 的 base address=0 且 limit=0xFFFFFFFF,这就意味着在这个 GDT 的映射下,virtual address 与 linear address 在数值上是相等的。我们将利用 GDT 来设置特权级相关的内容。

所以 GDT 不会在地址转换上做什么变化。一般来说,GDT 对我们程序有影响的就是其 descriptor 中的一些标志位。

回想一下 lab1,我们在 entrypgdir.c 设置过一个简陋的页目录和页表:

pde_t entry_pgdir[NPDENTRIES] = {
    // Map VA's [0, 4MB) to PA's [0, 4MB)
    [0] = ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P,
    // Map VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB)
    [KERNBASE>>PDXSHIFT] = ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P + PTE_W
};
pte_t entry_pgtable[NPTENTRIES] = {
	0x000000 | PTE_P | PTE_W,
	0x001000 | PTE_P | PTE_W,
	0x002000 | PTE_P | PTE_W,
	0x003000 | PTE_P | PTE_W,
	...
  0x3fe000 | PTE_P | PTE_W,
	0x3ff000 | PTE_P | PTE_W,  /*the last PTE*/
};

顺便说一句,上面的 Page Directory/Page Table,GDT 都被编译进了 kernel 里,所以在加载完 kernel 后,这 3 张表的内容自然就存放在了物理内存 kernel 的范围内。

4.1.1 Exercise 2

Use the xp command in the QEMU monitor and the x command in GDB to inspect memory at corresponding physical and virtual addresses and make sure you see the same data.

(gdb) x /10x 0xf0100000
0xf0100000 <_start+4026531828>: 0x1badb002      0x00000000      0xe4524ffe      0x7205c766
0xf0100010 <entry+4>:   0x34000004      0x3000b812      0x220f0011      0xc0200fd8
0xf0100020 <entry+20>:  0x0100010d      0xc0220f80
(gdb) x /10x *0xf0100000
0x1badb002:     Cannot access memory at address 0x1badb002
-------------------------------------------------
(qemu) xp /10x 0x100000
0000000000100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0000000000100010: 0x34000004 0x3000b812 0x220f0011 0xc0200fd8
0000000000100020: 0x0100010d 0xc0220f80

可以看到,通过 VA 或是 PA 查询,内容都是相同的,前提是要执行到设置 entry_pgdir 的后面再查询,页表没设置好是不能正常映射的。

大致瞄了一眼 QEMU monitor 的文档,发现里面还有许多强大的功能,以后再慢慢发掘吧!

在前面学到,启动保护模式后,任何使用到的地址都是虚拟地址(VA),即使是 kernel 也不例外,但由于 kernel 是最接近底层硬件的软件,有时候还是会使用到物理地址(例如在建立 PDE 和 PTE 的时候,存放在 entry 中的 PPN 就是物理地址)。为了在 JOS kernel 中对二者加以区分:uintptr_t 表示虚拟地址,physaddr_t 表示物理地址,它们都是定义自 uint32_t

如果我们在内核中遵守了这个约定,就不能对 physaddr_t 解引用(解引用就意味着要访问内存单元,也就需要 MMU 进行地址转换),因为凡是我们在软件(包括kernel)中使用到的地址,MMU 都会将其当做 VA,即使我们在编程时将它当做物理地址(physaddr_t),但是 MMU 还是会将其当做 VA,并将其翻译成 PA,这样一来就会解析到错误的物理内存处。

在 JOS kernel 中我们经常需要在 VA 与 PA 间转换,由于 entry_pgdirentry_pgtable 的设置,我们只需 VA - KERNBASE 就能得到 PA,同样 PA + KERNBASE 就能得到 VA。

假设下面的一段 kernel 中的代码是正确的:

mystery_t x;
char* value = return_a_pointer();
*value = 10;
x = (mystery_t) value;

因为在第 3 行对 value 进行了解引用,说明 value 指针里是一个虚拟地址,那么在第 4 行里只能将这个指针转换成 uintptr_t 类型,即 mystery_t 只能是 uintptr_t 类型。

4.1.2 Reference counting

在操作系统中,由于对多进程的支持,同一个 Physical Page 经常会经由不同进程的 Page Table 映射到不同位置的虚拟地址空间,我们使用 struct PageInfopp_ref 域来说明目前有多少个进程在引用这个 Page。

一般来说,因为 UTOP 上面的空间是在系统启动时就已经映射好了,并且不能被释放,所以 pp_ref 的值应该等于所有进程中在 UTOP 之下的 Page Tables 中 Physical Page 映射的次数。

We’ll also use it to keep track of the number of pointers we keep to the page directory pages and, in turn, of the number of references the page directories have to page table pages.

通过 page_alloc() 返回的页的 pp_ref 值为 0,一旦我们对这个返回页做了某些操作(例如将其插入进页表),就应该将 pp_ref 的值增加 1。有时会有函数来做这个操作(例如 page_insert()),sometimes the function calling page_alloc() must do it directly.

4.1.3 Page Table Management

完成下面几个操作页表的函数:

  1. pgdir_walk()
  2. boot_map_region()
  3. page_lookup()
  4. page_remove()
  5. page_insert()
(1) pgdir_walk():

Given ‘pgdir’, a pointer to a page directory, pgdir_walk returns a pointer to the page table entry (PTE) for linear address ‘va’. This requires walking the two-level page table structure.

The relevant page table page might not exist yet.If this is true, and create == false, then pgdir_walk returns NULL.Otherwise, pgdir_walk allocates a new page table page with page_alloc.

  1. Hint 1: you can turn a PageInfo * into the physical address of the page it refers to with page2pa() from kern/pmap.h.
  2. Hint 2: the x86 MMU checks permission bits in both the page directory and the page table, so it’s safe to leave permissions in the page directory more permissive than strictly necessary.
  3. Hint 3: look at inc/mmu.h for useful macros that manipulate page table and page directory entries.

结合上面的提示和参考 xv6 中 walkpgdir() 函数的实现,完成代码如下:

pte_t * pgdir_walk(pde_t *pgdir, const void *va, int create)
{
	pde_t *pde;
  pte_t *pgtab;
	struct PageInfo *pp;
	pde = &pgdir[PDX(va)];
	
	if (*pde & PTE_P)
	{
		pgtab = KADDR(PTE_ADDR(*pde)); /*存在相应的page table,那就直接返回相应的PTE*/
		return &pgtab[PTX(va)]; /*返回VA对应的PTE*/
	} else if (!create) {
		return NULL;
	} else {
		pp = page_alloc(1);  /*这里一定要填1,对分配的物理页面置零,不然分配出来的PTE上可能有脏值*/
		if (!pp) return NULL;
		pp->pp_ref++;
		*pde = page2pa(pp) | PTE_P | PTE_W | PTE_U; /*由于是新加的page table,所以要在PDE中标记其存在*/
		pgtab = KADDR(page2pa(pp));
		return &pgtab[PTX(va)]; /*返回VA对应的PTE*/
	}
}
(2) boot_map_region():
Map [va, va+size) of virtual address space to physical [pa, pa+size) in the page table rooted at pgdir. Size is a multiple of PGSIZE, and va and pa are both page-aligned. Use permission bits perm PTE_P for the entries. This function is only intended to set up the ``static’‘ mappings above UTOP. As such, it should not change the pp_ref field on the mapped pages.

结合上面的提示和参考 xv6 中 mappages() 函数的实现,完成代码如下:

static void boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
  char *a, *last;
  pte_t *pte;
  a = (char*)PGROUNDDOWN(va);
  last = (char*)PGROUNDDOWN((va) + size - 1);
  for(;;){
    if((pte = pgdir_walk(pgdir, a, 1)) == 0)
      return -1;
    if(*pte & PTE_P)
      panic("remap");
    /*
     * pte = &pgtab[PTX(va)]; 向虚拟地址中间10位指定的PTE中填充数据
     */
    *pte = pa | perm | PTE_P; 
    if(a == last)
      break;
    /*接着处理下一页*/
    a += PGSIZE;
    pa += PGSIZE;
  }
  return 0;
}

这个函数的目的就是添加 VA 与 PA 间的映射,总结一下就是:先通过 VA 的前 20 位找到 PTE,然后在此 PTE 的后 12 位加入 PA 和标志位相关信息。

为了计算方便,将 xv6 中的两个宏添加到 JOS 的 mmu.h 中:

#define PGROUNDUP(sz)  (((sz)+PGSIZE-1) & ~(PGSIZE-1))
#define PGROUNDDOWN(a) (((a)) & ~(PGSIZE-1))
(3) pgdir_lookup():

Return the page mapped at virtual address ‘va’. If pte_store is not zero, then we store in it the address of the pte for this page. This is used by page_remove and can be used to verify page permissions for syscall arguments, but should not be used by most callers. Return NULL if there is no page mapped at va. Hint: the TA solution uses pgdir_walk and pa2page.

page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
	pte_t *pte;
	physaddr_t pa;
	/*只是查找,不应该建立Page Table,所以第三个参数为0*/
	pte = pgdir_walk(pgdir, va, 0);
	if (!pte || !(*pte & PTE_P)) return NULL;
	if (pte_store)
    	*pte_store = pte;
	pa = PTE_ADDR(*pte);
	return pa2page(pa); /*返回对应的物理页state*/
}
(4) page_remove():

Unmaps the physical page at virtual address ‘va’. If there is no physical page at that address, silently does nothing.

Details:

  1. The ref count on the physical page should decrement.
  2. The physical page should be freed if the refcount reaches 0.
  3. The pg table entry corresponding to ‘va’ should be set to 0.(if such a PTE exists)
  4. The TLB must be invalidated if you remove an entry from the page table.

Hint: The TA solution is implemented using page_lookup, tlb_invalidate, and page_decref.

void page_remove(pde_t *pgdir, void *va)
{
	pte_t *pte = NULL;
	struct PageInfo *pp = page_lookup(pgdir, va, &pte);
	if (!pp)
		return;
	*pte = 0;
	page_decref(pp); /*引用减一*/
	tlb_invalidate(pgdir, va);
}

(5) page_insert():

Map the physical page ‘pp’ at virtual address ‘va’. The permissions (the low 12 bits) of the page table entry should be set to ‘perm PTE_P’.

Requirements

  1. If there is already a page mapped at ‘va’, it should be page_remove().
  2. If necessary, on demand, a page table should be allocated and inserted into ‘pgdir’.
  3. pp->pp_ref should be incremented if the insertion succeeds.
  4. The TLB must be invalidated if a page was formerly present at ‘va’.

Corner-case hint: Make sure to consider what happens when the same pp is re-inserted at the same virtual address in the same pgdir. However, try not to distinguish this case in your code, as this frequently leads to subtle bugs; there’s an elegant way to handle everything in one code path.

RETURNS:

  1. 0 on success
  2. E_NO_MEM, if page table couldn’t be allocated

Hint: The TA solution is implemented using pgdir_walk, page_remove, and page2pa.

int page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
	pte_t *pte = pgdir_walk(pgdir, va, 1);
	if ( !pte ) return -E_NO_MEM;
	pp->pp_ref++;
	if (*pte & PTE_P)
		page_remove(pgdir, va);
	*pte = page2pa(pp) | perm | PTE_P; /*删除掉原来的映射后,建立新的映射*/
	tlb_invalidate(pgdir, va);
	return 0;
}

 

五、Kernel Address Space


JOS divides the processor’s 32-bit linear address space into two parts. User environments (processes), will have control over the layout and contents of the lower part, while the kernel always maintains complete control over the upper part. The dividing line is defined somewhat arbitrarily by the symbol ULIM in inc/memlayout.h, reserving approximately 256MB of virtual address space for the kernel.

5.1 Permissions and Fault Isolation

Since kernel and user memory are both present in each environment’s address space, we will have to use permission bits in our x86 page tables to allow user code access only to the user part of the address space. Note that the writable permission bit (PTE_W) affects both user and kernel code!

The user environment will have no permission to any of the memory above ULIM, while the kernel will be able to read and write this memory. For the address range [UTOP,ULIM), both the kernel and the user environment have the same permission: read-only. This range of address is used to expose certain kernel data structures read-only to the user environment. Lastly, the address space below UTOP is for the user environment to use; the user environment will set permissions for accessing this memory.

再次留意这句话:This range of address([UTOP,ULIM)) is used to expose certain kernel data structures read-only to the user environment.

5.2 Initializing the Kernel Address Space

Now you’ll set up the address space above UTOP: the kernel part of the address space. inc/memlayout.h shows the layout you should use. You’ll use the functions you just wrote to set up the appropriate linear to physical mappings.

从 xv6 的学习中可以知道,设置 kernel 部分的地址空间,也就是建立 kernel 部分的页表相关内容。在 xv6 中对于 kernel 部分,一致的使用 kmap 这个数组来描述 xv6 kernel 部分虚拟内存和物理内存的映射关系:

static struct kmap {
  void *virt; /*e.g. KERNBASE*/
  uint phys_start; /*e.g. 0*/
  uint phys_end; /*e.g. EXTMEM*/
  int perm; /*e.g. PTE_W*/
} kmap[] = {
 { (void*)KERNBASE, 0,             EXTMEM,    PTE_W}, // I/O space
 { (void*)KERNLINK, V2P(KERNLINK), V2P(data), 0},     // kern text+rodata
 { (void*)data,     V2P(data),     PHYSTOP,   PTE_W}, // kern data+memory
 { (void*)DEVSPACE, DEVSPACE,      0,         PTE_W}, // more devices
};

xv6 和 JOS 的虚拟内存布局有挺大区别的,这里不过多描述,参照二者的 mmu.h 即可。

5.4 Exercise 5

这个练习要求完成 mem_init() 中空缺的内容,由于只是映射 kernel 部分,下面只贴出 UTOP 以上的布局:

                                                Permissions
                                                kernel/user
        4 Gig -> +------------------------------+
                 |                              | RW/--
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                 :            ...               :
                 |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
                 |                              | RW/--
                 |   Remapped Physical Memory   | RW/--
                 |                              | RW/--
    KERNBASE, -> +------------------------------+ 0xf0000000  -----------------+
    KSTACKTOP    |     CPU0's Kernel Stack      | RW/-- KSTKSIZE(8*PGSIZE)     |
                 | - - - - - - - - - - - - - - -|                              |
                 |      Invalid Memory (*)      | --/-- KSTKGAP(8*PGSIZE)      |
                 +------------------------------+                              |
                 |     CPU1's Kernel Stack      | RW/-- KSTKSIZE               |
                 | - - - - - - - - - - - - - - -|                         PTSIZE(1024*4096B
                 |      Invalid Memory (*)      | --/-- KSTKGAP                |
                 +------------------------------+                              |
                 :              .               :                              |
      MMIOLIM -> +------------------------------+ 0xefc00000  -----------------+
                 |       Memory-mapped I/O      | RW/--  PTSIZE
ULIM,MMIOBASE -> +------------------------------+ 0xef800000      
                 |  Cur. Page Table (User R-)   | R-/R-  PTSIZE    +-------+-----+-----+
         UVPT -> +------------------------------+ 0xef400000 ----> | 0x3BD | 0x0 | 0x0 |
                 |          RO PAGES            | R-/R-  PTSIZE    +-------+-----+-----+
       UPAGES -> +------------------------------+ 0xef000000
                 |           RO ENVS            | R-/R-  PTSIZE
   UTOP,UENVS -> +------------------------------+ 0xeec00000

下面根据提示依次完成映射:

[UPAGES, UVPT):

//////////////////////////////////////////////////////////////////////
// Map 'pages' read-only by the user at linear address UPAGES
// Permissions:
//    - the new image at UPAGES -- kernel R, user R
//      (ie. perm = PTE_U | PTE_P)
//    - pages itself -- kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, UPAGES, PTSIZE,  PADDR(pages),  PTE_U)

[KSTACKTOP-KSTKSIZE, KSTACKTOP):

//////////////////////////////////////////////////////////////////////
// Use the physical memory that 'bootstack' refers to as the kernel
// stack.  The kernel stack grows down from virtual address KSTACKTOP.
// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
// to be the kernel stack, but break this into two pieces:
//     * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
//     * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
//       the kernel overflows its stack, it will fault rather than
//       overwrite memory.  Known as a "guard page".
//     Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, KSTACKTOP-KSTKSIZE, KSTKSIZE, PADDR(bootstack), PTE_W);

从上面的注释可知,kernel stack 映射到了物理内存 kernel 部分 data section 的 bootstacktop 标记处,通过 .space 指令为内核栈预留了 KSTKSIZE 大小的物理内存空间,从 data section 的 0 地址偏移开始。通过 objdump -h kernel 能看到 data section 的起始地址。

TODO:这里有个疑惑,既然只使用到了 [KSTACKTOP-KSTKSIZE, KSTACKTOP) 部分空间,为什么还要区分 CPU0 与 CPU1 的内核栈呢?

.data  # 切换到 data section
#####################################################
# boot stack
#####################################################
	.p2align	PGSHIFT		# force page alignment
	.globl		bootstack
bootstack:
	.space		KSTKSIZE  # 预留 KSTKSIZE 大小的物理内存
	.globl		bootstacktop   
bootstacktop:

[KERNBASE, 2^32):

//////////////////////////////////////////////////////////////////////
// Map all of physical memory at KERNBASE.
// Ie.  the VA range [KERNBASE, 2^32) should map to
//      the PA range [0, 2^32 - KERNBASE)
// We might not have 2^32 - KERNBASE bytes of physical memory, but
// we just set up the mapping anyway.
// Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, KERNBASE, 0xFFFFFFFF - KERNBASE, 0, PTE_W);

some questions:

What entries (rows) in the page directory have been filled in at this point? What addresses do they map and where do they point? In other words, fill out this table as much as possible:

要回答这个问题,就是要弄明白截止现在有哪些地方设置了 PDE,这一点可以通过 boot_map_region() 函数的调用情况知晓。

第三列的 point to,实质上就是弄清楚 PDE 里前 20bit(PPN) 的值,它是一个物理地址,指向内存。

+-----+--------------------+--------------------+
|Entry|Base Virtual Address|Point to(logically) |
+-----+--------------------+--------------------+
|1023 |          ?         |         1.         |
+-----+--------------------+--------------------+
|1024 |          ?         |         ?          |
+-----+--------------------+--------------------+
|  .  |          ?         |                    |
+-----+--------------------+--------------------+
|0x3C0| KERNBASE=0xF0000000|                    |
+-----+--------------------+--------------------+
|0x3BF| MMIOLIM=0xEFC00000 |                    |
+-----+--------------------+--------------------+
|  .  |          ?         |         ?          |
+-----+--------------------+--------------------+
|0x3BD|   UVPT=0xEF400000  |all PTEs(内存位置暂定)|
+-----+--------------------+--------------------+
|0x3BE|  UPAGES=0xef000000 |   npages           |
+-----+--------------------+--------------------+
|  .  |          ?         |         ?          |
+-----+--------------------+--------------------+
|  2  |     0x00800000     |         ?          |
+-----+--------------------+--------------------+
|  1  |     0x00400000     |         ?          |
+-----+--------------------+--------------------+
|  0  |     0x00000000     |[see next question] |
+-----+--------------------+--------------------+

We have placed the kernel and user environment in the same address space. Why will user programs not be able to read or write the kernel’s memory? What specific mechanisms protect the kernel memory?

PTE 和 PDE 分为高 20bits 和低 12bits,其中高 20bits 指向一个物理页(PDE的高20bits指向一个Page Table物理页面,PTE的高20bits指向保存数据的物理页面),低 12bits 用来描述当前指令对这个物理页的操作许可。

What is the maximum amount of physical memory that this operating system can support? Why?

+------------------+
|        ...       |
|    not allocat   |
+------------------+ <- boot_alloc(0) - KERNBASE
| npage * PageInfo | 
+------------------+ <- pages,这段内存会映射到虚拟内存[UPAGES, UVPT]
|    kern_pgdir    |
+------------------+ <- end - KERNBASE
|      kernel      |
+------------------+ <- EXTPHYSMEM
|     I/O hole     |
+------------------+ <- IOPHYSMEM
|   Base memory    |
+------------------+

上图中,虚拟内存 [UPAGES, UVPT) 大小 4MB,所以物理内存对应 npage×PageInfo 也最多 4MB,因为 PageInfo 大小为 8B,所以 Physical Memory=((4MB/8B)*4096)B=2GB

How much space overhead is there for managing memory, if we actually had the maximum amount of physical memory? How is this overhead broken down?

如果要管理 2GB 内存的话,Page Table 需要 2MB,需要一张 Page Directory 4KB,最后还需要管理物理内存的 struct PageInfo 4MB。

Revisit the page table setup in kern/entry.S and kern/entrypgdir.c. Immediately after we turn on paging, EIP is still a low number (a little over 1MB). At what point do we transition to running at an EIP above KERNBASE? What makes it possible for us to continue executing at a low EIP between when we enable paging and when we begin running at an EIP above KERNBASE? Why is this transition necessary?

entry.S 中执行完 call i386_init 后,EIP 指向 KERNBASE 以上的空间。

在启动分页后,进入 i386_init() 函数这段时间里,EIP之所以还能在低地址执行,是因为 entry_pgdir 中第 0 号 entry 设置了 [0, 4MB) -> [0, 4MB) 映射。

TODO:这个实验还提供了一些 challenge 题目,抽空再做!

 

总结


assert() 和 cprintf() 函数配合使用调试。

最后注意的一点是在 pgdir_walk() 中 page_alloc() 的时候参数一定要填 1,这样分配出来的页才不会有脏值,这里比较坑,调试了好久才找出来。

执行 ./grade-lab2:

得分