1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
|
/* $Id: Docs-RawMode.cpp $ */
/** @file
* This file contains the documentation of the raw-mode execution.
*/
/*
* Copyright (C) 2006-2020 Oracle Corporation
*
* This file is part of VirtualBox Open Source Edition (OSE), as
* available from http://www.virtualbox.org. This file is free software;
* you can redistribute it and/or modify it under the terms of the GNU
* General Public License (GPL) as published by the Free Software
* Foundation, in version 2 as it comes in the "COPYING" file of the
* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
*/
/** @page pg_raw Raw-mode Code Execution
*
* VirtualBox 0.0 thru 6.0 implemented a mode of guest code execution that
* allowed executing mostly raw guest code directly the host CPU but without any
* support from VT-x or AMD-V. It was implemented for AMD64, AMD-V and VT-x
* were available (former) or even specified (latter two). This mode was
* removed in 6.1 (code ripped out) as it was mostly unused by that point and
* not worth the effort of maintaining.
*
* A future VirtualBox version may reintroduce a new kind of raw-mode for
* emulating non-x86 architectures, making use of the host MMU to efficiently
* emulate the target MMU. This is just a wild idea at this point.
*
*
* @section sec_old_rawmode Old Raw-mode
*
* Running guest code unmodified on the host CPU is reasonably unproblematic for
* ring-3 code when it runs without IOPL=3. There will be some information
* leaks thru CPUID, a bunch of 286 area unprivileged instructions revealing
* privileged information (like SGDT, SIDT, SLDT, STR, SMSW), and hypervisor
* selectors can probably be identified using VERR, VERW and such instructions.
* However, it generally works fine for half friendly software when the CPUID
* difference between the target and host isn't too big.
*
* Kernel code can be executed on the host CPU too, however it needs to be
* pushed up a ring (guest ring-0 to ring-1, guest ring-1 to ring2) to let the
* hypervisor (VMMRC.rc) be in charge of ring-0. Ring compression causes
* issues when CS or SS are pushed and inspected by the guest, since the values
* will have bit 0 set whereas the guest expects that bit to be cleared. In
* addition there are problematic instructions like POPF and IRET that the guest
* code uses to restore/modify EFLAGS.IF state, however the CPU just silently
* ignores EFLAGS.IF when it isn't running in ring-0 (or with an appropriate
* IOPL), which causes major headache. The SIDT, SGDT, STR, SLDT and SMSW
* instructions also causes problems since they will return information about
* the hypervisor rather than the guest state and cannot be trapped.
*
* So, guest kernel code needed to be scanned (by CSAM) and problematic
* instructions or sequences patched or recompiled (by PATM).
*
* The raw-mode execution operates in a slightly modified guest memory context,
* so memory accesses can be done directly without any checking or masking. The
* modification was to insert the hypervisor in an unused portion of the the
* page tables, making it float around and require it to be relocated when the
* guest mapped code into the area it was occupying.
*
* The old raw-mode code was 32-bit only because its inception predates the
* availability of the AMD64 architecture and the promise of AMD-V and VT-x made
* it unnecessary to do a 64-bit version of the mode. (A long-mode port of the
* raw-mode execution hypvisor could in theory have been used for both 32-bit
* and 64-bit guest, making the relocating unnecessary for 32-bit guests,
* however v8086 mode does not work when the CPU is operating in long-mode made
* it a little less attractive.)
*
*
* @section sec_rawmode_v2 Raw-mode v2
*
* The vision for the reinvention of raw-mode execution is to put it inside
* VT-x/AMD-V and run non-native instruction sets via a recompiler.
*
* The main motivation is TLB emulation using the host MMU. An added benefit is
* would be that the non-native instruction sets would be add-ons put on top of
* the existing x86/AMD64 virtualization product and therefore not require a
* complete separate product build.
*
*
* Outline:
*
* - Plug-in based, so the target architecture specific stuff is mostly in
* separate modules (ring-3, ring-0 (optional) and raw-mode images).
*
* - Only 64-bit mode code (no problem since VirtualBox requires a 64-bit host
* since 6.0). So, not reintroducing structure alignment pain from old RC.
*
* - Map the RC-hypervisor modules as ROM, using the shadowing feature for the
* data sections.
*
* - Use MMIO2-like regions for all the memory that the RC-hypervisor needs,
* all shared with the associated host side plug-in components.
*
* - The ROM and MMIO2 regions does not directly end up in the saved state, the
* state is instead saved by the ring-3 architecture module.
*
* - Device access thru MMIO mappings could be done transparently thru to the
* x86/AMD64 core VMM. It would however be possible to reintroduce the RC
* side device handling, as that will not be removed in the old-RC cleanup.
*
* - Virtual memory managed by the RC-hypervisor, optionally with help of the
* ring-3 and/or ring-0 architecture modules.
*
* - The mapping of the RC modules and memory will probably have to runtime
* relocatable again, like it was in the old RC. Though initially and for
* 32-bit target architectures, we will probably use a fixed mapping.
*
* - Memory accesses must unfortunately be range checked before being issued,
* in order to prevent the guest code from accessing the hypervisor. The
* recompiled code must be able to run, modify state, call ROM code, update
* statistics and such, so we cannot use page table stuff protect the
* hypervisor code & data. (If long mode implement segment limits, we
* could've used that, but it doesn't.)
*
* - The RC-hypervisor will make hypercalls to communicate with the ring-0 and
* ring-3 host code.
*
* - The host side should be able to dig out the current guest state from
* information (think AMD64 unwinding) stored in translation blocks.
*
* - Non-atomic state updates outside TBs could be flagged so the host know
* how to roll the back.
*
* - SMP must be taken into account early on.
*
* - As must existing IEM-based recompiler ideas, preferrably sharing code
* (basically compiling IEM targetting the other architecture).
*
* The actual implementation will depend a lot on which architectures are
* targeted and how they can be mapped onto AMD64/x86. It is possible that
* there are some significan roadblocks preventing us from using the host MMU
* efficiently even. AMD64 is for instance rather low on virtual address space
* compared to several other 64-bit architectures, which means we'll generate a
* lot of \#GPs when the guest tries to access spaced reserved on AMD64. The
* proposed 5-level page tables will help with this, of course, but that need to
* get into silicon and into user computers for it to be really helpful.
*
* One thing that helps a lot is that we don't have to consider 32-bit x86 any
* more, meaning that the recompiler only need to generate 64-bit code and can
* assume having 15-16 GPRs at its disposal.
*
*/
|