找回密码
 立即注册→加入我们

QQ登录

只需一步,快速开始

搜索
热搜: 下载 VB C 实现 编写
查看: 4976|回复: 0

【搬运】Virtualization for System Programmers

[复制链接]

1111

主题

1651

回帖

7万

积分

用户组: 管理员

一只技术宅

UID
1
精华
244
威望
743 点
宅币
24241 个
贡献
46222 次
宅之契约
0 份
在线时间
2297 小时
注册时间
2014-1-26
发表于 2014-7-25 09:45:11 | 显示全部楼层 |阅读模式

欢迎访问技术宅的结界,请注册或者登录吧。

您需要 登录 才可以下载或查看,没有账号?立即注册→加入我们

×
来源:http://www.codeproject.com/Articles/215458/Virtualization-for-System-Programmers
转载请注明出处!


Virtualization for System Programmers
Michael Chourdakis, 3 Jul 2011

Introduction
This article targets the user who has first read my ASM tutorial (http://www.codeproject.com/KB/system/asm.aspx ) and wants to learn how Virtualization works.  You want to create your own VMWare workstation? Let's go!

Background
Required items:
  • A complete understanding on how the CPU works in protected and long mode - read my article:  http://www.codeproject.com/KB/system/asm.aspx.
  • Bochs Source - recompilation with VMX extensions. VMWare (or other virtualizers) won't work - at least, as far as I know. Also chances are that my code could use features not found in your cpu version - but you can try.  Oh, and you can test it in raw DOS PC if you are really brave.
  • =very good= Assembly knowledge
  • Flat Assembler (http://flatassembler.net/)
  • FreeDos (or any other DOS you might have licensed)
  • LOTS OF PATIENCE

If you are a beginner programmer, quit right now.

If you are an advanced programer, quit right now.

If you are an expert programmer, quit right now. When you start reading this article you will feel like a beginner anyway.

BUT, since I am a beginner too, you will be eventually able to read what I have to say because I felt the same after I read the virtualization manuals. So keep reading!

Startup
We will create an application that prepares the CPU for virtualization, creates a guest, enters it and exits.  All this will be done in x64 mode for simplicity. In x86 it is also possible, but we will focus on the x64 architecture to avoid unnecessary overhead in the code.

The code demonstrates only the basic VMX features and it might not work in your own CPU. However you can use bochs with virtualization enabled and then you will be able to test my code.

Terminology
  • VMM (Virtual Machine Monitor)
    The hosting application
  • VM (Virtual Machine)
    The guest application
  • Root Operation
    The code/context the VMM runs
  • Non Root Operation
    The code/context the VM runs
  • VMX Transition
    Going from host to guest (VMEntry) or from guest to host (VMExit)
  • VMCS
    A structure to control a VM and VMX transitions.
  • VM Entry
    A transition from the host application to the guest.
  • VM Exit
    A transition from the guest to the host due to some reason.


Life cycle of VMX operations
  • VMM checks for CPU virtualization (CPUID) and enables it (CR4 and VMXON)
  • VMM initializes  a control structure, called VMCS, for each VM. Tell the CPU where this pointer is by using VMPTRST and VMPTRLD. Read/Write VMCS with VMREAD, VMWRITE and VMCLEAR.
  • VMM enters a VM using VMLAUNCH or VMRESUME
  • VM exits to the VMM with VMEXIT
  • Do all the above over and over again
  • VMM eventually shutdowns itself VMXOFF

Does my CPU have Virtualization Support?
Yes (or you wouldn't be reading this one by now anyway), but if you still want to verify, you check the ECX's bit 5 after a CPUID with EAX = 1:
  1. mov eax,1
  2. cpuid
  3. bt ecx,5
  4. jc VMX_Supported
  5. jmp VMX_NotSupported
复制代码
Reading implementation specific parameters
After you know your CPU supports VMX operations, you should check the IA32_VMX_BASIC MSR (index 0x480) to check implementation-specific information for your CPU:
  1. mov ecx, 0480h
  2. rdmsr
复制代码
This 64-bit MSR has a lot of information, but at the moment we are interested in 2 fields:
  • Bits 0 - 31   : 32-bit VMX Revision Number
  • Bits 32 - 44 : Number of bytes (up to 4096) that a VMXON region or a VMCS should be.
The VMX revision (4 bytes) should be put in every VMCS/VMXON structure so the processor knows the format that should be used to store data in it. Each VMCS/VMX structure size should be exactly the number of bytes indicated by bits 32-44 (max 4096).

Enabling VMX operations
  • Enter Long Mode.
  • Set CR4's bit 13 to 1. This bit enables the VMX operations.
  • Set CR0's bit 5 to 1 (NE)  - this is required for the VMXON to succeed.
  • Initialize a VMXON region.
  • Disable the A20 line. Yes you read that correctly. Why ? Who knows. Just disable it.
  • Execute the VMXON instruction.
A VMCS is a 4-KB aligned memory area used to support VM operations. It consists of 3 fields: 4 bytes that hold the revision number (0x480 MSR Register returned value), 4 bytes that are used for VMX Abort data (more on this later), and the rest is a collection of six fields to control the VM operations.
A VMXON region is a single VMCS region which you only need to initialize the revision number. Initialization of the VMXON region requires putting the correct revision number (first 4 bytes) as returned by the 0x480 MSR register above.
The VMXON instruction requires an address (e.g. VMXON [rdi]). This address should contain the 64-bit physical address of the VMXON region (4-KB aligned) and the first 4 bytes of that region should contain the VMX revision.
  1. File: VMX.ASM
  2. Func: VMX_Enable
复制代码
CR4 bit set for VMX operations
  1. mov rax,cr4
  2. bts rax,13
  3. mov cr4,rax
复制代码
Enable VMX
  1. mov [rdi],ebx ; Put the revision. Rdi holds the VMCS address and ebx holds the revision
  2. VMXON [rsi]  ; Assuming rsi holds the address of the VMCS
复制代码
The VMCS Groups
It was easy so far, but here starts your hell. The rest of the VMCS (that is, after the first 8 bytes (revision + VMX Abort) is divided into 6 subgroups:
  • Guest State
  • Host State
  • Non root controls
  • VMExit controls
  • VMEntry controls
  • VMExit information
Each of the above fields contains important information about how the VM starts (State after a VMEntry), what is the host state after a VMExit, when a VMExit will occur and others.
  1. File: VMX.ASM
  2. Func: VMX_TryGuest and VMX_TryGuest2
复制代码
The Guest State
This contains the following information (In parentheses, the bit number):
  • CR0,CR3,CR4,DR7,RSP,RIP,RFLAGS, (64 each)
  • For each of CS,SS,DS,ES,FS,GS,LDTR,TR:
    • Selector (16)
    • Base address (64)
    • Segment limits (32)
    • Access rights (32)
  • For GDTR and IDTR:
    • Base address (64)
    • Limit (32)
  • IA32_DEBUGCRTL (64)
  • IA32_SYSENTER_CS (32)
  • IA32_SYSENTER_ESP (64)
  • IA32_SYSENTER_EIP (64)
  • IA_PERF_GLOBAL_CTRL (64)
  • IA32_PAT (64)
  • IA32_EFER (64)
  • SMBASE (32)
  • Activity State (32) - 0 Active , 1 Inactive (HLT executed) , 2 Triple fault occured , 3 waiting for startup IPI (SIPI).
  • Interruptibility state (32) - a state that defines some features that should be blocked in the VM - more on that later.
  • Pending debug exceptions (64) - to facilitate hardware breakpoings with DR7 - more on that later.
  • VMCS Link pointer (64) - reserved, set to 0xFFFFFFFFFFFFFFFF.
  • *VMX Preemption timer value (32) - more on this later.
  • *Page Directory pointer table entries (4x64) - pointers to pages - more on this later.
The guest state describes the values of the registers that the cpu has after a VMEntry. Because you can totally control the registers, you can start a VM in any mode (real, protected, long etc). But even if you are to start a real mode VM (as my code does) you have to initialize the segment registers as normal p-mode selectors, with proper limits access etc.
The values that are used for the segment registers (limits, base address, selector , access rights and flags) are the same with those used in ordinary protected mode, so for example you will see my code adding a 0x92 access flag for a DS read/write data segment.
The Host State
This contains the following information (In parentheses, the bit number):
  • CR0,CR3,CR4,RSP,RIP (64 each)
  • CS,SS,DS,ES,FS,GS,TR selectors (16 each)
  • FS,GS,TR,GDTR,IDTR base addresses (64 each)
  • IA32_SYSENTER_CS (32)
  • IA32_SYSENTER_ESP (64)
  • IA32_SYSENTER_EIP (64)
  • *IA32_PERF_GLOBAL_CTRL (64)
  • *IA32_PAT (64)
  • *IA32_EFER (64)
The host state tells the cpu how to return to the VMM after a VMExit.
Executon Control Fields
These fields essentially tell the CPU what is allowed to be executed in the VM and what is not. Everything not allowed causes a VMExit. The sections are:
  • Pin-Based (32b) : Interrupts
  • Processor-Based (2x32b)
    • Primary : Single Step, TSC HLT INVLPG MWAIT CR3 CR8 DR0 I/O Bitmaps
    • Secondary:   EPT , Descriptor Table Change, Unrestricted Guest and others.
  • Exception bitmap (32b) : One bit for each exception. If bit is 1, the exception causes a VMExit.
  • I/O bitmap addresses (2x64b) : Controls when IN/OUT cause VMExit.
  • Time Stamp Counter offset
  • CR0/CR4 guest/host masks
  • CR3 Targets
  • APIC Access
  • MSR Bitmaps
My code only uses the pin-based and the processor based for simplicity, but these fields are your real swiss army knife; you can control entirely what the VM is and is not allowed to perform.
VM-Exit Control Fields
These fields tell the CPU what to load and what to discard in case of a VMExit:
  • VMExit Controls (32b)
  • VMExit Controls for MSRs
VM-Entry Control Fields
  • VMEntry Controls (32b)
  • VMEntry Controls for MSRs
  • VMEntry Controls for event injection
This event injection is your second weapon. When a VM exits you can inject an event so the VM believes that the exception was generated by its code.  Yes, a VMM can become really mighty.
VM-Exit Information field (read only)
  • Basic information
    • Exit Reason (32)
    • Exit Qualification (64)
    • Guest Linear Address (64)
    • Guest Physical Address (64)
  • Vectored exit information
  • Event delivery exits
  • Intstruction execution exits
  • Error field


The VMCS Initialization
To mark a VMCS for further reading/writing with VMREAD or VMWRITE, you would first initialize its first 4 bytes to the revision (as with the VMXON structure above), and then execute a VMPTRLD with its address.
Appendix H of the 3B Intel Manual has a list of all indices.
For example, the index of the RIP of the guest is 0x681e. To write the value of 0 to that field we would use:
  1. mov rax,0681eh
  2. mov rbx,0
  3. vmwrite rax,rbx
复制代码
This means that, after a successful  VM Entry, the guest will start with RIP set to 0.

The Extended Page Table Mechanism (EPT)  - Part 1
You would think you were finished? Hahahah. Not so fast.  You have to give your new Virtual Machine some memory to work, and you have to configure the EPT. An EPT is a mechanism that translates host physical address to guest physical addresses.  And because the structure is similar to Paging structures, we will first going to review how paging works in x86 and in x64 mode.

Paging
In my previous article I noted that you need Paging and PAE style paging tables for long mode, but I didn't explain how to create such tables. Since they are required here as you see, here is a quick introduction to the paging mechanism.
Paging is the mechanism to map an actual physical address to another virtual address. Each process can have a full virtual address space without the need for the physical ram to be mapped to the entire range or even installed. The OS maps the same code (e.g. the kernel modules, or the API) to different address spaces for each process. When a process tries to access memory that doesn't exist, a "age Fault" is generated.
Because paging has so many benefits, it's the only used in x86 mode (all modern OSes implement paging "flat" mode) and it is the only available in x64.
There are 2 tables that are used for paging: The Paging Directory and the Paging Table. Each one of these tables has  a size of 4kb, containing 1024 dword entries. Their addresses must be alinged in a 4KB boundary as well. In the page directory, each entry points to a table in the paging table. In the paging table, each entry points to a physical address which is mapped to a virtual address. The virtual address is calculated using the offset in the paging directory and the offset in the paging table.
A See Through system (the style we used in the previous article) is a paging system in which each physical address is mapped to the same virtual address.
After you have created the page tables, load the Paging Directory address into CR3, then enable paging by setting CRO bit 31.
Each dword entry of the 1024 in the Paging Directory has the following format:
012 345 678   9-10  11-31
PRUWDA0SG - AV - Address
  • P - Page is present in memory (This flag allows the OS to cache the pages back to disk , clear P, and reload them when a page fault is generated when software attemps to access the page.
  • R - If set, the page is Read/Write, else Read only.
  • U - If set, anyone can access this page. If not set, only ring 0 can access it.
  • W - If set, write - through abilities are enabled.
  • D - If set, the page will not be cached.
  • A - Set if the page has been accessed. Unlike GDT segments in which the CPU sets this flag, here this is left to the OS.
  • S - If set, then the pages are 4MB in size. If not set, then pages are 4 KB in size. For 4MB pages the PSE must also be enabled (CR4 bit 4)
  • G - Ignored at the moment
  • AV - Ignored at the moment
  • Address - The upper 20 bits (the lower 12 are ignored since it must be 4KB aligned) of the physical address of the page table that manages this page directory entry. If S is set, then the pages must be 4MB aligned and bits 12 to 21 are reserved.
Each dword entry of the 1024 in the Page Table has also the above format. Here the address field is the physical address of a 4KB/4MB block (depending on the S flag in page directory) which is mapped. The virtual address that this physical address is mapped is the product of the multiplication of the page directory index by the  page table index. Because each page can be 4MB in size and the page table has 1024 entries, a maximum of 4MB * 1024 = 4GB (the entire 32-bit space) is allowed to be mapped. The G flag, if set, prevents the INVPLG instruction to update the address in it's cache if CR3 is reset.

Physical Address Extensions (PAE)

PAE is a paging system to allow more than 4GB to be accessed in x86. PAE is mandatory for long mode as we saw. To enable PAE, set the CR4 bit 5 and after that, the CR3 points to a top level PAE table which consists of 4 64-bit entries.
So now there are 3 tables: the PDPT (Page Directory Pointer Table) (4 64-bit entries), the PDT (Page Directory Table) and the PT (Page Table).
But now the "S" bit in the PDT has a different meaning: If not set, it means that the page entry is 4KB but if set, it means that this entry does not point to a PT entry, but it describes itself a 2MB page. So you can have different levels of paging traversal depending on the S bit.
Each of the PDTD entries points to a Page Directory of 4KB (like in normal paging). Each entry in the new Page Directory is now 64 bit long (so there are 512 entries).  Each entry in the new Page Directory points to a Page Table of 4KB (like in normal paging), and each entry in the new Page Table is now 64-bit long, so there are 512 entries. Because that would allow only a quarter of the original mapping, that's why 4 directory/table entries are supported. The first entry maps the first 1GB, the 2nd the 2nd GB, the 3rd the 3rd GB and finally, the 4th entry maps the 4th GB.
There is a new flag in the Page Directory entry as well, the NX bit (Bit 63) which, if set, prevents  code execution in that page.
This system allows the OS to handle memory over 4GB, but since the address space is still 4GB, each process is still limited to 4GB. The memory can be up to 64GB but a process cannot see the entire memory.

Long Mode Physical Address Extensions

In long mode the PAE system adds a new top level structure, the PML4T which has 512 64-bit long entries which point to one PDPT and now the PDPT has 512 entries as well (instead of 4 in the x86 mode). So now you can have 512 PDPTs which means that one PT entry manages 4KB, one PDT entry manages 2MB (4KB * 512 PT entries), one PDPT entry manages 1GB (2MB*512 PDT entries), and one PML4T entry manages 512 GB (1GB * 512 PDPT entries). Since there are 512 PML4T entries, a total of 256TB (512GB * 512 PML4T entries) can be addressed.
Each of the "S" bits in the PDPT/PDT can be 0 to indicate that there is a lower level structure below, or 1 to indicate that the traversal ends here.  Note that some CPUs do not support 1GB page entries (i.e. the "S" bit set in a PDPT) and so does bochs by default. My code contains 2 functions for EPT initialization, one for 1GB support and one for PDT-level "S" support, which is most common. If you would like to test the 1GB pages, you must enable it in bochs using a CPUID flag in bochs' settings.

The Extended Page Table Mechanism (EPT) - Part 2

Now that we have reviewed how paging works, we are ready to configure the EPT - that is, to give some memory to our Virtual Machine.
Originally the VMX capabilities of the cpu required guests to start in paged protected mode, and VMM applications usually put the virtual cpu into VM86 mode, to allow OSes (which expect a clean real mode boot) to work. Soon they introduced the "Unrestricted Guest" flag (bit 7 in Secondary Exit Controls) that would allow a guest to start in real mode - and that we will be using here for simplicity. However putting the virtual cpu in real mode means we have to map the lower 640KBtyes, so we have to use EPT.
If your CPU doesn't allow the "unrestricted guest" mode, then you can setup a protected mode guest using similar code, because my code creates protected mode style segments anyway.
Of course, depending on your guest's initial state (for example, if you 'd want to start a guest in long mode) you would also need to configure Guest PAE, paging, proper CR4 and stuff. But our little application will configure a real mode guest, so it needs to map a region of host memory to guest physical address.EPT Translation uses the lower 48 bits (as the nowadays CPU actually do nowadays - not the entire 64-bit range is used).
The fortunate thing is that the EPT table is like the Page table and directories we have seen in our previous article.  It consists of a top level PML4T which consists of 512 64-bit entries. And these entries either reference directly a memory area, or they reference a lower page table (PDPT, PDT or PG). The format of each entry is :
012   3-7  8-11 12:N-1 N:51 52:63
RWE   R      I       A     R      I
  • RWE -  Read,Write,Execute bits.
  • R - Reserved (should be set to 0)
  • I - Ignored.
  • A - The physical address of the PDPT referenced by this entry.
N is the physical address width supported by the processor and we must execute CPUID with EAX = 0x80000008 to get the physical address width (returned in bits 0-7 of EAX).
The PDPT entry is as follows:
012  3-5   6   7  8-11 12:29  30:N-1  N:51 52:63
RWE  MT PAT  S    I       R        A        R       I
  • RWE -  Read,Write,Execute bits.
  • MT - Memory type (to be discussed later)
  • PAT - PAT Memory Type (to be discussed later)
  • S - 1 to reference a 1GB page, or 0 to reference a PDT
  • I - Ignored.
  • A - The physical address of the PDT referenced by this entry if S is 0, or the physical address of the 1GB page if S is 1.
The PDT entry is as follows:
012  3-5   6   7  8-11 12:20  21:N-1  N:51 52:63
RWE  MT PAT  S    I       R        A        R       I
  • RWE -  Read,Write,Execute bits.
  • MT - Memory type (to be discussed later)
  • PAT - PAT Memory Type (to be discussed later)
  • S - 1 to reference a 2MB page, or 0 to reference a PT
  • I - Ignored.
  • A - The physical address of the PT referenced by this entry if S is 0, or the physical address of the 2MB page if S is 1.
The PT entry is as follows:
012  3-5   6  7-11 12:N-1  N:51 52:63
RWE  MT PAT     I       A        R       I
  • RWE -  Read,Write,Execute bits.
  • MT - Memory type (to be discussed later)
  • PAT - PAT Memory Type (to be discussed later)
  • I - Ignored.
  • A - The physical address of the 16KB page.

Paging configurations in code
Because all paging structures must be alinged to 4096, I 've put hard-copied addresses (0x40000 , 0x70000 , 0x160000 etc) and I create the page tables there.

Virtual Machine 1: Real Mode
The initialization code for this VM is in VMX_TryGuest. It is setup so guest CR0 is set to real mode. VENTRY16.ASM has the entry point for our real mode guest, which does little but to set a flag and return to the VMM with VMCall.
Note that chances are that your cpu does not support the "Unrestricted guest" flag, so it can not start a Virtual Machine in real mode. If so, try the VM2 discussed below. To test if the cpu supports the unrestricted guest, check bit 5 of the IA32_VMX_MISC MSR (index 0x485):
  1. mov rcx,0x485
  2. rdmsr
  3. bt rax,5
  4. jc UnrestrictedGuestSupported
复制代码
Note that because the VMM has configured protected-mode style selectors, a real mode guest has to execute a JMP to a real-mode style CS (as if we were returning from protected mode)

Virtual Machine 2: Paged Protected Mode
Initialization for this VM is in VMX_TryGuest2. This time CR0 is set to be in protected paged mode, CR4 is loaded with the page directory (the very same used for normal protected mode since our EPT is a see-through). VENTRY32.ASM has the entry point for our protected mode guest and this time the selectors are ready to go. VENTRY32.ASM merely sets a flag and exits to the VMM with VMCall.

Launching the VM
Having initialized the VMCS properly (ok, that's a joke, but I have to say "properly" anyway - prepare for LOTS of failures here), the VMLAUNCH opcode will start the execution of the virtual machine (from the VMCS guest set CS:XIP). If the entry fails the Z flag will be set immediately after execution of VMLAUNCH.
This is where BOCHS will help you. After VMLAUNCH fails, the bochs debugger window will show you a message depending on what went wrong, so you will get an idea what to fix in the VMCS.
If VMLAUNCH succeeds, control will not return to the host until a VM Exit occurs. When a VM exit occurs, control is transferred to the VMM's exit routine (as configured in the VMCS host state fields). VMExit merely checks the flags set by VMEntry to know if the VMEntry code was successfully executed.
Note that, even if VMLAUNCH succeeds, starting the VM might immediately cause a VMExit due to any fault (page faults, EPT misconfigurations etc).  That way VMLAUNCH will succeed but control will immediately return to your exit routine without the VMEntry code to be executed.

The VM is launched, now what ?
Nothing. The VM executes as if nothing is present, unless you make something present. You need now to implement your own BIOS , copy it at the virtual memory at the proper address (so execution starts from 0xFFFF:0xFFF0) and your drivers to transfer data between the actual hardware and the actual memory to the virtual hardware and memory you may have allowed within the VM. Yup, that's why VMWare Workstation is some 500 MB in side; It contains bios and drivers and communication protocols to allow e.g. a virtual screen (which is seen as an Actual driver from the guest) to be shown in your actual screen within a window. The same with USB hardware which is duplicated from the actual system to the virtual system.
For a simple test, one might think that it should be easy to copy the actual bios to the virtual memory so, for example, DOS can boot. Right, and from what device will DOS boot since there is no one in the VM? That's why you have to duplicate an actual device into the VM using your custom BIOS in order to communicate with the host with a specific protocol, then emulate the allowed devices in order for the VM to function properly.
My application simply forwards memory in a see-through style, so calling BIOS and DOS from the VM is possible. But in real life you don't want to do that, as then the VM can ruin the VMM because they share the same memory.
In real life also, in case that the "unrestricted guest" isn't allowed, you have to start the guest in VM86 paging protected mode and if the guest likes itself to set protected/long mode (like an OS) you must catch the VMExit (which would occur when the guest software attempts to execute LGDT) and emulate all the calls that would otherwise fail (LGDT, LIDT, CR0 , Paging initialization etc) so the guest can assume that its operations were successful.

VM Exits
A VMExit can occur for various reasons, either because you had specified a VMExit reason in VMCS control/exit fields, or if the VM actually entered a shutdown state (for example, a ring 0 crash) that would reset the CPU if it would be run in an actual, non virtualized state, or anything else. Execution resumes at the VMCS host state saved (CS:XIP), and you can read the VMCS exit information (read only) to detect the reasons of the exit.
Use VMRESUME to resume the VM after an exit.

VMCall
Some systems know that they run under virtualization (for example, VMWare drivers) and they do want to jump back to their host in order to exchange information. The VMCall opcode causes a VMExit to the host, and the virtualized system can exchange information with the host. My code also uses VMCall to exit to the host.
Of course, if VMCall is executed in a non VMX-non root environment, an unrecognized opcode exception is thrown.

Control MSRs
For simplicity, my code doesn't check for all features (that's the most probable reason it won't work in your raw DOS), but you should check the VMX MSRs for available features before testing them. Intel's 3B Appendix G contains all these MSRs. To load a MSR, you put its number to RCX and execute the rdmsr opcode. The result is in RAX.
  • IA32_VMX_BASIC (0x480) : Basic VMX information including revision,  VMCS size, memory types and others.
  • IA32_VMX_PINBASED_CTLS (0x481) : Allowed settings for pin-based VM execution controls.
  • IA32_VMX_PROCBASED_CTLS (0x482) : Allowed settings for processor based VM execution controls.
  • IA32_VMX_PROCBASED_CTLS2 (0x48B) : Allowed settings for secondary processor based VM execution controls.
  • IA32_VMX_EXIT_CTLS  (0x483) : Allowed settings for VM Exit controls.
  • IA32_VMX_ENTRY_CTLS  (0x484) : Allowed settings for VM Entry controls.
  • IA32_VMX_MISC MSR (0x485) : Allowed settings for miscellaneous data, such as RDTSC options, unrestricted guest availability, activity state and others.
  • IA32_VMX_CR0_FIXED0  (0x486) and IA32_VMX_CR0_FIXED1 (0x487) : Indicate the bits that are allowed to be  0 or to 1 in CR0 in the VMX operation.
  • IA32_VMX_CR4_FIXED0 (0x488) and IA32_VMX_CR4_FIXED1 (0x489) : Same for CR4.
  • IA32_VMX_VMCS_ENUM (0x48A) : enumerator helper for VMCS.
  • IA32_VMX_EPT_VPID_CAP (0x48C) : provides information for capabilities regarding VPIDs and EPT.

Creating the Hypervisor Virus
So far we are interested in the VM science all right, but the programmer's  soul will always contain notorious feelings like killing, revenging, cheating, cracking and all sort of that stuff.
Now we 'll take into account the fact that you are evil (or you wouldn't make it up to here) and discuss about Blue Pill. Blue Pill is a hypervisor virtus that controlls the entire OS. For this to work you would simply map the entire memory as a see through and start the VM with Windows in it, while configuring almost anything to cause a VMExit. Now whatever Windows tries will be reported to your hypervisor via VMExits, and using the injection technology you can fake any response - and since Intel doesn't have a (known) way to detect if an application is running in Virtualization, you will never get caught. Never? Who knows - but if you ever get caught let me assure you that I know nothing about it {:soso_e100:}
But wait! CR4's 13th bit should be 1 inside a Virtual Machine so if that bit is 0 you definitely know you are not virtualized! But if this bit is 1, do you really know if you are under a VMM? Who knows. If anybody gets the Windows Loader source and finds out that a mov eax,cr4 - test eax 0x2000 - jz WE_ARE_OWNED sequence is there, let me know.
Another possible option to test if you are owned is the VMCall, which would raise an exception and you can catch it. However, did anyone ensure you that there wasn't an exit and your host injected an exception for you to catch and assume you are free?
Another possible option is to test if the CPU does not support unrestricted guests and you started in VM86 mode. If you see that you are running in VM86 mode then chances are that you are virtualized. But whoops - did we forget EMM386 exe? But Windows NT-based OSes do not load any DOS drivers so if NT loaded checks for VM86 and it is enabled, it may assume it is under virtualization.

Where should I try it?
There are a number of options that work and others that won't work:
  • BOCHS: The safest. Download bochs source, recompile it with VMX extensions and debugging support, create a virtual hard disk, install FreeDos there, write my asm stuff into the virtual hard disk by using WinImage (can read/write .IMG files used by BOCHS) and you are fine.
  • VMWare: It will work until the Virtualization functions. VMWare runs itself a VM and a VMM cannot run if the application is already a VM.
  • Raw DOS: It will work, provided that:
    • Your CPU actually has the features I use. Add MSR checks to my code to ensure proper functionality.
    • The cpu is in real mode. If you have loaded EMM386 or Soft Ice ,it won't work.


Code
  • MAIN.ASM , main startup and includes everything else.
  • REAL.ASM , code to manipulate the memory structures , print messages, use real mode and prepare protected mode.
  • PMODE.ASM , code to manipulate protected mode, try its features and prepare long mode.
  • LONG.ASM , code to manipulate long mode, enter compatibility and x64 mode, and prepare virtualization.
  • VMX.ASM , code to prepare, enter and exit virtualization
  • VENTRY16.ASM and VENTRY32.ASM , entries to the real/protected mode guests.


Conclusion
As you saw, virtualization is not initially a very complex subject, but to make something that really works you need to implement a BIOS, drivers etc. That's why not many programmers really try such a thing and that's why only a few applications support virtualization. VMWare has completed a great deal of work to make their Workstation actually do the job.
The code is imported from my previous article and organized in 6 files. It is rather dirty, but it works.
Try it and tell me. If it doesn't work, tell me and help me to improve it. Either way, the fact that you are reading up to here is appreciated.
If you aren't disappointed by now, I urge you to apply to a Virtualization Software company like VMWare for a job - you will do very well. And tell them you have read my article, they might hire me as well{:soso_e100:}
GOOD LUCK.

References


History
  • 02 - 07 -2012 : Added protected mode guest and fixed some minimal bugs.
  • 26 - 06 -2011 : First Release


License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author
Michael Chourdakis
Engineer
Greece

I'm working in C++, PHP , Flash and DSP Programming, currently experimenting with Windows 7 technologies and professional audio applications.
I 've a PhD in Digital Signal Processing.
My home page: http://www.michaelchourdakis.com
回复

使用道具 举报

QQ|Archiver|小黑屋|技术宅的结界 ( 滇ICP备16008837号 )|网站地图

GMT+8, 2024-4-24 20:45 , Processed in 0.052326 second(s), 31 queries , Gzip On.

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表