Thursday, December 29, 2011

Malware Analysis Tutorial 9: Encoded Export Table

 Learning Goals:

  1. Practice reverse engineering techniques.
  2. Understand basic checksum functions.
Applicable to:
  1. Operating Systems.
  2. Computer Security.
  3. Assembly Language

1. Introduction
This tutorial answers the challenges in Tutorial 8. We explain the operations performed by Max++ which are related to export table. In this tutorial, we will practice analyzing functions that do not follow C language function parameter conventions, and examine checksum and encoding functions.

2. Lab Configuration
You can either continue from Tutorial 8, or follow the instructions below to set up the lab. Refer to Tutorial 1 and Tutorial 4 for setting up VBOX instances and WinDbg.
(1) In code pane, right click and go to expression "0x40105c"
(2) right click and then "breakpoints -> hardware, on execution"
(3) Press F9 to run to 0x40105c
(4) If you see a lot of DB instructions, select them and right click -> "During next analysis treat them as Command".
(5) Exit from IMM and restart it again and run to 0x40105c. Select the instructions (about 1 screen) below 0x40105c, right click -> Analysis-> Analyze Code. You should be able to see all loops now identified by IMM.

3. Analysis of Code from 0x40108C to0x4010C4
We now continue the analysis of Tutorial 8, which stops at 0x401087.

Figure 1. Code from 0x40108C to 0x4010C4
Recall that EAX at this moment points to the beginning of DLL base. From Figure 1 of Tutorial 8, we can infer that offset 120 (0x78), it is the EXPORT TABLE beginning address, and at offset 124 (0x7c), it is the EXPORT TABLE size. So the instruction COMP DWORD DS:[EAX+7C], 0 is to compare the size of export table with 0. Clearly, the JE instruction at 0x401091 will not be executed and the control flow will continue to 0x40109F.

Now let's observe the instruction MOV EDX, DWORD PTR DS:[ESI+18].  Note that ESI, after being set by instruction at 0x40108A, is now pointing at the export table. It contains value 0X7C903400 (which is the starting address of the export table). In Section 3 of Tutorial 8, we have given the data structure of the export table. From it, we can infer that offset 0x18 is the number of names. Thus, after the instruction is executed, EDX has the number of names (0x523) exported from ntdll.dll.

Using the same technique, we can infer that the subsequent instructions (0x4010AD to 0x4010C4) assigns registers EAX to EDI. We have:

  EAX <-- offset (relative to 0x7C90000 starting of ntdll base) of the beginning address of the array that stores function entry addrsses
  EBX <-- offset of the beginning of the array that stores the names of the functions
  EDI <-- offset of the array that stores the name ordinals (as we mentioned earlier, to find the entry address of a function, we have to find its index of the function name in the function name array, and then use the index to find the ordinal, and then use the ordinal to locate the entry address in the array of function entry address)

We can verify the above analysis. Take EBX as one example, its value is 0x7C9048B4. The following is the dump of the memory starting from that address. Note that each element is the "offset of the name". So the first element is 0x00006790 (it actually means that the string is located at 0x7C906790), and similarly, the second string is located at 0x7C9067A9. Figure 3 now displays the contents at 0x7C906790 (you can see that the first function name is CsrAllocateCaptureBuffer).


Figure 2: Array of Function Names


Figure 3: The Strings

4. Analysis of Function 0x004138A8
Now we proceed to the instruction at 0x004010C6 (see Figure 1). It calls the function located at 0x004138A8. Figure 4 shows a part of the function.

Figure 4: Function 0x004138A8
Notice that malware authors will not simply follow C language calling conventions (i.e., to push parameters into stack). Instead, they may use registers directly to pass information between function calls. To analyze the functionality/purpose of a function, we need to figure out: (1) what are the inputs and outputs? (2) and then the logic of the function.

To figure out the input parameters of the function, we look at those registers that are READ before assigned. Looking at the instructions beginning at 0x004138A8, we soon identify that EAX is the input parameter, at this moment its value is 0x00002924. Recall that in Section 3, it contains the offset of the beginning of the array that contains function entry addresses, i.e., 0x7C902924 is the beginning of the array that contains function entry addresses.

Then starting from 0x004138A9, the next few arithmetic instructions seem to be rounding the value of ECX based on EAX. Now Figure 5 shows the second half of the function. There are two interesting instructions, first the instruction at 0x0041389A (it is to reduce the value of EAX by 0x1000 inside a loop), and then the instruction of 0x413893, it is to exchange the value of EAX and ESP.

So the eventual output is the ESP register, which has multiples of 0x1000 bytes (in our case 0x2000 bytes) reduced compared with its original value.

In a word, the function at 0x004138A8 expands the stack frame (recall the stack grows from higher address to lower address) by 0x2000 bytes. Why? It is used to hold the new export table, which is encoded!

Challenge 1 of the Day: where does the new export table start?


Figure 5: Second Half of Function 0x004138A8
5. Rest of Encoding Function
We now proceed to analyze the code between 0x4010CB to 0x40113B. Note that the expanded stack now includes around 0x2000 bytes from 0x0012D66C. This is going to hold encoded export table. This encoded table (its format) is defined by the malware author himself/herself. Each entry has two elements and each element 4 bytes. The first element is the checksum of the function name, and the second is the function address. Later, the Max++ malware is able to invoke the system functions in ntdll.dll without resolving them using the export table of ntdll.dll, but using its own encoded table.



Figure 6. Code from 0x4010CB to 0x40113B
Most of the program logic is pretty clear. Instructions from 0x4010D0 to 0x40DF save some important information to stack. Now [EBP-C] and [EBP-14] both have value 0x0012D66C (which is ESP+C). This is going to be the beginning of the encoded table, which will be demonstrated by the code later.

Then there is a 2-layer nested loop from 0x004010F9 to 0x00401136. Let's first look at the inner loop from 0x401103 to 0x401110. Clearly, EAX is the input for this inner loop and note that EAX is pointing to the array of chars that represent the function name (at 0x004010FB). At 0x401110, EAX is incremented by one in each iteration of the inner loop. Clearly, the inner loop is doing some checksum computation of the function name. In the checksum loop, there are two registers being written by the code: EDX and ECX. If you read the code carefully, you will note that the value of EDX is overwritten completely in each iteration (note instruction at 0x401103). Only ECX's previous value affects its next value. Thus ECX must be the output and it saves the checksum!

Then the instructions from 0x401117 to 0x401133 are to set up the entry for the function. The instruction at 0x40111D (MOV DS:[EAX], ECX) is to save the checksum of the function to the first element, and then the instruction at 0x0040112B saves the function address as the second element.

Challenge 2 of the Day: Explain the logic of EDX+ECX*4 of the instruction at 0x00401122. Hint: study the use of ordinal numbers in export table.



5. Conclusion
Our conclusion is: Max++ reads the export table of ntdll.dll and builds an encoded export table for itself. We will later see its use.

6. Challenge of the Day
At 0x0040113E,  Max++ calls 0x0040165E. Analyze the functionality of 0x0040165E.

Sunday, December 25, 2011

Malware Analysis Tutorial 8: PE Header and Export Table

Learning Goals:

  1. Understand the portable executable (PE) header of binary executables.
  2. Understand the EXPORT TABLE.
  3. Practice disassemble and reverse engineering techniques.
Applicable to:
  1. Operating Systems.
  2. Computer Security.

1. Introduction

In this tutorial, we will analyze the first harmful operation performed by Max++. It changes the structure of the export table of ntdll.dll. Recall the analysis of Max++ in Tutorial 7, the malware reads the information in TIB and PEB, and examines the loaded modules one by one, until it encounters "ntdll.dll" (this is accomplished using a checksum function inside a two layer nested loop).

In this tutorial, we will reverse engineer the code starting at 0x40105C.

2. Background Information of PE Header
Any binary executable file (no matter on Unix or Windows) has to include a header to describe its structure: e.g., the base address of its code section, data section, and the list of functions that can be exported from the executable, etc. When the file is executed by the operating system, the OS simply reads this header information first, and then loads the binary data from the file to populate the contents of the code/data segments of the address space for the corresponding process. When the file is dynamically linked (i.e., the system calls it relies on are not statically linked in the executable), the OS has to rely on its import table to determine where to find the entry addresses of these system functions.

Most binary executable files on Windows follows the following structure: DOS Header (64 bytes), PE Header, sections (code and data). For a complete survey and introduction of the executable file format, we recommend Goppit's "Portable Executable File Format - A Reverse Engineering View" [1].

DOS Header starts with magic number 4D 5A 50 00, and the last 4 bytes is the location of PE header in the binary executable file. Other fields are not that interesting. The PE header contains significantly more information and more interesting. In Figure 1, please find the structure of PE Header. We only list the information that are interesting to us in this tutorial. For a complete walk-through, please refer to Goppit's work [1].

Figure 1. Structure of PE Header
 At run time of a binary executable, Windows loader actually loads the PE header into a process's address space. There are some well defined data structures defined in winnt.h for each of the major part of the PE header.

As shown in Figure 1, PE header consists of three parts: (1) a 4-byte magic code, (2) a 20-byte file header and its data type is IMAGE_FILE_HEADER, and (3) a 224-byte optional header (type: IMAGE_OPTIONAL_HEADER32). The optional header itself has two parts: the first 96 bytes contain information such as major operating systems, entry point, etc. The second part is a data directory of 128 bytes. It consists of 16 entries, and each entry has 8 bytes (address, size).

We are interested in the first two entries: one has the pointer to the beginning of the export table, and the other points to the import table.


2.1 Debugging Tool Support (Small Lab Experiments)
Modern binary debuggers have provided sufficient support for examining PE headers. We discuss the use of WinDbg and Immunity Debugger.

(1) WinDbg. Assume that we know that the PE structure of ntdll.dll is located at memory address 0x7C9000E0. We can display the second part: file header using the following.

dt nt!_IMAGE_FILE_HEADER 0x7c9000e4
   +0x000 Machine          : 0x14c
   +0x002 NumberOfSections : 4
   +0x004 TimeDateStamp    : 0x4802a12c
   +0x008 PointerToSymbolTable : 0
   +0x00c NumberOfSymbols  : 0
   +0x010 SizeOfOptionalHeader : 0xe0
   +0x012 Characteristics  : 0x210e


Then we can calculate the starting address of the optional header: 0x7C9000E4 + 0x14 (20 bytes) = 0x7C9000F8. The attributes of optional header is displayed as below. For example, the major linker version is 7 and the the address of entry point is 0x12c28 (relative of the base address 0x7c900000).


kd> dt _IMAGE_OPTIONAL_HEADER 0x7c9000F8
nt!_IMAGE_OPTIONAL_HEADER
   +0x000 Magic            : 0x10b
   +0x002 MajorLinkerVersion : 0x7 ''
   +0x003 MinorLinkerVersion : 0xa ''
   +0x004 SizeOfCode       : 0x7a000
   +0x008 SizeOfInitializedData : 0x33a00
   +0x00c SizeOfUninitializedData : 0
   +0x010 AddressOfEntryPoint : 0x12c28
   +0x014 BaseOfCode       : 0x1000
   +0x018 BaseOfData       : 0x76000
   +0x01c ImageBase        : 0x7c900000
   +0x020 SectionAlignment : 0x1000
   +0x024 FileAlignment    : 0x200
   +0x028 MajorOperatingSystemVersion : 5
   +0x02a MinorOperatingSystemVersion : 1
   +0x02c MajorImageVersion : 5
   +0x02e MinorImageVersion : 1
   +0x030 MajorSubsystemVersion : 4
   +0x032 MinorSubsystemVersion : 0xa
   +0x034 Win32VersionValue : 0
   +0x038 SizeOfImage      : 0xaf000
   +0x03c SizeOfHeaders    : 0x400
   +0x040 CheckSum         : 0xb62bc
   +0x044 Subsystem        : 3
   +0x046 DllCharacteristics : 0
   +0x048 SizeOfStackReserve : 0x40000
   +0x04c SizeOfStackCommit : 0x1000
   +0x050 SizeOfHeapReserve : 0x100000
   +0x054 SizeOfHeapCommit : 0x1000
   +0x058 LoaderFlags      : 0
   +0x05c NumberOfRvaAndSizes : 0x10
   +0x060 DataDirectory    : [16] _IMAGE_DATA_DIRECTORY

As shown by Goppit [1], OllyDbg can display PE structure nicely. Since the Immunity Debugger is based on OllyDbg, we can achieve the same effect. In IMM View -> Memory, we can easily locate the starting address of each module (e.g., see Figure 2).


Figure 2. Getting PE Header Address
Then in the memory dump window, jump to the starting address of the PE of ntdll.dll. Then right click in the dump pane, and select special -> PE, we can have all information nicely presented by IMM.




3. Export Table

Recall that the first entry of IMAGE_DATA_DIRECTORY of the optional header field contains information about the export table. By Figure 1, you can soon infer that the 4 bytes located at PE + 0x78 (i.e., offset 120 bytes) is the relative address (relative to DLL base address) of the export table, and the next byte (at offset 0x7C) is the size of the export table.

The data type for the export table is  IMAGE_EXPORT_DIRECTORY. Unfortunately, the WinDbg symbol set does not include the definition of this data structure, but you can easily find it in winnt.h through a google search (e.g., from [2]). The following is the definition of IMAGE_EXPORT_DIRECTORY from [2].

typedef struct _IMAGE_EXPORT_DIRECTORY {
  DWORD Characteristics; //offset 0x0
  DWORD TimeDateStamp; //offset 0x4
  WORD MajorVersion;  //offset 0x8
  WORD MinorVersion; //offset 0xa
  DWORD Name; //offset 0xc
  DWORD Base; //offset 0x10
  DWORD NumberOfFunctions;  //offset 0x14
  DWORD NumberOfNames;  //offset 0x18
  DWORD AddressOfFunctions; //offset 0x1c
  DWORD AddressOfNames; //offset 0x20
  DWORD AddressOfNameOrdinals; //offset 0x24
 }

Here, we need some manual calculation of addresses for each attribute for our later analysis. In the above definition, WORD is a computer word of 16 bites (2bytes), and DWORD is 4 bytes. We can easily infer that, MajorVersion is located at offset 0x8, and AddressOfFunctions is located at offset 0x1c.

Now assume that IMAGE_EXPORT_DIRECTORY is located at 0x7C903400, the following is the dump from WinDbg (here "dd" is to display memory):

kd> dd 7c903400
7c903400  00000000 48025c72 00000000 00006786
7c903410  00000001 00000523 00000523 00003428
7c903420  000048b4 00005d40 00057efb 00057e63
7c903430  00057dc5 00002ad0 00002b30 00002b40
7c903440  00002b20 0001eb58 0001ebb9 0001e3af
7c903450  0002062d 000206ee 0004fe3a 00012d71
7c903460  000211e7 0001eaff 0004fe2f 0004fdaa
7c903470  0001b08a 0004febb 0004fe6d 0004fde6

We can soon infer that there are 0x523 functions exposed in the export table, and there are 0x523 names exposed. Why? Because the NumberOfFunctions is located at offset 0x14 (thus its address is 0x7c903400+0x14 = 0x7c903414)  For another example, look at the attribute "Name" which is located at offset 0xc (i.e., its address: 0x7c90340c), we have number 0x00006787. This is the address relative to the base DLL address (assume it is 0x7c900000). Then we have the name of the module located at 0x7c906786. We can verify using the "db" command in WinDbg (display memory contents as bytes): you can verify that the module name is indeed ntdll.dll.


kd> db 7c906786
7c906786  6e 74 64 6c 6c 2e 64 6c-6c 00 43 73 72 41 6c 6c  ntdll.dll.CsrAll
7c906796  6f 63 61 74 65 43 61 70-74 75 72 65 42 75 66 66  ocateCaptureBuff


Read page 26 of [1], you will find that the "AddressOfFunctions", "AddressOfNames", and "AddressOfNameOdinals" are the most important attributes. There are three arrays (shown as below), and each of the above attributes contains one corresponding starting address of an array.

PVOID Functions[523]; //each element is a function pointer
char * Names[523]; //each element is a char * pointer
short int Ordinal[523]; //each element is an 16 bit integer

For example, by manual calculation we know that the Names array starts at 7C9048B4 (given the 0x48B4 located at offset 0x20, for attribute AddressOfNames; and assuming the base address is 0x7C900000). We know that each element of the Names array  is 4 bytes. here is the dump of the first 8 elements:
kd> dd 7c9048b4
7c9048b4  00006790 000067a9 000067c3 000067db
7c9048c4  00006807 0000681f 00006831 00006845


We can verify the first name (00006790): It's CsrAllocateCaptureBuffer. Note that a "0" byte is used to terminate a string.
kd> db 7c906790
7c906790  43 73 72 41 6c 6c 6f 63-61 74 65 43 61 70 74 75  CsrAllocateCaptu
7c9067a0  72 65 42 75 66 66 65 72-00 43 73 72 41 6c 6c 6f  reBuffer.CsrAllo


We can also verify the second name (000067a9): It's CsrAllocateMessagePointer.
kd> db 7c9067a9
7c9067a9  43 73 72 41 6c 6c 6f 63-61 74 65 4d 65 73 73 61  CsrAllocateMessa
7c9067b9  67 65 50 6f 69 6e 74 65-72 00 43 73 72 43 61 70  gePointer.CsrCap


Now, given a Function name, how do we find its entry address? The following is the formula:
Note that array index starts from 0.
Assume Names[x].equals(FunctionName)
Function address is Functions[Ordinal[x]]

4. Challenge1 of the Day
The first sixteen elements of the Ordinal is shown below:
kd> dd 7c905d40
7c905d40  00080007 000a0009 000c000b 000e000d
7c905d50  0010000f 00120011 00140013 00160015


The first eight elements of the Functions array is shown below:
kd> dd 7c903428
7c903428  00057efb 00057e63 00057dc5 00002ad0
7c903438  00002b30 00002b40 00002b20 0001eb58



What is the entry address of function CsrAllocateCaptureBuffer? The answer is: it's 7C91EB58. Think about why? (Pay special attention to the byte order of integers).


5. Analysis of Code
We now start to analyze the code, starting at 0x40105C. Set a hardware breakpoint at 0x40105C (in code pane, right click -> Go To Expression (0x40105c) and then right click -> breakpoints -> hardware, on execution). Press F9 to run to the point. The first instruction should be PUSH DS:[EAX+8]. If you see a bunch of BYTE DATA instructions, that's caused by the byte scission of the code. Highlight all these BYTE DATA instructions, right click -> Treat as Command during next analysis and we should have the correct disassembly displayed in IMM.

Figure 4: Accessing Export Table


Now let us analyze the first couple of instructions starting at 0x40105C (in Figure 4). Continuing the analysis of Tutorial 7, we know that  after reading the module information (one by one), the code jumps out of the loop when it encounters the ntdll.dll. At this moment, EAX contains the address of offset 0x18 of LDR_DATA_TABLE_ENTRY. In another word, EAX points to the attribute "DllBase". Thus, the instruction at 0x40105C, i.e., PUSH DWORD DS:[EAX+8] is to push the DllBase into the stack. Executing this command, you will find that 0x7C900000 appearing on top of the stack.

The control flow soon jumps to 0x401077 and 0x401078. It soon follows that at 0x401070, ECX now has value 0x7C90000 (the DLL base).  Now consider the instruction 0x40107D:
       MOV EAX, DWORD PTR DS:[ECX+3C]

Recall that the beginning of a PE file is the DOS header (which is 64 bytes) and the last 4 bytes of the DOS header contains the location of the PE header (offset relative to the DLL base) [see Section 2]. Hex value 0x3C is decimal value 60! So we now have EAX containing of PE offset. Observing the registry pane, we have EAX = 0xE0. We then infer that PE header is located at 0x7C9000E0 (which is 0x7C900000 base + offset 0xE0).

Now observe instruction at 0x401087:
    MOV ESI, DWORD PTR DS:[EAX+78]

Note the offset 0x78, its decimal value is 120. From Figure 1, we can soon infer that offset 0x78 is the address of the EXPORT table data entry in the IMAGE_DATA_DIRECTORY of the optional header. Thus, ESI now contains the entry address of the export table (offset relative to DLL base) ! After instruction at 0x40108c, Its value is now 0x7C903400 (starting address of EXPORT TABLE).


5. Challenge of the Day
We have demonstrated to you some basic analysis techniques to reverse engineer the malicious logic. Now your job is to continue our analysis and explain what the Max++ malware is trying to do. Specifically, you can follow the road-map below:
(1) What does function 0x004138A8 do? What are its input parameters?
(2) Which data fields of the export table are the instructions between 0x4010AD and 0x4010C4 accessing?
(3) Explain the meaning of EDX*+C in the instruction at 0x4010BB.
(4) Explain the logic of 0x4010CB to 0x4010F6.
(5) What is the purpose of the loop from 0x401103 to 0x401115?
(6) What does function 0x40165E do? What are its input parameters?
(7) Explain the code between 0x401117 and 0x40113E.


References
1. Goppit, "Portable Executable Format - a Reverse Enginnering View", v1(2), Code Breakers Magzine, January 2006.
2. An online copy of winnt.h, Available at http://source.winehq.org/source/include/winnt.h

Wednesday, December 14, 2011

Malware Analysis Tutorial 7: Exploring Kernel Data Structure

Learning Goals:

  1. Explore kernel data structures effectively, e.g., using WinDbg.
  2. Understand the important kernel structures of Windows to maintain live information about processes and threads.
  3. Know the difference between hard and soft breakpoints and can use them effectively during debugging.
  4. Practice code reverse engineering to understand assembly code.
Applicable to:
  1. Computer Architecture
  2. Operating Systems Security
  3. Assembly Language
  4. Operating Systems
1. Introduction

     This tutorial shows you how to explore kernel data structures of windows using WinDbg. It is very beneficial to us for understanding the infection techniques employed by Max++. We will look at some interesting data structures such as TIB (Thread Information Block), PEB (Process Information Block), and the loaded modules/dlls of a process. We will examine what Max++ did to some important kernel DLL files.

1.1 Lab Setup
If you have not installed WinDbg on your host machine (note: not the VM instance), please follow Tutorial 1 first to install the VirtualBox platform (a small LAN consisting of one Linux gateway and one Windows instance infected with Max++). Then please follow Tutorial 4 to install WinDbg on the host machine (note: not the VM instances) and configure the piped COM port for the VM instance to be debugged.  The following is the steps of launching the VM instance and WinDbg:

  1. Launch the Windows guest OS in VirtualBox first. Boot it in the "Debugged" mode. (Follow Tutorial 4 for how to include the "Debugged" boot option).
  2. On your Host machine, start a command window and change directory to "c:\Program Files\Debugging Tools for Windows(x86)" and type the following.
    windbg -b -k com:pipe,port=\\.\pipe\com_11
  3.  You should see in the WinDbg window the following "Breakpoint on INT 3". It means that it currently stops at a software breakpoint (INT 3). Type "g" (means "go") to let it continue. If necessary, "g" it a second time.
  4. Occasionally you might find that your windows guest OS is frozen. Simply in the WinDbg window (at the host) type "g".
  5. Now start the Immunity Debugger in the windows guest OS, and load the Max++ (see Tutorial 1 for where to get Max++ binary).
  6. In the Code Pane of IMM, right click to go to "0x401018" and then set a HARD BREAKPOINT (right click and select "Breakpoint->Hardware, on Execution") at it. This is where we stopped at in Tutorial 6. As you see right now, the instruction at 0x401018 is "DEC DWORD [EAX+20]". Later, when we stop at this address, the instruction will be overwritten, due to the self-extracting feature of Max++, see details in Tutorial 6.  Then Press F9 (continue) to run to 0x00401018.

 1.1.1 Why Hardware Breakpoint?
  Notice that, you have to use hardware breakpoint in Step 6. Why not software breakpoint? Think about how software breakpoint is implemented. When you set a software breakpoint in a debugger, the debugger actually modifies the first byte of the instruction at that location to "INT 3". When the execution gets to the "INT 3", the windows kernel calls debugger to handle the interrupt (which then stops and highlights it in the debugger window, and when you resume the execution or cancel the breakpoint, the debugger writes the original opcode back).

 Recall that the malware does self-extraction (see Tutorial 6). It overwrites the "INT 3" and you will never be able to stop at the desired location 0x401018! That's the reason we use hardware breakpoint. When a hardware breakpoint is set, the address is recorded in one of the four HW breakpoint registers provided by an Intel CPU. The CPU examines the registers everytime one instruction is executed and stops at it. The only drawback is that you can set up to 4 hardware breakpoints at any time.

1.2 Analysis Objective
We will analyze around 20 instructions, from 0x00401018 to 0x0040105B. The assembly code is shown in Figure 1.


Figure 1. Code Segment to Analyze (0x401018 to 0x40105B)



2. FS Register, TIB, and PEB


As shown in Figure 1, instruction 0x00401018 (MOV EAX, DWORD FS:[18]) does some important trick . It is reading the memory word located at FS:[18] into EAX. Here FS, like SS and DS, is one of the segment registers provided in Intel x86 register file.  The FS:[18]is an address specified using the displacement addressing mode. The address is calculated as [value stored in FS] + 0x18. 

Whenever you see some code accessing the FS register, you should pay special attention! FS points to the most important Windows kernel data structure related to the current process/thread. Check out reference [1] for details and you will see that FS:[18] stores the entry address of TIB (Thread Information Block) - also called TEB.

Then the instruction at 0x40101E (MOV EAX, [EAX+30]) takes the word located at EAX+0X30. What does this mean? Since now EAX has the entry address of the TIB, it is now taking some data field which is 0x30 bytes away from the beginning of the TIB record.

We need to figure out the internal data structure of TIB. There are two ways: (1) MSDN document, and (2) take advantage of the WinDbg kernel debugger. For the most well known data structures like TIB, people have already done the address calculation for you. For example, by reading [1], you would know that offset 0x30 stores the entry address of PEB (process information block). But for most cases, for a kernel data structure, you'll have to manually calculate the offset (i.e., figure out the size of all the previous attributes in the structure and sum them up).

The most convenient way would be using WinDbg. Now come back to our WinDbg window in the host machine and type the following: (Ctrl+Break). This is to interrupt the running of the guest windows and get the control back to WinDbg. Then type the following:

dt nt!_TEB

This is to say, display the data type of "_TEB" located in the nt module. If you need information of the "nt" module, you can type

lm

This displays the loaded modules and you can see that  "nt" is the module name for "ntoskrnl.dll".

WinDbg is actually very powerful, by appending "-r n" to the dt command, you can display the data types recursively, i.e., when a data field itself is a complex data type, you can display its contents. For example, dt nt!_TEB -r 2 display the contents recursively and the extraction level is 2.

From the WinDbg dt dump, you can immediately infer that 0x30 of TEB is the entry address of PEB.

3. Loaded Module List
We now proceed to the next few instructions.  Using the technique introduced in Section 2, we can infer that instruction at 0x401021 (MOV ECX, [EAX+C]) loads into the ECX the pointer to LDR (loaded module list). The information of PEB structure can be found on MSDN [2], however, you will find that WinDbg actually can provide more detailed information, including many undocumented attributes.

Now we need to look at the structure of LDR (_LIST_ENTRY). Executing dt nt!_PEB_LDR_DATA in WinDbg, we have the following dump:

kd> dt _PEB_LDR_DATA
nt!_PEB_LDR_DATA
   +0x000 Length           : Uint4B
   +0x004 Initialized      : UChar
   +0x008 SsHandle         : Ptr32 Void
   +0x00c InLoadOrderModuleList : _LIST_ENTRY
   +0x014 InMemoryOrderModuleList : _LIST_ENTRY
   +0x01c InInitializationOrderModuleList : _LIST_ENTRY
   +0x024 EntryInProgress  : Ptr32 Void
kd> dt _LIST_ENTRY
nt!_LIST_ENTRY
   +0x000 Flink            : Ptr32 _LIST_ENTRY
   +0x004 Blink            : Ptr32 _LIST_ENTRY


 Notice that ECX now contains the address of the offset 0xC of the _PEB_LDR_DATA, starting at this address is a _LIST_ENTRY structure which contains two computer words (each word is 4 bytes long). The first four bytes is the Flink, which points to the next _LIST_ENTRY, and the next four bytes is the Blink, which points to the previous _LIST_ENTRY. So this is exactly a doubly linked list structure! More details of the PEB_LDR_DATA structure can be found in MSDN document [4]. However, again, notice that the documentation in [4] is not complete and is NOT accurate! The most authorative information should be from WinDbg.


Now let us proceed to instruction 00401029 (MOV EAX, DWORD [ECX]). This is essentially to move the contents of the FLink to EAX. Now according to [4], the EAX now has the entry address of the_LDR_DATA_TABLE_ENTRY for the next module. However, it is WRONG! the correct information is that EAX now contains the address of the offset 0x8 of _LDR_DATA_TABLE_ENTRY (i.e., the address of the data field "InMemoryOrderLinks")

Now comes the interesting part. Look at instruction 0x0040102D (MOV EDX, DWORD [EAX+20]), what does this mean? Let's examine the data structure LDR_DATA_TABLE_ENTRY first.

kd> dt _LDR_DATA_TABLE_ENTRY -r2
nt!_LDR_DATA_TABLE_ENTRY
   +0x000 InLoadOrderLinks : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x008 InMemoryOrderLinks : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x010 InInitializationOrderLinks : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x018 DllBase          : Ptr32 Void
   +0x01c EntryPoint       : Ptr32 Void
   +0x020 SizeOfImage      : Uint4B
   +0x024 FullDllName      : _UNICODE_STRING
      +0x000 Length           : Uint2B
      +0x002 MaximumLength    : Uint2B
      +0x004 Buffer           : Ptr32 Uint2B
   +0x02c BaseDllName      : _UNICODE_STRING
      +0x000 Length           : Uint2B
      +0x002 MaximumLength    : Uint2B
      +0x004 Buffer           : Ptr32 Uint2B
   +0x034 Flags            : Uint4B
   +0x038 LoadCount        : Uint2B
   +0x03a TlsIndex         : Uint2B
   +0x03c HashLinks        : _LIST_ENTRY
      +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
      +0x004 Blink            : Ptr32 _LIST_ENTRY
         +0x000 Flink            : Ptr32 _LIST_ENTRY
         +0x004 Blink            : Ptr32 _LIST_ENTRY
   +0x03c SectionPointer   : Ptr32 Void
   +0x040 CheckSum         : Uint4B
   +0x044 TimeDateStamp    : Uint4B
   +0x044 LoadedImports    : Ptr32 Void
   +0x048 EntryPointActivationContext : Ptr32 Void
   +0x04c PatchInformation : Ptr32 Void

We know that the instruction MOV EDX, DWORD [EAX+20] is to load the contents of the word located at EAX+0x20. But where is EAX pointing at? It's pointing at offset 0x8 of the _LDR_DATA_TABLE_ENTRY. Thus EAX+0x20 is pointing at offset 0x28 (see the emphasized area of the data structure dump above), which is the "Buffer" field of the FullDllName.

In Windows, _UNICODE_STRING is Microsoft's effort to cope with the multi-cultural/language needs for localization of windows in different parts of the world. It consists of two parts: (1) length of the string, and (2) the real raw data of the string. So the "Buffer" field encodes the full DLL name in unicode!

What it essentially means is that code at 0x0040102Dis starting to process/read the DLL name! To verify our conjecture, look at the register EDX in the Immunity Debugger (Figure 3).You can see that the first module name we are looking at is "ntdll.dll".


Figure 3: EDX points to DLL Name

4. Challenges of the Day
Now let us try to get the whole picture of the code from 0x00401018 to 0x00401054. You might notice that we have actually a nested 2-layer loop here.

The outer loop is from 0x40102E to 0x401054, this is essentially a do-while loop. The inner loop is from 0x401036 to 0x401046. Our challenges today are:
(1) What does the inner loop from 0x401036 to 0x401046 do?
(2) What does the out-loop do?

A hint here: the code we discussed today tries to search for a module and do some bad things to that module (these malicious operations will start at 0x40105C). Use your immunity debugger to find it out. We will show you these malicious operations in the next tutorial.

References
1. Wiki, "Windows Thread Information Block", Available at http://en.wikipedia.org/wiki/Win32_Thread_Information_Block
2. Microsoft, "PEB Structure", Available at http://msdn.microsoft.com/en-us/library/windows/desktop/aa813706(v=vs.85).aspx
3.Microsoft, "PEB_LDR_DATA structure", Available at http://msdn.microsoft.com/en-us/library/windows/desktop/aa813708(v=vs.85).aspx


Tuesday, December 6, 2011

Malware Analysis Tutorial 6: Analyzing Self-Extraction and Decoding Functions

Learning Goals:

  1. Use Immunity Debugger to Analyze and Annotate Binary Code.
  2. Understand the Techniques for Self-Extraction in Code Segment.
Applicable to:
  1. Computer Architecture
  2. Operating Systems Security
1. Introduction


In this tutorial, we discuss several interesting techniques to analyze decoding/self-extraction functions, which are frequently used by malware to avoid static analysis. The basic approach we use here is to execute the malware step by step, and annotating the code.

1.1 Goals
We will examine the following functions in Max++ (simply set a breakpoint at each of the following addresses):
  • 0x00413BC2
  • 0x00413BDD
  • 0x00413A2B
  • 0x00410000
  • 0x00413BF2

1.2 General Techniques
 We recommend that you try your best to analyze the aforementioned functions first, before proceeding to section 2. In the following please find several useful IMM tricks:
  • Annotating code: this is the most frequently used approach during a reverse engineering effort. Simply right click in the IMM code pane and select "Edit Comment", or press the ";" key.
  • Labeling code: you could set a label at an address (applicable to both code and data segments). When this address is used in JUMP and memory loading instructions, its label will show up in the disassembly. You can use this to assign mnemonics to functions and variables. To label an address, right click in IMM code pane and select "Label".
  • Breakpoints: to set up software breakpoints press F2. To set up hardware breakpoints, right click in code pane, and select Breakpoints->Hardware Breakpoint on Execution. At this moment, set soft breakpoints only.
  • Jump in Code Pane: you can easily to any address in the code segment by right clicking in code pane and enter the destination address.


2. Analysis of Code Beginning at 0x00413BC2

As shown in Figure 1, there are four related instructions, POP ESI (located at 0x00413BC1), SUB ESI, 9 (located at 0x00413BC2), and the POP ESP and RETN instructions.

Figure 1. Code Starting at 0x00413BC2

 As discussed in Tutorial 5, the RETN instruction (at 0x00413BC0) is skipped by the system when returning from INT 2D (at 0x00413BBE). Although it looks like the POP ESI (at 0x413BC1) is skipped, it is actually executed by the system. This results in that ESI now contains value 0x00413BB9 (which is pushed by the instruction CALL 0x00413BB9 at 0x00413BB4). Then the SUB ESI, 9 instruction at 0x00413BC2 updates the value of ESI to 0x00413BB0. Then the next LODS instruction load the memory word located at 0x00413BB0 into EAX (you can verify that the value of EAX is now 0). Then it pops the top element in the stack into EBP, and returns. The purpose of the POP is to simply enforce the execution to return (2 layers) back to 0x413BDD.

Note that if the INT 2D has not caused any byte scission, i.e., the RETN instruction at 0x00413BD7 will lead the execution to 0x413A40 (the IRETD instruction). IRETD is the interrupt return instruction and cannot be run in ring3 mode (thus causing trouble in user level debuggers such as IMM). From this you can see the purpose of the POP EBP instruction at 0x413BC6.

Conclusion: the 4 instructions at 0x00413BC2 is responsible for directing the execution back to 0x00413BDD. This completes the int 2d anti-debugging trick.

3. Analysis of Function 0x00413BDD


Figure 2: Function 0x00413BDD


As shown in Figure 2, this function clears registers and calls three functions: 0x413A2B (decoding function), 0x00401000 (another INT 2D trick), and call EBP (where EBP is set up by the function 0x00401000 properly). We will go through the analysis of these functions one by one.


4. Analysis of Function 0x00413A2B.

Figure 3: Function 0x00413A2B


 Function 0x00413A2B has six instructions and the first five forms a loop (from 0x00413A2B to 0x00413A33), as shown in Figure 3.  Consult the Intel instruction manual first, and read about the LODS and STORS instruction before proceeding to the analysis in the following.

  Essentially the LODS instruction at 0x00413A2B loads a double word (4 bytes) from the memory word pointed by ESI to EAX, and STOS does the inverse. When the string copy finishes, the LODS (STOS) instruction advances the ESI (EDI) instruction by 4. The next two instructions following the LODS instruction perform a very simple decoding operation, it uses EDX as the decoding key and applies XOR and SUB operations to decode the data.

  The loop ends when the EDI register is equal to the value of EBP. If you observe the values of EBP and EDI registers in the register pane, you will find that this decoding function is essentially decoding the region from 0x00413A40 to 0x00413BAC.

  Set a breakpoint at 0x00413A35 (or F4 to it), you can complete and step out of the loop. To view the effects of this decoding function, compare Figure 4 and Figure 5. You can see that before decoding, the instruction at 0x00413A40 is an IRET (interrupt return) instruction and after the decoding, it becomes the INT 2D instruction!

 Figure 4: Region 0x00413A40 to 0x00413BAC (before decoding)



 Figure 5: Region 0x00413A40 to 0x00413BAC (after decoding)



 Now let's right click on 0x00413A2B and select "Label" and we can mark the function as "basicEncoding". (This is essentially to declare 0x00413A2B as the entry address of function "BasicEncoding"). Later, whenever this address shows in the code pane, we will see this mnemonic for this address. This will facilitate our analysis work greatly.

5. Analysis of Code Beginning at 0x00410000

Function 0x00410000 first clears the ESI/EDI growing direction and immediately calls function 0x00413A18. At 0x00413A18, it again plays the trick of INT 2D. If the malware analyzer or binary debugger does not handle by the byte scission properly, the stack contents will not be right and the control flow will not be right (see Tutorials 3,4,5 for more details of INT 2D).

In summary,  when the function returns to 0x00413BED, the EBP should have been set up property. Its value should be 0x00413A40.

6. Analysis of Code Beginning at 0x00413A40

 We now delve into the instruction CALL EBP (0x00413A40). Figure 6 shows the function body of 0x00413A40. It begins with an INT 2D instruction (which is continued with a RET instruction). Clearly, in regular/non-debugged setting, when EAX=1 (see Tutorial 4), the byte instruction RET should be skipped and the execution should continue.

Figure 6: Another Decoding Function

Challenge of the Day

 The major bulk of the function is a multiple level nested loop which decodes and overwrites (a part) of the code segment. Now here comes our challenge of the day.

(1) How do you get out of the loop? [hint: the IMM debugger has generously plotted the loop structure (each loop is denoted using the solid lines on the left). Place a breakpoint at the first instruction out of the loop - look at 0x00413B1C]

(2) Which part of the code/stack has been modified? What are the starting and ending addresses? [Hint: look at the instructions that modify RAM, e.g., the instruction at 0x00413A6F, 0x00413A8D, 0x00413B0E.

Friday, October 21, 2011

Malware Analysis Tutorial 5: Int2d Anti-Debugging Trick (Part III)

Learning Goals:

  1. Apply the techniques presented in Tutorials 3 and 4 to analyzing Max++ anti-debugging trick.
  2. Practice reverse engineering/interpretation of Intel x86 assembly.
Applicable to:
  1. Computer Architecture
  2. Operating Systems Security
  3. Software Engineering

Challenge of the Day:
  1. Write a Python snippet for Immunity Debugger that executes Max++ and generates a log message for each INT 2D instruction executed.
1. Introduction

[Lab Configuration: we assume that you are running the VM instance using NON-DEBUG mode. We will use the Immunity Debugger in this tutorial.]
 
We now revisit Max++ and apply the knowledge we have obtained in Tutorial 3 and Tutorial 4. Figure 1 presents the disassembly of the first 20 instructions of Max++. The entry point is 0x00403BC8. Execute the code step by step until you reach 0x403BD5, now you are facing your first challenge: How do you deal with the INT 2D instruction?

There are several choices you could take: (1) Simply press F8 and IMM will SKIP the RETN instruction and directly jumps to 0x413BD8 (by executing the CALL 0x413BD8 instruction right after the RETN instruction); (2) Execute the RETN instruction by readjusting the EIP register to enforce its execution. In IMM, you can readjust the value of EIP by launching the Python window (clicking the 2nd button on the toolbar, on the right of the "open_file" button), and then executing the following Python command: "imm.setReg("EIP", 0x00413BD7);".

Figure 1. Entry Point of Max++

Which action to take will depend on the behavior of IMM -- if we press F8, will its behavior be the same as running the program without any debuggers attached? Following a similar approach taken in Tutorial 4 we can do an experiment for the case of EAX=1 (i.e., calling the debug print service of INT 2D). The conclusion is:

When the DEBUG-MODE is NOT enabled at booting, the behavior of IMM is the same as regular execution, given EAX=1 (note that in tutorial 4, our experiments explore the case when EAX=0). Hence, we can feel safe about stepping over (and skipping the RETN instruction) in IMM!

2.Diverting Control Flow using Int 2D

As shown in Figure 2, now we are at 0x00413A38. In this function, we only have four instructions: STD, MOV EDI, EDI, CALL 0x00413BB4, and IRETD.

The purpose of the STD instruction is to set the growth direction of EDI (i.e., direction flag)  to -1. EDI/ESI registers are frequently used in RAM copy instructions such as "REP STOSB" (to repeatedly copy from memory address pointed by ESI to the destination address pointed by EDI). Later we'll see the use of these instructions in the decoding of encrypted malicious code in Max++.

The MOV EDI,EDI instruction does nothing (no impacts on any flag registers) and then we are calling the function at 0x00413BB4.


Note that it looks like once we are back from 0x00413BB4, the next immediate instruction is to return (IRETD), however, it is not the case. Function 0x413BB4 will retrieve a section of encrypted code, decrypt them, and deploy it from the location of IRETD. So if a static analysis tool is used  to analyze the program, e.g., draw the control flow graph of Max++, it will mislead the malware analyzers. We'll get to the decoding function in the next tutorial.

Figure 2. Function 0x413A38

 Press "F7" to step into Function 0x00413BB4. Now we are getting to the interesting point. Look at instruction CALL 0x00413BB9 at 0x413BB4 in Figure 3!

  The CALL instruction basically does two things: (1) it pushes the address of the next instruction to the stack (so when the callee returns, the execution will resume at the next instruction); If you observe the stack content (the pane on the right-bottom on IMM), you will notice that 0x413BB9 is pushed into stack. (2) It then jumps to the entry address of the function, which is 0x00413BB9.

 Now the next two instructions is to call the INT 2D service. Notice that the input parameter EAX is 3 (standing for the load image service). Using an approach similar to Tutorial 4, you can design an experiment to tell what is the next action you would take. The conclusion is: when EAX is 3, in the non-kernel-debug mode, the IMM behavior is the same as normal execution, which is: the next immediate byte instruction after INT 2D will be skipped.

  Now, what if the RETN instruction is executed (i.e., the byte instruction is not skipped, assume that an automatic analyzer does not do a good job at handling INT 2D)? You will jump directly and return. The trick is as follows: Recall that RETN takes out the top element in stack and jump to that address. The top element in stack is now 0x00413BB9. So what happens is that the execution comes back to 0x00413BB9 again. Then doing the INT 2D again and RETN again will force the execution to 0x00413A40 (the IRETD instruction, which is right after the CALL 0X00413BB4 in function 0x00413A38 (see Figure 2)). It then returns to the main program and exits. So the other malicious activities will not be performed in this scenario. To this point, you can see the purpose of the int 2d trick: the malware author is trying to evade automatic analysis tools (if they did not handle int 2dh well) and certain kernel debuggers such as WinDbg.

Challenge of the day: use WinDbg instead of Immunity Debugger to debug through Max++ (with DEBUG-MODE enabled at booting). What is your observation?


Figure 3. Trick: Infinite Loop of Call


3. Conclusion
  We have shown you several examples of the use of INT 2D in Max++ to detect the existence of debugger and change malware behavior to avoid being analyzed by a debugger. For debugger to cope with INT 2D automatically will not be an easy job. First, there are many scenarios to deal with (affected by the type of debugger, existence of kernel debugger, and the booting options). Second, don't expect to catch all INT 2D instructions when the program is loaded, because a program can be self-extracting (modifying its code segment at run time).

4. Challenge of the Day
  It is beneficial to write a Python script that drives the Immunity Debugger to cope with INT 2D automatically. We provide some basic ideas here:

In IMM, there is a global variable "imm" for you to drive the debugger. You can use "imm" to inspect all register values, set all register values (thus including modifying EIP to change control flow), examine and modify RAM. Your program will be a simple loop, which executes instructions one by one (to do this, you can take advantage of breakpoint functions available in the Python API of IMM). Before executing an instruction, you can examine its opcode (using libanalyze.opcode, check the IMM documentation), and take proper actions for INT 2D (skipping the next byte based on the value of EAX, to simulate a normal non-debug environment).

Thursday, October 13, 2011

Malware Analysis Tutorial 4: Int2dh Anti-Debugging (Part II)

Learning Goals:
  1. Explore the behavior difference of debuggers on int 2dh.
  2. Debugging and modification of binary executable programs.
  3. Basic control flow constructs in x86 assembly.
Applicable to:
  1. Computer Architecture
  2. Operating Systems
  3. Operating Systems Security
  4. Software Engineering

Challenge of the Day:
  1. Find out as many ways as possible to make a program run differently in a debugged environment from a regular execution (using int 2d)?
1. Introduction

The behavior of int 2d instructions may be affected by many factors, e.g., the SEH handler installed by the program itself, whether the program is running under a ring 3 debugger, whether the OS is running in the debugged mode, the program logic of the OS exception handler (KiDispatch), the value of registers when int 2d is requested (determining the service that is requested). In the following, we use an experimental approach to explore the possible ways to make a program behave differently when running in a virtual machine and debugged environment.

2. Lab Configuration

In addition the the immunity debugger, we are going to use WinDbg in this tutorial. Before we proceed, we need to configure it properly on the host machine and the guest XP.

If you have not installed the guest VM, please follow the instructions of Tutorial 1. Pay special attention to Seciton 3.1 (how to set up the serial port of the XP Guest). In the following we assume that the pipe path on the host machine is \\.\pipe\com_11 and the guest OS is using COM1. The installation of WinDbg on the host machine can follow the instructions on MSDN.

We need to further configure the XP guest to make it work.

(1) Revision of c:\boot.ini. This is to set up a second booting option for the debug mode. The file is shown as below, you can modify yours correspondingly. Note that we set COM1 as the debug port.

-------------------------------------
[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="DEBUGGED VERSION" /noexecute=optin /fastdetect /debug /debugport=com1 /baudrate=115200

------------------------------------

(2) Manual configuration of COM ports. In some versions of XP, COM ports have to be manually configured. You can follow jorgensen's tutorial on "How to Add a Serial Port in Windows XP and 7 Guest" (follow the XP part). It consists of two steps: (1) manually add a COM port in Control Panel and (2) manually configure COM1 as the port number.

(3) Test run of WinDbg.
Start your XP guest in the debug mode (2nd option).

Now in the host machine, launch the "Windows SDK 7.1" command window coming with WinDbg. Change directory to "c:\Program Files\Debugging Tools for Windows(x86)" and type the following. You should be able to get a window as shown in Figure 1.

windbg -b -k com:pipe,port=\\.\pipe\com_11

You might notice that currently you are not able to access your XP Guest. This is because WinDbg stops its running, Simply type "g" (standing for "go") in the WinDbg window, and let the XP Guest continue.
Figure 1: Screenshot of WinDbg


3. Experiment 1: Int 2d on Cygwin

In the following, we demonstrate some of the interesting behaviors of Int 2d using a simple program Int2dPrint.exe. The C++ source of the program is shown in the following. The output of the program should be "AAAABBBB". We added a fflush(stdout) to enforce the output in an eager mode and before each printf() statement, there are five integer operations to allow us insert additional machine code later.

----------------------------------------------
#include <stdio.h>

int main(){
  int a = 0;
  int b = 0;
  int c = 0;
  int d = 0;
  int e = 0;
  printf("AAAA");
  fflush(stdout);

  a = 0; b = 0; c = 0; d = 0; e = 0;
  printf("BBBB");
  fflush(stdout);
}
--------------------------------------------
                     Source code of Int2dPrint.exe

Figure 2 shows the assembly of the compiled code.Clearly,the 'MOV [EBP-xx], 0' instructions between 0x4010BA and 0x4010D6 correspond to the integer assignments "int a=0" etc. in the source program. The "MOV [ESP], 0X402020" at 0x4010DD is to push the parameter (the starting address of constant string "AAAA") into the stack, for the printf() call. Also note that before the fflush call at 0x4010F4, the program calls cygwin.__getreent. It is to get the thread specific re-entrant structure so that the stdout (file descriptor of the standard output) can be retrieved. In fact, you can infer that the stdout is located at offset 0x8 of the reentrant structure.

Figure 2. Compiled Binary of Int2dPrint.cc
3.1 Patching Binary Code

Now let us roll our sleeves and prepare the patch the Int2dPrint.exe. The binary program is compiled using g++ under Cygwin. To run it, you need the cygwin1.dll in Cygwin's bin folder. You can choose to compile it by yourself, or use the one provided in the zipped project folder.

Make sure that your XP guest is running in NON-DEBUG mode!

We now add the following assembly code at location 0x4010F9 of Int2dPrint.exe (the first "int a=0" before the printf("BBBB")). Intuitively, the code tests the value of EAX after the int 2d call. If EAX is 0 (here "JZ" means Jump if Zero), the program will jump to 0x401138, which skips the printf("BBBB"). Notice that this occurs only when the instruction "inc EAX" is skipped.

------------------------------------
 
xor EAX, EAX       # set EAX=0;
int 2d                      # invoke the exception handler
inc EAX                 # if executed, will set EAX=1
cmp EAX, 0
JZ 0x401138         # if EAX=0, will skip printf("BBBB");
-----------------------------------
     The assemble Code to Insert

The following shows you how to patch the code using IMM:
(1) Right click at 0x4010F9 in the CPU pane, and choose "Assemble". (Or simple press Spacebar at the location). Enter the code as above.
(2) Right click in the CPU pane, choose "Copy to Executable" --> "All Modified", then click "Copy All". A window of modified instructions will show up. Close that window and click "Yes" to save. Save the file as Int2dPrint_EAX_0_JZ0.exe. The name suggests that the EAX input parameter to the int 2d service is 0, and we expect it to skip the printf("BBBB") if EAX=0, i.e., the output of the program should be "AAAA". (this, of course, depends on whether the "inc EAX" instruction is executed or not).

In Figure 3, you can find the disassembly of Int2dPrint_EAX_0_JZ0.exe. Setting a breakpoint at 0x004010BA, you can execute the program step by step in IMM. You might find that the output is "AAAA" (i.e., "BBBB" is skipped). It seems to confirm the conclusion of byte scission of int 2d.You can also run the program in a command window, the output is the same.

Figure 3. Disassembly of Int2dPrint_EAX_0_JZ0.exe

But wait, how about another experiment. Let's modify the instruction at 0x401101 and make it "JNZ 0x401138" (name it as Int2dPrint_EAX_0_JNZ0.exe). What is the expected output? "AAAABBBB"? You might find that in IMM, the program outputs "AAAABBBB"; but if run in command window, it generates "AAAA" only!!! (Notice that we have ruled out the possibility that the I/O output was lost in buffer - because we call the fflush(stdout) to enforce all outout immediately). What does this mean? There could be two possibilities:

  (1). Somehow, the instruction "INC EAX" is mysteriously executed (in the regular execution of Int2dPrint_EAX_0_JNZ0.exe). This makes no sense, because prior to 0x401101, the program is exactly the same as Int2dPrint_EAX_0_JZ0.exe.

 (2). There is something tricky in the exception handler code (it could be the SEH of the program itself, or the KiDispatch in the kernel).

 We will later come back to this strange behavior, and provide an explanation.

 3.2 Experiments with Kernel-Debugging Mode

  Now let's reboot the guest OS into the DEBUG mode (but without launching WinDbg in the host machine). Let's re-run the two programs, you might have some interesting finding. Both programs hang the guest OS!

  Now let's reboot the guest OS again into the DEBUG mode and launch WinDbg in the host machine (press "g" twice to let it continue). Now start the Int2dPrint_EAX_0_JNZ0.exe in command window. What is your observation? Figure 4 displays the result: the debugger stops at 0x4010fd (the "inc EAX" instruction) on exception 80000003 (the exception code "BREAKPOINT" in windows)! If you type "g", the program will proceed and produce "AAAA"! (while in the non-debugged windows mode and command window, it's producing "AAAABBBB"!)

Figure 4: Running Result of Int2dPrint_EAX_0_JNZ0.exe

 3.2 Discussion

 Now let us summarize our observations so far in Table 1 (I did not discuss some of the experiments here but you can repeat them using the files provided).

Table 1: Summary of Experiment 1

    To simply put: int 2dh is a much more powerful technique to examine the existence of  debuggers than people previously thought (see the reference list of tutorial 3). It can be used to detect the existence of both ring 3 (user level) and ring 0 (kernel level) debuggers. For example, using Table 1, we can easily tell if Windows is running in DEBUG mode (i.e., kernel debugger enabled) or not, and if a kernel debugger like WinDbg is hooked to the debug COM port or not. We can also tell the existence of a user level debugger such as IMM, whether windows is running in non-debug or debug mode. The delicacy is that the final output of the int 2dh instruction is affected by many factors, and experiment 1 only covers a subset of them. The following is a re-cap of some of the important facts:
  1. EAX, ECX, EDX are the parameters to the int 2d service. EAX (1,2,3,4) represent the printing, interactive prompt, load image, unload image. See Almeida's tutorial for more details.Notice that we are supplying an EAX value 0, which is not expected by the service! (normal values should be from 1 to 4).
  2. Once the int 2d instruction is executed, CPU locates the interrupt vector and jumps to the handler routine, which is the part of OS.
  3. OS wraps the details of hardware exception, and generates kernel data structures such as Exception_Record, which contains Exception Code: 80000003 (represents a breakpoint exception).
  4. Then control is forwarded to kernel call KiDispatchException, which depending on if Windows is running in kernel mode, exhibits very sophisticated behavior. See details in G. Nebbett, "Windows NT/2000 Native API Reference" (pp 441 gives pseudo code of KiDispatchException). For example, in windows debug mode, this generally involves forwarding the exception to debugger first (calling DbgkForwardException), and then the invocation of user program installed SEH handlers, and then forward the exception to debugger a second time.  


We now proceed to briefly explain all the behaviors that we have observed.

  Case 1. Non-Debug Mode and Command Window (column 2 in Table 1): this is the only case that Int2dPrint_EAX_0_JZ0.exe and Int2dPrint_EAX_0_JNZ0.exe behave the same way. There is only one explanation: the inc EAX is not executed - not because the exception handling behaves differently in a debugged environment, but because the entire process is terminated. To illustrate the point, observe the two screenshots in Figure 5, which are generated by the IMM debugger via (View->SEH Chain). Diagram (a) shows the SEH chain when the program is just started, you can see that the default handler kernel32.7C839AC0 (means the entry address of the handler is 7c839ac0 and it is located in kernel32). If you set a breakpoint right before the printf(), you might notice that the SEH chain now includes another handler from cygwin (in Fig 5(b))! It's the cygwin handler which directly terminates the process (without throwing any error messages); if it is the kernel32 handler, it would pop a standard windows error dialog.

Figure 5: SEH Chain of Int2dPrint_EAX_0_JZ0.exe before and after reaching the main()


  Case 2. Non-Debug Mode and IMM Debugger (column 3 in Table 1): Based on the logic of the two programs, you can soon reach the conclusion that the byte instruction right after int 2dh is skipped! There are two observations here: (1) the Cygwin handler is NEVER executed! This is because the Immunity Debugger takes the control first (Recall the logic of KiDispatchException and the KiForwardException to debugger port). (2) Immunity Debugger modifies the value of EIP register, because the exception is a breakpoint. See discussion in Ferrie's article about IMM's behavior [1]. The result of shifting one byte, however, is also affected by the kernel behavior (look at the EIP-- operation in KiDispatchException (see pp. 439 of Nebbett's book [2]). The combined effect is to shift one byte. Note that if replacing IMM with another user level debugger such as IDA, you might have a different result.

 Case 3. Debug Mode without WinDbg Attached and CMD shell (column 4 in Table 1): windows freeze! The reason is clear: no debuggers are listening to the debug port and the breakpoint exception is not handled (no one advances the EIP register).

Case 4. Debug Mode without WinDbg Attached and Run in IMM (column 5 in Table 1): This is similar to case 2. If you F9 (and run the program) in IMM, you might notice that IMM stops at the SECOND instruction  right after int 2dh (i.e, "CMP EAX,0") first (because it's a breakpoint exception, but the kernel debugging service is actually not triggered). If you F9 (continue) the program, it continues and exhibits the same behavior as Case 2. Again, the byte scission is the combined result of IMM and the kernel behavior (on int exceptions).

Case 5. Debug Mode with WinDbg Attached and Run in CMD shell (column 6 in Table 1):In this case, WinDbg stops at the instruction right after int 2dh (i.e., "inc EAX") and if continues, executes the "inc EAX" instruction.

Case 6. Debug Mode with WinDbg Attached and Run in IMM (column 7 in Table 1):In this case, WinDbg never gets the breakpoint exception, it's the user level debugger IMM gets the breakpoint exception first and like case 4, IMM readjusts the EIP register so that it stops at the SECOND INSTRUCTION after int 2d. It is interesting to note that, even when WinDbg is initiated, if you start a user debugger, it gets/overrides the WinDbg on the processing of breakpoints. This is of course understandable -- think about using Visual Studio in the debugged mode for debugging a program, it is natural to pass the breakpoint event to Visual Studio first. Once the user level debugger declares that the exception has been handled, there is no need to to pass to the kernel debugger for handling.

Clearly, IMM debugger has a "defect" in its implementation. First, it blindly processes a breakpoint exception even if this is not a registered exception in its breakpoint list. Second, the kernel service handles the readjustment of EIP differently for int 3 and int 2d (even though both of them are wrapped as the 80000003 exception in windows). When IMM does not differentiate the cases, the combined effect is that the readjustment of EIP is "over-cooked" and we see the byte scission.

3.3 Challenges of the Day

All of the above discussion are based on the assumption that EAX is 0 when calling the int 2d service. Notice that this is a value unexpected by the windows kernel -- the legal values are 1, 2, 3, and 4 (debug print, interactive, load image, unload image). Your challenges today is to find out the cases when EAX is set to 1, 2, 3, 4, and other unexpected values and assess the system behavior. You will have interesting discoveries.




4. Experiment 2: notepad


There is another interesting area we have not explored: the user installed SEH. The Int2d programs are good examples. The preamble code before the main function installs an SEH handler offered by Cygwin. It immediately leads to the termination of the process. It is interesting to observe the behavior of the default kernel32 handler. The following experiment sheds some light.


4.1 Experiment Design
When we use File->Open menu of notepad, we will always see a dialog popped up. Our plan is to insert the code in Section 3.1 before the call for popping dialog, and observe if there is any byte scission.


The first question is how to locate the code in notepad.exe that launches a file open dialog. We will again use some immunity debugger tricks. It is widely known that user32.dll provides important system functions that are related to graphical user interface. We could examine the visible functions by user32.dll using the following approach.
  1. Open notepad.exe (in c:\windows) using the Immunity Debugger
  2. View -> Executable Modules
  3. Right click on "user32.dll" and select "View->Names". This exposes the entry address of all externally visible functions of the dll. Browse the list of functions, we may find a collections of functions such as CreateDialogIndirectParamA and CreateDialogIndirectParamW. Press "F2" to set a software breakpoint on each of them. 
  4. Now F9 to run the notepad program. Click File->Open and the IMM stops at 7E4EF01F. Go back to the View->Names window, you will find that it is the entry address of CreateDialogIndirectParamW.
  5. Now remove all other breakpoints (except CreateDialogIndirectParam), so that we are not distracted by others. You can do this in View->Breakpoints window to remove the ones you don't want.
  6. Restart the program (make sure that your BP is set), click file->open, now you are stopping at CreateDialogIndirectParamW. We will now take advantage of once nice feature in IMM. Click Debug-> Execute Till User Code (to allow us get to the notepad.exe code directly!). Note that since the dialog is a modal dialog (which sticks there until you respond), you have to go back to the running notepad and cancel the dialog. Then the IMM stops at instruction 0x01002D89 of notepad.exe! This is right after the call of GetOpenFileNameW, which we just returns from.

Figure 6. Disassembly of notepad.exe


 The disassembly of notepad.exe is quite straightforward. At 0x01002D27, it sets up the dialog file filter "*.txt", and then at 0x01002D3D, it calls the GetOpenFileW function. The return value is stored in EAX. At 0x01002D89, it tests the value of EAX. If it is 0 (meaning the file dialog is canceled), the program control jumps to 0x01002DE0 (which directly exists the File->open processing).

  We now can insert our instructions (most from Section 3.1) at 0x01002D27 (the side-effect is that the dialog file filter is broken - but this is ok). The code is shown below (we call it notepad_EAX_0_JZ0.exe. Similarly, we can generate notepad_EAX_0_JNZ0.exe):

------------------------------------
 
xor EAX, EAX       # set EAX=0;
int 2d                      # invoke the exception handler
inc EAX                 # if executed, will set EAX=1
cmp EAX, 0
JZ 0x01002D89         # if EAX=0, will skip printf("BBBB");
-----------------------------------
 
Run notepad_EAX_0_JZ0.exe in a command window (undebugged window), you will get the standard exception window thrown by windows. If you click the "details" link of the error dialog, you will be able to see the detailed information: note the error code 0x80000003 and the address of the exception (0x01002D2B!). I believe now you can easily draw the conclusion about the exception handler of kernel32.dll.



Figure 7: Error Report


4.2 Challenge of the Day

Our question is: are you sure that the error dialog is thrown by the handler of kernel32.7C839AC0? Prove your argument.


5. Experiment 3: SEH Handler Technique

Recall that the SEH handler installed by the user program itself can also affect the running behavior of int 2d. For example, Int2dPrint_EAX_0_JZ0.exe installed a handler in Cygwin1.dll, it leads to the termination of the process immediately; while the default kernel32.dll handler throws out an exception dialog that displays debugging information. In this experiment, we repeat Ferrie's example in [3] and explore further potential possibilities of anti-debugging.

Figures 8 and 9 present our slightly adapted version of Ferrie's example in [3] . The program is modified from the Int2dPrint.exe. The first part of the code is displayed in Figure 8, starting from 0x004010F9 and ending at 0x0040110E. We now briefly explain the logic.

Basically, the code is to install a new exception handler registration record (recall that SEH is a linked list and each registration record has two elements: prev pointer and the entry address of handler). So instruction at 0x004010FB is to set up the handler address attribute to 0x004016E8 (we'll explain later), and at 0x00401100 it is to set the prev pointer attribute. Then the instruction at 0x00401103 resets FS:[0], which always points to the first element in the SEH chain. The rest of the code does the old trick: it puts an "INC EAX" instruction right after the int 2d instruction and depending on whether the instruction is skipped, it is able to tell the existence of debugger.

Figure 8. Part I of Ferrie's Code


We now examine the exception handler code at 0x004016E8. It is shown in Figure 9, starting at 0x004016E8 and ending at 0x004016F4. It has three instructions. At 0x004016E8, it puts a word 0x43434343 into address 0x00402025. If you study the instruction at 0x0040111c (in Figure 8), you might notice that at 0x00402025, it stores the string "BBBB". So this instruction is essentially to store "CCCC" into the RAM. If the SEH handler is executed and if the second printf() statement is executed, you should see "AAAACCCC" in output, instead of "AAAABBBB". You might wonder, why not just change the value of a register (e.g., EBX) in the handler to indicate that the SEH is executed? Recall that interrupt handler of OS will recover the value of registers from kernel stack - no matter what value you set on a register (except for EAX), it will be wiped out by OS after return.

The last two instructions of the SEH handler simply returns 0. Notice that, as shown by Pietrek in [1], "0" means ExceptionContinueExecution, i.e., the exception has been handled and the interrupted process should resume. There are other values that you can play with, e.g., "1" means ExceptionContinueSearch, i.e., this handler could not solve the problem and the search has to be continued on the SEH chain to find the next handler. Note that these values are defined in the EXCEPT.h.


Figure 9. Part II of Ferrie's Code

 There could be another factor that affects your experimental results. The immunity debugger can be configured on whether or not to pass the exception to a user program. Click the "Debugger Options" menu in IMM, and then the "Exceptions" tab (shown in Figure 10). You can specify to pass all exceptions to user programs (by clicking the "add range" button and select all exceptions). After the configuration is done, running the program using "Shift + F9" will pass the exceptions to user installed SEH (compared with F9).

Figure 10. Configuration of Exception Handling of IMM

Similar to Section 4, we can run our program (Int2dprint_EAX0_RET0_JZ0.exe, meaning setting EAX to 0 when calling int 2d, and returning 0 in the SEH handler), under different environments, with debugging mode turned on or not. The results are displayed in Figure 11.

Non-debug mode: when running in command window, the output is "AAAACCCC". Clearly, the user installed SEH is executed and the byte scission did not occur (i.e., the "inc EAX" instruction is indeed executed). Compare it with the similar running environment in Table 1, you can immediately understand the effect of returning 0 in SEH: it tells the OS: "everything is fine. Don't kill the process!".

If you run the program in IMM, using F9 (without passing exceptions to user program), the result is "AAAA", where the "inc EAX" is skipped by IMM (similar to Table 1) and the user installed SEH is never executed; however, if you choose shift+F9 to pass exceptions to user program, the SEH is executed and the "inc EAX" is executed! It seems that in the "shift+F9" mode, IMM's does not re-adjust the EIP (as stated in Ferrie's article).

Debug-Mode with WinDbg Attached: Now when WinDbg is attached, the command line running of the program yields "AAAABBBB". This means that "inc EAX" is executed but the SEH is not executed! I believe, similarly, you can explain the IMM running result.

Now, the conclusion is: the use of user installed SEH enables more possibilities to detect the existence of debuggers and how they are configured!

Figure 11. Experimental Results of Ferrie's Example

5.1 Challenges of the Day

Play with the return values of your SEH handler, set it to 1, 2, and other values such as negative integers. What is your observation?


6. Conclusion

The int 2d anti-debugging technique is essentially an application of OS finger printing, i.e., from the behaviors of a system to tell its version and configuration. From the point of view of a program analysis researcher, it could be a very exciting problem to automatically generate such anti-debugging techniques, given the source/binary code of an operating system.


References

[1] M. Pietrek, "A Crash Course on the Depth of Win32Tm Structured Exception Handling," Microsoft System Journal, 1997/01. Available at http://www.microsoft.com/msj/0197/exception/exception.aspxhttp://www.microsoft.com/msj/0197/exception/exception.aspx.

[2] G. Nebbett, "Windows NT/2000 Native API Reference", pp. 439-441, ISBN: 1578701996.

[3] P. Ferrie, "Anti-Unpacker Tricks - Part Three", Virus Bulletin Feb 2009. Available at http://pferrie.tripod.com/papers/unpackers23.pdf, Retrieved 09/07/2011.