Thursday, 31 July 2014

Squirrelling Away Plists

Just grabbin some acorns ...

Plists are Apple's way of retaining configuration information. They're scattered throughout OS X and iOS like acorns and come in 2 main types - XML and binary. Due to their scattered nature and the potential for containing juicy artefacts, monkey thought a script to read plists and extract their data into an SQLite database might prove useful. The idea being analysts run the script (plist2db.py) against a directory of plists and then browse the resultant table for any interesting squirrels. Analysts could also execute the same queries against different sets of data to find common items of interest (eg email addresses, filenames, usernames).
Similar in concept to SquirrelGripper which extracted exiftool data to a DB, the tool will only be as good as the data fields extracted and the analyst's queries. At the very least, it allows analysts to view the contents of multiple plists at the same time. Plus we get to try out Python 3.4's newly revised native "plistlib" which now parses BOTH binary and XML plists. Exciting times!

Not having easy access to an OS X or iOS system, monkey is going to have to improvise a bit for this post and also rely upon the kindness of plist donaters. Special Thanks to Sarah Edwards (@iamevltwin) and Mari DeGrazia (@maridegrazia) for sharing some sample plists used for testing.

XML based plists are text files which can be read using a text editor. Binary plists follow a different file format and typically require a dedicated reader (eg plist Editor Pro) or conversion to XML to make it human readable.
Both types of plist support the following data types:

CFString = Used to store text strings. In XML, these fields are denoted by the <string> tag.
CFNumber = Used to store numbers. In XML, the <real> tag is used for floats (eg 1.0) and the <integer> tag is used for whole numbers (eg 1).
CFDate = Used to store dates. In XML, the <date> tag is used to mark ISO formatted dates (eg 2013-11-17T20:10:06Z).
CFBoolean = Used to store true/false values. In XML, these correspond to <true/> or <false/> tags.
CFData = Used to store binary data. In XML, the <data> tag marks base64 encoded binary data.
CFArray = Used to group a list of values. In XML, the <array> tag is used to mark the grouping.
CFDictionary = Used to store sets of data values keyed by name. Typically data is grouped into dictionaries with <key> and <value> elements.  The <key> fields use name strings. The <value> elements are typically one of the following - <string>, <real>, <float>, <date>, <true/>, <false/>, <data>. The order of key declaration is not significant. In XML, the <dict> tag is used to mark the dictionary boundaries.

To show how it all fits together, let's take a look an XML plist example featuring everyone's favourite TV squirrel ...


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Name</key>
    <string>Rocket J. Squirrel</string>
    <key>Aliases</key>
    <array>
        <string>Rocky the Flying Squirrel</string>
        <string>Rocky</string>
    </array>
    <key>City of Birth</key>
    <string>Frostbite Falls</string>
    <key>DNA</key>
    <data>
    cm9ja3ktZG5hCg==
    </data>
    <key>Year Of Birth</key>
    <integer>1959</integer>
    <key>Weight</key>
    <real>2.5</real>
    <key>Flight Capable</key>
    <true/>
</dict>
</plist>

Note: The DNA <data> field "cm9ja3ktZG5hCg==" is the base64 encoding of "rocky-dna".

We can cut and paste the above XML into plist Editor Pro and save it as a binary plist.
We can also open a new text file and paste the above XML into it to create an XML plist.

Further Resources

The Mac Developer Library documentation describes Plists here and the Apple manual page describes XML plists here.

Michael Harrington has a great working example / explanation of the binary plist file format here and here.

Setting Up

Using the binary capable "plistlib" requires Python v3.4+. So if you don't have it installed, you're gonna be disappointed. Note: Ubuntu 14.04 has Python 3.4 already installed so if you're already running that, you don't have to worry about all this setup stuff.

To install Python 3.4 on Ubuntu 12.04 LTS (eg like SANS SIFT 3), there's a couple of methods.
I used this guide from James Nicholson to install the 3.4.0beta source onto my development VM.
FYI 3.4.1 is currently the latest stable release and should be able to be installed in a similar manner.

There's also this method that uses an Ubuntu Personal Package Archive from Felix Krull.
But Felix makes no guarantees, so I thought it'd be better to install from source.

Alternatively, you can install Python 3.4.1 on Windows (or for OS X) from here.

Not having a Mac or iPhone, monkey created his own binary and XML plist files. First, we define/save the new binary plist file using plist Editor Pro (v2.1 for Windows), then we copy/paste the XML into new text file on our Ubuntu development VM and save it. This way we can have both binary and XML versions of our plist information. Note: Binary plists created by plist Editor Pro in Windows were read OK by the script in Ubuntu. However, Windows created XML plists proved troublesome (possibly due to Windows carriage returns/linefeeds?) - hence the cut and paste from the XML in plist Editor Pro to the Ubuntu text editor for saving.

For squirrels and giggles, we'll continue to base our test data on characters from the Rocky and Bullwinkle Show. For those that aren't familiar with the squirrel and moose, commiserations and see here.

The Script

For each file in the specified input directory (or just for an individual file), the script calls the "plistlib.load()" function.
This does the heavy lifting and returns the "unpacked root object" (usually a dictionary).
The script then calls a recursive "print_object" function (modified/re-used from here) to go into each/any sub-object of the root object and store the filename, plist path and plist value in the "rowdata" list variable.

Once all plist objects have been processed, the script creates a new database using the specified output filename and SQL "replaces" the extracted "rowdata" into a "plists" table. We use SQL "replace" instead of SQL "insert" so we don't get "insert" errors when running the script multiple times using the same source data and target database file. Although to be prudent, it's just as easy to define a different output database name each time ... meh.
The "plists" table schema looks like:

CREATE TABLE plists(filename TEXT NOT NULL, name TEXT NOT NULL, value TEXT NOT NULL, PRIMARY KEY (filename, name, value) )

Note: The "plists" table uses the combination of filename + name + value as a Primary Key. This should make it impossible to have duplicate entries.

See comments in code for further details.

Testing

To run the script we just point it at a directory or individual plist and give it a filename for the output SQLite database.
Here we are using the python3.4 beta exe from my Ubuntu development VM's locally installed directory ...

cheeky@ubuntu:~/python3.4b/bin$ ./python3.4 /home/cheeky/plist2db.py
Running  plist2db.py v2014-07-24

Usage: plist2db.py -f plist -d database

Options:
  -h, --help   show this help message and exit
  -f FILENAME  XML/Binary Plist file or directory containing Plists
  -d DBASE     SQLite database to extract Plist data to
cheeky@ubuntu:~/python3.4b/bin$

Here's how the test data was stored ...

cheeky@ubuntu:~/python3.4b/bin$ tree /home/cheeky/test-plists/
/home/cheeky/test-plists/
+-- bin-plists
¦   +-- boris.plist
¦   +-- bullwinkle.plist
¦   +-- natasha.plist
¦   +-- rocky.plist
+-- Red-Herring.txt
+-- xml-plists
    +-- boris-xml.plist
    +-- bullwinkle-xml.plist
    +-- natasha-xml.plist
    +-- rocky-xml.plist

2 directories, 9 files
cheeky@ubuntu:~/python3.4b/bin$

Note: "Red-Herring.txt" is text file included to show how non-plist files are handled by the script.

Now we can try our extraction script with our test data ...

cheeky@ubuntu:~/python3.4b/bin$ ./python3.4 /home/cheeky/plist2db.py -f /home/cheeky/test-plists/ -d /home/cheeky/bullwinkles.sqlite
Running  plist2db.py v2014-07-24

*** WARNING /home/cheeky/test-plists/Red-Herring.txt is not a valid Plist!

cheeky@ubuntu:~/python3.4b/bin$

Here is a screenshot of the resultant "bullwinkles.sqlite" database ...

Test Data Output Database

Note: The XML plist DNA <data> fields shown have been extracted and base64 *decoded* automatically by the "libplist" library. Our test data binary plists store the raw ASCII values we entered and the XML plists store the base64 encoded values. Being text based, I can understand why XML encodes binary data as base64 (so its printable). But binary plists don't have the printable requirement so there's no base64 encoding/decoding step and the raw binary values are written directly to the binary plist file.

By having the raw hexadecimal values from the <data> fields in the DB, we can cut and paste these <data> fields into a hex editor to see if there's any printable characters ...



Binary rocky.plist's DNA data value

From the previous 2 pictures, we can see that the "DNA" value from our binary "rocky.plist" is actually UTF-8/ASCII for "rocky-dna".

One nifty feature of plist Editor Pro is that from the "List view" tab, you can double click on a binary value represented by a "..." and it opens the data in a hex editor window. This binary inspection would be handy when looking at proprietary encoded data fields (eg MS Office FileAlias values). Or we could just run our script as above and cut and paste any <data> fields to a hex editor ...

From our results above, we can also see that the "Red-Herring.txt" file was correctly ignored by the script and that a total of 66 fields were extracted from our binary and XML plists (as expected).

Now we have a database, we can start SQL querying it for values ...
As the "name" and "value" columns are currently defined as text types, limited sorting functionality is available.

Here are a few simple queries for our test data scenario. Because our test plists are not actual OS X / iOS plists, you'll have to use your imagination/your own test data to come up with other queries that you might find useful/practical. More info on forming SQLite queries is available here.

Find distinct "Aliases"
SELECT DISTINCT value FROM plists
WHERE name LIKE '%Alias%';


Find all the values from the "rocky-xml.plist"
SELECT * FROM plists
WHERE filename LIKE '%rocky-xml.plist';


Find/sort records based on "Weight" value
SELECT * FROM plists
WHERE name LIKE '%Weight' ORDER BY value;

Note: Sort is performed textually as the value column is TEXT.
So the results will be ordered like 125.0, 125.0, 2.5, 2.5, 53.5, 53.5, 65.7, 65.7.

Find/sort records by "Info Expiry Date" value
SELECT * FROM plists
WHERE name LIKE '%Info Expiry Date' ORDER BY value;

Note: This works as expected as the date text is an ISO formatted text string.

The script has been developed/tested on Ubuntu 12.04 LTS (64bit) with Python 3.4.0beta.
It was also tested (not shown) with OS X MS Office binary plists, a Time Machine binary backup plist and a cups.printer XML plist.

Additionally, the script has been run with the bullwinkle test data on Win7 Pro (64 bit) with Python 3.4.1 and on a Win 8.1 Enterprise Evaluation (64 bit) VM with Python 3.4.1

Final Words

The idea was to write a script that grabs as much plist data as it can and leave it to the analyst to formulate their own queries for finding the data they consider important.
The script also allowed monkey to sharpen his knowledge on how plists are structured and granted some valuable Python wrestling time (no, not like that!).
By re-using a bunch of existing Python libraries/code, the script didn't take much time (or lines of code) to put together.
The native Python "plistlib" also allows us to execute on any system installed with Python 3.4 (OS X, Windows, Linux) without having to install any 3rd party libraries/packages.
I have not been able to run/test it on a complete OS X system (or on iOS plist files) but in theory it *should* work (wink, wink). I am kinda curious to see how many plists/directories it can process and how long it takes. The bullwinkle test data took less than a second to execute.

Depending on what artefacts you're looking for, you can use the script as an application artefact browsing tool or by using the same queries on data from different sources, you could use it to detect known keywords/values (eg IP theft email addresses, app configuration). Or perhaps you have a bunch of application directories from an iOS device that you're curious about. Rather than having to inspect each plist individually, you can run this script once and snoop away.

The sorting could be made more comprehensive if each data type was extracted to it's own table (ie floats in one table, ints in another). However, given that sorting by time currently works already, that additional functionality might not be much use?

If anyone uses this script, I'd appreciate hearing any feedback either via the comments section below or via email. It'd be nice to know if this script is actually of some use and not just another theoretical tool LOL.


Thursday, 17 July 2014

Android Has Some Words With Monkey


Be advised ... Here thar be Squirrels!

The recent NIST Mobile Forensics Webcast and SANS FOR585 poster got monkey thinking about using the Android emulator for application artefact research. By using an emulator, we don't need to "root" an Android device in order to access artefacts from the protected data storage area (eg "/data/data/"). As an added bonus, the emulator comes as part of the FREE Android Software Development Kit (SDK). Hopefully, this post will help encourage further forensic research/scripts for Android based apps.
So now we just need a target app to investigate ... "Words With Friends" (WWF) is a popular scrabble type game with chat functionality. For this post, we'll be focusing on using an Android emulator to retrieve in-game chat artefacts and then create a script to parse them ("wwf-chat-parser.py"). It's a fairly long post so you might want to take that potty break now before we begin ;)


0. Installation / Setup

On an Ubuntu 12.04 LTS (32 bit) Virtual Machine (using VMware Player), I installed the following:
- Android SDK bundle including the "eclipse" IDE (from http://developer.android.com/sdk/index.html)
- dex2jar tool to convert .dex byte code into the .jar Java archive format (from https://code.google.com/p/dex2jar/)
- JD-GUI Java decompiler to display the source code from a .jar file (from http://jd.benow.ca/)

Android has an official SDK install guide here which also continues on here.
Installation was as simple as unzipping the downloaded archives and launching the relevant executable.
Lazy monkey just unzipped the archives to his home directory (ie "/home/cheeky/").
Here's a quick guide:
- Go to http://developer.android.com/sdk/index.html and download the 32 bit linux ADT bundle (includes both the eclipse IDE and Android SDK tools)
- Double click the zip file and use the Ubuntu Archive Manager to unzip the bundle to "/home/*username*" (eg unzips to "/home/cheeky/adt-bundle-linux-x86-20140702/")
- Use the Nautilus File Exporer GUI to navigate to the eclipse sub-directory (eg "/home/*username*/eclipse/")
- Double click on "eclipse" icon to launch it (or you could launch it from the command line via "/home/*username*/eclipse/eclipse")
- Go to the "Window" ... "Android SDK Manager" drop menu item and launch it. Some packages are installed by default but if you want to run an emulator with a specific/previous version of Android you need to download/install that specific SDK Platform (eg 4.2.2 SDK platform) and a corresponding Hardware System Image (eg ARM for a Nexus 7 tablet).
- Unzip the downloaded dex2jar zip file contents to "/home/*username*" (eg "/home/cheeky/dex2jar-0.0.9.15/")
- Unzip the downloaded jd-gui zip file to "/home/*username*" (eg "/home/cheeky/"). Note: we only need to extract the "jd-gui" exe.

Also installed was the Bless Hex Editor (via Ubuntu Software Center) and the Firefox SQLite Manager extension (via the Firefox Add-ons Manager).

To make things a bit easier, I also setup a soft link (ie alias) so we can just type "adb" without the preceding path info to launch the Android Debug Bridge.
cheeky@ubuntu:~$ sudo ln -s /home/cheeky/adt-bundle-linux-x86-20140702/sdk/platform-tools/adb /usr/bin/adb

1. Getting the .apk app install file

Android .apk install files are zip archives. You can download them from the GooglePlay store by using a Chrome plugin or via the apk-downloader website. For this experiment however, I wanted to test the specific version from my Nexus 7 tablet (WWF 7.1.4), so I decided to use the Android Debug Bridge (adb) method.
Excellent adb instructions are available from the official Android Dev site here.

To prepare my Nexus 7 (1st gen c.2012) for the .apk file transfer, I attached it to my PC via USB cable. I then enabled the tablet's "Developer options" from the "Settings" menu by tapping the "About tablet" ... "Build number" several times. Next, I went into "Developer options" and enabled the "USB Debugging" and "Stay awake" options.

From our Ubuntu VM we can now check for connected devices/emulators ...
cheeky@ubuntu:~$ adb devices
List of devices attached
*serialnumber_of_device*    device

cheeky@ubuntu:~$


Note: I have redacted the serial number of my Nexus 7. Just imagine a 16 digit hex value in place of *serialnumber_of_device" ...

So now we know that the adb has recognized our physical device, let's try connecting to it ...
cheeky@ubuntu:~$ adb -s *serialnumber_of_device* shell
shell@grouper:/


For squirrels and giggles, lets try to list the files in the protected "/data/data/" directory ...
127|shell@grouper:/ $ ls /data/data
opendir failed, Permission denied
1|shell@grouper:/ $

We also can't do a directory listing of "/data/app/" (where the .apk install files are located) ...
shell@grouper:/ $ ls /data/app
opendir failed, Permission denied
1|shell@grouper:/ $


Fortunately, we CAN list the installed 3rd party packages and associated .apk file by typing "pm list packages -f -3"
1|shell@grouper:/ $ pm list packages -f -3
package:/data/app/com.zynga.words-1.apk=com.zynga.words
package:/data/app/org.malwarebytes.antimalware-2.apk=org.malwarebytes.antimalware
package:/data/app/com.accuweather.android-1.apk=com.accuweather.android
package:/data/app/com.farproc.wifi.analyzer-2.apk=com.farproc.wifi.analyzer
package:/data/app/com.modaco.cameralauncher-2.apk=com.modaco.cameralauncher
package:/data/app/org.mozilla.firefox-2.apk=org.mozilla.firefox
package:/data/app/com.adobe.reader-2.apk=com.adobe.reader
package:/data/app/com.avast.android.mobilesecurity-2.apk=com.avast.android.mobilesecurity
package:/data/app/com.skype.raider-1.apk=com.skype.raider
package:/data/app/com.evernote-2.apk=com.evernote
package:/data/app/com.google.android.apps.translate-1.apk=com.google.android.apps.translate
package:/data/app/com.evernote.skitch-2.apk=com.evernote.skitch
shell@grouper:/ $


At this point, we type "exit" to logout from the physical device.
From the output of the "pm list packages -f -3" command, we know that the WWF .apk file is "/data/app/com.zynga.words-1.apk".
So we can use the "adb pull" command to copy it to our local Ubuntu VM.
cheeky@ubuntu:~$ adb -s *serialnumber_of_device* pull /data/app/com.zynga.words-1.apk wwf.apk
1252 KB/s (23648401 bytes in 18.442s)
cheeky@ubuntu:~$

The above command copies "/data/app/com.zynga.words-1.apk" to the current directory and names it as "wwf.apk".
I didn't feel like typing the whole long .apk filename each time (lazy monkey!) so just called it "wwf.apk".
Anyhow, a copy of the WWF apk is now stored as "/home/cheeky/wwf.apk".

2. Create/Launch emulator

Now we create an Android emulator and fire it up ...
The Android website has some detailed instructions about creating/running the emulator here, here and here.
Here's the quick version ...
- Assuming you still have the eclipse IDE open, go to the "Window" ... "Android Virtual Device (AVD) Manager" menu item and create a new device similar to the following:

Test emulator device specs
Note: Be sure to tick the "Use Host GPU" checkbox to improve emulator speed. It also helps to ensure your VM has plenty of RAM.
- Start the device emulator by selecting the "testtab" device AVD and clicking "start". Alternatively, you can launch the AVD Manager GUI from the command line instead of via eclipse ...
cheeky@ubuntu:~$ /home/cheeky/adt-bundle-linux-x86-20140702/sdk/tools/android avd

The emulator can take a minute to boot but eventually you should see something like this:

Emulator at startup

Now we can see if our emulator is recognized by typing "adb devices". Note: Our physical Nexus 7 device has been disconnected from the PC so it doesn't appear now.
cheeky@ubuntu:~$ adb devices
List of devices attached
emulator-5554    device

cheeky@ubuntu:~$


Update 2014-07-26:
To research Google product artefacts such as GoogleMaps and Hangouts, you can use an emulator with a Google APIs target set (eg "Google APIs - API Level 19") instead of an Android target as shown previously.

According to the official documentation, Google Play services can only be installed on an emulator with an AVD that runs a Google APIs platform based on Android 4.2.2 or higher. To be able to use a Google API in the emulator, you must also first install the target Google API system image (eg "Google APIs - API Level 19") from the SDK Manager.

3. Installing an .apk on the emulator

To install the "wwf.apk" we previously pulled off our Nexus 7 device, we type the following:
cheeky@ubuntu:~$ adb -s emulator-5554 install wwf.apk
2929 KB/s (23648401 bytes in 7.883s)
    pkg: /data/local/tmp/wwf.apk
Success
cheeky@ubuntu:~$


There should now be a WWF icon in the emulator's "App" screen which we can launch by clicking on it.


WWF is now installed!

We should now see a login screen for WWF where we can provide an email address and start playing games / chatting with others (ie create lots of juicy artefacts!).
By default, the emulator retains data between emulator launches so you shouldn't lose much/any app data if/when the emulator closes (eg after a crash).

4. Capture artefact data (via adb pull and DDMS)

For squirrels and giggles, let's connect to the emulator and see what our access privileges are (remembering that they were limited on the Nexus 7 device) ...
cheeky@ubuntu:~$ adb -s emulator-5554 shell
root@android:/ #


Now lets try viewing the "/data/data/" directory ...
root@android:/ # ls /data/data
com.android.backupconfirm
com.android.browser
com.android.calculator2
com.android.calendar
com.android.camera
com.android.certinstaller
com.android.contacts
com.android.customlocale2
com.android.defcontainer
com.android.deskclock
com.android.development
com.android.development_settings
com.android.dreams.basic
com.android.emulator.connectivity.test
com.android.emulator.gps.test
com.android.exchange
com.android.fallback
com.android.gallery
com.android.gesture.builder
com.android.htmlviewer
com.android.inputdevices
com.android.inputmethod.latin
com.android.inputmethod.pinyin
com.android.keychain
com.android.launcher
com.android.location.fused
com.android.mms
com.android.music
com.android.netspeed
com.android.packageinstaller
com.android.phone
com.android.protips
com.android.providers.applications
com.android.providers.calendar
com.android.providers.contacts
com.android.providers.downloads
com.android.providers.downloads.ui
com.android.providers.drm
com.android.providers.media
com.android.providers.settings
com.android.providers.telephony
com.android.providers.userdictionary
com.android.quicksearchbox
com.android.sdksetup
com.android.settings
com.android.sharedstoragebackup
com.android.smoketest
com.android.smoketest.tests
com.android.soundrecorder
com.android.speechrecorder
com.android.systemui
com.android.vpndialogs
com.android.wallpaper.livepicker
com.android.widgetpreview
com.example.android.apis
com.example.android.livecubes
com.example.android.softkeyboard
com.svox.pico
com.zynga.words
jp.co.omronsoft.openwnn
root@android:/ #


Note: We are logged into our emulator as "root" so that's why we can now see the contents of "/data/data/" :)
Let's double-check that WWF was installed OK ...

root@android:/ # pm list packages -f -3
package:/data/app/GestureBuilder.apk=com.android.gesture.builder
package:/data/app/SmokeTestApp.apk=com.android.smoketest
package:/data/app/SmokeTest.apk=com.android.smoketest.tests
package:/data/app/WidgetPreview.apk=com.android.widgetpreview
package:/data/app/ApiDemos.apk=com.example.android.apis
package:/data/app/CubeLiveWallpapers.apk=com.example.android.livecubes
package:/data/app/SoftKeyboard.apk=com.example.android.softkeyboard
package:/data/app/com.zynga.words-1.apk=com.zynga.words
root@android:/ #


Now we can search for WWF chat artefacts in "/data/data/" ...
Note: Because the Nexus 7 doesn't have a removable SD card, we don't have to worry about checking "/mnt/sdcard/" for app artefacts. But just keep it in mind for other devices which may support app data storage on the SD card.

Let's do a file listing of the WWF directory ...
root@android:/ # ls /data/data/com.zynga.words/                               
app_storage
cache
databases
files
lib
shared_prefs
root@android:/ #


Looking closer at the "databases" directory ...
root@android:/ # ls /data/data/com.zynga.words/databases/                     
WordsFramework
WordsFramework-journal
cookiedb
cookiedb-journal
google_analytics_v2.db
google_analytics_v2.db-journal
mobileads.sqlite
mobileads.sqlite-journal
msc_profiles.db
msc_profiles.db-journal
webview.db
webview.db-journal
webviewCookiesChromium.db
webviewCookiesChromiumPrivate.db
wwf.sqlite
wwf.sqlite-journal
root@android:/ #


Hmmm ... it looks like this could contain some interesting info.

Update 2014-07-26:
By using the "lsof" command on the emulator whilst our target app is running, we can see a list of currently open files .
This can then be used to locate application artefact files (eg databases). For example, we can type:
lsof | grep com.zynga.wordswhich should also lead us to various files open in "/data/data/com.zynga.words/databases/".

We type "exit" to logout for now.
Now we can "pull" these files of interest from the emulator for further analysis. For example, I'm now going to skip ahead and pull the file where I eventually found the WWF chat artefacts ...
cheeky@ubuntu:~$ adb -s emulator-5554 pull /data/data/com.zynga.words/databases/WordsFramework
1111 KB/s (159744 bytes in 0.140s)
cheeky@ubuntu:~$


Alternatively, we can use the eclipse IDE and the Dalvik Debug Monitor Server (DDMS) tool to "pull" files. If our emulator is running (first, you better go catch it!), DDMS should connect to it automagically.
To launch the DDMS tool, use the following eclipse drop down menu - "Window" ... "Open Perspective" ... "DDMS"
The DDMS tool allows us to a bunch of cool things such as:
- pull/push files to the emulator
- dump process heap memory
- spoof phone calls (logs connections only / not capable of voice transmission/reception)
- send SMS text messages to the emulator
- set the GPS location of the phone
More information on DDMS is available here.

OK you should see the emulator on the LHS under "Devices" and the "File Explorer" tab on the RHS.

Dalvik Debug Monitor Server running in eclipse


Under the "File Explorer" Tab, browse to "/data/data/com.zynga.words/databases"
Then select the "WordsFramework" file and click the floppy disk icon to "pull" the file onto your Ubuntu box.

For squirrels and giggles, we can also dump the WWF process heap memory and later search it for interesting strings.
To do this, on the LHS, select the "com.zynga.words" process, toggle the "Update Heap" button (making DDMS continuously monitor the heap) and then click on the "Dump HPROF file" button (looks like cylinder with red arrow pointing down).

Dumping the process heap memory via DDMS


Next select the "Leak Suspects Report" and wait ... FYI we're not interested in the report as much as the accompanying HPROF dump.
After a while, the title bar changes to reflect that the .hprof is stored with a ridiculously long numeric filename in "/tmp/".

Obtaining the HPROF file

We can now run "strings" against this file and review ...
For example, I checked the validity of the word "twerk" using the emulator's WWF "Word Check" and then dumped the process heap. I then ran the "strings" command and piped the output to a separate text file ("8058l.txt") for easier analysis/searching.
cheeky@ubuntu:~$ strings --encoding=l /tmp/android1924397721207258058.hprof > 8058l.txt
From the text file output, I was able to observe a bunch of little endian UTF-16 strings (possibly a dictionary update) contained in memory. Opening the .hprof file separately in a hex editor also confirms this.

Viewing the HPROF in a Hex Editor shows something squirrelly ...

Entering some selected words from this list into the WWF "Word Check" confirmed that they are acceptable words.
I think we just found us a squirrel! Who's a good monkey eh? Purr ...

DDMS also has a "Network Statistics" tab view available but this was not working with my emulator. It apparently can be hit and miss according to this Google Android Issue Tracker notice.
Anyway, there's more than one way to get network stats ...
First, we login to the emulator (without WWF running) and take a baseline list of network connections via "netstat"
cheeky@ubuntu:~$ adb -s emulator-5554 shell
root@android:/ #
root@android:/ # netstat
Proto Recv-Q Send-Q Local Address          Foreign Address        State
 tcp       0      0 127.0.0.1:5037         0.0.0.0:*              LISTEN
 tcp       0      0 0.0.0.0:5555           0.0.0.0:*              LISTEN
 tcp       0      0 10.0.2.15:5555         10.0.2.2:52782         ESTABLISHED
tcp6       0      1 ::ffff:10.0.2.15:39776 ::ffff:74.125.237.134:80 CLOSE_WAIT
root@android:/ # 


According to our emulator's "Settings" ... "About Phone" ... "Status" our emulator's IP address is 10.0.2.15 which we can see under the "Local Address" columns.
Now we launch WWF and then take another "netstat" ...
root@android:/ # netstat                                                      
Proto Recv-Q Send-Q Local Address          Foreign Address        State
 tcp       0      0 127.0.0.1:5037         0.0.0.0:*              LISTEN
 tcp       0      0 0.0.0.0:5555           0.0.0.0:*              LISTEN
 tcp       0      0 10.0.2.15:45370        74.125.237.153:80      ESTABLISHED
 tcp       0      0 10.0.2.15:38258        162.217.102.145:80     ESTABLISHED
 tcp       0      0 10.0.2.15:38261        162.217.102.145:80     ESTABLISHED
 tcp       0      0 10.0.2.15:37321        117.18.237.203:80      ESTABLISHED
 tcp       0      0 10.0.2.15:45369        74.125.237.153:80      ESTABLISHED
 tcp       0      0 10.0.2.15:58077        75.101.157.204:80      ESTABLISHED
 tcp       0      0 10.0.2.15:45377        74.125.237.153:80      ESTABLISHED
 tcp       0      0 10.0.2.15:52200        74.125.237.155:80      ESTABLISHED
 tcp       0      0 10.0.2.15:58075        75.101.157.204:80      ESTABLISHED
 tcp     256      0 10.0.2.15:5555         10.0.2.2:52782         ESTABLISHED
 udp       0      0 0.0.0.0:43392          0.0.0.0:*              CLOSE
tcp6       0      0 ::ffff:10.0.2.15:50755 ::ffff:184.75.170.203:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:42147 ::ffff:54.200.105.113:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:57205 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:60082 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:37329 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:32939 ::ffff:184.75.170.203:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:38420 ::ffff:74.114.12.129:80 TIME_WAIT
tcp6       0      1 ::ffff:10.0.2.15:39776 ::ffff:74.125.237.134:80 CLOSE_WAIT
tcp6       0      0 ::ffff:10.0.2.15:56398 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:53006 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:55032 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:35837 ::ffff:184.75.170.203:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:57751 ::ffff:54.200.105.113:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:57661 ::ffff:184.75.170.203:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:41296 ::ffff:74.114.12.129:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:42812 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:42356 ::ffff:74.114.12.129:80 TIME_WAIT
root@android:/ # 


Lastly, we start an in-game chat, send a message and take a "netstat" ...
root@android:/ # netstat                                                      
Proto Recv-Q Send-Q Local Address          Foreign Address        State
 tcp       0      0 127.0.0.1:5037         0.0.0.0:*              LISTEN
 tcp       0      0 0.0.0.0:5555           0.0.0.0:*              LISTEN
 tcp       0      0 10.0.2.15:45370        74.125.237.153:80      ESTABLISHED
 tcp       0      0 10.0.2.15:38258        162.217.102.145:80     ESTABLISHED
 tcp       0      0 10.0.2.15:47694        162.217.102.126:80     ESTABLISHED
 tcp       0      0 10.0.2.15:53239        106.10.198.32:80       TIME_WAIT
 tcp       0      0 10.0.2.15:45369        74.125.237.153:80      ESTABLISHED
 tcp       0      0 10.0.2.15:37868        74.125.237.218:80      ESTABLISHED
 tcp       0      0 10.0.2.15:58077        75.101.157.204:80      TIME_WAIT
 tcp       0      0 10.0.2.15:45377        74.125.237.153:80      ESTABLISHED
 tcp       0      0 10.0.2.15:37867        74.125.237.218:80      ESTABLISHED
 tcp       0      0 10.0.2.15:52200        74.125.237.155:80      ESTABLISHED
 tcp       0      0 10.0.2.15:45211        31.13.70.17:80         ESTABLISHED
 tcp       0      0 10.0.2.15:51946        50.97.236.98:80        ESTABLISHED
 tcp       0      0 10.0.2.15:58075        75.101.157.204:80      ESTABLISHED
 tcp     103      0 10.0.2.15:5555         10.0.2.2:52782         ESTABLISHED
 tcp       0      0 10.0.2.15:36626        125.56.204.112:80      ESTABLISHED
 udp       0      0 0.0.0.0:43392          0.0.0.0:*              CLOSE
tcp6       0      0 ::ffff:10.0.2.15:42147 ::ffff:54.200.105.113:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:57205 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:37329 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:47440 ::ffff:54.186.69.248:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:38420 ::ffff:74.114.12.129:80 TIME_WAIT
tcp6       0      1 ::ffff:10.0.2.15:39776 ::ffff:74.125.237.134:80 CLOSE_WAIT
tcp6       0      0 ::ffff:10.0.2.15:56398 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:53006 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:45434 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:55032 ::ffff:74.114.12.129:443 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:58791 ::ffff:54.186.69.248:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:57751 ::ffff:54.200.105.113:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:41296 ::ffff:74.114.12.129:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:42356 ::ffff:74.114.12.129:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:53812 ::ffff:54.187.30.152:80 TIME_WAIT
tcp6       0      0 ::ffff:10.0.2.15:58248 ::ffff:74.114.12.129:443 TIME_WAIT
root@android:/ #


Obviously ESTABLISHED means theres a connection between our local host 10.0.2.15 and a remote host.
TIME_WAIT means there WAS a connection but it is in the process of being closed. Thus you can also sometimes see previous TCP connections.
These netstat listings are snapshots at a particular time so you have to be quick and/or take several samples.

Now we can do a "whois" lookup on these IP addresses and find out a bit more info. Admittedly, these connections could be from ANY process on the emulator but as WWF is the only installed app, it's likely that it's the process responsible for these communications.
Without getting into too much detail, some of these remote IP addresses resolved back to companies such as Google, Voxel.net / Twitter, Yahoo, Amazon, Facebook, Softlayer Technologies and Akamai.

You could also fire up Wireshark or tcpdump and capture packets for analysis but depending on your jurisdiction this may be illegal (ie intercepting communications without consent). Anyhow, monkey won't be doing that - his delicate simian features wouldn't survive prison!

To use the DDMS Call / SMS / GPS functionality ...
Select "Window" ... "Show View" ... "Other" ... and then start typing "Emulator Control".
This should bring up a new tab where you can send SMS / simulate a voice call / set the emulator's GPS location.
Making voice call logs and sending SMS worked well with my emulator. However, I have not been able to verify that the GPS functionality works. Launching Googlemaps in the emulator's web browser and clicking on the crosshairs icon results in Googlemaps reporting "Your location could not be determined" ... *shrug*
FYI A couple of times, using the emulator control functionality also made the emulator laggy / caused a crash.

DDMS also allows users to grab screenshots from the emulator via the camera button.
However, we can also do screenshots from our emulator window via the Alt-PrintScreen button combo in a VMware environment.

5. Viewing the source code

- Open the .apk file using Ubuntu Archive Manager (or similar unzip app)
- Extract "classes.dex" (to minimize confusion I extracted it as "words-classes.dex" to "/home/cheeky/")
- Run "d2j-dex2jar.jar" from the install directory (eg "/home/cheeky/dex2jar-0.0.9.15/")
cheeky@ubuntu:~/dex2jar-0.0.9.15$ ./d2j-dex2jar.sh /home/cheeky/words-classes.dex
dex2jar /home/cheeky/words-classes.dex -> words-classes-dex2jar.jar
cheeky@ubuntu:~/dex2jar-0.0.9.15$


There will now be a "words-classes-dex2jar.jar" in the current direcotry (eg "/home/cheeky/dex2jar-0.0.9.15/")
- Run JD-GUI (either from command line or by double-clicking the extracted exe) and open the "words-classes-dex2jar.jar" file.
Be aware that not all of the code may be easily understandable due to obfuscation (eg class "a" has non-descriptive method/function names such as "a", "b" etc)
Anyhoo, now you can have fun basking in all things Java ... smells wonderful huh? ;)

6. Other .apk activities

We can use the "aapt" SDK tool to determine the Android permissions for the "wwf.apk" we pulled earlier ...
Note: the "aapt" tool is installed by default with the bundle and located in the latest android version sub-directory (android-4.4W in our case).
cheeky@ubuntu:~$ /home/cheeky/adt-bundle-linux-x86-20140702/sdk/build-tools/android-4.4W/aapt dump permissions wwf.apk
package: com.zynga.words
uses-permission: android.permission.INTERNET
uses-permission: android.permission.ACCESS_WIFI_STATE
uses-permission: android.permission.ACCESS_NETWORK_STATE
uses-permission: android.permission.READ_PHONE_STATE
uses-permission: android.permission.READ_CONTACTS
uses-permission: android.permission.SEND_SMS
uses-permission: android.permission.WRITE_EXTERNAL_STORAGE
uses-permission: android.permission.VIBRATE
uses-permission: android.permission.RECEIVE_BOOT_COMPLETED
uses-permission: com.android.vending.BILLING
uses-permission: com.android.launcher.permission.INSTALL_SHORTCUT
uses-permission: com.android.launcher.permission.UNINSTALL_SHORTCUT
uses-permission: android.permission.GET_ACCOUNTS
uses-permission: android.permission.AUTHENTICATE_ACCOUNTS
uses-permission: android.permission.USE_CREDENTIALS
uses-permission: android.permission.NFC
uses-permission: android.permission.ACCESS_FINE_LOCATION
uses-permission: com.google.android.c2dm.permission.RECEIVE
permission: com.zynga.words.permission.C2D_MESSAGE
uses-permission: com.zynga.words.permission.C2D_MESSAGE
uses-permission: com.sec.android.provider.badge.permission.READ
uses-permission: com.sec.android.provider.badge.permission.WRITE
cheeky@ubuntu:~$

For a listing of possible Android permissions, see here.
For more details on the "aapt" tool, see here.

It can also be handy to explore what other files are included in an .apk.
For example, the "res" directory holds resources such as pics, sounds.
See here and here for further details on the apk archive structure and the apk building process.

Opening the "wwf.apk" file using Ubuntu Archive Manager, we also note that under the "res/raw/" directory there exists the "dict" file - that sounds pretty squirrelly eh?
Unfortunately, opening it in a Hex editor shows that it's encoded somehow :'(
So no free WWF word list for you! Hey, it was worth a shot ...

7. Creating a Chat Extraction script

OK we're both losing the will to go on, so I'll start finishing up by mentioning the WWF chat artefacts and describing the accompanying extraction script.

Where are the chat artefacts stored?
Under "/data/data/com.zynga.words/databases/" there is a "WordsFramework" file.

I discovered this by "pulling" all the files from the "databases" directory and looking at them using the Firefox SQLite Manager or the Bless hex editor.
We can also use the Linux "file" command to figure out what type of file it is.
cheeky@ubuntu:~$ file WordsFramework
wwf/WordsFramework: SQLite 3.x database, user version 220
cheeky@ubuntu:~$


Opening it up in Firefox SQLite Manager, we can see that theres 2 tables of interest - "users" and "chat_messages"
Here's a diagram showing how the 2 tables go together ... well, it's been abbreviated down to what monkey considers the important fields anyway ...

WWF chat schema

From our previous adventures in Python SQLite (eg Facebook Messenger post), we know how to query/extract this kind of stuff. Here's the query that gets us the chat artefacts ...
SELECT chat.chat_message_id, chat.game_id, chat.created_at, users.name, chat.message, chat.user_id, users.email_address, users.phone_number, users.facebook_id, users.facebook_name, users.zynga_account_id
FROM chat_messages as chat, users
WHERE users.user_id = chat.user_id ORDER BY chat.created_at;


The script ("wwf-chat-parser.py") opens the specified "WordsFramework" file, runs the above query and prints out the chat messages in chronological order.
If there's multiple game conversations going on, the analyst can (manually) use the "game_id" to filter out conversations from the TSV output.

It has been developed/tested for Python 2.7.3 on a 32-bit Ubuntu 12.04 LTS VM.
It can be downloaded from Github here.

Making the script executable (via "sudo chmod a+x wwf-char-parser.py") and running it with no arguments shows the help text.
cheeky@ubuntu:~/wwf$ ./wwf-chat-parser.py
Running wwf-chat-parser v2014-07-11
Usage: wwf-chat-parser.py -d wordsframework_db -o chat_output.tsv

Options:
  -h, --help    show this help message and exit
  -d FRAMEWKDB  WordsFramework database input file
  -o OUTPUTTSV  Chat output in Tab Separated format
cheeky@ubuntu:~/wwf$


Here's what the command line output looks like when using the "WordsFramework" file from our emulated chat ...
cheeky@ubuntu:~/wwf$ ./wwf-chat-parser.py -d /home/cheeky/WordsFramework -o wwf-output.tsv
Running wwf-chat-parser v2014-07-11

Extracted 9 chat records

cheeky@ubuntu:~/wwf$


With that many columns returned by the query, outputting to the command line just looked too crappy/confusing.
So instead, the script creates a Tab Separated (TSV) file for the output instead.
Here's what the TSV output file would look like if imported into an LibreOffice Calc spreadsheet ...

TSV output of "wwf-chat-parser.py"

Some comments/observations:
- Data is fictional and is included just to illustrate which fields are extracted. Any id numbers and names have been changed to protect the Simians.
- The "created_at" times appear to be referenced to GMT (Don't trust me ... verify it for yourself).
- The "zynga_account" and "email_address" fields are only populated for the device owner (ie "emulator-monkey"). The opponent's corresponding details aren't populated.
- The "phone_number" field does not appear to be populated at all (FYI the emulator's "Settings" ... "About phone" ... "My phone number" is 1-555-521-5554).
- The highlighted rows 5-6 are from a chat between "emulator-monkey" and "3rd-party-monkey" and show how it is possible to use the "game_id" value to determine conversation threads.
- Monkey does not use Facebook so the Facebook ID and name are included for completeness but not tested. It is apparently possible to use your Facebook login to login in to WWF.
- This script relies on allocated chats (ie chats from active games). There might be chat strings still present in the "WordsFramework" database from previous completed/expired games but I haven't had time to research this area. Running "strings" on the WordsFramework file and/or opening it in a Hex editor might help in that regard.

8. Resources

Here are some resources that I found useful while researching for this post ...
The Official Android SDK documentation (eg how to install the SDK, run the emulator, install the app etc.)
http://developer.android.com/sdk/index.html

Cindy Murphy's (@CindyMurph) webcast/slides on reversing Android malware
https://www.sans.org/webcasts/mobile-malware-spyware-working-bugs-97790
http://www.nist.gov/forensics/upload/8-Murphy-NIST-Mobile-Malware-normal.pdf


Pau Oliva Fora's (@pof) RSA presentation on reverse engineering Android Apps
http://www.rsaconference.com/writable/presentations/file_upload/stu-w02b-beginners-guide-to-reverse-engineering-android-apps.pdf

Thomas Cannon's (@thomas_cannon) blog post on reverse engineering Android
http://thomascannon.net/projects/android-reversing/
and his "Gaining access to Android" presentation from DEFCON 20
http://www.youtube.com/watch?v=MOYqgIhQ3y0

Lee Reiber's (@Celldet) Forensic Focus webcast on malware detection
http://www.forensicfocus.com/c/aid=57/webinars/2013/mobile-forensics-mpe-android-malware-detection/

9. Final Words

We have been able to use the increased privileges of the Android emulator to uncover and harvest Android application artefacts.
This can be used by researchers to develop forensic extraction scripts without requiring actual rooted physical devices (or expensive commercial forensic tools).

Hopefully, this post will help to address the mobile device "app gap" that currently exists between commercial forensic tools and the sheer number of apps available on the GooglePlay market.
But if not, at least we got to chase some squirrels and learn some new things about WWF chats.

Some initial research shows that Microsoft's Windows Phone emulator cannot currently be used in a similar manner because the MS emulator does not allow you to load Windows marketplace apps onto it. It seems like it's mainly for testing apps that you write yourself. It has trust issues apparently.
As for iOS, monkey doesn't have access to OS X or any Apple devices (shocking!) so can't say whether the iOS emulator supports loading apps from the store and/or is "jailbroken" by default.

OK due to time/space constraints and a tired monkey ("What do you mean Red Bull doesn't come in Banana flavour?!"), we're gonna stop here ... There are probably a few squirrels that escaped but at this point, something is better than nothing eh?

Friday, 13 June 2014

Monkeying around with Windows Phone 8.0

Ah, the wonders of Windows Phone 8.0 ... Failing eyesight, Frustration and Squirrel chasing

Currently, there is not much freely available documentation on how Windows Phone 8.0 stores data so it is hoped that the information provided in this post can be used as a stepping stone for further research / possible scripting. Hopefully, analysts will also be able to use this post to help validate any future tool results.

Special Thanks to Detective Cindy Murphy (@CindyMurph), Lieutenant Jennifer Krueger Favour (@rednogn) and the Madison Police Department ("Forensicate Like A Champion!") for providing the opportunity and encouragement for this research.
Unfortunately, due to time contraints and a limited test data set, I wasn't able to write an all-singing/all-dancing script. Instead, some one-off scripts were created to extract/sort the relevant data a lot quicker than it would have taken to do manually. Rather than releasing scripts that are customized for a limited set of test data (which I don't have easy access to any more) - this post will be limited to documenting the data sources/structures.

OK, so no free tool and you're still here reading huh? In Yoda voice: "The nerd runs strong in this one" ;)

Thanks to Maggie Gaffney from Massachusetts State Patrol / Teel Technologies, the initial test data (.bin file) was sourced via JTAG from a Nokia 520 Windows 8.0 phone - a "cheap" smart phone common to prepaid plans. The .bin file was then opened in X-Ways Forensics to parse the 28(!) file system partitions and to export out files of interest. The exported files were then viewed in hex view using Cellebrite Physical Analyzer (love the data interpretation and colour coded bookmarking!). Later, we were also able to get our paws on some test data from a HTC PM23300 Windows Phone 8.0 phone courtesy of JoAnn Gibb from the Ohio Attorney Generals Office. UPDATE: Thanks also to Brian McGarry (Garda) for his testing feedback and help with the SMS and Call Logs. It's awesome knowing people that know people!

Note: The Nokia 520 does not display the full SMS timestamp info (threaded messages display date only).
So while we can potentially re-create the order of threaded messages as per the test phone, we can't easily validate the exact time an SMS message was sent/received. There's a good chance that other Windows Phone 8.0 phones will use the same timestamp mechanism and hopefully they will display the full timestamp.

So where's the data?!

The SMS content, MMS file attachment info and Contacts information are stored (via the 28th Partition) in:

\Users\WPCOMMSSERVICES\APPDATA\Local\Unistore\store.vol

Various .dat files containing MMS content are also stored in sub-directories of:

\SharedData\Comms\Unistore\data

The Call log is stored in:

\Users\WPCOMMSSERVICES\APPDATA\Local\UserData\Phone

The "store.vol" and "Phone" files seem to be ESE Databases (see explanantions here and here) with the magic number of "xEF xCD xAB x89" present at bytes 4-8. Consequently, we tried opening "store.vol" using Nirsoft's ESE Database viewer but had limited success - the SMS message texts were not viewable however other data was. This suggests that maybe the "store.vol" file differs in some way from the ESE specification and/or the tool had issues reading the file.
Joachim Metz has also both documented (here  and here) and written a C library "libesedb" to extract ESE databases. Unfortunately, I didn't discover Joachim's library until after we started poking around .. Anyway, it was a pretty masochistic interesting exercise trying to reverse engineer the "store.vol" file. One possible benefit of this data diving is that it *might* also reveal unallocated/partially overwritten data records that might be ignored by libraries which read the amount of data declared (vs reading all the data present). This is pure speculation though as I don't know if old records are overwritten or just marked as invalid.

Viewing "store.vol" using Cellebrite Physical Analyzer, relevant data was observed for text strings (eg phone numbers, SMS text strings) encoded in UTF-16 LE throughout the file.
As a database file there will be tables. Each table will have columns of values (eg time, text content, flags). A single (table row) record will thus have data stored for each column.
Table data will be organized within the file somehow (eg multiple SMS records organized into page blocks). So it is likely that finding a hit for a specific SMS will lead you to the contents of other SMS messages (potentially around the same timeframe).

The Nokia 520 was actually locked with a 4 digit PIN when we started investigating. Without access to the phone, any manual inspection/validation would have been impossible. It was unknown if the phone would have been wiped if too many incorrect PINs were entered. So any guesses would have to be documented and carefully chosen. It wasn't looking good ... until a combination of thinking outside the box and a touch of luck lead us to an SMS text message (in "store.vol") with the required 4 digit code. Open sesame!

Some things we tried with the data ...

To find specific SMS records we searched for unique/known strings from the SMS text (eg "Look! A Squirrel!"). A single record was found per SMS in "store.vol" and each record also contained a UTF-16-LE string set to "SMStext".

To find contact information, we searched for known phone number strings (eg +16085551234, 123456, 1234). Some numbers were observed in "store.vol" in close proximity to "SMStext" strings while other instances were located close to what appeared to be contact information (eg contact names).

To search for field markers and flags, we compared separate SMS text records and looked for patterns/commonalities in the hex. Sometimes the pattern was obvious (eg "SMStext" occurs in each SMS message) and sometimes it wasn't so obvious (sometimes there is no discernible pattern!).

Figuring out the timestamp format being used was HUGE. Without it, we could not have figured out the order messages were sent/received. Using Cellebrite Physical Analyzer to view the "store.vol" hex, Eagle-eyed Cindy noticed that there were 8 byte groupings occurring before/after the SMS text content. These 8 bytes were usually around the same value range (eg in LE xFF03D2315FE1C701). Which is what you'd expect within a single message. Subsequent messages usually had larger values - which corresponds to messages sent/received at a later time.
Like most hex viewers, Cellebrite Physical Analyzer can interpret a predefined number of bytes from the current cursor position and print a human friendly version. Using this, Calculon Cindy showed an otherwise oblivious monkey that these 8 byte groupings could be interpreted as MS FILETIME timestamps! To be honest, I was expecting smaller 4 byte timestamps - Silly monkey!
By comparing the 8 byte values surrounding a specific SMS text message (eg "Look! A Squirrel!") with the date displayed on the phone for that message, we theorized that our mysterious timestamps were *probably* MS FILETIME timestamps (No. of 100 ns increments since 1 January 1601 in UTC). For example, xFF03D2315FE1C701 = Sat, 18 August 2007 06:15:37 UTC. As the phone did not display the exact time for each SMS, we could only use the order of threaded messages and the date displayed to somewhat confirm our theory. Various SMS sent/received dates on the phone were spot checked against a corresponding "store.vol" entry timestamp date and the date values consistently matched.
UPDATE: FTK 5.4 can also be used to view the database tables in the store.vol and Phone files. Thanks to JoAnn for the tip!

What the data looks like

After some hex ray vision induced cross-eyedness (who knew that looking at hex is almost like a curse!), we think we've figured out some general data structures for SMS, MMS, Contacts and Call log records. There's still some unknowns/grey areas but it's a start.

- On the data structure diagrams below, "?" is used to denote varying/unknown number of bytes.
- FILETIMEs are LE 8 byte integers representing the number of 100 ns intervals since 1 JAN 1601.
- In general, strings are null terminated and UTF-16-LE encoded (ie 2 bytes per character).

Sent / Received SMS records

There are two types of SMS data structures which are mixed together. Each type of SMS structure contains a UTF-16-LE encoded string for "IPM.SMStext". However, one type contains phone number strings and the other does not.
For later ease of understanding, we'll say these "SMStext" records occur in "Area 1". UPDATE: Area 1 corresponds to the "Message" table.
Initially, monkey was confused about why some SMS records had phone numbers and some didn't. However, by inspecting the unlocked phone, we were able to confirm that the SMS message records with no number corresponded to sent SMS.

Sent "SMStext" record (from Area 1 in "store.vol")

Note 1: Note the lack of Phone number information. From test data, FILETIME values (in red and pink) seemed a little inconsistent. Sometimes FILETIMEs within the same record matched each other and other times they varied by seconds/minutes.
Note 2: The Sent Text string (in yellow) is null terminated and encoded in UTF-16-LE.


Received "SMStext" record (from Area 1 in "store.vol")


Note 1: Received SMS have multiple source phone number strings listed (in orange). These seem to remain constant within a given record (eg PHONE0 = PHONE1 = PHONE2 = PHONE3)
Note 2: Similar to Sent "SMStext" records, the FILETIMEs (in red and pink) within a record might/might not vary.
Note 3: The Received Text string (in yellow) is null terminated and encoded in UTF-16-LE.

To find out the destination phone number for a sent SMS we can make use of the factoid observed by searching "store.vol" for the FILETIMEs from a specific Sent "SMStext" record.
It appears that FILETIMEs 1, 3 & 4 (in pink) from a given Sent "SMStext" record usually occur once in the entire "store.vol". The FILETIME2 value (in red) however, also appears in a second area ("Area 2"). UPDATE: Area 2 corresponds to the "Recipient" table. This area has a bunch of different looking data records each containing the null terminated UTF-16-LE encoded string for "SMS". Also contained in each data record is a phone number string. The "Area 2" SMS records look like:

"SMS" record (from Area 2 in "store.vol")


Note 1: Each "SMS" record contains a UTF-16-LE encoded string for "SMS".
Note 2: From both sets of test data, there seems to be a consistent number of bytes between:
- The FILETIMEX (in red) and "SMS" string (in kermit green) and
- The "SMS" string (in kermit green) and the Phone number string (in orange).

So, each sent "SMStext" FILETIME2 value (from Area 1) should have a corresponding match with an "SMS" record's FILETIMEX value (in Area 2). In this way, we can match a sent "SMStext" message with the destination phone number via the FILETIME2 value. Sounds a little crazy right? But the test data seems to confirm this. Purrr!

Contacts

Contact information is also located in "store.vol". UPDATE: This area corresponds to the "Contact" table. There were 2 main observed data structure types - both contained phone number and name information however, one data type had an extra 19 digit number string. It was later discovered via phone inspection that the records with the extra digit strings corresponded with "Hotmail" address book entries. It would be interesting to see if the 19 digit number corresponded to a unique hotmail user ID of some kind.
The second type of contacts structure was a "Phonebook" entry - presumably these contact types were entered into the phone by the user rather than slurped up from a Hotmail account.
Common to both contact records were multiple occurrences of the same contact name and phone number. OCD phonebook, OCD phonebook, OCD phone book ... ;)

"Hotmail" Contacts record (from "store.vol")

"Phonebook" Contacts record (from "store.vol")

Note 1: The flag value (in red) which can be used to determine if the contact record is a "Hotmail" or "Phonebook" entry.
Note 2: The potential 6 byte magic number (0xFFFFFF2A2A00) for Contact records should make it easier to find each entry. This was discovered by Sharp-eyed Cindy on the last day (by which time monkey had lost the will to live).
Note 3: There is also an "End Marker" which has the following value in hex: [01 04 00 00 00 82 00 E0 00 74 C5 B7 10 1A 82 E0 08]. This value lead to a couple of extra contact records which did not have the 6 byte magic number at the beginning.
Note 4: The 19 digit string (in pink) could be a potential Hotmail ID.

UPDATE: Since this was originally written, new Contact test data has been observed. These have slightly different record structures but all records seem to have the same "End Marker" and the last 3 Unicode string fields. The last and 3rd last strings can thus be extracted for name/phone (and possibly email) information.

MMS data

Further research is required for MMS records (eg linking timestamps and phone numbers to sent files). But here's what we've learned so far ...
Various .dat files containing MMS content (eg there was a .dat file containing a sent JPEG and another .dat file containing the accompanying text) are stored in:

\SharedData\Comms\Unistore\data

under 3 sub-directories: "0", "2" and "7". These folders might correspond to Sent, Received and Draft???
There were multiple .dat files with similar names each seemingly containing info for different parts of the same MMS.

In "store.vol", there are records containing the UTF-16-LE encoded string for "MMS". These records also contain 3 filename strings and a filetype string (possibly the MIME type eg "image/jpeg"). From my jet-lagged memory, I want to say that the filename strings were pointing to the same filename and there were multiple "MMS" entries for a single MMS message (ie each MMS message has three separate files associated with it). But you should probably should check it out for yourself ...
UPDATE: These MMS records correspond to the "Attachment" table.

MMS record (from "store.vol")

Call log

The Call log information is located in the "Phone" file. Each Call log record contains a flag (in blue) to mark whether a call record is Missed / Incoming / Outgoing. The flag values were confirmed via inspection of the phone and corresponding Call log record. There's also Start and Stop FILETIMEs, repeated contact names and repeated phone numbers.
Of potential interest is a 10 digit ASCII encoded string (in grey) and what looks to be a GUID (in light purple). Each call record had the same GUID string value enclosed by "{}".
UPDATE: The GUID appears to be consistent between 3 phones (2 x Nokia Lumia 520 and HTC PM23300). The ASCII ID string has also been observed to be greater/less than 10 digits.

Revised Call log diag (from "Phone")


Summary

So there you have it - we started off knowing very little about Windows Phone 8.0 data storage and now we know a considerable amount more especially regarding SMS records.
Due to time constraints, it was not possible to investigate the non-SMS related data areas (ie MMS, Call log, Contacts) with the same level of detail. However, it's probably better to share what we've discovered now as I don't know when I'll be able to perform further research.
The observations in this post may not be consistent for Windows 8.1 and/or on other models of Windows phones but hopefully this post can still serve as a starting point. As always, check that the underlying data matches your expectations!

It was really awesome having someone else to bounce ideas off when hex-diving. I'm pretty sure I would have missed some important details (eg the FILETIME timestamp) had it not been for another set of eyes. Of course, that's not always going to be possible so I also appreciated the other opportunities to work automonously / with minimal supervision. Someday monkey might have to do this on his lonesome! :o
Initially, it was easy to tie my idea of success with the "I have to code a solution for every scenario/data set". It would have been awesome if I could have done that but the fact was - we didn't have any SMS messages from "store.vol" at the start and after running the one-off SMS script, we had 5000+ messages sorted in chronological order with their associated phone numbers. Success doesn't have to be black and white. It sounds cliche but focusing on little wins each day made it easier to start eating the metaphorical elephant. Now please excuse me, while I adjust my pants ...


Thursday, 23 January 2014

Android SMS script update and a bit of light housekeeping

Knock, Knock ...

During recent research into Android SQLite databases (eg sms), Mari DeGrazia discovered a bug in the sms-grep.pl script.
Mari's test data was from a Samsung Galaxy S II. It turns out the script wasn't handling Cell Header "Serial Type" values of 8 or 9.
These Cell Header values are respectively used to represent "0" and "1" integer constants and eliminate the need for a corresponding 0x0/0x1 byte value in the Cell Data field section.
So this meant that some fields were being interpreted as "0" when they were actually set to "1". DOH!

The previous Android test data I used did not utilize these particular cell header values which is why it escaped my monkey like attention to detail. Banana? Where?!

Anyway, there's an updated version of the sms-grep.pl script available from GitHub here.

Pictures speak louder than words so lets look at a simplified example of an SQLite cell record:

SQLite Cell Record Structure


From the diagram above, we can see the usual SQLite record format. A Cell Size, Rowid and Cell Header Size followed by the rest of the Cell Header and the Cell Data sections.
Notice how HeaderField-B = 0x8? This means there will be no corresponding value written in the Cell Data section (ie there is no DataField-B).
When read, the extracted value of DataField-B will be set (to 0) based on the HeaderField-B type (0x8).
Alternatively, if the HeaderField-B type value was 0x9, the extracted value of DataField-B would be set to 1.
Simples!

Additionally, since the previous sms-grep.pl post here - both Mari and I have used sms-grep.pl to carve sms messages from a cellphone's free space.
Here's how it played out:
- Cellebrite UFED was used to generate the .bin physical image file(s) from an Android phone.
- Then the .bin file(s) were added to a new X-Ways Forensics case.
- A keyword search for various phone numbers turned up multiple hits in the 1 GB+ "Free Space" file (ie unallocated space) which was then exported/copied to SIFT v2.14.
- The script's schema config file was adjusted to match the sms table schema.
- After trying the script with a 1GB+ file, we were consistently getting out of memory errors (even after increasing the SIFT VM RAM to 3 GB).
So the Linux "split" command was used to split the 1GB+ file into 3 smaller 500 MB files.
This ran error free although it meant running the script a few more times. Meh, still better than doing it by hand!
As mentioned in a previous post, this script can potentially be used with non-sms SQLite databases especially if the search term field appears near the start of the cell data section.

From now on, all of my scripts will be hosted at GitHub. I'm not sure how much longer GoogleCode will keep my existing scripts so I have also transferred most of those to GitHub.
Because I can no longer update sms-grep.pl on GoogleCode, I have removed the previous version to minimize further confusion.

Apologies for any inconvenience caused by this script oversight and Special Thanks to Mari for both spotting and letting me know about the error!