Thursday 23 January 2014

Android SMS script update and a bit of light housekeeping

Knock, Knock ...

During recent research into Android SQLite databases (eg sms), Mari DeGrazia discovered a bug in the script.
Mari's test data was from a Samsung Galaxy S II. It turns out the script wasn't handling Cell Header "Serial Type" values of 8 or 9.
These Cell Header values are respectively used to represent "0" and "1" integer constants and eliminate the need for a corresponding 0x0/0x1 byte value in the Cell Data field section.
So this meant that some fields were being interpreted as "0" when they were actually set to "1". DOH!

The previous Android test data I used did not utilize these particular cell header values which is why it escaped my monkey like attention to detail. Banana? Where?!

Anyway, there's an updated version of the script available from GitHub here.

Pictures speak louder than words so lets look at a simplified example of an SQLite cell record:

SQLite Cell Record Structure

From the diagram above, we can see the usual SQLite record format. A Cell Size, Rowid and Cell Header Size followed by the rest of the Cell Header and the Cell Data sections.
Notice how HeaderField-B = 0x8? This means there will be no corresponding value written in the Cell Data section (ie there is no DataField-B).
When read, the extracted value of DataField-B will be set (to 0) based on the HeaderField-B type (0x8).
Alternatively, if the HeaderField-B type value was 0x9, the extracted value of DataField-B would be set to 1.

Additionally, since the previous post here - both Mari and I have used to carve sms messages from a cellphone's free space.
Here's how it played out:
- Cellebrite UFED was used to generate the .bin physical image file(s) from an Android phone.
- Then the .bin file(s) were added to a new X-Ways Forensics case.
- A keyword search for various phone numbers turned up multiple hits in the 1 GB+ "Free Space" file (ie unallocated space) which was then exported/copied to SIFT v2.14.
- The script's schema config file was adjusted to match the sms table schema.
- After trying the script with a 1GB+ file, we were consistently getting out of memory errors (even after increasing the SIFT VM RAM to 3 GB).
So the Linux "split" command was used to split the 1GB+ file into 3 smaller 500 MB files.
This ran error free although it meant running the script a few more times. Meh, still better than doing it by hand!
As mentioned in a previous post, this script can potentially be used with non-sms SQLite databases especially if the search term field appears near the start of the cell data section.

From now on, all of my scripts will be hosted at GitHub. I'm not sure how much longer GoogleCode will keep my existing scripts so I have also transferred most of those to GitHub.
Because I can no longer update on GoogleCode, I have removed the previous version to minimize further confusion.

Apologies for any inconvenience caused by this script oversight and Special Thanks to Mari for both spotting and letting me know about the error!

Monday 13 January 2014

Facebook / Facebook Messenger Android App Parser Script

Poorly drawn parody of the Faceoff movie poster

Not satisfied with how your forensic tools are currently presenting Facebook (v3.3 for Android) / Facebook Messenger (v2.5.3 for Android) messages and contacts?
Would you also like a GoogleMaps URL that plots each message using available geographic metadata?
Or maybe you're just curious about how Facebook / Facebook Messenger stores contacts/messages on Android devices?
If so, read on! If not, then there's nothing to see here ... move along.

This Python script is the brainchild of Shafik Punja (@qubytelogic).  I recently contacted him regarding mobile device script ideas and he quickly got back to me with sample Android test databases and cell phone screenshots (for validation). Without his assistance and feedback, this script would not be here today. So, Thankyou Shafik! The Commonwealth of Forensic Monkeys salutes you!

It's also pretty fortunate timing because Heather Mahalik (@heathermahalik) recently gave an awesomely detailed SANS webcast about Data Retention on Android/iOS devices. In her talk she covered where to look for various application data artefacts and also mentioned a few fun Facebook facts.

BTW I don't use Facebook / Facebook Messenger (monkey has no social life) and no one in their right mind would volunteer their personal data for this blog. So just for you ingrates, I made up a test scenario involving 3 muppets and several Facebook messages.
Due to time contraints, I have only fabricated the script relevant data fields just so we had something to print out.
Any id's I use (hopefully) do not correspond to valid Facebook accounts.
Your own data will probably have more populated fields/longer field lengths. Meh.

The script has been developed/tested on SANS SIFT v2.14 running Python 2.6.4. It has also been run successfully on Win7x64 running Python 2.7.6. You can download it from my Google Code page here.

UPDATE 2014-02-05 Two potential issues have arisen with this code.
1. emoji/non-printable characters in the message text may cause the script to crash. Currently, I do not have the Android test data to verify this.
2. If the keywords "to" or "from" are in the message text and it is subsequently used in a GoogleMaps URL, the URL will not plot properly. This is because GoogleMaps interprets these keywords as routing instructions. I can't think of a way around this without changing the text for the GoogleMaps plot.

Data, Data ... where's the data?

For sanity's sake, I am limiting the scope of this post to actual message content and contacts information. There's a crapload of databases/tables that Facebook uses so I had to draw the line somewhere (even if it's in crayon!). From Shafik's test data, there are 3 tables ("contacts", "threads" and "messages") that we are going to extract data from. These tables are stored in 2 separate SQLite database files ("contacts_db2" and "threads_db2").

The "contacts" table
The Facebook app for Android (katana) stores it's "contacts" table data in:

Notice there's no file extension but it's actually SQLite.

Similarly, the Facebook Messenger app for Android (orca) stores it's "contacts" table data in:

Notice how the filenames are the same?
If you compare their table schemas, you will find that they are identical.
Using the SQLite Manager plugin for Firefox on SIFT v2.14, I opened both "contacts_db2" files and checked the "contacts" table schema (found under the Structure tab).

Facebook App (katana) / Facebook Messenger App (orca) "contacts" table schema:

The "data" column is actually a JSON encoded series of key/value pairs. JSON (JavaScript Object Notation) is just another way of exchanging information. See here for further details.

Using the SQLite Manager Firefox plugin, our fictional muppet test scenario "contacts" table looks like:

Muppet test data "contacts" table

Note: If you hover your cursor over the data cell you're interested in, it brings up the neat yellow box displaying the whole string. So you don't have to waste time re-sizing/scrolling.
If it makes it easier, you can also copy the data cell string and use the JSON validator here to pretty print/validate the string.

In addition to the "contact_id" column, the script extracts/outputs the following JSON fields from the "data" column:

The "PictureUrl" values were usually observed to be based at the "" domain.
The "timelineCoverPhoto" values were usually observed to be based at the "" domain.
For the muppet test scenario data, I've just used picture URLs from wikipedia.

The "threads" table

The Facebook app for Android (katana) stores it's "messages" and "threads" table data in:

Similarly, the Facebook Messenger app for Android (orca) stores it's "messages" and "threads" table data in:

For the "threads" table, the Facebook / Facebook Messenger schemas are identical.

Facebook App (katana) / Facebook Messenger App (orca) "threads" table schema :
CREATE TABLE threads (thread_id TEXT PRIMARY KEY, thread_fbid TEXT, action_id INTEGER, refetch_action_id INTEGER, last_visible_action_id INTEGER, name TEXT, participants TEXT, former_participants TEXT, object_participants TEXT, senders TEXT, single_recipient_thread INTEGER, single_recipient_user_key TEXT, snippet TEXT, snippet_sender TEXT, admin_snippet TEXT, timestamp_ms INTEGER, last_fetch_time_ms INTEGER, unread INTEGER, pic_hash TEXT, pic TEXT, can_reply_to INTEGER, mute_until INTEGER, is_subscribed INTEGER, folder TEXT, draft TEXT )

For the "threads" table, we are only interested in the "thread_id" and "participants" columns.
The "thread_id" can be used to group all the messages from a particular conversation.
Later, we will use the "thread_id" to link "messages" table entries with the "participants" of that thread.
The "participants" column is formatted in JSON and looks something like:
[{"email":"","user_key":"FACEBOOK:100000987654321","name":"Kermit The Frog","mute":0,"lastReadReceiptTimestampMs":0},{"email":"","user_key":"FACEBOOK:1087654322","name":"Rowlf","mute":0,"lastReadReceiptTimestampMs":0}]

The script currently only extracts/prints the "name" data field. It is currently left to the analyst to match these "name" values with the "displayName" fields extracted from the "contacts" table mentioned previously.

Here's a screenshot of our fictional muppet "threads" table:

Muppet test data"threads" table

The "messages" table
The Facebook / Facebook Messenger apps' "messages" table schemas differ by one column - Facebook Messenger's "messages" table has an extra column called "auto_retry_count". We're not going to extract this field anyway so our idea of using one extraction script for both apps is still viable. Phew!

Facebook App (katana) "messages" table schema:
CREATE TABLE messages (msg_id TEXT PRIMARY KEY, thread_id TEXT, action_id INTEGER, subject TEXT, text TEXT, sender TEXT, timestamp_ms INTEGER, timestamp_sent_ms INTEGER, mms_attachments TEXT, attachments TEXT, shares TEXT, msg_type INTEGER, affected_users TEXT, coordinates TEXT, offline_threading_id TEXT, source TEXT, is_non_authoritative INTEGER, pending_send_media_attachment STRING, handled_internally_time INTEGER, pending_shares STRING, pending_attachment_fbid STRING, client_tags TEXT, send_error STRING )

Facebook Messenger App (orca) "messages" table schema:
CREATE TABLE messages (msg_id TEXT PRIMARY KEY, thread_id TEXT, action_id INTEGER, subject TEXT, text TEXT, sender TEXT, timestamp_ms INTEGER, timestamp_sent_ms INTEGER, mms_attachments TEXT, attachments TEXT, shares TEXT, msg_type INTEGER, affected_users TEXT, coordinates TEXT, offline_threading_id TEXT, source TEXT, is_non_authoritative INTEGER, pending_send_media_attachment STRING, handled_internally_time INTEGER, pending_shares STRING, pending_attachment_fbid STRING, client_tags TEXT, send_error STRING, auto_retry_count INTEGER )

For our test scenario, we will be using the Facebook Messenger App schema (orca) for the "messages" table. It should not matter either way.
Our fictional muppet test scenario "messages" table looks like this:

Muppet test data "messages" table

Note: This screenshot does not show all of the column values, just the script relevant ones (ie "msg_id", "thread_id", "text", "sender" (JSON formatted), "timestamp_ms", "coordinates" (JSON formatted), "source").

The "msg_id" is a unique identifier for each message stored in the table.
The "thread_id" is used to group messages from the same conversation thread.
The "text" string stores the message's text. Note: For formatting reasons, the script converts any "/r/n" and "/n" to spaces.
The "sender" column JSON looks like:
{"email":"","user_key":"FACEBOOK:100000987654321","name":"Kermit The Frog"}
From testing observations, this "name" field should correspond to a "displayName" JSON field from the "contacts" table.

The "timestamp_ms" column seems to be the ms since 1JAN1970 in UTC/GMT. It was verified by comparing the message timestamps with screenshots taken from the test Android phone. The test phone displayed the local time of this timestamp.

The "coordinates" column JSON looks like:
Sometimes this column was blank, while other times there were only values defined for latitude/longitude/accuracy.

The "source" column values have been observed to be "messenger", "chat", "web", "mobile".
At this time, I don't know what all of the values indicate. Further testing is required as the Messaging Help from Facebook does not mention this "source" field. Boo!
It's probably safe to say that "messenger" indicates the Facebook Messenger app (not sure if this includes the mobile versions).
The "chat" probably indicates the source being the chat sidebar from within a browser.
The "mobile" possibly indicates the source is a Facebook app running on mobile device (eg Android/iPhone).
The "web" could mean a "Facebook message" was sent from a browser logged into Facebook?
There is also a Firefox add-on for sending Facebook messages but it's unknown which category this would fall under.
BTW if you know what all these values stand for please let us know via the comments section!

So putting it all together ... here's our script relevant data all nicely mapped out for you :)

Facebook messaging schema

Note: The JSON encoded data fields are highlighted in blue ("Columbia Blue" according to Wikipedia for all you interior decorator types :). The remaining uncoloured fields are either text or numeric in nature.

From the diagram above, we can use the "thread_id" to match "participants" (senders/receivers) to a particular thread (collection of messages). See the red link in the diagram.
As mentioned earlier, we can also link the "messages" table's "sender" column ("name") back to an entry in the "contacts" table's "data" column ("displayName"). See the yellowy-orange link in the diagram above.

Due to "sender" columns sometimes being blank, the script does not currently do this automagically (Drats!). Instead, it is suggested that the analyst manually matches each participant "name" from the extracted contacts output using the contacts "displayName" field.
From the test data supplied, these two fields seem to correspond. Future versions of the script could also print the "user_key" field in case there are multiple contacts with the same "displayName"s.

How the script works ...

OK, enough about the data. Let's see what the script does eh?

The script connects to the given "threads_db2" and "contacts_db2" SQLite files and runs queries to extract the stored contacts and messages.
It then sorts/outputs these values to the command line and optionally to the nominated Tab Separated Variable files.
The script converts the "timestamp_ms" column values into the form YYYY-MM-DDThh:mm:ss.
If a message has latitude/longitude data, it will also provide a plot of the position via a GoogleMaps URL. The message text and timestamp are also included on the plot.

In case you were wondering about the SQLite query the script uses to extract the messages ...
select messages.msg_id, messages.thread_id, messages.text, messages.sender, threads.participants, messages.timestamp_ms, messages.source, messages.coordinates from messages, threads where messages.thread_id=threads.thread_id order by messages.thread_id, messages.timestamp_ms;

And for the contacts ...
select contact_id, data from contacts;

To make things easier, Python has some existing libraries we can use:
sqlite3 (for querying the SQLite files)
json (for converting the JSON strings to a Python object we can parse)
datetime (for converting "the timestamp_ms" field into a readable date/time string)
urllib (used to ensure our GoogleMaps URL doesn't contain any illegal characters ... it makes 'em an offer they can't refuse!)

On SIFT v2.14, I've used the following command to make the script directly executable (ie no need to type "python" before the script name):
sudo chmod a+x

Here's the help text ...

sansforensics@SIFT-Workstation:~$ ./
Running fbmsg-extractor v2014-01-08 Initial Version
Usage: -t threads_db -c contacts_db -x contacts.tsv -z messages.tsv

  -h, --help      show this help message and exit
  -t THREADSDB    threads_db2 input file
  -c CONTACTSDB   contacts_db2 input file
  -x CONTACTSTSV  (Optional) Contacts Tab Separated Output Filename
  -z MESSAGESTSV  (Optional) Messages Tab Separated Output Filename

And here's what happens when we run it with our muppet test data ...

sansforensics@SIFT-Workstation:~$ ./ -t facebook/test/threads_db2 -c facebook/test/contacts_db2 -x muppet-contacts.txt -z muppet-messages.txt
Running fbmsg-extractor v2014-01-08 Initial Version

Extracted CONTACTS Data

contact_id    profileFbid    displayName    displayNumber    universalNumber    smallPictureUrl    bigPictureUrl    hugePictureUrl    timelineCoverPhoto

Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMwo=    1087654323    Fozzie Bear    (555) 555-0003    +15555550003

Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTAwMDAwOTg3NjU0MzIxCg==    100000987654321    Kermit The Frog    NA    NA

Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMQo=    1087654321    Miss Piggy    (555) 555-0001    +15555550001

Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMgo=    1087654322    Rowlf    (555) 555-0002    +15555550002

Extracted MESSAGES Data

msg_id    thread_id    text    sender    participants    timestamp_ms    source    latitude    longitude    accuracy    heading    speed    altitude    googlemaps
m_id.123456789012345678    t_1234567890abcdefghijk1    Hi-ho! You coming to the show?    Kermit The Frog    Kermit The Frog, Miss Piggy    2014-01-03T23:45:03    messenger    33.807958    -117.918157    15.0    0.0    0.0    0.0,+-117.918157+%28Hi-ho%21+You+coming+to+the+show%3F+%402014-01-03T23%3A45%3A03%29&iwloc=A&hl=en

m_id.123456789012345679    t_1234567890abcdefghijk1    Yes Kermie! Just powdering my nose ...    Miss Piggy    Kermit The Frog, Miss Piggy    2014-01-03T23:49:05    mobile    33.802399    -117.914954    1500.0    NA    NA    NA,+-117.914954+%28Yes+Kermie%21+Just+powdering+my+nose+...+%402014-01-03T23%3A49%3A05%29&iwloc=A&hl=en

m_id.123456789012345680    t_1234567890abcdefghijk1    So ... At IHOP again huh?    Kermit The Frog    Kermit The Frog, Miss Piggy    2014-01-03T23:50:05    messenger    33.807958    -117.918157    15.0    0.0    0.0    0.0,+-117.918157+%28So+...+At+IHOP+again+huh%3F+%402014-01-03T23%3A50%3A05%29&iwloc=A&hl=en

m_id.123456789012345683    t_1234567890abcdefghijk1    More Pork Rolls for you to love!    Miss Piggy    Kermit The Frog, Miss Piggy    2014-01-03T23:50:45    mobile    33.802399    -117.914954    1500.0    NA    NA    NA,+-117.914954+%28More+Pork+Rolls+for+you+to+love%21+%402014-01-03T23%3A50%3A45%29&iwloc=A&hl=en

m_id.123456789012345689    t_1234567890abcdefghijk2    Yo Fozzie! Where u at?    Kermit The Frog    Kermit The Frog, Fozzie Bear    2014-01-03T23:47:13    messenger    33.807958    -117.918157    15.0    0.0    0.0    0.0,+-117.918157+%28Yo+Fozzie%21+Where+u+at%3F+%402014-01-03T23%3A47%3A13%29&iwloc=A&hl=en

m_id.123456789012345690    t_1234567890abcdefghijk2    Hey Kermie! I'm almost BEAR ! Wokka!Wokka!Wokka!    Fozzie Bear    Kermit The Frog, Fozzie Bear    2014-01-03T23:47:43    mobile    33.808227    -117.918948    12.0    90.0    1.0    0.0,+-117.918948+%28Hey+Kermie%21+I%27m+almost+BEAR+%21+Wokka%21Wokka%21Wokka%21+%402014-01-03T23%3A47%3A43%29&iwloc=A&hl=en

4 contacts were processed

6 messages were processed


As you can see, the script prints out the contacts information first followed by the message data. Columns are tab separated but when dealing with large numbers of contacts/messages, the command line quickly becomes unreadable. It is HIGHLY recommended that analysts utilize the output to TSV functionality.

Here's what the outputted TSV data looks like after being imported into a spreadsheet program:

Script's Output TSV for Muppet test data contacts

Script's Output TSV for Muppet test data messages

Contacts are sorted alphabetically by "displayName".
Messages are sorted first by thread, then in chronological order (using the "timestamp_ms" value).

Not all messages will have defined geodata. Some may be blank or only have lat/long/accuracy with no speed/heading/altitude.
CAUTION: Not sure what the units are for the accuracy/speed/heading/altitude
In general, the script outputs the string "NA" if there is no defined value.

Just for shiggles, lets plot Miss Piggy's position for her first reply back to Kermit (ie "Yes Kermie! Just powdering my nose ...") using the GoogleMaps URL from the messages TSV.

Where's Piggy?

From the example screenshot, we can see the message text and timestamp plotted along with her position in GoogleMaps. Somebody's telling porkies eh?

Some Other Trivia ...

Format of "contact_id"
The funky looking "contact_id" (eg "Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMQo=") from the "contacts" table is actually base64 encoded. Looking closer at the letters and numbers comprising the "contact_id", we can see an "=" character.
I remembered seeing similar strings in base64 encoded emails ... so just for shiggles, I tried decoding it via the "base64" command.
Here's a fictional demo example:

sansforensics@SIFT-Workstation:~$ echo 'Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMQo=' | base64 --decode

The decoded format appears to be "contact:XXX:YYY"
XXX = remains constant for each "contact_id" and corresponds to /data/data/com.facebook.orca/shared_prefs/com.facebook.orca_preferences.xml's "/auth/user_data/fb_uid" value. It's believed to be used as a unique user id for the Facebook account user.
YYY = Seems to be a user id field for the stored contact (ie equals their "profileFbid" value).

Don't believe everything in the "contacts" database however. Heather Mahalik mentioned in her SANS webcast that Facebook can add contact entries when the app suggests potential friends. Consequently, stored messages should be used to indicate whether a contact entry is someone the account holder has communicated with.

XML files
Be sure to also check out the various XML files containing potentially relevant info (such as username, times, account info). You can find these under:

Even though they share the same filename, /data/data/com.facebook.orca/shared_prefs/com.facebook.orca_preferences.xml differs from the /data/data/com.facebook.katana/shared_prefs/com.facebook.orca_preferences.xml.
In addition to having a different order of declarations, the katana version mentions what appears to be the user's email address.

Other databases
Also check out the "prefs" database tables for username, times, account info. This can be found under /data/data/com.facebook.katana/databases/prefs_db and /data/data/com.facebook.orca/databases/prefs_db.

Facebook Messaging Documentation
Just in case you're as semi-oblivious to Facebook messaging as I am, here's some messaging help I found from the Facebook website.

When you send someone a message, it gets delivered to the person’s Facebook Messages.
If the person you messaged has turned chat on, your message will appear as a chat. If they have chat off, the message will appear in their message inbox and they will receive a notification

Chat and message histories are threaded together — you can think of them as one and the same. When you open a conversation, you’ll see a conversation that includes all your messages along with your entire chat history. If you send a chat message to a friend who has turned chat off, the chat message will be sent directly to their message inbox.

Can I message my mobile contacts if we’re not Facebook friends?
Yes. Confirming your phone number when you first sign in helps ensure that your contacts will be able to find you. Messenger works similar to texting or other mobile messaging apps, and you can add people to your Messenger contacts by entering their phone number.
To allow people who have your phone number to reach you in Messenger, the app will ask you to set the control called "Who can look you up by the phone number you provided?" to Public.

Can I message friends who aren’t using the Facebook Messenger mobile app?
Yes. People who don't have the Facebook Messenger app on their phone will receive chats and messages you send whenever they log into Facebook.

How does location work with the Messenger mobile app?
When you send a message from the Messenger app, your location is included by default. You can turn this feature off by tapping before you send a message, which turns the arrow from blue (on) to gray (off). Location remains off for that conversation until you tap the arrow again.
In order to share your location in messages, you'll need to turn on location services in the main settings of your smartphone.

Who can see my location when I share it in a conversation in Messenger?
Your location is only visible to the people in that conversation.

Does Facebook store my location when I include it in a message?
When you add your location to a message, the location becomes a permanent part of the message history.
When you send a message to a friend with your location, that friend can see it as a pin on a map when they tap on your message. Your location won't appear anywhere outside of the message.

Final Thoughts

This script has been tested with a limited amount of test data. It is possible that some non-defined/blank fields might cause the script to fall over in a screaming heap. If so, let me know and I will try to fix it although some test data may be required to locate the bug(s) quicker. Knowing this monkey's luck, Facebook will probably change their database schema next week anyway LOL.

In the end, it took just as long to write this blog article as it did to write the script. Once you familarize yourself with the relevant libraries / Google for what you want the code to do (Thankyou Stack Overflow !) - it's pretty straight forward. Even a monkey can do it! And while this functionality is possibly already implemented in a number of forensic tools, writing this script provided me with a deeper understanding of the data involved whilst also allowing me to improve my Python programming skills. So I'd say it was well worth the effort.

It seems GoogleCode is stopping uploads from mid-January so this will probably be the last script I put on there. I'll have to find a new (free) home for future scripts. Anyone have suggestions? I am thinking of sharing a GoogleDrive / DropBox folder but that might not allow for easily viewed release notes. Not that anyone reads the release notes anyway LOL.

As usual, please feel free to leave comments/suggestions in the comments section below.