|
Poorly drawn parody of the Faceoff movie poster |
Not satisfied with how your forensic tools are currently presenting
Facebook (v3.3 for Android) /
Facebook Messenger (v2.5.3 for Android) messages and contacts?
Would you also like a GoogleMaps URL that plots each message using available geographic metadata?
Or maybe you're just curious about how Facebook / Facebook Messenger stores contacts/messages on Android devices?
If so, read on! If not, then there's nothing to see here ... move along.
This Python script is the brainchild of
Shafik Punja (
@qubytelogic). I recently contacted him regarding mobile device script ideas and he quickly got back to me with sample Android test databases and cell phone screenshots (for validation). Without his assistance and feedback, this script would not be here today. So, Thankyou Shafik! The Commonwealth of Forensic Monkeys salutes you!
It's also pretty fortunate timing because
Heather Mahalik (
@heathermahalik) recently gave an awesomely detailed
SANS webcast about Data Retention on Android/iOS devices. In her talk she covered where to look for various application data artefacts and also mentioned a few fun Facebook facts.
BTW I don't use Facebook / Facebook Messenger (monkey has no social life) and no one in their right mind would volunteer their personal data for this blog. So just for you ingrates, I made up a test scenario involving 3 muppets and several Facebook messages.
Due to time contraints, I have only fabricated the script relevant data fields just so we had something to print out.
Any id's I use (hopefully) do not correspond to valid Facebook accounts.
Your own data will probably have more populated fields/longer field lengths. Meh.
The
fbmsg-extractor.py script has been developed/tested on
SANS SIFT v2.14 running Python 2.6.4. It has also been run successfully on Win7x64 running Python 2.7.6. You can download it from my Google Code page
here.
UPDATE 2014-02-05 Two potential issues have arisen with this code.
1. emoji/non-printable characters in the message text may cause the script to crash. Currently, I do not have the Android test data to verify this.
2. If the keywords "to" or "from" are in the message text and it is subsequently used in a GoogleMaps URL, the URL will not plot properly. This is because GoogleMaps interprets these keywords as routing instructions. I can't think of a way around this without changing the text for the GoogleMaps plot.
Data, Data ... where's the data?
For sanity's sake, I am limiting the scope of this post to actual message content and contacts information. There's a crapload of databases/tables that Facebook uses so I had to draw the line somewhere (even if it's in crayon!). From Shafik's test data, there are 3 tables ("contacts", "threads" and "messages") that we are going to extract data from. These tables are stored in 2 separate SQLite database files ("contacts_db2" and "threads_db2").
The "contacts" table
The Facebook app for Android (katana) stores it's "contacts" table data in:
/data/data/com.facebook.katana/databases/contacts_db2
Notice there's no file extension but it's actually SQLite.
Similarly, the Facebook Messenger app for Android (orca) stores it's "contacts" table data in:
/data/data/com.facebook.orca/databases/contacts_db2
Notice how the filenames are the same?
If you compare their table schemas, you will find that they are identical.
Using the
SQLite Manager plugin for Firefox on SIFT v2.14, I opened both "contacts_db2" files and checked the "contacts" table schema (found under the Structure tab).
Facebook App (katana) / Facebook Messenger App (orca) "contacts" table schema:
CREATE TABLE contacts (internal_id INTEGER PRIMARY KEY AUTOINCREMENT, contact_id TEXT UNIQUE, data TEXT )
The "data" column is actually a
JSON encoded series of key/value pairs. JSON (JavaScript Object Notation) is just another way of exchanging information. See
here for further details.
Using the SQLite Manager Firefox plugin, our fictional muppet test scenario "contacts" table looks like:
|
Muppet test data "contacts" table |
Note: If you hover your cursor over the data cell you're interested in, it brings up the neat yellow box displaying the whole string. So you don't have to waste time re-sizing/scrolling.
If it makes it easier, you can also copy the data cell string and use the JSON validator
here to pretty print/validate the string.
In addition to the "contact_id" column, the script extracts/outputs the following JSON fields from the "data" column:
profileFbid
displayName
displayNumber
universalNumber
smallPictureUrl
bigPictureUrl
hugePictureUrl
timelineCoverPhoto
The "PictureUrl" values were usually observed to be based at the "fbcdn-profile-a.akamaihd.net" domain.
The "timelineCoverPhoto" values were usually observed to be based at the "fbcdn-sphotos-f-a.akamaihd.net" domain.
For the muppet test scenario data, I've just used picture URLs from wikipedia.
The "threads" table
The Facebook app for Android (katana) stores it's "messages" and "threads" table data in:
/data/data/com.facebook.katana/databases/threads_db2
Similarly, the Facebook Messenger app for Android (orca) stores it's "messages" and "threads" table data in:
/data/data/com.facebook.orca/databases/threads_db2
For the "threads" table, the Facebook / Facebook Messenger schemas are identical.
Facebook App (katana) / Facebook Messenger App (orca) "threads" table schema :
CREATE TABLE threads (thread_id TEXT PRIMARY KEY, thread_fbid TEXT, action_id INTEGER, refetch_action_id INTEGER, last_visible_action_id INTEGER, name TEXT, participants TEXT, former_participants TEXT, object_participants TEXT, senders TEXT, single_recipient_thread INTEGER, single_recipient_user_key TEXT, snippet TEXT, snippet_sender TEXT, admin_snippet TEXT, timestamp_ms INTEGER, last_fetch_time_ms INTEGER, unread INTEGER, pic_hash TEXT, pic TEXT, can_reply_to INTEGER, mute_until INTEGER, is_subscribed INTEGER, folder TEXT, draft TEXT )
For the "threads" table, we are only interested in the "thread_id" and "participants" columns.
The "thread_id" can be used to group all the messages from a particular conversation.
Later, we will use the "thread_id" to link "messages" table entries with the "participants" of that thread.
The "participants" column is formatted in JSON and looks something like:
[{"email":"100000987654321@facebook.com","user_key":"FACEBOOK:100000987654321","name":"Kermit The Frog","mute":0,"lastReadReceiptTimestampMs":0},{"email":"1087654322@facebook.com","user_key":"FACEBOOK:1087654322","name":"Rowlf","mute":0,"lastReadReceiptTimestampMs":0}]
The script currently only extracts/prints the "name" data field. It is currently left to the analyst to match these "name" values with the "displayName" fields extracted from the "contacts" table mentioned previously.
Here's a screenshot of our fictional muppet "threads" table:
|
Muppet test data"threads" table |
The "messages" table
The Facebook / Facebook Messenger apps' "messages" table schemas differ by one column - Facebook Messenger's "messages" table has an extra column called "auto_retry_count". We're not going to extract this field anyway so our idea of using one extraction script for both apps is still viable. Phew!
Facebook App (katana) "messages" table schema:
CREATE TABLE messages (msg_id TEXT PRIMARY KEY, thread_id TEXT, action_id INTEGER, subject TEXT, text TEXT, sender TEXT, timestamp_ms INTEGER, timestamp_sent_ms INTEGER, mms_attachments TEXT, attachments TEXT, shares TEXT, msg_type INTEGER, affected_users TEXT, coordinates TEXT, offline_threading_id TEXT, source TEXT, is_non_authoritative INTEGER, pending_send_media_attachment STRING, handled_internally_time INTEGER, pending_shares STRING, pending_attachment_fbid STRING, client_tags TEXT, send_error STRING )
Facebook Messenger App (orca) "messages" table schema:
CREATE TABLE messages (msg_id TEXT PRIMARY KEY, thread_id TEXT, action_id INTEGER, subject TEXT, text TEXT, sender TEXT, timestamp_ms INTEGER, timestamp_sent_ms INTEGER, mms_attachments TEXT, attachments TEXT, shares TEXT, msg_type INTEGER, affected_users TEXT, coordinates TEXT, offline_threading_id TEXT, source TEXT, is_non_authoritative INTEGER, pending_send_media_attachment STRING, handled_internally_time INTEGER, pending_shares STRING, pending_attachment_fbid STRING, client_tags TEXT, send_error STRING, auto_retry_count INTEGER )
For our test scenario, we will be using the Facebook Messenger App schema (orca) for the "messages" table. It should not matter either way.
Our fictional muppet test scenario "messages" table looks like this:
|
Muppet test data "messages" table |
Note: This screenshot does not show all of the column values, just the script relevant ones (ie "msg_id", "thread_id", "text", "sender" (JSON formatted), "timestamp_ms", "coordinates" (JSON formatted), "source").
The "msg_id" is a unique identifier for each message stored in the table.
The "thread_id" is used to group messages from the same conversation thread.
The "text" string stores the message's text. Note: For formatting reasons, the script converts any "/r/n" and "/n" to spaces.
The "sender" column JSON looks like:
{"email":"100000987654321@facebook.com","user_key":"FACEBOOK:100000987654321","name":"Kermit The Frog"}
From testing observations, this "name" field should correspond to a "displayName" JSON field from the "contacts" table.
The "timestamp_ms" column seems to be the ms since 1JAN1970 in UTC/GMT. It was verified by comparing the message timestamps with screenshots taken from the test Android phone. The test phone displayed the local time of this timestamp.
The "coordinates" column JSON looks like:
{"latitude":33.808227,"longitude":-117.918948,"accuracy":12.0,"altitude":0.0,"heading":90.0,"speed":1.0}
Sometimes this column was blank, while other times there were only values defined for latitude/longitude/accuracy.
The "source" column values have been observed to be "messenger", "chat", "web", "mobile".
At this time, I don't know what all of the values indicate. Further testing is required as the
Messaging Help from Facebook does not mention this "source" field. Boo!
It's probably safe to say that "messenger" indicates the Facebook Messenger app (not sure if this includes the mobile versions).
The "chat" probably indicates the source being the chat sidebar from within a browser.
The "mobile" possibly indicates the source is a Facebook app running on mobile device (eg Android/iPhone).
The "web" could mean a "Facebook message" was sent from a browser logged into Facebook?
There is also a Firefox add-on for sending Facebook messages but it's unknown which category this would fall under.
BTW if you know what all these values stand for please let us know via the comments section!
So putting it all together ... here's our script relevant data all nicely mapped out for you :)
|
Facebook messaging schema |
Note: The JSON encoded data fields are highlighted in blue ("Columbia Blue" according to Wikipedia for all you interior decorator types :). The remaining uncoloured fields are either text or numeric in nature.
From the diagram above, we can use the "thread_id" to match "participants" (senders/receivers) to a particular thread (collection of messages). See the red link in the diagram.
As mentioned earlier, we can also link the "messages" table's "sender" column ("name") back to an entry in the "contacts" table's "data" column ("displayName"). See the yellowy-orange link in the diagram above.
Due to "sender" columns sometimes being blank, the script does not currently do this automagically (Drats!). Instead, it is suggested that the analyst manually matches each participant "name" from the extracted contacts output using the contacts "displayName" field.
From the test data supplied, these two fields seem to correspond. Future versions of the script could also print the "user_key" field in case there are multiple contacts with the same "displayName"s.
How the script works ...
OK, enough about the data. Let's see what the script does eh?
The script connects to the given "threads_db2" and "contacts_db2" SQLite files and runs queries to extract the stored contacts and messages.
It then sorts/outputs these values to the command line and optionally to the nominated Tab Separated Variable files.
The script converts the "timestamp_ms" column values into the form
YYYY-MM-DDThh:mm:ss.
If a message has latitude/longitude data, it will also provide a plot of the position via a GoogleMaps URL. The message text and timestamp are also included on the plot.
In case you were wondering about the SQLite query the script uses to extract the messages ...
select messages.msg_id, messages.thread_id, messages.text, messages.sender, threads.participants, messages.timestamp_ms, messages.source, messages.coordinates from messages, threads where messages.thread_id=threads.thread_id order by messages.thread_id, messages.timestamp_ms;
And for the contacts ...
select contact_id, data from contacts;
To make things easier, Python has some existing libraries we can use:
sqlite3 (for querying the SQLite files)
json (for converting the JSON strings to a Python object we can parse)
datetime (for converting "the timestamp_ms" field into a readable date/time string)
urllib (used to ensure our GoogleMaps URL doesn't contain any illegal characters ... it makes 'em an offer they can't refuse!)
On SIFT v2.14, I've used the following command to make the script directly executable (ie no need to type "python" before the script name):
sudo chmod a+x fbmsg-extractor.py
Here's the help text ...
sansforensics@SIFT-Workstation:~$ ./fbmsg-extractor.py
Running fbmsg-extractor v2014-01-08 Initial Version
Usage: fbmsg-extractor.py -t threads_db -c contacts_db -x contacts.tsv -z messages.tsv
Options:
-h, --help show this help message and exit
-t THREADSDB threads_db2 input file
-c CONTACTSDB contacts_db2 input file
-x CONTACTSTSV (Optional) Contacts Tab Separated Output Filename
-z MESSAGESTSV (Optional) Messages Tab Separated Output Filename
sansforensics@SIFT-Workstation:~$
And here's what happens when we run it with our muppet test data ...
sansforensics@SIFT-Workstation:~$ ./fbmsg-extractor.py -t facebook/test/threads_db2 -c facebook/test/contacts_db2 -x muppet-contacts.txt -z muppet-messages.txt
Running fbmsg-extractor v2014-01-08 Initial Version
========================
Extracted CONTACTS Data
========================
contact_id profileFbid displayName displayNumber universalNumber smallPictureUrl bigPictureUrl hugePictureUrl timelineCoverPhoto
=======================================================================================================================================
Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMwo= 1087654323 Fozzie Bear (555) 555-0003 +15555550003 http://upload.wikimedia.org/wikipedia/en/5/51/Fozzie_Bear.jpg http://upload.wikimedia.org/wikipedia/en/5/51/Fozzie_Bear.jpg http://upload.wikimedia.org/wikipedia/en/5/51/Fozzie_Bear.jpg http://upload.wikimedia.org/wikipedia/en/5/51/Fozzie_Bear.jpg
Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTAwMDAwOTg3NjU0MzIxCg== 100000987654321 Kermit The Frog NA NA http://upload.wikimedia.org/wikipedia/en/6/62/Kermit_the_Frog.jpg http://upload.wikimedia.org/wikipedia/en/6/62/Kermit_the_Frog.jpg http://upload.wikimedia.org/wikipedia/en/6/62/Kermit_the_Frog.jpg http://upload.wikimedia.org/wikipedia/en/6/62/Kermit_the_Frog.jpg
Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMQo= 1087654321 Miss Piggy (555) 555-0001 +15555550001 http://upload.wikimedia.org/wikipedia/en/2/22/MissPiggy.jpg http://upload.wikimedia.org/wikipedia/en/2/22/MissPiggy.jpg http://upload.wikimedia.org/wikipedia/en/2/22/MissPiggy.jpg http://upload.wikimedia.org/wikipedia/en/2/22/MissPiggy.jpg
Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMgo= 1087654322 Rowlf (555) 555-0002 +15555550002 http://upload.wikimedia.org/wikipedia/en/b/b5/Rowlf_the_Dog.jpg http://upload.wikimedia.org/wikipedia/en/b/b5/Rowlf_the_Dog.jpg http://upload.wikimedia.org/wikipedia/en/b/b5/Rowlf_the_Dog.jpg http://upload.wikimedia.org/wikipedia/en/b/b5/Rowlf_the_Dog.jpg
========================
Extracted MESSAGES Data
========================
msg_id thread_id text sender participants timestamp_ms source latitude longitude accuracy heading speed altitude googlemaps
=======================================================================================================================================================
m_id.123456789012345678 t_1234567890abcdefghijk1 Hi-ho! You coming to the show? Kermit The Frog Kermit The Frog, Miss Piggy 2014-01-03T23:45:03 messenger 33.807958 -117.918157 15.0 0.0 0.0 0.0 http://maps.google.com/maps?q=33.807958,+-117.918157+%28Hi-ho%21+You+coming+to+the+show%3F+%402014-01-03T23%3A45%3A03%29&iwloc=A&hl=en
m_id.123456789012345679 t_1234567890abcdefghijk1 Yes Kermie! Just powdering my nose ... Miss Piggy Kermit The Frog, Miss Piggy 2014-01-03T23:49:05 mobile 33.802399 -117.914954 1500.0 NA NA NA http://maps.google.com/maps?q=33.802399,+-117.914954+%28Yes+Kermie%21+Just+powdering+my+nose+...+%402014-01-03T23%3A49%3A05%29&iwloc=A&hl=en
m_id.123456789012345680 t_1234567890abcdefghijk1 So ... At IHOP again huh? Kermit The Frog Kermit The Frog, Miss Piggy 2014-01-03T23:50:05 messenger 33.807958 -117.918157 15.0 0.0 0.0 0.0 http://maps.google.com/maps?q=33.807958,+-117.918157+%28So+...+At+IHOP+again+huh%3F+%402014-01-03T23%3A50%3A05%29&iwloc=A&hl=en
m_id.123456789012345683 t_1234567890abcdefghijk1 More Pork Rolls for you to love! Miss Piggy Kermit The Frog, Miss Piggy 2014-01-03T23:50:45 mobile 33.802399 -117.914954 1500.0 NA NA NA http://maps.google.com/maps?q=33.802399,+-117.914954+%28More+Pork+Rolls+for+you+to+love%21+%402014-01-03T23%3A50%3A45%29&iwloc=A&hl=en
m_id.123456789012345689 t_1234567890abcdefghijk2 Yo Fozzie! Where u at? Kermit The Frog Kermit The Frog, Fozzie Bear 2014-01-03T23:47:13 messenger 33.807958 -117.918157 15.0 0.0 0.0 0.0 http://maps.google.com/maps?q=33.807958,+-117.918157+%28Yo+Fozzie%21+Where+u+at%3F+%402014-01-03T23%3A47%3A13%29&iwloc=A&hl=en
m_id.123456789012345690 t_1234567890abcdefghijk2 Hey Kermie! I'm almost BEAR ! Wokka!Wokka!Wokka! Fozzie Bear Kermit The Frog, Fozzie Bear 2014-01-03T23:47:43 mobile 33.808227 -117.918948 12.0 90.0 1.0 0.0 http://maps.google.com/maps?q=33.808227,+-117.918948+%28Hey+Kermie%21+I%27m+almost+BEAR+%21+Wokka%21Wokka%21Wokka%21+%402014-01-03T23%3A47%3A43%29&iwloc=A&hl=en
4 contacts were processed
6 messages were processed
Exiting...
sansforensics@SIFT-Workstation:~$
As you can see, the script prints out the contacts information first followed by the message data. Columns are tab separated but when dealing with large numbers of contacts/messages, the command line quickly becomes unreadable. It is HIGHLY recommended that analysts utilize the output to TSV functionality.
Here's what the outputted TSV data looks like after being imported into a spreadsheet program:
|
Script's Output TSV for Muppet test data contacts |
|
Script's Output TSV for Muppet test data messages |
Contacts are sorted alphabetically by "displayName".
Messages are sorted first by thread, then in chronological order (using the "timestamp_ms" value).
Not all messages will have defined geodata. Some may be blank or only have lat/long/accuracy with no speed/heading/altitude.
CAUTION: Not sure what the units are for the accuracy/speed/heading/altitude
In general, the script outputs the string "NA" if there is no defined value.
Just for shiggles, lets plot Miss Piggy's position for her first reply back to Kermit (ie "Yes Kermie! Just powdering my nose ...") using the GoogleMaps URL from the messages TSV.
|
Where's Piggy? |
From the example screenshot, we can see the message text and timestamp plotted along with her position in GoogleMaps. Somebody's telling porkies eh?
Some Other Trivia ...
Format of "contact_id"
The funky looking "contact_id" (eg "Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMQo=") from the "contacts" table is actually base64 encoded. Looking closer at the letters and numbers comprising the "contact_id", we can see an "=" character.
I remembered seeing similar strings in base64 encoded emails ... so just for shiggles, I tried decoding it via the "base64" command.
Here's a fictional demo example:
sansforensics@SIFT-Workstation:~$ echo 'Y29udGFjdDoxMDAwMDA5ODc2NTQzMjE6MTA4NzY1NDMyMQo=' | base64 --decode
contact:100000987654321:1087654321
The decoded format appears to be "
contact:XXX:YYY"
Where
XXX = remains constant for each "contact_id" and corresponds to
/data/data/com.facebook.orca/shared_prefs/com.facebook.orca_preferences.xml's "/auth/user_data/fb_uid" value. It's believed to be used as a unique user id for the Facebook account user.
YYY = Seems to be a user id field for the stored contact (ie equals their "profileFbid" value).
Don't believe everything in the "contacts" database however.
Heather Mahalik mentioned in her SANS webcast that Facebook can add contact entries when the app suggests potential friends. Consequently, stored messages should be used to indicate whether a contact entry is someone the account holder has communicated with.
XML files
Be sure to also check out the various XML files containing potentially relevant info (such as username, times, account info). You can find these under:
/data/data/com.facebook.katana/shared_prefs/
and
/data/data/com.facebook.orca/shared_prefs
Even though they share the same filename,
/data/data/com.facebook.orca/shared_prefs/com.facebook.orca_preferences.xml differs from the
/data/data/com.facebook.katana/shared_prefs/com.facebook.orca_preferences.xml.
In addition to having a different order of declarations, the katana version mentions what appears to be the user's email address.
Other databases
Also check out the "prefs" database tables for username, times, account info. This can be found under
/data/data/com.facebook.katana/databases/prefs_db and
/data/data/com.facebook.orca/databases/prefs_db.
Facebook Messaging Documentation
Just in case you're as semi-oblivious to Facebook messaging as I am, here's some messaging help I found from the Facebook website.
From
https://www.facebook.com/help/326534794098501
When you send someone a message, it gets delivered to the person’s Facebook Messages.
If the person you messaged has turned chat on, your message will appear as a chat. If they have chat off, the message will appear in their message inbox and they will receive a notification
From
https://www.facebook.com/help/326534794098501#!/help/336759363070078/
Chat and message histories are threaded together — you can think of them as one and the same. When you open a conversation, you’ll see a conversation that includes all your messages along with your entire chat history. If you send a chat message to a friend who has turned chat off, the chat message will be sent directly to their message inbox.
From
https://www.facebook.com/help/151024075021791/
Can I message my mobile contacts if we’re not Facebook friends?
Yes. Confirming your phone number when you first sign in helps ensure that your contacts will be able to find you. Messenger works similar to texting or other mobile messaging apps, and you can add people to your Messenger contacts by entering their phone number.
To allow people who have your phone number to reach you in Messenger, the app will ask you to set the control called "Who can look you up by the phone number you provided?" to Public.
Can I message friends who aren’t using the Facebook Messenger mobile app?
Yes. People who don't have the Facebook Messenger app on their phone will receive chats and messages you send whenever they log into Facebook.
How does location work with the Messenger mobile app?
When you send a message from the Messenger app, your location is included by default. You can turn this feature off by tapping before you send a message, which turns the arrow from blue (on) to gray (off). Location remains off for that conversation until you tap the arrow again.
In order to share your location in messages, you'll need to turn on location services in the main settings of your smartphone.
Who can see my location when I share it in a conversation in Messenger?
Your location is only visible to the people in that conversation.
Does Facebook store my location when I include it in a message?
When you add your location to a message, the location becomes a permanent part of the message history.
When you send a message to a friend with your location, that friend can see it as a pin on a map when they tap on your message. Your location won't appear anywhere outside of the message.
Final Thoughts
This script has been tested with a limited amount of test data. It is possible that some non-defined/blank fields might cause the script to fall over in a screaming heap. If so, let me know and I will try to fix it although some test data may be required to locate the bug(s) quicker. Knowing this monkey's luck, Facebook will probably change their database schema next week anyway LOL.
In the end, it took just as long to write this blog article as it did to write the script. Once you familarize yourself with the relevant libraries / Google for what you want the code to do (Thankyou
Stack Overflow !) - it's pretty straight forward. Even a monkey can do it! And while this functionality is possibly already implemented in a number of forensic tools, writing this script provided me with a deeper understanding of the data involved whilst also allowing me to improve my Python programming skills. So I'd say it was well worth the effort.
It seems GoogleCode is stopping uploads from mid-January so this will probably be the last script I put on there. I'll have to find a new (free) home for future scripts. Anyone have suggestions? I am thinking of sharing a GoogleDrive / DropBox folder but that might not allow for easily viewed release notes. Not that anyone reads the release notes anyway LOL.
As usual, please feel free to leave comments/suggestions in the comments section below.