Apparently, you can't trust any old monkey with your Windows Phone ... |
Following on from our previous Windows Phone post and after some excellent testing feedback, it's time to release some Windows Phone 8.0 scripts for extracting SMS, Call History and Contacts. How much would you expect to pay for these marvellous feats of monkey code? 3 bananas? 2 bananas? How about for FREE :)
But wait .. there's more! As an added bonus we'll throw in a Facebook message JSON extraction script.
Special Thanks to Cindy Murphy (@cindymurph) and the Madison, WI Police Department (MPD) for the initial test data and encouragement.
Thanks also to Brian McGarry (Garda) and JoAnn Gibb (Ohio Attorney Generals Office) for providing further testing data/feedback.
The scripts are available from my GitHub page and have been developed/tested on Windows 7 running Python 2.7 against data from Nokia Lumia 520's running Windows Phone 8.0.
UPDATE (12/7/15):
Have now updated the "wp8-sms.py", "wp8-callhistory.py" and "wp8-contacts.py" scripts to read large files in chunks. This has resulted in a quicker processing time for large files (ie whole image files). Updated code is now available from my Github page. See this post for more details,
SMS Script
The wp8-sms.py script initially searches a given store.vol for "SMS" strings and stores the associated time and phone number information for each corresponding "SMS" record. Next it searches for "SMStext" strings and extracts the FILETIME2, the sent/received text and any associated phone numbers. If a phone number is not found in the "SMStext" record (ie sent SMS), the script uses the FILETIME2 value to lookup the corresponding "SMS" record's phone number field. For ease of display and documentation, the script outputs this data sorted by FILETIME2 in Tabbed Separated Variable (TSV) format.This script has also been used to parse the pagefile.sys and various store.vol .log files for SMS records which were not present in the store.vol.
Usage:
python wp8-sms.py -f store.vol -o output-sms.tsv
Output format:
Text_Offset UTC_Time2 Direction Phone_No Text
0xabcd 2014-10-01T19:34:57 Sent 1115551234 This is a sent SMS
0xabc1 2014-10-01T19:37:07 Recvd 1115574321 Here is a received SMS
UPDATE (7/7/15):We have run the "wp8-sms.py script" on a complete 7 GB .bin image from a Windows Phone 8 device.
It processed 6000+ SMS hits in 290 seconds.
The system was a Xeon 6 core 3.5 GHz (circa 2011) with 12 GB RAM and a 160 GB SSD (which contained the .bin image). The OS was Windows 7 x64 and the version of Python used was 2.7.5.
According to Python's cProfile monitoring module, most of the time (~250 seconds) was spent in the "read" call (line 270). In order to reduce the read time, the script could read the .bin file in smaller chunks using multiple threads.
Call History Script
The wp8-callhistory.py script searches a given Phone file for the GUID "{B1776703-738E-437D-B891-44555CEB6669}" which occurs at the end of each call history record. It then works backwards to read the Phone/Name/ID/FILETIME/Flag fields for that record. Finally, it outputs the extracted records sorted by Start_Time in Tabbed Separated Variable (TSV) format.Usage:
python wp8-callhistory.py -f Phone -o output-callhistory.tsv
Output format:
GUID_Offset Flag Start_Time Stop_Time ID Phone_1 Name_1 Name_2 Phone_2
0x3c5ee 0 2014-10-01T03:06:04 2014-10-01T03:06:37 4321555111 (111) 555-1234 BananaMan BananaMan (111) 555-1234
0x3c123 1 2014-10-01T03:16:04 2014-10-01T03:18:07 4321555111 (111) 555-1234 BananaMan BananaMan (111) 555-1234
Note 1: Flag value: 0 = Outgoing, 1 = Incoming, 2 = Missed
Note 2: ID appears to be the reverse of Phone_1 and Phone_2.
Contacts Script
The wp8-contacts.py script searches a given store.vol for instances of the hex code [01 04 00 00 00 82 00 E0 00 74 C5 B7 10 1A 82 E0 08] which occurs at the end of each contact record. It then tries reading the previous Unicode string fields in reverse order. The last field should contain the Name but can also hold Email for an MPD Hotmail entry. The 3rd last field should contain the Phone number but can also hold Name for MPD Hotmail/other Garda type entries. The contact records are then sorted by the last field (Name) and output in Tabbed Separated Variable (TSV) format.Usage:
python wp8-contacts.py -f store.vol -o output-contacts.tsv
Output format:
Offset Last_Field(Name) Third_Last_Field(Phone)0x711a0 BananaMan (111) 555-1234
0x727bd PooFlinger (111) 555-4321
Facebook Messages Script
The wp8-fb-msg.py script parses selected Facebook JSON fields from ASCII & Unicode file dumps. It should also handle escaped (ie backslashed) fields. It was suggested by Brian McGarry after he observed various JSON encoded messages in a Windows Phone 8.0 pagefile.sys.So while it's intended to be used against pagefile.sys, it can also be used against any file containing these JSON encoded messages (there's probably an input file size limit though).
The script extracts the author_fbid, author_name, message and timestamp_src fields and outputs the records sorted by timestamp_src in Tabbed Separated Variable (TSV) format. It also prints the timestamp in a human readable format.
Here's a simple JSON encoded Facebook message example (in reality there's a LOT more fields than this):
{[{"author_fbid":123456789,"author_name":"Monkey", "message":"Where's my Bananas?!", timestamp":1392430316355}]}
For more information on JSON and Facebook messages see this somewhat related previous post
Usage:
python wp8-fb-msg.py -f pagefile.sys -o output-facebook.tsv -u
Note: the -u flag specifies to search for Unicode/UTF16LE encoded messages. The default (ie no -u flag) is to search for ASCII/UTF8 encoded messages.
Output format:
author_fbid_Offset author_fbid author_name message timestamp_src timestamp_str
0xae 123456789 "Monkey" "Where's my Bananas?!" 1392430316355 2014-02-15T02:11:56
0x1e 123456780 "BananaMan" "Chill out Monkey boy. Magilla Gorilla says they're on the way." 1392430323543 2014-02-15T02:12:03
Final Thoughts
These scripts have been tested mostly against datasets from JTAG'd Nokia Lumia 520s. We can't guarantee they will work for other phones or for Windows Phone 8.1 but it's a good starting point considering the currently limited open source alternatives.Anyhoo, it is suspected that other Windows Phone data will only require minor tweaks to the existing code rather than a complete rewrite. I'm pretty sure we're in the ballpark *famous last words* :)
As Windows Phones are a market minority and extracting the data out of them typically requires JTAG'ing, these scripts are aimed at a very small audience. Having said that, if they do help you out, it'd be great to hear about it in the comments section ...