Sunday, 6 December 2015

Windows Phone 8.10 MMS (for Lumia 530) ...

Now with attachment info! Catch the excitement!

We recently noticed that while some commercial forensic tools show Windows Phone 8.10 MMS transaction information (eg Date, Phone number), they do not show or list the accompanying MMS file attachments. Welcome masochistic script monkey! May I take your "safe word"?
In Windows Phone 8.10, a user can send an MMS containing a picture (eg camera JPEG, PNG screen capture), a video (eg camera captured MP4), a Contact (via VCARD text) and/or a VoiceNote (via .AMR audio). These attachments are recopied/renamed/stored as weirdly named .dat files in various sub-directories (named "A" to "P") under Data:\SharedData\Comms\Unistore\data\7.
Received MMS are also stored as .dat files under the same location so it isn't immediately obvious what attachments go together and/or which were sent/received.

The main issue is finding a link between the MMS database transaction entries and the actual stored .dat file(s) which were sent/received.
There may be a more comprehensive link yet to be discovered, but the best link we've found so far is via the filesystem "Last Modified" timestamp of the .dat files and a timestamp found in the "Message" table of the store.vol database.
However, because file system modified times can vary by 1-2 seconds when compared to what is listed in the MMS database, we wrote two scripts - one to sort/print the filesystem's .dat attachment files in chronological order (wp8-1-mms-filesort.py) and a separate script (wp8-1-mms.py) to print the store.vol database MMS records in chronological order. The discerning simian analyst can then make the decision about which attachments go with which MMS based on the timestamp information, MMS total size, file types and individual file attachment sizes. Additionally, they can use the calculated SHA256 hash to locate any sent MMS files (received MMS files are initially stored only as .dat files).

The two scripts are available from my GitHub site and have been created/tested on Windows 7 PC running Python 2.7 using test data from a Nokia Lumia 530 running Windows Phone 8.10.

Background

The "store.vol" database is an ESE database located in Data:/USERS/WPCOMMSERVICES/APPDATA/Local/Unistore/ that contains 3 tables of interest for MMS:
- Message (contains metadata of sent/received MMS)
- Recipient (contains sent MMS phone numbers)
- Attachment (contains metadata about sent/received MMS attachments)

These tables have weird combination numeric/letter strings used for their column/field names (eg "0037001f").
Therefore, throughout this blog post we will use alternative monkey monikers (eg Size, Flag, Filename0) to keep things sane.

Here's an overview of how the various tables and .dat files fit together.  
Note: Due space constraints, this relationship diagram does NOT include every field from each table.

So, are you seeing anyone right now? Oh ... it's complicated?

Every MMS will have a "Message" table entry. If it's a sent MMS, there will be a corresponding "Recipient" table entry (containing the Destination Phone number and Timestamp3).
Every sent/received MMS Message will have at least 2 "Attachment" table entries - one for a "smil" message layout XML file and one for each attached file (eg one for each picture).
Filename aliases (eg "FOT1234.jpg") for attached Images/Videos/Text/Voicenotes/VCARDs will appear in both the "smil" .dat file and in the corresponding "Attachment" table entry.
Each attachment's content will be stored in a separate .dat file with a similar filesystem "Last Modified" timestamp to the "smil" layout .dat file. This "Last Modified" timestamp will approximately equal the relevant MMS's Timestamp2 value in the "Message" table.

We recommend using OSForensics "ESEDB Viewer" to view "store.vol" as it seems to have more reliable search functionality (and possibly shows more table columns) than NirSoft's ESEDatabaseView.
It's one thing to view "store.vol" using a dedicated viewer but to script a solution (independent of third party libraries) we have to look at how each field of interest is stored in the raw hex.

For the "Message" table there are 2 variations of MMS record - one for Sent MMS and one for Received MMS.
Each Received MMS "Message" record looks like:

Received MMS format
Where X represents a number of bytes that we don't really know/care about. All strings are null-terminated UTF-16-LE encoded. Timestamps are LE 8 byte integers representing the number of 100 ns intervals since 1 JAN 1601 (MS FILETIME). The string "IPM.MMS" is actually "x49x00x50x00x4Dx00x2Ex00x4Dx00x4Dx00x53x00x00x00" in hex.

Each Sent MMS "Message" record (contains no phone numbers) looks like:

Sent MMS format

The important "Message" fields are colour highlighted in the diagrams:
- Msgid (Unique id number for each "Message" entry.)
- Flag (Sent (33 decimal) / Unread(0) / Read(1). Draft MMS are stored using a different Message record format which contains "IPM.MSG" (instead of "IPM.MMS") and their Flag is set to 41 decimal. They will not be discussed further in this post.)
- Size (Total MMS size in bytes. This will help when there are multiple file attachments per MMS.)
- Timestamp2 (File system Last Modified time (approximate). For sent MMS, this corresponds to the time of creation/last update and not the time actually sent.)
- Timestamp3 (Sent/Received time common to both "Message" / "Recipient" tables.)
- Phone0/Phone1/Phone2/Phone3 (Phone number for received MMS. These tend to be set to the same value when present. Sent message phone numbers have to be obtained from the "Recipient" table.)

Each "Recipient" table record looks like:

Recipient format

Where X represents a number of bytes that we don't really know/care about. All strings are null-terminated UTF-16-LE encoded. Timestamps are LE 8 byte integers representing the number of 100 ns intervals since 1 JAN 1601 (MS FILETIME). The string "@.SMS" is actually "x40x01x53x00x4dx00x53x00x00x00" in hex.
Each "Recipient" table record should correspond to a sent SMS/MMS. Our ass-umption here is that it's impossible to simultaneously send both SMS and MMS with the same Timestamp3 value. So if we find a "Recipient" table entry which has a Timestamp3 value that also occurs in a "Message" table "Sent Record", we can ass-ume that it is a sent MMS. Sent SMS have an "IPM.SMStext" value set instead of "IPM.MMS" in the "Recipient" table.

The important "Recipient" fields are:
- Msgid (Unique id number for each "Message" entry.)
- Timestamp3 (Sent/Received time common to both "Message" / "Recipient" tables.)
- DestPhone (Destination Phone number string.)

Each "Attachment" table record looks like:

Attachment format

Where X represents a number of bytes that we don't really know/care about. All strings are null-terminated UTF-16-LE encoded.
Filename0 has been observed to start with "<cidText" or "<cidSmil" or "<cidImage" or "<cidVideo" or "<cidAudio" or "<cidVCard". It has also been seen in the form "<123>" - where "<123>" represents a 3 digit number that increases with each "Rowid". MMS using the "<123>" alias format also had the fixed alias value of "<0000>" for the "smil" layout files.
Draft SMS/MMS do not use the "<>" method of enclosing aliases so cannot be found in the same manner.

The important "Attachment" fields are:
- Msgid (Unique id number for each Message entry.)
- Size (Attached file's size in bytes. There can be multiple file attachments per MMS. Adding all the "Attachment" sizes for a given "Msgid" should equal the "Message" table's size for that same "Msgid".)
- Filename0 (Alias enclosed by "<" and ">" characters. We can use it to identify MMS attachment entries (eg "<cidImage_FOT1234.JPG>").)
- Filename1 (Alias used in "smil" files. eg "FOT1234.JPG".)
- Filename2 (If it's a sent MMS, this will be the actual source Filename (eg WP_20151203_001.JPG). If it's a received MMS, this is typically set to the same value as Filename1.)
- Filetype (Description of the type of file.) 

The Filetype values have been observed as:
- "application/smil" (for the MMS layout)
- "text/plain" (for the MMS text content)
- "image/jpeg" and "image/png" (for attached camera/screenshot images)
- "audio/amr" (for attached VoiceNotes)
- "video/mp4" (for attached camera videos)
- "text/x-vcard" (for attached Contacts)

It's been briefly mentioned above but the last piece of the banana is that MMS .dat files are stored under "Data:\SharedData\Comms\Unistore\data\7". One .dat file is required for each sent/received file. Additionally, every MMS has a "smil" XML layout file which lists an alias to the attached file(s) (eg <img src="FOT1234.JPG" region="Image"/>). We can also find that alias mentioned in a corresponding "Attachment" table entry for that MMS.

Here's an example "smil" .dat file:
<smil><head><layout><region id="Text" height="50%" width="100%" left="0" top="50%" fit="scroll"/><region id="Image" height="50%" width="100%" left="0" top="0" fit="meet"/></layout></head><body><par dur="5000ms"><img src="FOT1234.JPG" region="Image"/><text src="Text_0.txt" region="Text"/></par></body></smil>

The "text src" consistently seems to be set to "Text_0.txt". During testing we did not try sending multiple images in the same MMS with each image having it's own accompanying text. However, we suspect that each text entity would then get its own "text src" element and unique alias (eg "Text_0.txt", "Text_1.txt"). With our current understanding, determining the order of these multiple texts would be difficult but we can still retrieve the text content.

Note: "Data:\SharedData\Comms\Unistore\data\7" has been observed to also contain Draft MMS (text content) and Received Email attachment .dat files.
Attached files for Draft MMS (eg pics) do not appear to be stored in .dat files under that directory. Editing a Draft MMS should update the "Last Modified" filesystem time on the .dat files.

You might also have noticed that the constant 0x07000000 value appears in each of the above records. Coincidentally(?), 7 is also the rowid corresponding to the "SMS" row in the "Store" table. This kinda makes sense as both SMS and MMS are grouped together under the same Messaging menu on a Windows Phone 8 device.
It seems there is a "Store" table row for each potential store location on the phone (eg Outlook, SMS, ExternalStore, OneDrive etc).

Scripting and Testing

wp8-1-mms-filesort.py
Assuming the analyst has already exported the contents of "Data:\SharedData\Comms\Unistore\data\7" (eg using AccessData FTK Imager), we can write a Python script to create a clickable HTML table (sorted by "Last Modified" time) of .dat attachments. The script can also output the same data to a Tab Separated Variable (TSV) output file for importing into an analysis spreadsheet.

Here's the help for wp8-1-mms-filesort.py:
c:\Python27\python.exe wp8-1-mms-filesort.py
Running wp8-1-mms-filesort.py v2015-11-24

Usage:  wp8-1-mms-filesort.py -i inputfiledir -t output.tsv (Optional) -o output.html (Optional)

Options:
  -h, --help     show this help message and exit
  -i DIRNAME     Input Directory To Be Processed
  -t OUTPUTTSV   Output Tab Separated Variable (TSV) filename (Optional)
  -o OUTPUTHTML  Output HTML filename (Optional)

The script walks through each input directory and looks for filenames ending in "73701.dat". It then retrieves that file's "Last Modified" time and stores the filename and timestamp for later display.
For each of the stored filenames, the script reads the file contents and attempts to calculate what type of file it is, the file size, the SHA256 Hash of the file and if it's a "smil" file, it will list any file aliases used. It then prints the file information sorted chronologically by "Last Modified" time to the command line (SHA256 Hashes are NOT printed to command line) and/or HTML and/or TSV.
By grouping the .dat files by "Last Modified" time, it should make it easier to decide which .dat files belong together.

Here is some redacted example output:

c:\python27\python.exe wp8-1-mms-filesort.py -i 7 -t fsop.tsv -o fsop.html
Running wp8-1-mms-filesort.py v2015-11-24

Parsed 57 files

Mod. Timestamp  Filename        Size(bytes)     Type    Comments
... [REDACTED]
2015-12-01T04:02:08     7\c\40000002000000073701.dat    7302    AMR
2015-12-01T04:02:08     7\d\40000003000000073701.dat    186     <smil>  aud = P__4987.amr
2015-12-01T04:03:08     7\f\40000005000000073701.dat    893237  MP4
2015-12-01T04:03:08     7\g\40000006000000073701.dat    366     <smil>  video =P__CD04.mp4
2015-12-01T04:05:04     7\h\40000007000000073701.dat    104     VCARD
2015-12-01T04:05:04     7\i\40000008000000073701.dat    386     <smil>  VCARD present
2015-12-01T04:07:04     7\j\40000009000000073701.dat    366     <smil>  video =P__BC3E.mp4
2015-12-01T04:07:04     7\k\4000000a000000073701.dat    899422  MP4
2015-12-01T04:08:20     7\l\4000000b000000073701.dat    186     <smil>  aud = P__7C25.amr
2015-12-01T04:08:20     7\m\4000000c000000073701.dat    5702    AMR
2015-12-02T20:30:20     7\n\4000000d000000073701.dat    616     <smil>  img = FOTDAD6.JPG, text = Text_0.txt
2015-12-02T20:30:20     7\o\4000000e000000073701.dat    624866  JPEG
2015-12-02T20:30:20     7\p\4000000f000000073701.dat    50      Unknown
2015-12-02T22:26:42     7\b\50000001000000073701.dat    52      Unknown
2015-12-02T22:26:42     7\c\50000002000000073701.dat    616     <smil>  img = FOTEB94.jpg, text = Text_0.txt
2015-12-02T22:26:44     7\a\50000000000000073701.dat    571938  JPEG
2015-12-03T02:57:10     7\f\50000005000000073701.dat    38303   PNG
2015-12-03T02:57:10     7\g\50000006000000073701.dat    22      Unknown
2015-12-03T02:57:10     7\h\50000007000000073701.dat    616     <smil>  img = FOT348F.png, text = Text_0.txt
2015-12-03T02:59:48     7\i\50000008000000073701.dat    616     <smil>  img = FOT3070.png, text = Text_0.txt
2015-12-03T02:59:48     7\j\50000009000000073701.dat    48870   PNG
2015-12-03T02:59:48     7\k\5000000a000000073701.dat    18      Unknown

Note1: Type "Unknown" Types possibly indicate MMS message text files

Note2: Not all .dat files may belong to an MMS message (eg Received Email Attachments, Drafts)

Finished processing MMS .dat files ... Exiting ...

The corresponding output TSV looks like:

Mod. Timestamp    Filename    Size(bytes)    Type    SHA256 Hash    Comments
2015-12-01T04:02:08    7\c\40000002000000073701.dat    7302    AMR    CAE79982FCAD00B09707FB24FAB0D226E54E4B2DB85F926B34166B5DC7D3DBDD   
2015-12-01T04:02:08    7\d\40000003000000073701.dat    186    smil    30614CF3267AD6C53B714E528F474B61C47CD10058CCB5838ACA55BA7E69C5C0    aud = P__4987.amr
2015-12-01T04:03:08    7\f\40000005000000073701.dat    893237    MP4    DC8200239F601FA8B14783BD97DDCE55E094CEA5E23D01DD3F4832A6F87A7CE2   
2015-12-01T04:03:08    7\g\40000006000000073701.dat    366    smil    68A4069FB3AF78B3C045D830FCF03E964AE043E2276AD059C00A9BBFF30CEB76    video = P__CD04.mp4
2015-12-01T04:05:04    7\h\40000007000000073701.dat    104    VCARD    C1DF9E80CDCB9140D2FFC427B6896BB2E67EF8F989068E6F9CCAF367D947D29B   
2015-12-01T04:05:04    7\i\40000008000000073701.dat    386    smil    C56C8A12E8E5E2536161B6AF52C3A20F244603F31CDF637B86B6F4AA74087A91    VCARD present
2015-12-01T04:07:04    7\j\40000009000000073701.dat    366    smil    77AEC99255C3234D2B0A1EAA5DD935D4C694255C6F4A4C821FCF0C62AC43DCE9    video = P__BC3E.mp4
2015-12-01T04:07:04    7\k\4000000a000000073701.dat    899422    MP4    D2DA77B80DC689460829C6B0F554BC0778348EA149DED1242E80787F005958D3   
2015-12-01T04:08:20    7\l\4000000b000000073701.dat    186    smil    2EF5F8BEBC9C03665DB666539F1F9CB595CF543C541537E8EB104A81C60659AE    aud = P__7C25.amr
2015-12-01T04:08:20    7\m\4000000c000000073701.dat    5702    AMR    50AD1A3004E4C1541F507B38DB867A9186237612FF312FDF7E2BE4370617F3E6   
2015-12-02T20:30:20    7\n\4000000d000000073701.dat    616    smil    AA2382E8BC8DEA7A7176F21C5FD58C4E52E54608404DE437B1C3299AF0206192    img = FOTDAD6.JPG, text = Text_0.txt
2015-12-02T20:30:20    7\o\4000000e000000073701.dat    624866    JPEG    4BA420B740F5C8DEF862690566195CE620C2495406CDD8383362713A2A38E562   
2015-12-02T20:30:20    7\p\4000000f000000073701.dat    50    Unknown    E2E39EA85A25B1C02FB287049F1179B51DED480B99896B4E8E70F773BD2D7158   
2015-12-02T22:26:42    7\b\50000001000000073701.dat    52    Unknown    7DE34A123DA5F220CD6346F8CE852036C255A4FE4C81C130EB465AC6378193C0   
2015-12-02T22:26:42    7\c\50000002000000073701.dat    616    smil    144753D485039158AB10A06D86A6D421983EC12D708E992A8DA408B079917020    img = FOTEB94.jpg, text = Text_0.txt
2015-12-02T22:26:44    7\a\50000000000000073701.dat    571938    JPEG    139A1E26E6D317F1BBC1C36A7B98AFD3875351B6254B4AEB336F684841F23759   
2015-12-03T02:57:10    7\f\50000005000000073701.dat    38303    PNG    D4DAA620FCF32FA0198CAE17AD55B3F847243617F034C4A0BB88356853C77662   
2015-12-03T02:57:10    7\g\50000006000000073701.dat    22    Unknown    1BDA3B8EF697CC23E5299068695114F67F7B03C0DD7A4BFEB72F553088AB4009   
2015-12-03T02:57:10    7\h\50000007000000073701.dat    616    smil    D28199571A7C838C52206E4DA69D4C2A02A44B851B57DF739DBE2E35F8B6C696    img = FOT348F.png, text = Text_0.txt
2015-12-03T02:59:48    7\i\50000008000000073701.dat    616    smil    4E2D452A0190125D1D44062DEB300B3E00FC2FD0F33E55392A12CCFC3925188E    img = FOT3070.png, text = Text_0.txt
2015-12-03T02:59:48    7\j\50000009000000073701.dat    48870    PNG    3360CD0D57820536D86841DCA94453049B6C4B3D95A3D3B25B91B4CD5ABD50E7   
2015-12-03T02:59:48    7\k\5000000a000000073701.dat    18    Unknown    094D1EA6F14A289EC08AF1F85CE1B55A2D95937915A67061FFA85DFCD4284B37   

Note1: "Unknown" Types possibly indicate MMS message text files
Note2: Not all .dat files may belong to an MMS message (eg Received Email Attachments, Drafts)

Apologies for the Blogger formatting - this is why we didn't print the SHA256 hash to the command line!
And here's what the corresponding HTML output looks like in a web browser:

HTML Output from "wp8-1-mms-filesort.py"

When we click on an HTML file link, we can view that attachment more readily. However, for the HTML links to work, the HTML file must be in the same directory as the extracted "7" directory.

In this example HTML output, we have the .dat files from a sent MMS containing two pictures (saved from the web) and a text set to "Funny? Don't remember that".

Sent MMS with text and 2 pictures
Note: The different timestamp values for .dat files from the same MMS.

When we click on the first row's file link (not circled) and open it via Notepad we see the text:
Funny? Don't remember that

Clicking on the "smil" file's link (circled in orange) and opening it via Notepad looks like:
<smil><head><layout><region id="Text" height="50%" width="100%" left="0" top="50%" fit="scroll"/><region id="Image" height="50%" width="100%" left="0" top="0" fit="meet"/></layout></head><body><par dur="5000ms"><img src="FOT66CB.jpg" region="Image"/></par><par dur="5000ms"><img src="FOT9241.jpg" region="Image"/><text src="Text_0.txt" region="Text"/></par></body></smil>

This tells us that there are 2 images (FOT66CB.jpg, FOT9241.jpg) and a text message associated with this MMS. To determine if this MMS is sent/received and/or the sent/received time and/or the phone number, we can process "store.vol" using the second script (wp8-1-mms.py).

Clicking on the JPEG link (circled in green) displays this picture in the browser:

Indeed!

Clicking on the JPEG link (circled in red) displays this picture in the browser:

Do you Mind?!

Ah, what's the point in blogging if we can't share pictures of toilet trained monkeys?!
Ahem, continuing on ...

wp8-1-mms.py
Assuming the analyst has already exported "Data:/USERS/WPCOMMSERVICES/APPDATA/Local/Unistore/store.vol", we can write another script (wp8-1-mms.py) to list MMS information for correlation with the output of wp8-1-mms-filesort.py. Essentially, wp8-1-mms.py prints out the information for every MMS attachment (sorted by Timestamp2).

Here's the help for wp8-1-mms.py:

c:\Python27\python.exe wp8-1-mms.py
Running wp8-1-mms.py v2015-11-14

Usage:  wp8-1-mms.py -s store.vol -o output.tsv(Optional)

Options:
  -h, --help         show this help message and exit
  -s STOREFILE       store.vol file
  -o OUTPUTFILENAME  Output Tab Separated Variable filename (Optional)

Finding all MMS attachments requires searching "store.vol" for a slight variation of the Filename1 alias used in the "smil" file (eg "<cdImage_FOT1234.jpg>" instead of "FOT1234.jpg", "<123>" instead of "cid:123").
Note: It is thought that the "cid:123" style of alias operates in a similar manner to the "FOT1234.jpg" alias but we have not been able to generate the "cid:123" alias during testing.
Fortunately, we can generalise this to a search for a group of letters starting with "<cid" and/or a group of digits enclosed by "<" and ">". Once we have found a list of these hits, we can retrieve that "Attachment" row's values (Filename0, Filename1, Filename2, Type, Size) and store them in a dictionary (creatively called "attachments") keyed by "Msgid" for later use.

Similarly, we can find all "Recipient" table entries by searching for the "@.SMS" pattern mentioned earlier. We store the retrieved row data (Timestamp3, DestPhone) in another dictionary (called "recipients") keyed by "Msgid" for later use.

Next, we search for "Message" table entries by searching for the "IPM.MMS" pattern and store the retrieved data (Timestamp3, Timestamp2, Phone, Flag, Size) in a dictionary (called "mmsdict") keyed by "Msgid". If we don't find a Phone number in the "Message" record (ie Sent MMS), we use the retrieved "Msgid" to obtain the "DestPhone" number from the "recipients" dictionary populated earlier.

Finally, we can iterate through the "mmsdict" for each "Msgid" and retrieve the corresponding "attachments" info - there will usually be multiple attachments for each MMS dict entry.
The output will be printed to the command line (looks a bit cramped) and/or TSV file (optional).

Here's a redacted sample of the wp8-1-mms.py command line output. For brevity, we've edited the output to highlight the corporate fatcat/toilet monkey MMS message sent earlier. The command line output is pretty cramped so we recommend outputting to TSV.

c:\python27\python.exe wp8-1-mms.py -s store.vol -o store-op.tsv
Running wp8-1-mms.py v2015-11-14

Opening store.vol...

Processing Attachment table ...
57 Attachment "<cid" hits found in store.vol

0 Attachment "<d+>" hits found in store.vol

57 Total Attachment hits found in store.vol

Attachment ASIZE ERROR! at offset 0x12004c ... skipping this hit

Attachments sorted by msgid ...
===================================
[REDACTED]
No. Attachments = 4
msgid = 48 : (u'<cidText_0>', u'Text_0.txt', u'Text_0.txt', u'text/plain', 52)
msgid = 48 : (u'<cidSmil>', u'Smil.txt', u'Smil.txt', u'application/smil', 742)
msgid = 48 : (u'<cidImage_FOT66CB.jpg>', u'FOT66CB.jpg', u'Monkey throwing poo by SheriffBean on DeviantArt.jpg', u'image/jpeg', 37397)
msgid = 48 : (u'<cidImage_FOT9241.jpg>', u'FOT9241.jpg', u'Lolcats Funny Pictures Of Cats With Captions.jpg', u'image/jpeg', 34529)

[REDACTED]

Processed/Stored 56 out of 57 Attachment hits

Processing Recipient table ...
55 Recipient hits found in store.vol

[REDACTED]

Recipients sorted by msgid ...
===================================
[REDACTED]
msgid = 42 : ('2015-11-11T19:36:25', u'+12345678900')
msgid = 48 : ('2015-11-15T21:47:53', u'+12345678900')
msgid = 49 : ('2015-11-15T21:51:32', u'+12345678900')

[REDACTED]

Processed/Stored 54 out of 55 Recipient hits

Processing Message table ...
19 IPM.MMS Message hits found in store.vol

MMS sorted by msgid ...
===================================
[REDACTED]
msgid = 48 : ('2015-11-15T21:47:53', '2015-11-15T21:47:52', u'+12345678900', 33, 72720)

[REDACTED]

Processed/Stored 19 out of 19 Message hits

Printing finalized table sorted by Timestamp2 ...
===================================================
Timestamp2  Msgid   Timestamp3  Phone   Flag    TotalSize   Type    Filesize    Filename0   Filename1   Filename2
[REDACTED]
2015-11-15T21:47:52 48  2015-11-15T21:47:53 +12345678900    33  72720   text/plain  52  <cidText_0> Text_0.txt  Text_0.txt

2015-11-15T21:47:52 48  2015-11-15T21:47:53 +12345678900    33  72720   application/smil    742 <cidSmil>   Smil.txt    Smil.txt

2015-11-15T21:47:52 48  2015-11-15T21:47:53 +12345678900    33  72720   image/jpeg  37397   <cidImage_FOT66CB.jpg>  FOT66CB.jpg Monkey throwing poo by SheriffBean on DeviantArt.jpg

2015-11-15T21:47:52 48  2015-11-15T21:47:53 +12345678900    33  72720   image/jpeg  34529   <cidImage_FOT9241.jpg>  FOT9241.jpg Lolcats Funny Pictures Of Cats With Captions.jpg

[REDACTED]
Finished processing store.vol ... Exiting ...

Looking at the "finalized table" results at the end, we can see that each attachment's Timestamp2 values are all equal to 2015-11-15T21:47:52. This does not correspond with our earlier results from wp8-1-mms-filesort.py (where the JPEG files had different "Last Modified" times to the "smil" and text files). Perhaps the Timestamp2 values in store.vol were written before the actual .dat files were written around the 52-53 second boundary?
Anyway, this shows that there can be some discrepancy between the .dat files "Last Modified" times and the Timestamp2 values recorded in "store.vol". Hence the need for a meat based decision maker!
We can also see that we can use the Msgid (eg 48) to group common MMS attachments together.
Timestamp3 is a common value to both "Message" and "Recipient" tables and seems to be the timestamp quoted for sent/received MMS by commercial forensic tools.
The Type value of 33 indicates that this is Sent MMS.
The TotalSize figure of 72720 equals the sum of each attachment's Filesize (52 + 742 + 37397 + 34529 = 72720).
Filename1 is the alias used for each attached file in the "smil" layout file (eg FOT66CB.jpg, FOT9241.jpg).
We can see that for saved pictures which were then sent via MMS, the original filename appears in Filename2 (eg "Monkey throwing poo by SheriffBean on DeviantArt.jpg").
By using the Filename1 alias, Timestamps and individual file sizes, we should be able to match these store.vol results with the previous .dat results.
That is, there are 4 files listed from store.vol (Text_0.txt, Smil.txt, FOT66CB.jpg, FOT9241.jpg) which have corresponding matches in the HTML output table produced by wp8-1-mms-filesort.py.

For a given attachment, you might not find a file with the Filename2 name on a Windows Phone 8.10 device/SD card as it could be a received file and the Filename2 string was actually sourced from the sender.
In this case, the file will only exist as a .dat file (assuming it was not re-saved locally after reception).
Alternatively, after sending an MMS attachment, the user may delete it from the phone or it might have been sourced from an SD card. For a file stored on an SD card/another device, we should be able use the SHA256 hash calculated from the relevant .dat file to help confirm the external file's identity.

And here is the more conveniently formatted TSV output file contents from wp8-1-mms.py ...

Timestamp2    Msgid    Timestamp3    Phone    Flag    TotalSize    Type    Filesize    Filename0    Filename1    Filename2
[REDACTED]
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    text/plain    52    <cidText_0>    Text_0.txt    Text_0.txt
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    application/smil    742    <cidSmil>    Smil.txt    Smil.txt
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    image/jpeg    37397    <cidImage_FOT66CB.jpg>    FOT66CB.jpg    Monkey throwing poo by SheriffBean on DeviantArt.jpg
2015-11-15T21:47:52    48    2015-11-15T21:47:53    +12345678900    33    72720    image/jpeg    34529    <cidImage_FOT9241.jpg>    FOT9241.jpg    Lolcats Funny Pictures Of Cats With Captions.jpg

Blogger formatting strikes again! Basically the TSV output is the same as the command line output but its easier to import into a spreadsheet.

For shiggles, the wp8-1-mms.py script was run against a Lumia 520 / Windows Phone 8.0 store.vol but it seems that the store.vol offsets used are different and so the script did not work as intended.
It is ass-umed this script will work with other Windows Phone 8.10 devices however because Monkey has to believe he didn't waste his time with some one-off scripts ... grrr!

Also please note that on Windows Phone 8 devices, the displayed Messaging timestamps do not list the seconds (only hours/minutes). So while we can say that Timestamp3 is accurate to the minute, we cannot definitively claim its accuracy in seconds.

Some Final Thoughts

The wp8-1-mms.py store.vol script relies on searching the table records for unique markers (eg "IPM.MMS", "@.SMS", "<cid") and then reading/storing the surrounding field values. That's why we can extract data without knowing the entire structure of each database record.
We could have used a third party library to query the store.vol database directly but this means users would have to install that library on their analysis PCs (which are usually isolated from the Internet). For those interested, Jon Glass has blogged about Python coding using Joachim Metz's libesedb and Alberto Solino's Impacket ESE libraries here.

Be aware there may be more .dat files than MMS attachment entries in "Data:\SharedData\Comms\Unistore\data\7" (eg Received email attachments and/or Draft text). Not every .dat file may have a corresponding MMS.

GPS and other EXIF metadata can be present on sent/received images/videos (depending on the sending phone's settings and/or if the original file had embedded metadata). This can help an analyst decide if a picture was originally taken with the target device and/or the time/location of the picture.

This post only looked at allocated MMS - deleted MMS is an area for further research. Upon MMS deletion, "Message" / "Attachment" / "Recipient" entries should be deleted/overwritten from "store.vol". However, ESE .log transaction files and/or pagefile.sys may still contain enough information from deleted MMS to recreate the "store.vol" records. Recovering the .dat file content and linking it to a MMS transaction would be more complicated however.

Sending a Location (via Messaging) does not utilize the "Attachment" table so these cannot be retrieved by the scripts in this post.
For Location messaging, an "IPM.SMStext" record entry is made in the "Message" table and it has a Windows Phone URL string in the content column ("0037001f"). Analysts can browse to that URL and view a map centered on the sent Location. This map also has a timestamp and what appears to be an accuracy radius.
The URL format looks like:
http://www.windowsphone.com/l1/ZZZ

Where ZZZ = 13 character random code. Both of our test samples had 13 character codes starting with "CI" but your mileage may vary.
Theoretically, it should be possible to scan a store.vol (or pagefile.sys) for these types of URL and if found, they can tie a device to a specific location and time (assuming you can prove they were sent SMS).

And so finishes another glorious Windows Phone 8.10 post! For some reason, the phrases "One trick monkey" and "One off scripts" seem to be reverberating in monkey's caffeine addled brain ... Good! ;)



Monday, 5 October 2015

Finding Geo

Monkey, just keep swimming through the WinPhone data ... ya clown!

UPDATE 6OCT2015: Edited FindMyPhone and Multimedia sections + added suspected main Location setting Registry location.

A couple of recent cases had this monkey investigating how Windows Phone 8.10 stores geolocation data on a Nokia Lumia 530.
There does not appear much forensic documentation regarding this, so this post is going to be a pretty voluminous / potentially narcoleptic episode of squirrel chasing (without any neat scripts to run at the end either). Despite the length of this post, monkey gets the sneaking suspicion that there is more to be discovered. I guess we have to start somewhere ...

Carrier-locked versions of the 530 can be picked up for as little as $50 in Monkeytown so they could be more popular than you'd initially suspect. They also come bundled with Windows Phone 8.10 by default.
Downloading of the 4 GB capacity devices was done via eMMC read using the Z3X-Pro flasher box and took approximately 90 minutes per device. Note: The soldering points for these are not for the banana fisted. The points are so tiny that this monkey needed his big boy soldering pants AND special adult assistance (Thanks Boss Rob!).

For the 530, there are 3 potential storage areas for geolocation data – the Partition 26 (P26) "MainOS" NTFS partition, the Partition 27 (P27) "Data" NTFS partition and the removable SD Card.
The test data came from 2 well used 530 devices (Devices A and B), a factory fresh 530 (Device C) and a simulated test device (Device D). While all four were 8.10 (MajorVersion.MinorVersion) devices, their SYSTEM hive's \Versions\BuildNumber values were not all the same (probably due to being configured for different service providers).
An X-Ways Forensics (XWF) "simultaneous search" for likely geolocation keywords was initially performed (eg "latitude", "lat", "GPS", "GNSS", "degree"). Subsequently, a regex search was also performed using some likely latitude regular expressions (eg1 "-1[2-5]." for "-12." to "-15.". eg2 "-1[2-5]°" for "-12°" to "-15°").
Tip: To type the degree symbol, you hold down the Alt key as you type 248 on the numeric keypad (with numlock ON).
For accuracy purposes, it might also help to know that 1 nautical mile (approximately 1852 m) is 1 minute of a degree (there are 60 minutes to a degree).
Thanks to a Brian Moran (@brianjmoran) tip, we found some information on commonly used latitude/longitude formats here.
To simplify the number of searches, it was also assumed that any textual latitudes will have the corresponding longitude close by. Ideally, there will also be an associated timestamp so we can say that at a certain time, the device was at location X. The XWF regex search technique should find any plain text (UTF8, UTF16-LE) strings which contain relevant latitude information. However, it will not find any latitudes stored as binary data (ie floats/doubles).

The main Location setting (for this phone model) is suspected to be at:
P26:\Windows\system32\config\SOFTWARE\OEM\Nokia\GPS\LocationService which was set to 1 when the main Location setting was enabled and set to 0 when disabled. We're not sure if this is the direct cause or a secondary result of the Location setting. Presumably, this location will vary with non-Nokia devices.

The proliferation of web based location/map services and the availability/storage capacity of P27:/pagefile.sys (which contains the swapped out contents of RAM) has resulted in a large number of readable lat/long pairs. Connecting a lat/long pair to specific phone functionality and/or proving a user’s direct knowledge was/remains a challenge.

To assist with this we have organized the lat/long information in this post into the following categories:
- ObservationLogWP8 (crowd sourced location logs)
- FindMyPhone (location tracking of device)
- Default Internet Explorer Browser
- Cortana (personal assistant)
- Multimedia metadata (from device pictures/video)
- WP8 Application data
- Registry 

ObservationLogWP8 (crowd sourced location logs)

This appears to be related to Microsoft’s crowd-sourcing efforts to survey/report WiFi and cell tower information for device location (see here and here). Various UTF8 and UTF16-LE encoded XML fields belonging to a parent "ObservationLogWP8" XML element were found in Device A’s P27:/pagefile.sys. These fields included a timestamp, location, WiFi and Cell Tower signal information. Not all instances of these had latitude/longitude information, some only had timestamped WiFi and cell information (they may have been related to requests for location).
The string "ObservationLogWP8" was also found in a LocationCrowdsource.dll. The LocationCrowdsource.dll also existed in a Nokia 520 running Windows Phone 8.0 suggesting this functionality is not new to Windows Phone 8.10.
From the dates found in test data, it is suspected the initial phone setup is one event that triggers this data being recorded. According to the FAQ mentioned previously, enabled apps can also request the device location which could result in ObservationLogWP8 data being generated.

One of the more complete "ObservationLogWP8" XML elements (from P27:\pagefile.sys) looked like:
<Env Version="1.0"><Body Type="ObservationLogWP8"><LocationData><RequestHeader><Timestamp>2015-12-31T01:23:45.678+12:34</Timestamp><Authorization /><TrackingId>378130cb-e97f-4558-a3ac-123456789ABC </TrackingId><ApplicationId>d002970e-345b-409f-9e22-b360eb83f641</ApplicationId><DeviceProfile ExtendedDeviceInfo="NOKIA/Lumia 530" OSVersion="8.10.14234.WPB_CXE_R1(wpbldlab).20150123-1722" LFVersion="2.0" Platform="" ClientGuid="00000000-0000-0000-0000-000000000000" DeviceType="WP8" DeviceId="d002970e-345b-409f-9e22-123456789ABC" /></RequestHeader><LocationStamps><LocationStamp ts="2015-12-31T01:23:45.678+12:34"><Loc la="-XX.12345" lo="YYY.12345" al="12.00000" spd="5.25000" hed="180.50000" hacc="3" hdop="0.80000" vdop="0.80000" herralong="2" haxis="2" /><CellTowers ts="2015-12-31T01:23:45.678+12:34"><Umts7 mcc="505" mnc="1" lac="12345" ucid="123456789" uarfcn="1234" psc="12" rscp="-100" ecno="-11" /></CellTowers><WifiPoints ts="2015-12-31T01:23:45.678+12:34"><Wifi7 bssid="AA:BB:CC:11:22:33" rssi="-95" /><Wifi7 bssid="AA:BB:CC:11:22:33" rssi="-86" /><Wifi7 bssid="AA:BB:CC:11:22:33" rssi="-95" /> /></WifiPoints></LocationStamp></LocationStamps></LocationData></Body></Env>

Note1: The mcc = country id and mnc = carrier id.
Note2: Note the ApplicationId GUID field in the data which could potentially connect an app to this location data.

For privacy reasons, the above example had the following information modified:
-    Timestamps
-    Various GUIDs (except ApplicationId and ClientGuid)
-    bssid (MAC address of WiFi access points)
-    CellTowers
-    Location/Movement information

If the main Location setting is OFF, ObservationLogWP8 data should not be written (in theory). This was possibly observed when we found that some devices contained hits for "ObservationLogWP8" in LocationCrowdsource.dll but no ObservationLogWP8 XML elements. Also, if an individual application does not have it's Location capability enabled, the ObservationLogWP8 data should not be present for that app.
From the Microsoft "Personal Wi-Fi Access Point Opt-Out" section here:
"If you have a Wi-Fi access point or router and you wish to exclude it from Microsoft’s location positioning database ...  you can submit the MAC address to Microsoft’s block list"
This means by default (with a Location enabled device), the device gets to snoop around and map any/all WiFi access points it can. So you'd expect to see more of these ObservationLogWP8 XML instances, the longer a phone is used (assuming the phone moves around).

FindMyPhone (location tracking of device)

Windows Phone 8.10 has the capability to ring/lock/erase/locate a registered device from this website. This capability is not enabled by default and requires the user register their device and phone number with their Microsoft account. The FindMyPhone settings menu has a checkbox for "Save my phone’s location periodically and before the battery runs out to make it easier to find". This is OFF by default. According to the (March 2014) Windows Phone 8 Privacy Statement:
"the location of your phone will be sent periodically to your online account at the My Windows Phone page". It "only stores the last known location of your phone. When a new location is sent, it replaces the previously stored location".
There were various FindMyPhone P26:\Windows\system32\config\SOFTWARE registry entries, FindMyPhone DLL libraries and a FindMyPhone executable present in our devices. The FindMyPhoneRuntimeDll.dll contains what appears to be a print format string for a "MyPhoneOperation" data element. Searching for "MyPhoneOperation" returned hits in Device A at  
P27: \Users\WPNETWORK\APPDATA\Local\dcpsvc\StagingFiles\CssV1_1\*FOLDER_GUID*\*FILENAME_GUID*.csd - which contained a timestamp, Latitude, Longitude and possibly Altitude information stored in a .csd file. We are unsure of the significance of the GUIDs in the directory and file names as they varied between phones. For Device A (with FindMyPhone enabled), there were multiple directories and .csd files with two .csd files containing geolocation information. For Device D (with FindMyPhone enabled), there were multiple directories and .csd files but only one .csd contained geolocation information.
Devices B and C did not store this information potentially indicating that the FindMyPhone functionality was not enabled on those devices. The found XML string looked like:
<MyPhoneOperation>    <UpdateLocation>    <Location>(2015-12-31 01:23:45Z) -XX.123456,YYY.1234,ZZ.000000</Location>    </UpdateLocation>    </MyPhoneOperation>

Where XX is latitude and YYY is longitude. ZZ is suspected to be altitude. For Device A, the lat/long listed was about 400m W of the suspected location so the third parameter is probably not accuracy. Note the differences in precision (decimal places) between latitude and longitude. So if your device does contain a FindMyPhone location, the accuracy of the position can vary.

Two configurable FindMyPhone Registry settings are located at:
P26:\Windows\system32\config\SOFTWARE\Microsoft\Settings\FindMyPhone\LocationSyncEnabled
(which was set to 1 when "Save my phone's location periodically and before the battery runs out to make it easier to find" is checked. 0 if unchecked (by default)) and
P26:\Windows\system32\config\SOFTWARE\Microsoft\Settings\FindMyPhone\MpnEnabled
(which was set to 1 when "Always use push notifications (not SMS) to send commands and apps to my phone" is checked. 0 if unchecked (by default)).
There were also multiple hits for "FindMyPhone" in various GUID named sub-keys under:
P26:\Windows\system32\config\SOFTWARE\Microsoft\WPTaskScheduler\{*GUID1*}
which suggests the regular running of a process/processes to report back the device’s current location. For example, the "Schedule" entry value under the GUID sub-key contains the strings "ProcessFindMyPhoneCommand", "c:\Programs\FindMyPhone\ShellCommandDispatcher.exe ProcessFindMyPhoneCommand". This was also present on a factory fresh install.
There was also a "Schedule" entry under P26:\Windows\system32\config\SOFTWARE\Microsoft\WPTaskScheduler\{*GUID2*} which contained the strings "FMPLastLocationSyncSchedule", "c:\Programs\FindMyPhone\ShellCommandDispatcher.exe SyncLocation". This one was NOT present on a factory fresh install with FindMyPhone disabled.

Default Internet Explorer Browser

The default Internet Explorer browser provides some potential geolocation data via cookies, webcache and the "GetLocationUsingFingerprintResponse" browser function. The P26:\Windows\system32\config\SOFTWARE\Microsoft\Internet Explorer\Version registry entry value was set to 11.0.0.0 for all test devices.
The browser also has an "Allow access to my location" setting which should affect how much location information is stored/accessed.
Cookies which contained (URL/percent encoded) textual latitude and longitude information were located in randomly named .txt files in P27: \Users\DefApps\APPDATA\INTERNETEXPLORER\INetCookies. The availability and format will vary depending on the website requesting the user’s location and the device's Location settings. Using ESEDatabaseview from NirSoft to view the P27:\Users\DefApps\APPDATA\Local\Microsoft\Windows\WebCache\WebCacheV01.dat tables (eg table "Container_21") can help link a URL to the randomly named cookie file.
URLs with latitude/longitude information were also found in Webcache log files such as P27: \Users\DefApps\APPDATA\Local\Microsoft\Windows\WebCache\V01tmp.log
For example, a (redacted) GoogleMaps directions URL:
http://maps.googleapis.com/maps/api/directions/json?origin=-XX.1234567891234,YYY.123456789123&waypoints=&destination=-XX.1234,YYY.123&mode=d&units=metric&language=en&sensor=true

Note1: XX = latitude and YYY = longitude.
Note2: Notice the precision (number of decimal places) of the latitude differs from the longitude. Similar entries were also observed in WebcacheV01.dat which makes sense if the .log files are being used as transaction logs for the WebcacheV01.dat ESE database.

The "GetLocationUsingFingerprintResponse" browser function appears to be used by Internet Explorer to submit various values (eg cell tower/WiFi signal strengths) to a web based service. The service then sends back the (estimated) device location. Various hits for "GetLocationUsingFingerprintResponse" with close by Latitude, Longitude, Altitude, ServerUtcTime and RadialUncertainty were found in P27:\pagefile.sys.
For example:
<GetLocationUsingFingerprintResponse xmlns="http://inference.location.live.com" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><GetLocationUsingFingerprintResult><ResponseStatus>Success</ResponseStatus><LocationResult><ResolverStatus Status="Success" Source="Internal"/><ResolvedPosition Latitude="-XX.123456" Longitude="YYY.123456" Altitude="0"/><RadialUncertainty>2246</RadialUncertainty><TileResult/><TrackingId>7e8381e9-106c-45f2-8af9-123456789ABC</TrackingId></LocationResult><ExtendedV21Result CrowdSourcingLevel="High" ServerUtcTime="2015-12-31T01:23:45.1234567Z" CollectionType="Wifi|Cell" InferenceType="Wifi|Cell"/></GetLocationUsingFingerprintResult></GetLocationUsingFingerprintResponse>
For further information on "GetLocationUsingFingerprintResponse", see the "Post Script" section in Rudy Huyn’s blog post (in French but Chrome translates it OK-ish).
Also see Chad Tilbury’s (@chadtilbury) series of posts on browser geolocation forensics here.

Cortana (personal assistant)

The Alpha version of Cortana comes installed with the 530. Search for "What does the fox say?" or "Who is Siri?" and prepare to be semi-amused. It appears Cortana requires an active Internet connection so it may not always be enabled by the user if they're on a cheap pre-paid plan. If not enabled, searches are supposed to use Bing instead. Cortana can be voice activated and location aware so you can set it to remind you "When I get home, remind me to spank the monkey". Such behaviour (the reminding, not the spanking) means there could be a register of locations which may help prove the user had prior knowledge of a location.

Most of the information in this section was sourced from Brent Muir's (@bsmuir) recent Windows 10 forensics post here. Not all of the functionality mentioned in that post was applicable to Windows Phone 8.10 but it was still a great resource to have (Thanks to Brent for sharing!).

P27:\Users\WPNETWORK\APPDATA\Local\Graph\WP8LoggedInUser\Me\00000000.ttl
contained various Latitude and Longitude strings (preceded by the string "place."). According to Brent, these types of file contain searched/favourite locations. Two other data sets had .ttl files but they did not contain any latitude/longitude data. It is likely that the recorded position data is dependent on the main Location setting being ON.
An example 00000000.ttl entry looked like:
p:me/place.-XX.123456_YYY.123456 <http://platform.bing.com/dateAccessed> "1601-01-01T00:00:00.000Z"^^s:Date ;
  a s:Place ;
  b:preferences/favorites.businessType "" ;
  b:preferences/favorites.entityId "-XX.123456_YYY.123456" ;
  b:preferences/favorites.entityType "http://schema.org/Place" ;
  b:preferences/favorites.entityUrl "local_vdpid:\"-7995539123\"" ;
  b:preferences/favorites.isBusiness false ;
  b:preferences/favorites.originalName "Some Place Banana-ish" ;
  s:ShowOnMap true ;
  s:address p:me/postalAddress.-XX.123456_YYY.123456 ;
  s:dateModified "2015-12-31T01:23:45.123Z"^^s:Date ;
  s:displayCoordinates "Point(-XX.123456 YYY.123456)"^^geo:wktLiteral ;
  s:geo "Point(0.000000 0.000000)"^^geo:wktLiteral ;
  s:mapZoomLevel 18 ;
  s:name "Some Place Banana-ish" ;
  s:phone "" .
Note: The "dateAccessed" value above has NOT been modified from the original value.

There were other instances of latitude/longitude recorded in this .ttl file but they did not have a timestamp. For example:
p:me/list.recent. -XX.123456_YYY.123456 a rdf:List ;
  rdf:first p:me/place.-XX.123456_YYY.123456 ;
  s:displayOrder 25 ;
  s:memberOf p:me/geoAnnotationCollection.recent .
which appears to be a recent search location.

Similarly, there were also latitude/longitude pairs observed in P27:/pagefile.sys preceded by "place.". For example:
<http://platform.bing.com/persons/me/place. -XX.123456 YYY.123456>
It is unknown if/how this example can be correlated to the previous Cortana .ttl examples but they do look similar.

Grammar files are used by speech recognition to define input speech words/phrases. They could also be used to indicate prior knowledge of a location. For more information, refer to the MSDN reference on "Grammars for Windows Phone 8" here.
Location data was found in Device A grammar files located at P27:\SharedData\Speech\Grammars\0809\PointsOfInterestGrammar.cfp.txt and P27:\SharedData\Speech\Grammars\0809\PointsOfInterest2Grammar.cfp.txt
These grammar files did not exist in Device C (factory fresh phone).
The grammar files contained entries like:
"Monkeytown, BananaState":, http://platform.bing.com/persons/me/place. -XX.123456 YYY.123456”
"home":, http://platform.bing.com/persons/me/place. -XX.123456 YYY.123456
which was followed by a timestamp in the format:
31/12/2015 1:05:45 PM
This timestamp matched the file system File Created date (in UTC) for the parent .txt file.
Unlike Windows 10 for PC, there was no IndexedDB.edb or CortanaCoreDb.dat (geofence) databases found in our test data. However, this may be due to our test data not being setup with any geofences.

Multimedia metadata (from device pictures/video)

Our device created multimedia's file location, naming convention and metadata content was consistent with Det. Cindy Murphy et al's SANS whitepaper on Windows Phone 8. That is, camera created media was found in the phone’s internal memory at P27:\Users\Public\Pictures\Camera Roll.
Additionally, under Device A’s SD Card’s \Pictures\Camera Roll\ directory there were various .mp4 videos whose filename started with WP and a datestamp (eg WP_20150101_001.mp4). Upon further inspection in XWF, there was "Xtra" metadata embedded in each video file which included an ISO timestamp (eg 2015-01-01T01:02:03Z) and location (eg -XX.3456+YYY.4567). This information was also visible using Phil Harvey’s ExifTool.
Also stored under the SD card's \Pictures\Camera Roll\ were various .jpg files whose filename started with WP and a datestamp (eg WP_20150101_002.jpg). Upon further inspection in XWF, there was "EXIF" metadata embedded in each file which included the Phone model (eg "Lumia 530"), Date Original & Date Digitized timestamps (eg 2015:01:01 01:02:03), Latitude (eg 12° 34’ 5.678” S) and Longitude (eg 123° 45’ 5.678” E).
Photos from a different 530 device (media stored in internal phone memory), had similar EXIF data but did not contain GPS Latitude/Longitude information. This is possibly because the user did not enable the main Location setting and/or they did not enable the "Use Location info" in the "Photos" settings.
The following SOFTWARE hive entry value is suspected of configuring camera pictures/video with embedded location data:
P26:\Windows\system32\config\SOFTWARE\Microsoft\Photos\Shared\CameraSettings\EmbedLocation
This entry value was set to 2 for device multimedia files with embedded GPS metadata and it was set to 1 when the "Photos" ... "Use location info" setting was unchecked (no embedded GPS metadata). Presumably, the GPS data in EXIF also requires the main Location setting being enabled.

WP8 Application data

According to Microsoft, there are two location API libraries that Windows Phone 8.1 developers can use – the .NET Location API (for Windows Phone 7.1 and 8) and the Windows Phone Runtime Location API (new to Windows Phone 8 and 8.1).
The .NET Location API uses the System.Device.dll so whenever you see "System.Device.Location" you know the app was using the .NET Location API. See here for further details.
Alternatively, the MSDN Windows Phone Runtime uses the "Windows.Devices.Geolocation" namespace so if you see that string near latitude/longitude information, the Windows Phone Location API is potentially being used.
Both of these libraries are designed so the app can call a "Get Location" function and let the library calculate a position from the available resources (eg Cell, WiFi, GPS). As the library and results of the call are loaded into RAM, pagefile.sys can also potentially contain these app location artifacts.

The Facebook application comes bundled with the 530. Device A was the only data set that had Facebook user data. Various Facebook related location data was found in
P27:\Users\DefApps\APPDATA\Local\Packages\Microsoft.MSFacebook_8wekyb3d8bbwe\LocalState\Log.txt

This file did not exist in Device C (factory fresh) which suggests that Facebook was not used on that device.
Anyway, here's what the Facebook location data looked like:
[Type: Miscellaneous]    [Severity: Unspecified] 2015-12-31T01:23:45: POST: places.setLastLocation?coords{"accuracy":0.0,"altitude":0.0,"altitudeAccuracy":0.0,"heading":0.0,"latitude":-XX.12345,"longitude":YYY.12345,"speed":0.0}
[Type: Navigation]       [Severity: Low]         2015-12-31T01:23:45: Navigated to: Facebook.Views.Places
[Type: Miscellaneous]    [Severity: Unspecified] 2015-12-31T01:23:48: MULTIQUERY GET: Places:SELECT page_id, name, description, latitude, longitude, checkin_count, display_subtext, pic_square, distance(latitude, longitude, "-XX.12345", "YYY.12345") FROM place WHERE distance(latitude, longitude, "-XX.12345", "YYY.12345") < 750 LIMIT 25|Pages:SELECT page_id, name, fan_count, were_here_count, location FROM page WHERE page_id IN (SELECT page_id from #Places)
A potential Facebook "Last login" location was also found at:
P27:\Users\DefApps\APPDATA\Local\Packages\Microsoft.MSFacebook_8wekyb3d8bbwe\Settings\settings.dat and P27:\Users\DefApps\APPDATA\Local\Packages\Microsoft.MSFacebook_8wekyb3d8bbwe\Settings\settings.dat.LOG1
Both setting files contained a string similar to:
"AllowLocationAccess":true,"CacheExpirationInMinutes":10,"ConfirmExit":true,"CurrentUserName":"Randy Monkey","CurrentUserProfilePicUri":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/p200x200/12345678_123456789ABCDEF12_123456789ABCDEF1234_n.jpg?oh=9b1356725bdbc1f30cb193068343123&oe=564DE360&__gda__=1438678992_c2802dc1e45c7998ca39c6f5dcf67484","GetHereNowEnabled":true,"IsVerfified":false,"HasLoggedIn":true,"LastLatitude":-XX.123456789012345,"LastLocationDate":"\/Date(1430562061234)\/","LastLongitude":YYY.12345678901234,"LockScreenState":1,
For Device A, it also appears that the Windows Store app "GPS Voice Navigation" was installed, run and then uninstalled. There were location artifacts left in P27:\pagefile.sys such as:
app://8E116909-CF87-4176-894F-F54434EFB01E/_default#/Protocol?encodedLaunchUri=ms-drive-to%3A%3Fdestination.latitude%3D-XX.12345%26destination.longitude%3DYYY.123456%26destination.name%3DMonkey%2520Central%2520Headquarters
The GUID 8E116909-CF87-4176-894F-F54434EFB01E listed above is an alias for the GPS Voice Navigation app. This can be confirmed by visiting:
windowsphone.com/s?appId=8E116909-CF87-4176-894F-F54434EFB01E

Searching for "ms-drive-to" (as above) and "ms-walk-to" keywords can lead to further location data. According to this, these keywords can be used by an app to request walking/driving directions. So while it may not confirm a device's actual location, it can prove a user looked up directions from/to specific places.

For the "GPS Voice Navigation" app, various location data was found with references to the System.Device.Location namespace. As mentioned previously, this is a .NET Framework library for calculating location via GPS, WiFi, Cell Tower triangulation. Any location strings found near a "System.Device.Location" string could be potential device locations.
So in P27:\pagefile.sys, we found:
<Latitude xmlns="http://schemas.datacontract.org/2004/07/System.Device.Location">-XX.1234</Latitude><Longitude xmlns="http://schemas.datacontract.org/2004/07/System.Device.Location">YYY.123</Longitude> <Speed xmlns="http://schemas.datacontract.org/2004/07/System.Device.Location">NaN</Speed>
Also found in Device A P27:\pagefile.sys was this more comprehensive gem:
<KeyValueOfstringanyType><Key>CurrentPosition</Key><Value xmlns:d3p1="http://schemas.datacontract.org/2004/07/System.Device.Location" i:type="d3p1:GeoPositionOfGeoCoordinate8rbsckdZ"><d3p1:Location><d3p1:Altitude>45</d3p1:Altitude><d3p1:Course>188.3</d3p1:Course><d3p1:HorizontalAccuracy>3</d3p1:HorizontalAccuracy><d3p1:Latitude>-XX.123456789012345</d3p1:Latitude><d3p1:Longitude>YYY.1234567890123</d3p1:Longitude><d3p1:Speed>3.72</d3p1:Speed><d3p1:VerticalAccuracy>3</d3p1:VerticalAccuracy></d3p1:Location><d3p1:Timestamp xmlns:d4p1="http://schemas.datacontract.org/2004/07/System"><d4p1:DateTime>2015-12-31T01:23:45.1234567Z</d4p1:DateTime><d4p1:OffsetMinutes>600</d4p1:OffsetMinutes></d3p1:Timestamp></Value></KeyValueOfstringanyType>
As this only appeared in Device A, it is probably related to the "GPS Voice Navigation" app. This is potentially confirmed by nearby XML elements such as:
 <Value xmlns:d3p1="http://schemas.datacontract.org/2004/07/GPS_VN_BL_WP8" i:type="d3p1:Locales">English</Value>
Also observed in Device A P27:/pagefile.sys, was an "EndLocation" keyword used by the "GPS Voice Navigation" app which showed the destination address details. So searching for that keyword could also reveal destinations entered into the app.

A basic app permission analysis was also performed on selected devices using the WP8_AppPerms.py script (see this previous post).
In Device A, There were 7 applications which required the ID_CAP_LOCATION capability (which provides access to device location):
WhatsApp
Facebook Messenger
*Facebook
*Skype
*XBox Video
*LumiaHelpTips_4?
*Nokia Music WP8 client


The apps that we preceded with an asterisk seem to be default apps that are pre-installed with the phone. Notice how even chat apps can require access to the device location. For example, Skype allows the user to "share location" from a chat. Similarly, WhatsApp can also send its location (as an attachment) in a chat.
Other permissions which a location aware app might need include:
ID_CAP_NETWORKING which would be required for accessing network services that can provide location information (eg “GetLocationUsingFingerprintResponse” data for Internet Explorer location queries).
ID_CAP_MAP which allows an app to display a map.
ID_CAP_CELL_API_LOCATION which seems to allow for/help location via Cellular position information.
ID_CAP_SENSORS which provides access to the phone’s accelerometer, compass, gyroscope.

Registry

According to XWF, Device A’s "Free space" (unallocated space) contained some UTF16-LE lat/long hits and a timestamp close to a "LastFoundAt" (ASCII) string.
For example, in close proximity to:
“lat=-XX.123456###long=YYY.123456###time=22/12/2015 01:23:45” 
was an "hbin" string and a "LastFoundAt" string. The "hbin" signature suggests that the surrounding data belongs to a registry hive cell. Searching the P26:\Windows\system32\config\SOFTWARE hive resulted in a hit at \OEM\Nokia\NokiaAccessories\Devices\*FF:FF:FF:FF:FF:FF* (Hex string has been redacted) which contained an entry for "LastFoundAt" equal to the lat/long/time string observed.
However, this subkey did not exist in all test devices so this registry value may not always be available for providing a position.

P26:\Windows\system32\config\SOFTWARE\Microsoft\BingSuggests can have PreviousLatitude, PreviousLongitude entry values along with QueryAttemptTime, SuccessfulQueryTime and CacheUpdateTime entry values. It is unclear what the significance is of the Position information relative to Bing. Times appear to be LE 64 bit Number of 100 ns since 1 JAN 1601. The CacheUpdateTime was slightly later than the QueryAttemptTime (which equalled the SuccessfulQueryTime). The Registry Key's "LastWrittenTime" occured AFTER the various timestamp values. The Timestamp values were zeroes for a new phone.

Miscellaneous Squirrel Chasing

While squirrel chasing through the data, we noticed a few non-location related but interesting nuggets ... (as if this post wasn't already long enough!)

P27: \SharedData\Store\PurchaseHistory.xml
contains the purchased App GUID, the app name and the purchase date. If entering the app GUID to the "windowsphone.com/s?appId=" URL does not reveal the app name, try checking this file.

Calendar Reminders are stored in P26:\Windows\system32\config\SOFTWARE\Microsoft\WPTaskScheduler\{*GUID*} entry values. For example, SOFTWARE\Microsoft\WPTaskScheduler\{*GUID*}
could contain a "Schedule" entry with value string containing the strings: "Reminder", "Monkey's New Year" and "All day event 01/01/2015".

As mentioned by Brent Muir, notification messages are archived in a .dat file. For our test devices, this appears to be located at:
P27:\Users\DefApps\APPDATA\Local\Microsoft\Windows\Notifications\appdb.dat
This file was 24 MB in size but sparsely populated.
An example data string from the file looks something like:
<toast launch="app://5B04B775-356B-4AA0-AAF8-6491FFEA5610/Chat?EntryId=00000000213E0000060000000700000000000000&amp;MessageId=000000004A4E0000020000000800000000000001"><visual><binding template="ToastText02"><text id="1">Voicemail Access</text><text id="2">Pls call 321, You have 5 New VoiceMail messages</text></binding></visual><audio src="ms-winsoundevent:legacy-notification.sms"/><backgroundColor>#0</backgroundColor></toast>
Unlike Windows 10 for PC, there was no "TimestampWhenSeen" stored in P26:\Windows\system32\config\SOFTWARE (ie when Notification Center was viewed).

Final Thoughts

Thanks to Boss Rob for patiently helping/waiting for this monkey to obtain/analyse the various phone dumps and allowing us to share our findings with the forensic community.

There is lot of potential geolocation data stored on a Windows Phone 8.10 device but it will depend on the main Location setting being ON and in certain situations, on the app’s required capabilities.
The P27:/pagefile.sys is probably the best place to look for textually encoded latitude/longitude data. However, Internet Explorer cache, app logs, device camera picture/video files and the Registry can also store location data.
It is suspected that other Windows Phone 8.1 makes/models will contain the same geolocation artifacts but this should be tested/verified by the analyst.
For future testing purposes, it is noteworthy that the 530 can be upgraded to Windows 10 for Mobile devices (whenever it officially comes out, under whatever name they decide on).

If you would like to share your thoughts/suggestions or any geolocation artifacts, please leave a comment.


Sunday, 12 July 2015

Chunky4n6Monkey!

With some substantial assistance from Boss Rob and inspired by Mission Impossible  ... Enter the Chunky4n6Monkey!

This post is targeted at those particularly interested in Python programming. If you are looking for a forensic wonder-tool post, you could be bitterly disappointed (yet again!).
Special Thanks to Rob (my boss and Hex Ninja Sensei) for kindly sharing his work which was the basis for this post.

After experiencing a few reversing/carving jobs, it seems that there's common theme.
Usually, the analyst wants to search a file for a given set of values (eg magic hex number) and then process the surrounding data accordingly. Complicating matters is that as storage media grow in size (especially in mobile devices), it is not always possible or timely to read the whole contents of a file into memory for searching. While Python does allow you to read files line by line, this is not really conducive to searching for (long) strings that cross line boundaries.

Rockin' Rob's method for handling these large files is to break them up into chunks but read slightly more than a chunk size (ie chunk size + delta). This way if a hit starts at the end of one chunk and crosses the chunk boundary, we can still find it/log it for later.
Note: The delta must be at least the same size as the largest record that is being searched for. Worst case scenario, the very last byte of the chunk contains the first byte of the search term - which means making delta as big as the largest record.

OK, first we are going to look at a theoretical chunky situation, then we will look at developing a utility (chunkymonkey.py) that can help us select one of two search algorithms and also help us to optimize our chunk size. Finally, we will implement our newly selected search algorithm and chunk size in an existing Windows Phone 8 script (wp8-sms.py) and compare it with the previous un-chunkified version. You can grab the chunkymonkey.py script (and the updated wp8-sms.py script) from my GitHub page.
Hehe, "Chunkymonkey" - That's gotta be one of my all time favourite tool names :)

So now let's take a look at a chunky example:

16 Byte Chunky Example

Here we have a file which is divided into theoretical 16 byte chunks plus some extra bytes. The search term we are after is the three bytes 0x010203. They occur three times - once entirely before a chunk boundary at offset 12, once where it overlaps a chunk boundary at offset 31 and once right at the start of a chunk boundary at offset 48.
These three conditions simulate the possible chunk boundary situations. Our new chunkymonkey.py script will read a file chunk by chunk and if the search term starts before the end of a chunk boundary, it will log the file offset for later processing. If the search hit appears after the chunk boundary we ignore it. If the search hit starts after the chunk boundary but within chunk size+delta, we also ignore it as the next round of chunk processing should also pick it up.
We are also going to evaluate a couple of different search methods to see if we can speed our chunk searches up. The first one "all_indices" relies on the string.find method for finding substrings (think of the contents of a file as one big hex string). This was *ahem* "re-used" *ahem* from a recipe listed on code.activestate.com. The second method uses a compiled regular expression pattern. For more on Python regular expressions, you can read the documentation HOWTO.

Here's the code for each:


# Find all indices of a substring in a given string (using string.find)
# From http://code.activestate.com/recipes/499314-find-all-indices-of-a-substring-in-a-given-string/
def all_indices(bigstring, substring, listindex=[], offset=0):
    i = bigstring.find(substring, offset)
    while i >= 0:
        listindex.append(i)
        i = bigstring.find(substring, i + 1)
    return listindex

# Find all indices of the "pattern" regular expression in a given string (using regex)
# Where pattern is a compiled Python re pattern object (ie the output of "re.compile")
def regsearch(bigstring, pattern, listindex=[]):
    hitsit = pattern.finditer(bigstring)
    for it in hitsit:
        # iterators only last for one shot so we capture the offsets to a list
        listindex.append(it.start())
    return listindex
For benchmarking purposes, we are going to call these two functions from sliceNsearch (for "all_indices") and sliceNsearchRE (for "regsearch").
These slice functions are going to read the specified file chunk by chunk and then call their respective search function. If the file size is less than one chunk size, the entire file will be read and searched in one go.
Once the search function returns a list of hit offsets (relative to the current chunk), these offsets will be converted to the equivalent file offsets for later processing.
For comparison, our new script will then also do a full file.read (ie no chunks) and process the resultant file string using the "all_indices" and "regsearch" functions. These wholeread functions can take a while to run so we can comment out those calls (to "wholereadRE" and "wholeread") later.

The goal is to compare the times taken when searching for hits in chunks Vs searching for hits via full file reads.
The secondary aim is to figure out which search function is quicker ie "regsearch" or "all_indices".

Here's the help text for chunkymonkey.py:


c:\Python27\python.exe chunkymonkey.py -h

Running chunkymonkey.py v2015-08-19

usage: chunkymonkey.py [-h] inputfile term chunksize delta

Helps find optimal chunk sizes when searching large binary files for a known
hex string

positional arguments:
  inputfile   File to be searched
  term        Hex Search string eg 53004d00
  chunksize   Size of each chunk (in decimal bytes)
  delta       Size of the extra read buffer (in decimal bytes)

optional arguments:
  -h, --help  show this help message and exit

Now we'll call our new script (chunkymonkey.py) with our 16 byte chunk boundary hex file pictured earlier:

c:\Python27\python.exe chunkymonkey.py 16byte-chunk-with-3byte-delta.bin 010203 16 3
Running chunkymonkey.py v2015-08-19

Search term is: 010203
Chunky sliceNsearch hits = 3, Chunky sliceNsearchRE hits = 3
Wholeread all_indices hits = 3, Wholeread regsearch hits = 3

Both the sliceNsearch and sliceNsearchRE chunky functions found the same hit offsets. To save space, I have commented out the part which prints each hit offset but rest assured that all hits listed were the same.
The wholeread (one big file.read) function calls to the "all_indices" and "regsearch" functions also found the same hits.
This proves that our new chunky functions will find the same search hits as the file.read (wholeread) functions.

So now what?
Python has an inbuilt profiling module (cProfile) which provides timing information for each function call. By using this, we can see which search method and chunk size is the most time efficient.
However, as the example bin file is not very large, let's try finding the optimum search algorithm/chunk size for a 7 GB Windows Phone 8.0 image instead. The test system is a i7 3.4-3.9 GHz with 16 GB RAM and 256 GB SSD running Win7 Pro x64 and Python 2.7.6.

Note: For Python 2, there appears to be a size limitation on (chunksize + delta). It must be less than ~2147483647.
This is probably because a Python int is implemented via a C long which is limited to 2^32 bits (ie max range is +/-2147483647). See also here for further details. Python 3 apparently does not have this limitation. So that kinda limits us to 2 GB chunk sizes at this point :'(

OK so let's try running chunkymonkey.py with a 2000000000 byte (~2GB) chunk size and 1000 byte delta size. The search term is "53004d00530074006500780074000000" ie UTF-16LE for "SMStext".

c:\Python27\python.exe -m cProfile chunkymonkey.py 7GBtestbin.bin 53004d00530074006500780074000000 2000000000 1000
Running chunkymonkey.py v2015-08-19

Search term is: 53004d00530074006500780074000000
Chunky sliceNsearch hits = 21, Chunky sliceNsearchRE hits = 21
Wholeread all_indices hits = 21, Wholeread regsearch hits = 21
         2290 function calls (2229 primitive calls) in 133.211 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 argparse.py:1023(_SubParsersAction)
        1    0.000    0.000    0.000    0.000 argparse.py:1025(_ChoicesPseudoAction)
        1    0.000    0.000    0.000    0.000 argparse.py:1100(FileType)
        1    0.000    0.000    0.000    0.000 argparse.py:112(_AttributeHolder)
        1    0.000    0.000    0.000    0.000 argparse.py:1144(Namespace)
        1    0.000    0.000    0.000    0.000 argparse.py:1151(__init__)
        1    0.000    0.000    0.000    0.000 argparse.py:1167(_ActionsContainer)
        3    0.000    0.000    0.000    0.000 argparse.py:1169(__init__)
       34    0.000    0.000    0.000    0.000 argparse.py:1221(register)
       14    0.000    0.000    0.000    0.000 argparse.py:1225(_registry_get)
        5    0.000    0.000    0.000    0.000 argparse.py:1250(add_argument)
        2    0.000    0.000    0.000    0.000 argparse.py:1297(add_argument_group)
        5    0.000    0.000    0.000    0.000 argparse.py:1307(_add_action)
        4    0.000    0.000    0.000    0.000 argparse.py:1371(_get_positional_kwargs)
        1    0.000    0.000    0.000    0.000 argparse.py:1387(_get_optional_kwargs)
        5    0.000    0.000    0.000    0.000 argparse.py:1422(_pop_action_class)
        3    0.000    0.000    0.000    0.000 argparse.py:1426(_get_handler)
        5    0.000    0.000    0.000    0.000 argparse.py:1435(_check_conflict)
        1    0.000    0.000    0.000    0.000 argparse.py:147(HelpFormatter)
        1    0.000    0.000    0.000    0.000 argparse.py:1471(_ArgumentGroup)
        2    0.000    0.000    0.000    0.000 argparse.py:1473(__init__)
        5    0.000    0.000    0.000    0.000 argparse.py:1495(_add_action)
        1    0.000    0.000    0.000    0.000 argparse.py:1505(_MutuallyExclusiveGroup)
        1    0.000    0.000    0.000    0.000 argparse.py:1525(ArgumentParser)
        5    0.000    0.000    0.000    0.000 argparse.py:154(__init__)
        1    0.000    0.000    0.001    0.001 argparse.py:1543(__init__)
        2    0.000    0.000    0.000    0.000 argparse.py:1589(identity)
        5    0.000    0.000    0.000    0.000 argparse.py:1667(_add_action)
        1    0.000    0.000    0.000    0.000 argparse.py:1679(_get_positional_actions)
        1    0.000    0.000    0.000    0.000 argparse.py:1687(parse_args)
        1    0.000    0.000    0.000    0.000 argparse.py:1694(parse_known_args)

        1    0.000    0.000    0.000    0.000 argparse.py:1729(_parse_known_args)
        4    0.000    0.000    0.000    0.000 argparse.py:1776(take_action)
        1    0.000    0.000    0.000    0.000 argparse.py:1874(consume_positionals)
        1    0.000    0.000    0.000    0.000 argparse.py:195(_Section)
        5    0.000    0.000    0.000    0.000 argparse.py:197(__init__)
        1    0.000    0.000    0.000    0.000 argparse.py:2026(_match_arguments_partial)
        4    0.000    0.000    0.000    0.000 argparse.py:2042(_parse_optional)
        4    0.000    0.000    0.000    0.000 argparse.py:2143(_get_nargs_pattern)
        4    0.000    0.000    0.000    0.000 argparse.py:2187(_get_values)
        4    0.000    0.000    0.000    0.000 argparse.py:2239(_get_value)
        4    0.000    0.000    0.000    0.000 argparse.py:2264(_check_value)
        5    0.000    0.000    0.000    0.000 argparse.py:2313(_get_formatter)
        5    0.000    0.000    0.000    0.000 argparse.py:555(_metavar_formatter)
        5    0.000    0.000    0.000    0.000 argparse.py:564(format)
        5    0.000    0.000    0.000    0.000 argparse.py:571(_format_args)
        1    0.001    0.001    0.002    0.002 argparse.py:62(<module>)
        1    0.000    0.000    0.000    0.000 argparse.py:627(RawDescriptionHelpFormatter)
        1    0.000    0.000    0.000    0.000 argparse.py:638(RawTextHelpFormatter)
        1    0.000    0.000    0.000    0.000 argparse.py:649(ArgumentDefaultsHelpFormatter)
        1    0.000    0.000    0.000    0.000 argparse.py:683(ArgumentError)
        1    0.000    0.000    0.000    0.000 argparse.py:703(ArgumentTypeError)

        1    0.000    0.000    0.000    0.000 argparse.py:712(Action)
        5    0.000    0.000    0.000    0.000 argparse.py:763(__init__)
        1    0.000    0.000    0.000    0.000 argparse.py:803(_StoreAction)
        4    0.000    0.000    0.000    0.000 argparse.py:805(__init__)
        4    0.000    0.000    0.000    0.000 argparse.py:834(__call__)
        1    0.000    0.000    0.000    0.000 argparse.py:838(_StoreConstAction)

        1    0.000    0.000    0.000    0.000 argparse.py:861(_StoreTrueAction)
        1    0.000    0.000    0.000    0.000 argparse.py:878(_StoreFalseAction)

        1    0.000    0.000    0.000    0.000 argparse.py:895(_AppendAction)
        1    0.000    0.000    0.000    0.000 argparse.py:932(_AppendConstAction)
       14    0.000    0.000    0.000    0.000 argparse.py:95(_callable)
        1    0.000    0.000    0.000    0.000 argparse.py:958(_CountAction)
        1    0.000    0.000    0.000    0.000 argparse.py:979(_HelpAction)
        1    0.000    0.000    0.000    0.000 argparse.py:981(__init__)
        1    0.000    0.000    0.000    0.000 argparse.py:998(_VersionAction)
        1    0.310    0.310   11.327   11.327 chunkymonkey.py:115(sliceNsearchRE)
        1    0.000    0.000   44.736   44.736 chunkymonkey.py:167(wholeread)
        1    0.000    0.000   47.431   47.431 chunkymonkey.py:182(wholereadRE)
        1    1.023    1.023  133.211  133.211 chunkymonkey.py:34(<module>)
        5    0.000    0.000   22.229    4.446 chunkymonkey.py:44(all_indices)
        5   16.226    3.245   16.227    3.245 chunkymonkey.py:53(regsearch)
        1    0.296    0.296   28.691   28.691 chunkymonkey.py:63(sliceNsearch)
        1    0.001    0.001    0.001    0.001 collections.py:1(<module>)
        1    0.000    0.000    0.000    0.000 collections.py:26(OrderedDict)
        1    0.000    0.000    0.000    0.000 collections.py:387(Counter)
        3    0.000    0.000    0.000    0.000 gettext.py:130(_expand_lang)
        3    0.000    0.000    0.000    0.000 gettext.py:421(find)
        3    0.000    0.000    0.000    0.000 gettext.py:461(translation)
        3    0.000    0.000    0.000    0.000 gettext.py:527(dgettext)
        3    0.000    0.000    0.000    0.000 gettext.py:565(gettext)
        1    0.000    0.000    0.000    0.000 heapq.py:31(<module>)
        1    0.000    0.000    0.000    0.000 keyword.py:11(<module>)
        3    0.000    0.000    0.000    0.000 locale.py:347(normalize)
        1    0.000    0.000    0.000    0.000 ntpath.py:122(splitdrive)
        1    0.000    0.000    0.000    0.000 ntpath.py:164(split)
        1    0.000    0.000    0.000    0.000 ntpath.py:196(basename)
        5    0.000    0.000    0.000    0.000 os.py:422(__getitem__)
       12    0.000    0.000    0.000    0.000 os.py:444(get)
        1    0.000    0.000    0.000    0.000 re.py:134(match)
       15    0.000    0.000    0.001    0.000 re.py:188(compile)
       16    0.000    0.000    0.001    0.000 re.py:226(_compile)
        4    0.000    0.000    0.000    0.000 sre_compile.py:178(_compile_charset)
        4    0.000    0.000    0.000    0.000 sre_compile.py:207(_optimize_charset)
     24/5    0.000    0.000    0.000    0.000 sre_compile.py:32(_compile)
       13    0.000    0.000    0.000    0.000 sre_compile.py:354(_simple)
        5    0.000    0.000    0.000    0.000 sre_compile.py:359(_compile_info)
       10    0.000    0.000    0.000    0.000 sre_compile.py:472(isstring)
        5    0.000    0.000    0.000    0.000 sre_compile.py:478(_code)
        5    0.000    0.000    0.001    0.000 sre_compile.py:493(compile)
       61    0.000    0.000    0.000    0.000 sre_parse.py:126(__len__)
        4    0.000    0.000    0.000    0.000 sre_parse.py:128(__delitem__)
      109    0.000    0.000    0.000    0.000 sre_parse.py:130(__getitem__)
       13    0.000    0.000    0.000    0.000 sre_parse.py:134(__setitem__)
       49    0.000    0.000    0.000    0.000 sre_parse.py:138(append)
    37/18    0.000    0.000    0.000    0.000 sre_parse.py:140(getwidth)
        5    0.000    0.000    0.000    0.000 sre_parse.py:178(__init__)
       79    0.000    0.000    0.000    0.000 sre_parse.py:182(__next)
       35    0.000    0.000    0.000    0.000 sre_parse.py:195(match)
       69    0.000    0.000    0.000    0.000 sre_parse.py:201(get)
        8    0.000    0.000    0.000    0.000 sre_parse.py:257(_escape)
      9/5    0.000    0.000    0.000    0.000 sre_parse.py:301(_parse_sub)
     10/6    0.000    0.000    0.000    0.000 sre_parse.py:379(_parse)
        5    0.000    0.000    0.000    0.000 sre_parse.py:67(__init__)
        5    0.000    0.000    0.000    0.000 sre_parse.py:675(parse)
        4    0.000    0.000    0.000    0.000 sre_parse.py:72(opengroup)
        4    0.000    0.000    0.000    0.000 sre_parse.py:83(closegroup)
       24    0.000    0.000    0.000    0.000 sre_parse.py:90(__init__)
        5    0.000    0.000    0.000    0.000 {_sre.compile}
        1    0.000    0.000    0.000    0.000 {binascii.hexlify}
        1    0.000    0.000    0.000    0.000 {binascii.unhexlify}
        3    0.000    0.000    0.000    0.000 {getattr}
       26    0.000    0.000    0.000    0.000 {hasattr}
      133    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {iter}
  342/327    0.000    0.000    0.000    0.000 {len}
        2    0.000    0.000    0.000    0.000 {math.ceil}
        2    0.000    0.000    0.000    0.000 {max}
        8    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
      452    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        4    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        6    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
       56   22.229    0.397   22.229    0.397 {method 'find' of 'str' objects}
        5    0.000    0.000    0.000    0.000 {method 'finditer' of '_sre.SRE_Pattern' objects}
       78    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'groups' of '_sre.SRE_Match' objects}
        5    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}

        3    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'lstrip' of 'str' objects}

        3    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        6    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
       10   93.117    9.312   93.117    9.312 {method 'read' of 'file' objects}
        8    0.000    0.000    0.000    0.000 {method 'remove' of 'list' objects}
        7    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
        3    0.000    0.000    0.000    0.000 {method 'reverse' of 'list' objects}
        8    0.000    0.000    0.000    0.000 {method 'seek' of 'file' objects}
       40    0.000    0.000    0.000    0.000 {method 'setdefault' of 'dict' objects}
       42    0.000    0.000    0.000    0.000 {method 'start' of '_sre.SRE_Match' objects}
        3    0.000    0.000    0.000    0.000 {method 'translate' of 'str' objects}
       17    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
       50    0.000    0.000    0.000    0.000 {min}
        2    0.001    0.000    0.001    0.000 {nt.stat}
        4    0.005    0.001    0.005    0.001 {open}
       31    0.000    0.000    0.000    0.000 {ord}
        9    0.000    0.000    0.000    0.000 {range}
        8    0.000    0.000    0.000    0.000 {setattr}
        1    0.000    0.000    0.000    0.000 {zip}

That's a LOT of output eh? Don't worry, we're only interested in the respective lines which contain the sliceNsearch, sliceNsearchRE, wholeread and wholereadRE functions. We're also going to focus on the "cumtime" column. This is the cumulative time spent in the function (not what your twisted mind first thought eh?) and thats the figure (highlighted in red) that we will use to compare the various runs with different chunk sizes.

To save space, here's a table detailing runs with different chunk sizes (rounded to nearest second):

Function processing times by chunksize

Delta was consistently set at 1000 bytes.
From the results above, we can see that the best "cumtime" is for a chunksize of 2000000000 bytes and using the chunky sliceNsearchRE function (which calls the "regsearch" function for each chunk).
Note how the wholeread times are much larger than either the sliceNsearch or sliceNsearchRE times.
Anyhoo, that's all well and good but how much difference can it make to an actual script which also has to process the hits and not just find them?

We modified our previous Windows Phone 8 SMS script (wp-sms.py) to use 2000000000 byte chunks (with 1000 byte delta) and modified it to call the sliceNsearchRE function. We then captured the cProfile stats.

First, we ran the previous unchunkified version of wp-sms.py which yielded these times:

c:\Python27\python.exe -m cProfile wp8-sms-orig.py -f 7GBtestbin.bin -o 7GBtestop.tsv
Running wp8-sms.py v2014-10-05

Skipping hit at 0x2dace9b0 - cannot find next field after SMStext
Skipping hit at 0x2dad6000 - cannot find next field after SMStext
Skipping hit at 0x31611c30 - cannot find next field after SMStext
Skipping hit at 0x4ce99bc0 - cannot find next field after SMStext
Skipping hit at 0x4ce99c00 - cannot find next field after SMStext
Skipping hit at 0x4ce9bf7c - cannot find next field after SMStext
Skipping hit at 0x66947c30 - cannot find next field after SMStext
Skipping hit at 0x6694ebc0 - cannot find next field after SMStext
Skipping hit at 0x6694ec00 - cannot find next field after SMStext
String substitution(s) due to unrecognized/unprintable characters at 0xccf26379

Processed 21 SMStext hits


Finished writing out 12 TSV entries

         21672 function calls in 55.624 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 __init__.py:49(normalize_encoding)

        2    0.000    0.000    0.001    0.001 __init__.py:71(search_function)
        1    0.000    0.000    0.000    0.000 ascii.py:13(Codec)
        1    0.000    0.000    0.000    0.000 ascii.py:20(IncrementalEncoder)
        1    0.000    0.000    0.000    0.000 ascii.py:24(IncrementalDecoder)
        1    0.000    0.000    0.000    0.000 ascii.py:28(StreamWriter)
        1    0.000    0.000    0.000    0.000 ascii.py:31(StreamReader)
        1    0.000    0.000    0.000    0.000 ascii.py:34(StreamConverter)
        1    0.000    0.000    0.000    0.000 ascii.py:41(getregentry)
        1    0.000    0.000    0.000    0.000 ascii.py:8(<module>)
        1    0.000    0.000    0.000    0.000 codecs.py:322(__init__)
        1    0.000    0.000    0.000    0.000 codecs.py:395(__init__)
     1288    0.004    0.000    0.008    0.000 codecs.py:424(read)
       72    0.000    0.000    0.000    0.000 codecs.py:591(reset)
        1    0.000    0.000    0.000    0.000 codecs.py:651(__init__)
     1288    0.001    0.000    0.008    0.000 codecs.py:669(read)
       72    0.000    0.000    0.000    0.000 codecs.py:702(seek)
      106    0.000    0.000    0.000    0.000 codecs.py:708(__getattr__)
        2    0.000    0.000    0.000    0.000 codecs.py:77(__new__)
        1    0.000    0.000    0.000    0.000 codecs.py:841(open)
        1    0.000    0.000    0.000    0.000 gettext.py:130(_expand_lang)
        1    0.000    0.000    0.000    0.000 gettext.py:421(find)
        1    0.000    0.000    0.000    0.000 gettext.py:461(translation)
        1    0.000    0.000    0.000    0.000 gettext.py:527(dgettext)
        1    0.000    0.000    0.000    0.000 gettext.py:565(gettext)
        1    0.000    0.000    0.000    0.000 locale.py:347(normalize)
        3    0.000    0.000    0.000    0.000 optparse.py:1007(add_option)
        1    0.000    0.000    0.000    0.000 optparse.py:1190(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:1242(_create_option_list)
        1    0.000    0.000    0.000    0.000 optparse.py:1247(_add_help_option)

        1    0.000    0.000    0.000    0.000 optparse.py:1257(_populate_option_list)
        1    0.000    0.000    0.000    0.000 optparse.py:1267(_init_parsing_state)
        1    0.000    0.000    0.000    0.000 optparse.py:1276(set_usage)
        1    0.000    0.000    0.000    0.000 optparse.py:1312(_get_all_options)

        1    0.000    0.000    0.000    0.000 optparse.py:1318(get_default_values)
        1    0.000    0.000    0.000    0.000 optparse.py:1361(_get_args)
        1    0.000    0.000    0.000    0.000 optparse.py:1367(parse_args)
        1    0.000    0.000    0.000    0.000 optparse.py:1406(check_values)
        1    0.000    0.000    0.000    0.000 optparse.py:1419(_process_args)
        2    0.000    0.000    0.000    0.000 optparse.py:1516(_process_short_opts)
        1    0.000    0.000    0.000    0.000 optparse.py:200(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:224(set_parser)
        1    0.000    0.000    0.000    0.000 optparse.py:365(__init__)
        3    0.000    0.000    0.000    0.000 optparse.py:560(__init__)
        3    0.000    0.000    0.000    0.000 optparse.py:579(_check_opt_strings)
        3    0.000    0.000    0.000    0.000 optparse.py:588(_set_opt_strings)
        3    0.000    0.000    0.000    0.000 optparse.py:609(_set_attrs)
        3    0.000    0.000    0.000    0.000 optparse.py:629(_check_action)
        3    0.000    0.000    0.000    0.000 optparse.py:635(_check_type)
        3    0.000    0.000    0.000    0.000 optparse.py:665(_check_choice)
        3    0.000    0.000    0.000    0.000 optparse.py:678(_check_dest)
        3    0.000    0.000    0.000    0.000 optparse.py:693(_check_const)
        3    0.000    0.000    0.000    0.000 optparse.py:699(_check_nargs)
        3    0.000    0.000    0.000    0.000 optparse.py:708(_check_callback)
        2    0.000    0.000    0.000    0.000 optparse.py:752(takes_value)
        2    0.000    0.000    0.000    0.000 optparse.py:764(check_value)
        2    0.000    0.000    0.000    0.000 optparse.py:771(convert_value)
        2    0.000    0.000    0.000    0.000 optparse.py:778(process)
        2    0.000    0.000    0.000    0.000 optparse.py:790(take_action)
        3    0.000    0.000    0.000    0.000 optparse.py:832(isbasestring)
        1    0.000    0.000    0.000    0.000 optparse.py:837(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:932(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:943(_create_option_mappings)
        1    0.000    0.000    0.000    0.000 optparse.py:959(set_conflict_handler)
        1    0.000    0.000    0.000    0.000 optparse.py:964(set_description)
        3    0.000    0.000    0.000    0.000 optparse.py:980(_check_conflict)
        1    0.000    0.000    0.000    0.000 os.py:422(__getitem__)
        4    0.000    0.000    0.000    0.000 os.py:444(get)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:18(IncrementalEncoder)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:22(IncrementalDecoder)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:25(StreamWriter)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:28(StreamReader)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:33(getregentry)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:8(<module>)
        1    0.006    0.006   55.624   55.624 wp8-sms-orig.py:101(<module>)
       72    0.003    0.000    0.012    0.000 wp8-sms-orig.py:114(read_nullterm_unistring)
      831    0.001    0.000    0.004    0.000 wp8-sms-orig.py:148(read_filetime)

        2    0.001    0.000   22.113   11.057 wp8-sms-orig.py:174(all_indices)
       21    0.000    0.000    0.003    0.000 wp8-sms-orig.py:184(find_flag)
       12    0.001    0.000    0.004    0.000 wp8-sms-orig.py:200(find_timestamp)
       33    0.000    0.000    0.000    0.000 wp8-sms-orig.py:218(goto_next_field)
       12    0.000    0.000    0.000    0.000 wp8-sms-orig.py:489(<lambda>)
        2    0.001    0.001    0.001    0.001 {__import__}
        1    0.000    0.000    0.000    0.000 {_codecs.lookup}
     2576    0.002    0.000    0.002    0.000 {_codecs.utf_16_le_decode}
     1547    0.001    0.000    0.001    0.000 {_struct.unpack}
        2    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x000000001E29D0F0}
       38    0.002    0.000    0.002    0.000 {built-in method utcfromtimestamp}

        3    0.000    0.000    0.000    0.000 {filter}
      106    0.000    0.000    0.000    0.000 {getattr}
        4    0.000    0.000    0.000    0.000 {hasattr}
       22    0.000    0.000    0.000    0.000 {hex}
        8    0.000    0.000    0.000    0.000 {isinstance}
     3881    0.001    0.000    0.001    0.000 {len}
      639    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        3    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}

        1    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      634   22.113    0.035   22.113    0.035 {method 'find' of 'str' objects}
       21    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
       38    0.000    0.000    0.000    0.000 {method 'isoformat' of 'datetime.datetime' objects}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}

        2    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       35    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
     4124   33.484    0.008   33.484    0.008 {method 'read' of 'file' objects}
        4    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'reverse' of 'list' objects}
       13    0.000    0.000    0.000    0.000 {method 'rstrip' of 'str' objects}

     1646    0.002    0.000    0.002    0.000 {method 'seek' of 'file' objects}
        2    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
      993    0.001    0.000    0.001    0.000 {method 'tell' of 'file' objects}
        3    0.000    0.000    0.000    0.000 {method 'translate' of 'str' objects}
        5    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
       13    0.000    0.000    0.001    0.000 {method 'write' of 'file' objects}

        3    0.001    0.000    0.001    0.000 {open}
       46    0.000    0.000    0.000    0.000 {range}
       40    0.000    0.000    0.000    0.000 {setattr}
        1    0.000    0.000    0.000    0.000 {sorted}
     1288    0.000    0.000    0.000    0.000 {unichr}

Note: Total time was 56 seconds. 33 seconds of which were spent in the file.read function.

And after chunkification, it ran a lot quicker!

c:\Python27\python.exe -m cProfile wp8-sms.py -f 7GBtestbin.bin -o 7GBtestop-chunk.tsv
Running wp8-sms.py v2015-08-19

Skipping hit at 0x2dace9b0 - cannot find next field after SMStext
Skipping hit at 0x2dad6000 - cannot find next field after SMStext
Skipping hit at 0x31611c30 - cannot find next field after SMStext
Skipping hit at 0x4ce99bc0 - cannot find next field after SMStext
Skipping hit at 0x4ce99c00 - cannot find next field after SMStext
Skipping hit at 0x4ce9bf7c - cannot find next field after SMStext
Skipping hit at 0x66947c30 - cannot find next field after SMStext
Skipping hit at 0x6694ebc0 - cannot find next field after SMStext
Skipping hit at 0x6694ec00 - cannot find next field after SMStext
String substitution(s) due to unrecognized/unprintable characters at 0xccf26379

Processed 21 SMStext hits


Finished writing out 12 TSV entries

         22706 function calls in 37.401 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 __init__.py:49(normalize_encoding)

        2    0.000    0.000    0.003    0.001 __init__.py:71(search_function)
        1    0.000    0.000    0.000    0.000 ascii.py:13(Codec)
        1    0.000    0.000    0.000    0.000 ascii.py:20(IncrementalEncoder)
        1    0.000    0.000    0.000    0.000 ascii.py:24(IncrementalDecoder)
        1    0.000    0.000    0.000    0.000 ascii.py:28(StreamWriter)
        1    0.000    0.000    0.000    0.000 ascii.py:31(StreamReader)
        1    0.000    0.000    0.000    0.000 ascii.py:34(StreamConverter)
        1    0.000    0.000    0.000    0.000 ascii.py:41(getregentry)
        1    0.000    0.000    0.000    0.000 ascii.py:8(<module>)
        1    0.000    0.000    0.000    0.000 codecs.py:322(__init__)
        1    0.000    0.000    0.000    0.000 codecs.py:395(__init__)
     1288    0.004    0.000    0.008    0.000 codecs.py:424(read)
       72    0.000    0.000    0.000    0.000 codecs.py:591(reset)
        1    0.000    0.000    0.000    0.000 codecs.py:651(__init__)
     1288    0.001    0.000    0.008    0.000 codecs.py:669(read)
       72    0.000    0.000    0.000    0.000 codecs.py:702(seek)
      106    0.000    0.000    0.000    0.000 codecs.py:708(__getattr__)
        2    0.000    0.000    0.000    0.000 codecs.py:77(__new__)
        1    0.000    0.000    0.002    0.002 codecs.py:841(open)
        1    0.000    0.000    0.000    0.000 gettext.py:130(_expand_lang)
        1    0.000    0.000    0.000    0.000 gettext.py:421(find)
        1    0.000    0.000    0.000    0.000 gettext.py:461(translation)
        1    0.000    0.000    0.000    0.000 gettext.py:527(dgettext)
        1    0.000    0.000    0.000    0.000 gettext.py:565(gettext)
        1    0.000    0.000    0.000    0.000 locale.py:347(normalize)
        3    0.000    0.000    0.000    0.000 optparse.py:1007(add_option)
        1    0.000    0.000    0.000    0.000 optparse.py:1190(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:1242(_create_option_list)
        1    0.000    0.000    0.000    0.000 optparse.py:1247(_add_help_option)

        1    0.000    0.000    0.000    0.000 optparse.py:1257(_populate_option_list)
        1    0.000    0.000    0.000    0.000 optparse.py:1267(_init_parsing_state)
        1    0.000    0.000    0.000    0.000 optparse.py:1276(set_usage)
        1    0.000    0.000    0.000    0.000 optparse.py:1312(_get_all_options)

        1    0.000    0.000    0.000    0.000 optparse.py:1318(get_default_values)
        1    0.000    0.000    0.000    0.000 optparse.py:1361(_get_args)
        1    0.000    0.000    0.000    0.000 optparse.py:1367(parse_args)
        1    0.000    0.000    0.000    0.000 optparse.py:1406(check_values)
        1    0.000    0.000    0.000    0.000 optparse.py:1419(_process_args)
        2    0.000    0.000    0.000    0.000 optparse.py:1516(_process_short_opts)
        1    0.000    0.000    0.000    0.000 optparse.py:200(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:224(set_parser)
        1    0.000    0.000    0.000    0.000 optparse.py:365(__init__)
        3    0.000    0.000    0.000    0.000 optparse.py:560(__init__)
        3    0.000    0.000    0.000    0.000 optparse.py:579(_check_opt_strings)
        3    0.000    0.000    0.000    0.000 optparse.py:588(_set_opt_strings)
        3    0.000    0.000    0.000    0.000 optparse.py:609(_set_attrs)
        3    0.000    0.000    0.000    0.000 optparse.py:629(_check_action)
        3    0.000    0.000    0.000    0.000 optparse.py:635(_check_type)
        3    0.000    0.000    0.000    0.000 optparse.py:665(_check_choice)
        3    0.000    0.000    0.000    0.000 optparse.py:678(_check_dest)
        3    0.000    0.000    0.000    0.000 optparse.py:693(_check_const)
        3    0.000    0.000    0.000    0.000 optparse.py:699(_check_nargs)
        3    0.000    0.000    0.000    0.000 optparse.py:708(_check_callback)
        2    0.000    0.000    0.000    0.000 optparse.py:752(takes_value)
        2    0.000    0.000    0.000    0.000 optparse.py:764(check_value)
        2    0.000    0.000    0.000    0.000 optparse.py:771(convert_value)
        2    0.000    0.000    0.000    0.000 optparse.py:778(process)
        2    0.000    0.000    0.000    0.000 optparse.py:790(take_action)
        3    0.000    0.000    0.000    0.000 optparse.py:832(isbasestring)
        1    0.000    0.000    0.000    0.000 optparse.py:837(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:932(__init__)
        1    0.000    0.000    0.000    0.000 optparse.py:943(_create_option_mappings)
        1    0.000    0.000    0.000    0.000 optparse.py:959(set_conflict_handler)
        1    0.000    0.000    0.000    0.000 optparse.py:964(set_description)
        3    0.000    0.000    0.000    0.000 optparse.py:980(_check_conflict)
        1    0.000    0.000    0.000    0.000 os.py:422(__getitem__)
        4    0.000    0.000    0.000    0.000 os.py:444(get)
        2    0.000    0.000    0.000    0.000 re.py:188(compile)
        2    0.000    0.000    0.000    0.000 re.py:226(_compile)
        2    0.000    0.000    0.000    0.000 sre_compile.py:32(_compile)
        2    0.000    0.000    0.000    0.000 sre_compile.py:359(_compile_info)
        4    0.000    0.000    0.000    0.000 sre_compile.py:472(isstring)
        2    0.000    0.000    0.000    0.000 sre_compile.py:478(_code)
        2    0.000    0.000    0.000    0.000 sre_compile.py:493(compile)
       24    0.000    0.000    0.000    0.000 sre_parse.py:138(append)
        2    0.000    0.000    0.000    0.000 sre_parse.py:140(getwidth)
        2    0.000    0.000    0.000    0.000 sre_parse.py:178(__init__)
       30    0.000    0.000    0.000    0.000 sre_parse.py:182(__next)
        2    0.000    0.000    0.000    0.000 sre_parse.py:195(match)
       28    0.000    0.000    0.000    0.000 sre_parse.py:201(get)
        2    0.000    0.000    0.000    0.000 sre_parse.py:301(_parse_sub)
        2    0.000    0.000    0.000    0.000 sre_parse.py:379(_parse)
        2    0.000    0.000    0.000    0.000 sre_parse.py:67(__init__)
        2    0.000    0.000    0.000    0.000 sre_parse.py:675(parse)
        2    0.000    0.000    0.000    0.000 sre_parse.py:90(__init__)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:18(IncrementalEncoder)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:22(IncrementalDecoder)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:25(StreamWriter)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:28(StreamReader)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:33(getregentry)
        1    0.000    0.000    0.000    0.000 utf_16_le.py:8(<module>)
        1    0.209    0.209   37.401   37.401 wp8-sms.py:113(<module>)
       72    0.002    0.000    0.011    0.000 wp8-sms.py:131(read_nullterm_unistring)
      831    0.001    0.000    0.003    0.000 wp8-sms.py:165(read_filetime)
        8   16.194    2.024   16.194    2.024 wp8-sms.py:191(regsearch)
       21    0.000    0.000    0.000    0.000 wp8-sms.py:200(find_flag)
       12    0.001    0.000    0.004    0.000 wp8-sms.py:216(find_timestamp)
       33    0.000    0.000    0.000    0.000 wp8-sms.py:234(goto_next_field)
        2    0.697    0.348   37.168   18.584 wp8-sms.py:246(sliceNsearchRE)
       12    0.000    0.000    0.000    0.000 wp8-sms.py:548(<lambda>)
        2    0.003    0.001    0.003    0.001 {__import__}
        1    0.000    0.000    0.002    0.002 {_codecs.lookup}
     2576    0.002    0.000    0.002    0.000 {_codecs.utf_16_le_decode}
        2    0.000    0.000    0.000    0.000 {_sre.compile}
     1547    0.002    0.000    0.002    0.000 {_struct.unpack}
        2    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x000000001E29D0F0}
       38    0.000    0.000    0.000    0.000 {built-in method utcfromtimestamp}

        3    0.000    0.000    0.000    0.000 {filter}
      106    0.000    0.000    0.000    0.000 {getattr}
        4    0.000    0.000    0.000    0.000 {hasattr}
       22    0.000    0.000    0.000    0.000 {hex}
       14    0.000    0.000    0.000    0.000 {isinstance}
     3977    0.001    0.000    0.001    0.000 {len}
        2    0.000    0.000    0.000    0.000 {math.ceil}
     1382    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        3    0.000    0.000    0.000    0.000 {method 'close' of 'file' objects}

        1    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        4    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
        2    0.000    0.000    0.000    0.000 {method 'fileno' of 'file' objects}
        3    0.000    0.000    0.000    0.000 {method 'find' of 'str' objects}
        8    0.000    0.000    0.000    0.000 {method 'finditer' of '_sre.SRE_Pattern' objects}
       23    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
       38    0.000    0.000    0.000    0.000 {method 'isoformat' of 'datetime.datetime' objects}
        3    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}

        2    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       35    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
     4131   20.281    0.005   20.281    0.005 {method 'read' of 'file' objects}
        4    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'reverse' of 'list' objects}
       13    0.000    0.000    0.000    0.000 {method 'rstrip' of 'str' objects}

     1654    0.001    0.000    0.001    0.000 {method 'seek' of 'file' objects}
        2    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
      629    0.000    0.000    0.000    0.000 {method 'start' of '_sre.SRE_Match' objects}
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
      993    0.001    0.000    0.001    0.000 {method 'tell' of 'file' objects}
        3    0.000    0.000    0.000    0.000 {method 'translate' of 'str' objects}
        5    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
       13    0.000    0.000    0.001    0.000 {method 'write' of 'file' objects}

        4    0.000    0.000    0.000    0.000 {min}
        2    0.000    0.000    0.000    0.000 {nt.fstat}
        3    0.001    0.000    0.001    0.000 {open}
       24    0.000    0.000    0.000    0.000 {ord}
       48    0.000    0.000    0.000    0.000 {range}
       40    0.000    0.000    0.000    0.000 {setattr}
        1    0.000    0.000    0.000    0.000 {sorted}
     1288    0.000    0.000    0.000    0.000 {unichr}

Note: The number of SMS extracted from the original and chunkified versions of the wp8-sms.py script were consistent.
As a result of this additional testing, I have also adjusted the chunkified version to search further for received SMS timestamps. ie it should now better detect received timestamps.

Exciting times! It took 37 seconds - much less than the previous version's 57 seconds.
Most of the time was spent in sliceNsearchRE and not in processing the hits.
Given the same input file and search/chunk parameters (ie ~2GB chunk with 1k delta), more time was spent in sliceNsearchRE for wp8-sms.py (37 s) than for the previous chunkymonkey.py time (11 s) because sliceNsearchRE is being called twice in wp8-sms.py and only once in chunkymonkey.py. Not sure why the sliceNsearchRE's time is 3 times longer and not closer to 2 times the duration ...

The revised wp8-sms.py script was also run on a single store.vol file (18 MB) and the output results matched the previous version's output. Both scripts processed almost 6000 SMS in ~7 s.

Final Thoughts

A new tool (chunkymonkey.py) was written to help determine the optimum chunk size and search algorithm for finding a hex string in large binary files. Due to Python 2 limitations, the maximum chunk size has to be less than 2147483647 minus the delta size.
Code from the tool was re-used/applied to an existing Windows Phone 8 SMS script (wp8-sms.py) and significantly reduced the processing time.
Further reduction of processing time may be possible in the future by utilizing threads (a way of concurrently calling multiple functions) however, this could make some already hack-tacular code even more complicated.

Over the next few hours/days, I plan to update/chunkify other selected Windows Phone scripts (ie wp8-callhistory.py, wp8-contacts.py) so that they can also run quicker against whole .bin files. I won't bother creating a new post for that though as the principles are already outlined here.

UPDATE (12JUL2015):
Have now updated the "wp8-sms.py", "wp8-callhistory.py" and "wp8-contacts.py" scripts to read large files in chunks.
Updated code is now available from my Github page.

UPDATE (20AUG2015):
Fixed a bug in the latest chunkymonkey.py and chunkified wp8-sms.py, wp8-callhistory.py, wp8-contacts.py scripts.
The bug prevented the last chunk from being parsed properly. Have also updated the results in the post to reflect the revised code performance.
Warning: The offsets for Windows Phone 8.10 appear to be slightly different to those for 8.0. Consequently, the updated wp8 scripts may have issues parsing WinPhone 8.10 images.
Stay tuned for further developments regarding WinPhone 8.10 versions of the scripts ...

Thanks again to Boss Rob for sharing his work!

Now where's that sundae?!