Friday, 26 March 2021

Monkey Test Drives a Honda Accord


"The red ones go faster!" - original picture sourced from

Monkey recently "test drove" ("test-parsed"?) a data dump from a 2016 Honda Accord (USA).
This post will describe that wonderful journey.

Special Thanks to Manny Fuentes who generously shared his Honda data. Without this data, this post and associated scripts would not exist.

The scripts are available from GitHub.

Parsing the dump with X-Ways Forensics showed 7 partitions, 6 of which contained the EXT4 filesystem. The first partition was not recognized by X-Ways.

Here is the breakdown according to X-Ways:

Partition1 = 251 MB Unknown
Partition2 = 879 MB EXT4 (System part1, ~8332 files)
Partition3 = 251 MB EXT4 (System part2, ~1275 files)
Partition4 = 1.7 GB EXT4 (User data, ~15115 files)
Partition5 = 251 MB EXT4 (21 files, contained timestamped logs)
Partition6 = 1.1 GB EXT4 (49 files, appears to contain Speech related data)
Partition7 = 125 MB EXT4 (966 files, mostly stored in "data_org.tar.gz")

Note: Two partitions (Partition2 and Partition3) contained /system directories.

Based on strings found in Partition2:\system\build.prop
The system was running Android 4.2.2 ( and it seems to be made by Clarion (ro.product.manufacturer=Clarion, ro.board.platform=r8a7791 4.2.2 9TXX9211 211 release-keys).
The build date was 6AUG2015 16:57:16 UTC (
Partition2:\system\app also contained various .apk and .odex files.

Partition3:\system\build.prop similarly confirmed the previous Android properties

Partition3:\system\alps\evolution\paired_device_list.txt contained ASCII text listing various BT addresses and their device names. Was consistent with data found in Partition4's bluetoothsettings.db.

Partition4 contained Android /user data (mainly under, com.clarion, directories)
Also found on Partition4:

\property\persist.sys.timezone [contained ASCII text set to "US/Central"]

\system\usagestats\usage-history.xml [contained a log of various timestamped Android Activitys. Not validated]

\data\\shared_prefs\bluetooth_settings.xml [contained a timestamp string value for "last_discovering_time". Appears to be millisecs since 1JAN1970. Not validated]

[contained string values for "currentTimeZoneIdNotDaylight" (e.g. "US/Central") and "currentTimeZoneName" (e.g. "CST UTC-6")]

[contained the string "DEVICENAME" and what appears to be a MAC address - possibly the most used device?]

\data\\Garmin\sqlite\quick_search_list.db [contained a "quick_search_list" table which was empty]

Partition5 contained various timestamped error logs (e.g. ErrorLevelPower.log, ErrorLevelSoft.log, ErrorLevelHard.log).

Partition6 appears to contain various Text To Speech related files.

Partition7 contained most of its files in "data_org.tar.gz". This appears to be a restoration backup of Partition4:/data.

The most interesting user related information was found in various SQLite databases under Partition4:/data.
For our data dump, there were 4 x SQLite databases which were of interest:


Consequently, four Python3 parsing scripts were written/tested on Win10x64 running Python 3.9.
The four scripts work a similar manner - they take an input argument to the respective SQLite database and an output argument for the output TSV filename. They then run SQLite queries for the relevant data and output selected fields to the TSV file.

For our data, in addition to phonedb.db, there was a Write-Ahead-Log (phonedb.db-wal) in the same directory.
It is recommended to run the script twice in this type of scenario and compare the two outputs:
1. Run the script WITHOUT the phonedb.db-wal file present in the same directory as the specified phonedb.db
2. Run the script WITH the phonedb.db-wal file present in the same directory as the specified phonedb.db

Users do not have to specify the -wal file at the command line as SQLite will auto-magically incorporate the -wal file if present.
For our data, including the phonedb.db-wal file resulted in an extra 10 calls being found/output.

On to the scripts ... - reads RecentStops.db "history" table and outputs details to TSV file.
This table appears to document timestamped lat/long coordinates. We're not sure what triggers an entry.

RecentStops.db can be found at: \data\\Garmin\sqlite\RecentStops.db

SQLite query used: "SELECT time, lat, lon, name FROM history ORDER BY time ASC;"
Usage example:

c:\Python39\python.exe -d RecentStops.db -o rsop.txt
Running v2021-03-22

Processed/Wrote 26 entries to: rsop.txt

Exiting ...

Output TSV format:

time    lat    lon    name

time is displayed as ISO formatted string interpreted as Garmin secs since 31DEC1989 (UTC).
lat & lon has the scaling factor applied = 180 / 2^31 to convert to degrees.
name can include cross-street or location strings which can help confirm the calculated lat/long. - reads crm.db "eco_logs" table and outputs details to TSV file.
This table appears to log various timestamped journey legs (timestamped odometer / trip range measurements).

crm.db can be found at: \data\\databases\crm.db

SQLite query used: "SELECT _id, trip_date, trip_id, mileage, start_pos_time, start_pos_odo, finish_pos_time, finish_pos_odo, fuel_used, driving_range FROM eco_logs ORDER BY _id ASC;"
Usage example:

c:\Python39\python.exe -d crm.db -o crmop.txt
Running v2021-03-22

Processed/Wrote 300 entries to: crmop.txt

Exiting ...

Output TSV format:

_id    trip_date    trip_id    mileage    start_pos_time    start_pos_odo    finish_pos_time    finish_pos_odo    fuel_used    driving_range

trip_date, start_pos_time, finish_pos_time are displayed as ISO formatted strings interpreted as millisecs since 1JAN1970 (UTC).
mileage, start_pos_odo, finish_pos_odo, fuel_used, driving_range are measured in unknown units but can be used for trending. - reads phonedb.db "callhistory", "contact", "contactnumber" tables and outputs details to TSV file.
This database appears to log call history and contacts information.

phonedb.db can be found at: \data\com.clarion.bluetooth\databases\phonedb.db

Call History SQLite query: "SELECT _id, address, phonenum, calldate, calltype FROM call_history ORDER BY calldate ASC;"

Contacts SQLite query: "SELECT contact._id, contact.address, contact.firstName, contact.lastName, contact.phonename, contactnumber.number, contactnumber.numbertype FROM contact JOIN contactnumber ON contactnumber.contact_id = contact._id ORDER BY contact._id ASC;"

Usage example:

c:\Python39\python.exe -d phonedb.db -o op
Running v2021-03-22

Processed/Wrote 98 CALL entries to: op_CALLS.txt

Processed/Wrote 41 CONTACT entries to: op_CONTACTS.txt

Exiting ...

CALLS output TSV format:

_id    address    phonenum    calldate    calltype

address appears to be a MAC address.
calldate is displayed as ISO formatted string interpreted as millisecs since 1JAN1970 (UTC). 
calltype appears to be 1,2 or 3. By using Call Charge Records or examining the devices, it should be possible to determine the significance of each value (e.g. missed, incoming, outgoing).

CONTACTS output TSV format:

_id    address    firstname    lastname    phonename    contactnum    contacttype

address appears to be a MAC address.
contacttype was consistently set to 3 for all of our contacts. - reads bluetoothsettings.db "bluetooth_device" table and outputs details to TSV file.
This table appears to log Bluetooth device names and MAC addresses.
bluetoothsettings.db can be found at: \data\com.clarion.bluetooth\databases\bluetoothsettings.db

Note: There was also a "speed_dial" table but it was empty so we're not sure about how this table is populated

SQLite query used: "SELECT device_bank, device_addr, device_name FROM bluetooth_device ORDER BY device_bank ASC;"
Usage example:

c:\Python39\python.exe -d bluetoothsettings.db -o btop.txt
Running v2021-03-23

Processed/Wrote 6 entries to: btop.txt

Exiting ...

Output TSV format:

device_bank    device_addr    device_name

device_bank appears to be an index for each device entry. 
device_addr appears to be a MAC address.

Final Thoughts

If you have a Honda dump of similar vintage, we'd appreciate if you could run the scripts and let us know how it goes.
Obviously, as the scripts were written using one set of data, there may be bugs / mis-ass-umptions.

Or if you can shed any more light on a Honda Android dump, we'd appreciate hearing about your findings.

Finally, if you can share a dump for any another vehicle and would like us to write some parsing scripts, please let us know.

Comments and Suggestions are also welcome in the comments section below ...

Saturday, 28 November 2020

iOS14 Maps History BLOB Script


Another BLOBBY SQL (Sequel)!

A quick post to introduce a new iOS 14 Apple Maps History helper script ...
Thanks to Heather Mahalik for sharing her research and for both her and her associate Sahil's testing.
You can read about Heather's iOS14 research at her blog here.

The script ( focuses on the Apple Maps app's MapsSync_0.0.1 SQLite database which can contain the last 3-5 directions/searches.
There are 32 tables in the database but as we see in Heather's query below - most of the history info is stored in a table called ZHISTORYITEM and in the ZMIXINMAPITEM table. Both tables can contain protobuf BLOBs which are extracted by the script for further processing by the user along with an HTML summary report.

ZHISTORYITEM.z_pk AS 'Item Number',
when ZHISTORYITEM.z_ent = 14 then 'coordinates of search'
when ZHISTORYITEM.z_ent = 16 then 'location search'
when ZHISTORYITEM.z_ent = 12 then 'navigation journey'
end AS 'Type',
datetime(ZHISTORYITEM.ZCREATETIME+978307200,'UNIXEPOCH','localtime') AS 'Time Created',
datetime(ZHISTORYITEM.ZMODIFICATIONTIME+978307200,'UNIXEPOCH','localtime') AS 'Time Modified',
From the query we can see there 3 types of entry:
- "Location search"
- "Coordinates of search" (usually has a "Map Item Storage" BLOB in ZMAPITEMSTORAGE column)
- "Navigation journey" (usually has a "Journey" BLOB in ZROUTEREQUESTSTORAGE column)

Please note as per Heather's blog - "Time Created" (and presumably "Time Modified") are NOT accurate records of when the search was executed.

"Location search" entries (i.e. search location text) seem to be followed by "Coordinates of search" entries (containing the latitude/longitude of the search location).

When directions are requested, a "Navigation journey" entry is created with a "Journey BLOB" which contains the start/end locations.
However, "Navigation journey" entries also seem to be generated even if the user does not explicitly ask for a journey to be calculated. There were 2 such entries in Sahil's data despite him not navigating with the device.

Further research is required into these BLOBs - hence the script :)

The runs Heather's query and creates an HTML report (called iOS14-MapsReport.html) with links to the extracted BLOB files.
Each BLOB is extracted from the database and stored in the user's nominated output directory.
The script has been tested with Python3 on Ubuntu 20 and Win10x64.

Usage example for Ubuntu 20.04 LTS with Python 3.8.2:
python3 -d MapsSync_0.0.1 -o optest
This will output Heather's query to an HTML table with hyperlinks to each extracted BLOB file.
All files will be created in the user nominated "optest" directory.

The corresponding command line output looks like:

Running v2020-09-19

Processed 15 entries

Please refer to iOS14-MapsReport.html in "optest" directory
Exiting ...

Usage example for Win10 with Python 3.6:
c:\Python36\python.exe -d MapsSync_0.0.1 -o opdir
Here's what the output HTML table looks like:
Note: lat/longs were redacted to protect the not-so-innocent ;)

The outputted BLOBs can be researched further with tools like protobuf_inspector.
This is a very funky tool which pretty prints a protobuf and also interprets 64bit fields in multiple ways (useful for finding potential lat/longs).
Its also "pipping" easy to install ... 
pip install protobuf-inspector

Monkey has used protobuf_inspector on extracted Apple Maps protobufs to find some destination Yelp reviews and what appears to be epoch millisecond timestamps (ref. 1JAN1970) which occur just after a GUID.
However, because its not my test data and its a small sample size, I can't confirm if its the time of search ...

Anyway, this is where this post ends ... Good luck to those about to dive further into the protobuf swamp!
If you happen to use this script to find something interesting, please leave a comment and share the knowledge :)

Monday, 25 May 2020

Recovering and Replaying Garmin Voice Instructions

Wait a minute monkey, did you say Carmen or Garmin?

We had a damaged Garmin nuvi 56LM GPS unit from which we recovered a text file containing a voice log.
It was a bit of an unusual process so we thought it might be interesting to share the story.
As a result of this effort, monkey wrote a Python3 script ( that uses the free espeak-ng library to convert the Garmin 56LM voice log text to WAV files for better place name recognition. The script is available from GitHub.

Special Thanks to the following people for their assistance:
Sasha Sheremetov from Rusolut for his advice with the data recovery
Ken Case from Berla for his advice regarding GPS data logs
Benjamin "BJ" Duncan for sharing his findings regarding the voice log
Katie Russ for creating her helpful website

Our story begins with a damaged Garmin nuvi 56LM which "provides easy-to-follow, spoken turn-by-turn directions with street names". Well, it used to!
Wikipedia states it was released in 2014.

Google found an interesting paper from around that time - "Garmin satnavs forensic methods and artefacts: An exploratory study" by Alexandre Arbelet (August 2014).
And while it did not cover our model, it mentioned GPX files as a potential source of GPS tracklogs.
Sounds promising eh?

Unfortunately, our device was damaged beyond repair so chipoff was our only option.
The chip was a SanDisk 8 GB eMMC chip. Multiple reads of the chip produced the same hash so the chip seemed pretty stable.
X-Ways, Autopsy, Oxygen Forensic Detective and FTK Imager did not recognize partitions from the dump.
Cellebrite Physical Analyzer's Garmin Legacy chain also did not not extract any information from the dump.
ASCII plaintext was visible in dump though so not all hope was lost.

Ken from Berla advised that Garmin usually use a FAT32 partition which contains the GPX tracklogs.
Unfortunately, the first 512 byte sector of the dump did not end with the usual 55AA for an MBR so we needed to find a way to extract the filesystems.

Sasha from Rusolut suggested using R-studio to recover/extract the filesystems from the dump.
Great Success!
There were 2 extracted partitions - FAT16 (128 MB) and FAT32 (3.3 GB - a little smaller than expected. Maybe Garmin used 8 GB chips for commonality/ease of upgrade reasons?)

Some noteworthy files were found while looking for timestamped latitude/longitude coordinates ...
Note: This was based on the contents of ONE device, other devices/models probably store their data differently.

Contained a "history" table with scaled latitude/longitude numbers. By multiplying these raw numbers by 180/2^31, we were able to obtain plottable latitude/longitude coordinates.

The "route_segment" table contained timestamp and latitude/longitude route info (unsure if these were travelled).
The "history" table had start latitude/longitude, end latitude/longitude and start times.
Note: Garmin timestamps measure seconds since 31 DEC 1989 (Garmin launch date) - see here for further details.
By adding 631065600 seconds to the Garmin numeric timestamp, you have the number of seconds since the 1970 Unix epoch which then makes it easier to find the human readable time (there are more tools supporting Unix epoch time than for Garmin time).

Contained possible latitude/longitude coordinates with timestamps. However, the file also contained non-ASCII/binary bytes.


Contained potential timestamped search information.

Contained ASCII latitude/longitude strings (search for "GPS main")

Unknown file format which possibly contains the current trip log?

Contained various settings, version and model info.
It also mentioned a GPX directory with references to .gpx files (which did not exist in our device).
Sasha from Rusolut confirmed that his test unit had these files.
I'm not sure why there was a discrepancy - perhaps it was due to regional differences or the user settings?
Looking up the file format for GPX logs showed that they are XML text files which use certain keywords/field names to record the latitude and longitude.
Searching the entire dump for the "trkpt" and  "trkType" keywords did not find any GPX formatted data though.

All semi-interesting info so far ...
However, BJ also brought our attention to an interesting artifact that spawned this post ...

This text file appears to chronologically log GPS spoken instructions. Due to our lack of test devices, we can't guarantee the vehicle was at the exact location spoken but it might be used to show the user was in or aware of the area.

As an example, the voice log contained lines like:
D[2019/06/12 07:55:03] {22ce6e88} [vpm_tts_parse.c:vpm_tts_parse:3770] Navigation phrase selected: Keep right $USR_TO_NEXT_ROAD. (229)
D[2019/06/12 07:55:03] {22ce6e88} [vpm_tts_log.c:vpm_tts_log_phonetics:277] Map Phonetics: "nju "IN|gl@nd *"haI|%we (MDB Lang: 23)

The spoken voice section is the string occurring between "Map Phonetics: " and "(MDB Lang: 23)".
e.g. "nju "IN|gl@nd *"haI|%we
Another example could be:
"wE|st@n *"mo|t@|%we

The line format can be generalized as:
D[YYYY/MM/DD HH:MM:SS] {4byte hex id? process/thread id?} [vpm_tts_log.c:vpm_tts_log_phonetics:277] Map Phonetics: VOICE_STRING (MDB Lang: 23)

Where VOICE_STRING looks like a localized pronunication guide for the system.

Trivia note:
There were also various lines with the string "Voice Language: Australian English-Karen (TTS)" which seems to indicate which voice the user heard.
A bit of Googling found the voice of Karen who has also done other recognisable voice over work (e.g. Garmin GPS – Australian Karen, Navman GPS – Australian Karen, Apple iPhone 4s & 5 – Australian Voice of Siri).

"OK, Karen", there must be a system involved with the pronunciation of the string but at the time, monkey thought it was probably proprietary to Garmin.
Fast forward to a couple of weeks ago and monkey had a mini-breakthrough.
There's a system for pronunciation called the International Phonetic Alphabet (IPA).
If you've read Wikipedia or a dictionary you've probably seen these weird pronunciation symbols and sarcastically thought "Yeah that helps".
For example, the Wikipedia page for "Cooking banana" uses IPA to descibe how to pronounce "plantain".
See the highlighted text containing the weird symbols in the picture below:

Example of IPA pronunciation (highlighted text)

However, if we are limited to the 95 character printable English ASCII set, we need an additional method of encoding those weird IPA symbols. 
There are a several methods available but for our purposes, we will limit the discussion to the Kirshenbaum and X-SAMPA systems.

Looking again at the voice strings from our log - it appears they are using the X_SAMPA system. e.g. double quotes for emphasis, use of { symbols etc.

Conveniently, Katie Russ has created a website using Amazon's Polly Speech API that can take (X-SAMPA) IPA strings and convert them into sound. And with configurable voices/accents too!
You can find her website here:

If you're so inclined, try copying and pasting the following text:
"nju "IN|gl@nd *"haI|%we
into the website and you should hear the equivalent "New England Highway" pronounced. Very cool!

IPA-Reader Website Example

This got monkey thinking - a website is good for one or two strings but copying and pasting hundreds of entries from a device log is not practical.
Amazon Polly isn't free either so that prompted a search for a free alternative.

We found an open source C library called espeak-ng.
While you can compile/build it from scratch, Ubuntu also has it as an installable debian package. Much easier!
Note: We had issues trying to install it on Ubuntu 16.04 (probably because its no longer supported) but had no issues with Ubuntu 20.04.

Here is the espeak-ng help page for Ubuntu 20.04.
It allows users to input a (Kirshenbaum) IPA string and then hear/record the corresponding audio to a WAV file.

To install it on Ubuntu 18+, type:
sudo apt-get install espeak-ng

Then you can use it like:
espeak-ng "[[Hello w3:ld]]"
Note: string is enclosed in double quotes

Its not as polished as Amazon Polly and it can get a bit confused by some strings but it works reasonably well (if you don't mind some Steeeephen Haaaawking like ASMR)
Also note it uses Kirshenbaum strings as input and not X-SAMPA strings (like in the voice log) so some conversion is necessary.
Note: We found that the comparison/conversion chart listed in Wikipedia did not quite translate for all of our data so our script had to do some customized conversions. These conversions may not sound correct to other users depending on the language.

Here's a summary of the conversion process we figured out:
- Enclose the input string between '[[' and ']]' chracters to have the symbols interpreted rather than spelled out  
- Replace '{' characters with 'a'
- Replace double quotes " with single quotes ' for primary stress indication
- Replace '%' with ',' for secondary stress indication
- Replace 'A' with 'a'

For more details on the Kirshenbaum system see here.

For example, the voice log string is:
"h{|m@nd *"{|v@n|ju
which converts to:
[['ha|m@nd *'a|v@n|ju]]

And to hear it spoken as "Hammond Avenue" type:
espeak-ng "[['ha|m@nd *'a|v@n|ju]]"
Note: enclosing double quotes when entering via command line.

To save it as a WAV file you can use:
espeak-ng "[['ha|m@nd *'a|v@n|ju]]" -w output.WAV
Some converted strings may not sound right / recognizable so for those (hopefully rare) occasions, you can enter the voice log string into the website to hear the spoken phrase. It may also help to adjust the espeak-ng playback speed using the -s argument.

The Script

Now that we know how to get an audio file for one voice string, lets try automating the string extractions from the entire voice log.

Input file: vpm_log_all.txt
Output files: An HTML report table containing the original line text and line number, the converted text file for input into espeak-ng and a link to the output WAV file.

Here's the generalized script logic:
Open/Read vpm_log_all.txt
For each line:
    Extract the voice string from the line text
    Convert and write the voice string to LINENUMBER.txt
    Call "espeak-ng -s 100 -w LINENUMBER.WAV -f LINENUMBER.txt" to generate the WAV file
    Store (LINENUMBER, line text, converted voice text, WAV filename) in list

Read list
Print HTML table from list ("Log Line No.", "Log Line Text", "Processed espeak-ng string" (linked to text file), "Audio File" link)

And here is how to run it - this outputs .WAV, .txt and Report.html files to the given "op" output directory:
python3 -f vpm_log_all.log -o op 2020-05-17 Initial
Directory  op  Created
185.WAV = [[*'maks|,wEl 'strit]]
2225.WAV = [['ha|m@nd *'a|v@n|ju]]

Processed 875 voice entries. Exiting ...
Here's the contents of the "op" output directory:

And here's what the "Report.html" looks like (with redacted timestamps):

From the above screenshot, you can see the table is pretty simple - click on the links to view either the input text file or open the WAV audio file.

Note: We tried calling "espeak-ng" from the script using the converted voice string (instead of a text file containing the string) but the generated WAV file kept getting truncated for an unknown reason i.e. words were missing. Using the text file input seemed to avoid this issue.

The script was written/tested with Python3 on Ubuntu 20.04 LTS but we only had one set of test data so it probably needs some tweaking.

Final Thoughts

We have succesfully written a script to extract and convert IPA strings from a Garmin nuvi 56LM (2014) voice log into WAV files.

The script's code (see "process_voicestring" function) for converting the voice log string to the espeak-ng input string may require some adjustment depending on the user data/language settings.

The script may also work for other models of Garmin GPS but this has not been tested.

If you see/have seen similar voice logs in other devices, it would be great to hear from you in the comments section.

Tuesday, 10 March 2020

A Monkey Forays Into USB Flashdrives

What a Feeling Indeed!

Recently monkey was tasked with extracting data from a broken USB flash drive that had previously been "repaired" by another party. It still did not work however.
The following post details the journey to getting the device working again.
It also shows the power of reaching out to more experienced experts. You never know where/when you might find that missing piece of the puzzle!

Special Thankyous to the following monkey-enablers for their assistance/advice/enduring my endless emails:
- Sasha Sheremetov from Rusolut
- Jeremy Brock from RecoverMyFlashDrive
- Maggie Gaffney from Teel Technologies USA
- Cory Stenzel (Twitter - Cory also encouraged me to write this post)
- Ryan Olson

We started with a non-functioning 16 GB USB flash drive with no case or obvious branding. The original USB connector had broken off and was replaced with a different one provided by the repairer.
The drive looked similar to this example posted by atomcrusher on Reddit:

Example of a repaired connector for a USB Flash Drive (Note: this is NOT our repair device)

Initial Observations

- Plugging the device in to a USB power source via a USB Volt/Current meter showed it was not consuming much/if any current. We did not plug it into a PC initially because we did not know what data was stored on the device (e.g. USB Rubber Ducky ). We later used an older sacrificial standalone PC during subsequent testing.

- The USB drive only used 4 pads for the connector (GND, 5V, D+, D-) so it was probably USB2. USB3 uses 9 pins (USB3 uses more data channels). The website has a good introduction to USB devices here.

- There appeared to be a missing component near the activity LED.

- There was a Phison PS2251-68-5 NAND controller on one side of the circuit board and a Toshiba 16 GB Embedded MultiMediaCard (e.MMC) chip on the other side.
For older USB flash drives, the NAND controller is usually a squared shaped chip (e.g. LQFP48 = Low Profile Quad Flat Pack with 48 pins) similar to the one shown in the Reddit example pic.
The controller chip is responsible for translating the host device's (ie PC's) USB read/write instructions into commands that the memory chip can understand. The controller also looks after wear levelling of memory and determines how each write is stored (physical location/any error correction/data deletion).
The presence of an e.MMC memory chip was somewhat unexpected. Older USB flash drives typically use a single NAND controller chip and separate NAND memory chips (usually TSOP48 chips where TSOP48 = Thin Small Outline Package with 48 pins.). Here's an example diagram of a TSOP48 package from the Elnec chip reader website.
An e.MMC chip is different because it combines its own onboard memory controller with some NAND memory. The e.MMC chip package is usually BGA (BGA = Ball Grid Array). ie the signals travel via solder balls on underside of the chip. There are no other external pins like LQFP48. See Wikipedia for futher details .
A quick Google of the Toshiba part number written on the BGA confirmed that it was a 16 GB e.MMC chip.
Hmmm ... e.MMC chips are usually more expensive than regular NAND memory. Why would a USB flash drive manufacturer use e.MMC chips when there's already a dedicated Phison NAND controller chip on the board?
Sasha from Rusolut mentioned this possibility during his very informative Visual NAND Reconstructor course. Typically, e.MMC chips used in this type of arrangement are discounted factory seconds - they have a faulty/disabled internal controller but the NAND memory is OK so they're sold as cheaper NAND chips.
Interestingly, Rusolut also have a solution to read e.MMC chips via NAND interface points (ie it accesses the NAND memory directly and avoids talking to the internal controller). Unfortunately, I don't have access to that wonderful tool and NAND reconstruction into a filesystem can be very complicated and heavily device dependent. For example, two flash drives which use the same controller but run different firmwares can organize the NAND in different ways. 
Normal functioning e.MMC chips can also be removed and read via adapters (e.g. USB-eMMC adapters) that talk to the internal e.MMC controller. However, because our device's e.MMC probably has a faulty/disabled on-board controller, such a read would not be accurate/reliable.

So our initial strategy was to try to repair the flash drive, get it recognized by a PC and then image it via FTK Imager.

The Journey begins!

We started by gathering as much information about the repair drive as possible.
We Googled for any datasheets for:
- the Phison controller model
- the Toshiba memory chip model
We didn't have a lot of success finding datasheets however, we also noticed there was a serial number (POVK568FS1400311) printed on the PCB.
Googling for the serial number led us to this post by Jeremy from RecoverMyFlashDrive.
From the post's pictures, we could see the same circuit design layout, the same component labels and the same controller chip as the repair job but a different e.MMC chip (still a Toshiba e.MMC though).
To better conceptualize the layout, I recommend you check out the board pictures via the RecoverMyFlashDrive link above and have them open in a separate tab to follow along.
After finding Jeremy's post, I wrote him an email (not really expecting a reply) but Jeremy ended up becoming an incredible help.

Maggie Gaffney from Teel Technologies is a great friend always willing to help and she also teaches a course in "Board Level Repair for Digital Forensic Examiners".  So naturally, she was one of the first people I reached out to for advice.
Maggie suggested soldering to the exposed TSOP48 pads and then trying to get a read from the e.MMC chip. Unfortunately, this would also require some NAND reconstruction which I was trying to avoid.
Maggie also provided some tips about using hot air to reflow the e.MMC BGA chip. Reflowing / heating up a BGA chip with a hot air rework station can help re-connect any solder balls which have become disconnected. However, if there's any epoxy/filler between the chip and the PCB, the reflow may not work. Because there appeared to be some sort of translucent silicon filler used, I held off doing any reflow on the e.MMC chip.

I recently saw a Rusolut training testimonial Youtube video by Cory Stenzel and recognized his name from his posts on various digital forensics Google groups. Hoping that he might also be able to help, I sent him an email and was pleasantly surprised when he replied that he was familar with this monkey from attending both SANS FOR585 and Maggie's board repair course.
Cory also had some great ideas/research to share. For starters, Cory sent me this link about "Test mode" for USB Controllers.
Using the power of GoogleTranslate baked into Chrome, it seemed like a promising lead however, Cory noted that he hadn't been able to use it on an exhibit yet.
So I checked with Sasha from Rusolut who added that test mode might help troubleshoot if the (Phison) controller is working but it will not allow for data transfer from the NAND. D'OH!
However, Cory's link also had a diagram listing the pinouts for the Phison USB Controller PS2251-67. This was helpful because I was unable to find a pinout diagram for the PS2251-68-5 controller on our repair device. Ass-uming the pinouts would not have changed much/at all for such a revision, this diagram could help us determine if the controller was being supplied with the correct voltage.
Here's the pinout pic with the relevent pinout on the right hand side:

PS2251-67 Controller pinout from (use Right Hand Side)

Cory had some other good suggestions such as:
- Apply a little bit of pressure to each of the (Phison) controller pins to check that they are soldered correctly to the board. I took Cory's advice a fraction further and used a soldering iron and some flux gel to ensure each Phison pin was electrically connected to its corresponding pad. I then used a multimeter to verify the soldering reflow did not introduce any shorts between adjacent pins.
- Using a good thermal imaging camera to look for hotspots (ie potential short circuits) when the device was plugged in. Unfortunately, I don't have a thermal camera but it does seems like a worthwhile future investment.

I also asked for advice from the Teel Technologies "Physical Mobile Forensics" Google Group and Ryan Olson replied that most of the success stories he's seen involved transplanting the memory chip onto an identical donor. Unfortunately, we probably would not be able to confidently source an identical donor and  transplanting a BGA chip is not as easy as a TSOP48 (at least not for this monkey). Additionally, Rusolut support mentioned that if the controller hardware and firmware is not an exact match, the controller on the new board may erase/format data.

So to summarize our progress so far - we have a USB2 flash drive that does not draw current and is not recognized by an isolated Windows PC when its plugged in.
Plugging in a known working USB2 flash drive into the same port using the same cabling works OK before/after the damaged device. So there's something wrong with our repair device and not our test PC.

Looking at the pictures from Jeremy's post we can see that besides the controller and the e.MMC components, most of the other components are either resistors (small black rectangular components) or capacitors (varying size brown rectangular components).

After unsuccessfully plugging it into a PC, we noticed there was a missing component (labelled "BC9") next to the LED (labelled "D1"). Sourcing a replacement component was somewhat tricky because we did not know the exact make/model of device. However, by opening up a bunch of test USB2 flash drives with the same approximate age/capacity, we eventually found a similar looking LED arrangement on a Verbatim Store N Go flash drive. This reference drive used a different Phison controller but had a similar LED layout configuration. The missing component "BC9" was a capacitor connected to the LED. This was also confirmed later when we saw Jeremy's post pictures.
So using a pair of soldering tweezers and some flux, we transplanted the LED and capacitor from our test Verbatim Store N Go drive to our repair drive.
We tried our repair device again but it still could not connect. Perhaps this was all a "LED herring" after all? *crickets*

After double checking the repaired USB connector against the following pic from

USB A Connector Pinouts

it appeared the previous repairer had soldered on the connector in REVERSE (ie 5 V was connected to Ground and vice versa). Some choice adult phrases may have been uttered. To quote Blackadder: "I think the phrase rhymes with clucking bell".
Silly monkey was also angry that he did not check the connector orientation first.

Hint: The outer metal of the USB connector is typically connected to Ground. By using a multimeter (on the continuity setting) with one probe on the USB ground pin and the other probe on the USB connector should indicate a connection.
Another helpful hint is to cut off the PC end (male) plug of a USB extension cable and then connect the other end of the cable (female) to your test device. This will make it easier to probe the various USB signals later. Rather than trying to fit a multimeter probe into the flash drive's USB connector (or on the pads only located on one side of the PCB), you can now connect a probe to the exposed wires of the extension cable and probe easier. Here's pic of a similarly modified cable from a post here:

Use a modified USB extension cable as a tester cable
So we now know the USB Ground and 5V pins were reversed - what about the other USB pins, D- and D+? Were they reversed as well?
Fortunately, we had another test device which used the same Phison controller so we traced the USB D- / D+ pins to the test Phison controller and connected our repair device accordingly.

We plugged in our newly corrected device and still nothing ... Mother of a baboon!
Perhaps some damage was done by the reversed voltages?
Using the multimeter, we checked all capacitors and resistors for continuity and there were no shorts detected on any capacitors. Some resistors were reading 0 Ohms but apparently this is not uncommon when a manufacturer wishes to either future proof a design or use the 0 Ohm resistor as a kind of overcurrent component that blows before whatever else is downstream.
While the device was plugged into the PC, I started measuring some voltages across some capacitors and was surprised when Windows played the USB connection sound and briefly allowed File Explorer to view some directory contents. This only lasted about 30-60 seconds (long enough to grab a screenshot of a directory) but it was a good sign - the drive wasn't complete cactus. Interestingly, the volume name displayed was "Verbatim16". Unfortunately, I couldn't reconnect any further despite many subsequent attempts.

I shared this development with Jeremy from RecoverMyFlashDrive and he helpfully found a similar drive and shared some capacitor voltages that he saw on his device. One voltage in particular was different to our repair device. Jeremy observed 5 V across a capacitor "C7" connected to pin 47 of the Phison controller. On our repair device, the voltage to pin 47 was 3.7 V.
Looking at the pinout showed pin 47 was the controller's 5 V supply pin. So if we could provide it with 5 V, it *might* just work.

For completeness, the controller was also getting its expected 3.3V on various controller pins - it just seemed to be the 5 V pin that was undervolted. Interestingly, the TSOP48 pins for Vcc were also getting 3.3 V so a read via the TSOP pins and reconstruction via Rusolut VNR probably would have worked as well but it would have taken me a lot more time.

My first thought was to solder a copper jumper wire from the 5 V USB pin direct to pin 47 but after cross checking with Jeremy, it was decided not to do this in case it bypassed some internal controller safety mechanisms.
After unplugging the repair device from the PC, I measured the resistance between the USB 5 V pin and pin 47. On the repaired device, the resistance was 170 Ohms. On a reference device with the same controller (but different PCB layout), the resistance between USB 5V and pin 47 was ~2 Ohms. Bit of a difference!
So on the repair device, I traced the path between the USB 5V and pin 47 and found most of the resistance seemed to be coming from "R1".
If "R1" had failed and had increased its nominal resistance, then there would be less available current to pin 47. Remember: Voltage (V) = Current (I) x Resistance (Ohms).
So I decided to replace "R1" with a 1 Ohm resistor instead. This would make the USB 5 V to pin 47 resistance on the repair device comparable in value to the reference device.

I plugged our newly modified test device into the test PC and  ... BINGO!

The drive connected / was recognized and stayed connected. I was able to grab some screenshots of each directory and then finished imaging it via FTK Imager and a software write blocker program.

Bananas all round!

Further Thoughts

Reversing the voltage on a USB flash drive isn't necessarily a permanent drive killer.

When performing data recovery, don't reach for the nuclear option first (eg chipoff / NAND reconstruction)- it might just be one or two components that require replacement.

Don't be afraid to reach out to others for advice.

With the increasing levels of on device encryption, there will be a corresponding demand for repairing damaged devices instead of removing the memory and reading off-device.
Consequently, having basic hardware troubleshooting skills will be increasingly useful.

If anyone is interested in repair courses, Maggie teaches a "Board Level Repair for Digital Forensic Examiners" course. I've been wanting to attend for a while now and hope to experience it soon.

For further repair tips/techniques, check out these Youtubers:

HDD Recovery Services

Justin from The Art Of Repair

Jessa Jones ipadrehab

Louis Rossman

Finally, if you have any comments, suggestions or resources that can help others troubleshoot USB Flash devices, please leave a comment below.

PS Please don't ask me to recover your personal Flash Drives. Ask Google instead :)

Saturday, 14 October 2017

Monkey takes a .heic

The hills are alive ... with the compression of H.265!

With iOS 11 and macOS High Sierra (10.13), Apple has introduced a file container format called High Efficiency Image File Format (aka HEIF - apparently its pronounced "heef"). Apple is using HEIF to store camera/video/Apple "Live Photos". HEIF is based on multiple standards such as:
- ISO Base Media File Format ISO (14496-12) for structuring data sections within the file container
- ISO/IEC 23008-12 MPEG-H Part 2 / ITU H.265 for compressing the actual still picture and video data. Also referred to as High Efficiency Video Coding (HEVC). Theoretically, HEIF could use other compression algorithms but Apple is using it exclusively with HEVC / H.265.

Some benefits of HEIF are:
- It approximately halves the file size for a given image/video quality.
- It allows for a single file to contain multiple media (eg multiple animated still pictures AND sound e.g. an Apple "Live Photo").

Apple HEIF images will have a .heic file extension. Apple HEVC encoded movies will have the familar Quicktime .mov extension but internally they will use HEVC / H.265 compression. The ISO Base Media File Format ISO (14496-12) is based upon the Quicktime file structure and so it will apply to both .heic images and HEVC .mov files. 

Because it uses a more complex compression algorithm than previous standards (eg H.264 and JPEG), only recent model Apple devices have the required hardware to create HEVC content.
According to Apple's 2017 WWDC presentation 503 "Introducing HEIF and HEVC", to create HEVC pictures/video you need (at least) an iPhone 7 / iPad Pro (A10 Fusion chip) running iOS 11 or a 6th generation Intel Core processor running macOS 10.13 High Sierra.
Software decoding support is apparently available for all Apple devices (presumably running iOS 11 / High Sierra) but playback performance will probably suffer on older hardware.

For the rest of this post we will discuss:
- how to view .heic and HEVC .mov files
- the file format for .heic files
- the file format for HEVC .mov files

We won't be discussing how HEVC / H.265 compression works. For a quick overview on some basic concepts and the difference between H.264 and H.265, please watch this video.

And before we dive any deeper ...
Special Thanks to Maggie Gaffney from Teeltech USA for providing us with iPhone 8 Plus test media files.
We also used sample .heic files from an Ars Technica review (iPad Pro) and sample files provided to the FFmpeg forum (iPhone 7 Plus).

Viewing & Compatibility Issues

Here is an article showing how to set up an iOS 11 device to save/transfer .heic files in their original format (Camera set to "High Efficiency" and "Photos - Transfer to Mac or PC" set to "Keep Originals"). Apple can auto-magically convert .heic files to .jpg files (and h.265 .mov to h.264 .mov) when transferring to non-compatible devices/destinations (eg PC or emails). So if you're not receiving .heic files, check those iOS settings.

Apart from viewing them natively on iOS or High Sierra (eg using Apple Photos or Preview), we found the easiest way to view .heic files was using this free Windows HEIF utility by @liuziangexit.
Note: there are two versions - Chinese and English. Being the uncultured lapdog monkey that we are, we downloaded the English version. Be sure to read the readme file included. Running it on Windows 10 (VM) also required installing the signed Microsoft C++ Redistributable package which was conveniently included in the download zip file.

There is also a website that converts .heic to .jpeg but this may not be appropriate for sensitive photos.

For playing HEVC/H.265 encoded .mov files, we found that IrfanView and VLC player worked OK (IrfanView seemed to have better performance than VLC when viewing high resolution videos).

FFmpeg (v3.3.3) can also be used to screen grab frames  (1 per sec) from a H.265 .mov. The command is:
ffmpeg.exe -i -vf fps=1 outputframe_%d.png
This will result in "outputframe_1.png", "outputframe_2.png" etc. being generated to the current directory.

For more compatible playback, we can convert an H.265 .mov into an H.264 .mov. The command is:
ffmpeg.exe -i -map 0 -c copy -c:v libx264
This copies all other streams (eg audio, subtitles) to the new output file and re-encodes/outputs the source video stream to H.264. See here for details on using the FFmpeg map argument.

We found the easiest way to send a test .heic from an iPhone to a PC was to upload it to Dropbox which has been updated to support .heic and H.265 encoded .mov files. You can view both .heic and .mov files from the website. Unfortunately, it appears that Dropbox might rename the files upon upload. We were expecting to see something like "IMG_4479.heic" but the filename on Dropbox was something like "Photo Oct 08, 10 20 05.heic". Consequently, a hash compare of the source/destination files may be required to verify exact copies.

Exiftool (v10.63) added improved support for HEIF and it will display the EXIF data from an Apple generated .heic or H.265 .mov file. It has not been confirmed if iOS created .heic / HEVC .mov files will retain ALL of their original EXIF metadata after being auto-magically converted to .jpg / H.264 files.

We have not been able to find a non-Apple viewer for HEVC encoded "Live Photos". Trying to transfer them via Dropbox resulted in a "Live Photo" .heic file containing a single image (no sound or other animation). Sorry, no "Live Photos" for you!

File Structures

Now that we know how we can view iOS created HEIF images and videos, lets take a closer look at the actual file formats.
This will be a (reasonably?) short overview - we aren't going to become "data masochists" and delve into every field or the compression side of things. Maybe in a future post (especially if you've been a bad, bad, dirty, dirty monkey and feel the need to be punished LOL) ...

Apple created .heic and .mov files are BIG Endian.

Both .heic and .mov files are based on the ISO Base Media File Format. This means a .heic or .mov file container is divided into dozens of functional "boxes" of data. The start of each box will be marked with a 4 byte box size (typically) and a four byte box type string (eg. 'ftyp', 'mdat', 'meta'). Within the box, there will be other data fields which may consist of other boxes and/or a structured pattern of bytes. So there is a complicated hierarchy of boxes within boxes thing happening which makes it difficult to quickly understand every detail. The majority of the bytes (ie compressed data) will be stored in an "mdat" box. Other boxes will be used to store meta data about how to access/treat the data in those "mdat" boxes.
For further details on how these boxes are structured, please refer to the ISO Base Media File Format standard. Both it and the Quicktime movie format document will be your best friends for this section. FYI the ISO Base Media File Format is also used for .mp4 and .3gp files - so learning about this format will aid in understanding multiple types of media files.

Other handy references include:
- Chapter 3 of Lasse Heikkila's HEIF implementation thesis
- the Nokiatech HEIF Github site
- the 2017 Apple WWDC HEIF presentations (follow the transcript and slide PDF links) for the HEIC File Format and the Intro to HEIF amd HEVC.

For an Apple iPhone 8 Plus .heic file (containing a single 4032 x 3024 image) the file structure can look like this:

ftyp (size=0x18, majorbrand = 'heic', minorversion = 0, compatiblebrands = mif1, heic)
meta (size = 0xF74)
    hdlr (size = 0x22, handler_type is "pict" i.e. file is an image)
    dinf (size = 0x24)
    pitm (size = 0xE, item_ID = 0x31 = primary item)
    iinf (size 0x43D, entry_count = 0x33 = number of items stored)
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x1, item_type = hvc1, item_name = ""  [Tile 1]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x2, item_type = hvc1, item_name = ""  [Tile 2]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x30, item_type = hvc1, item_name = "" [Tile 30]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x31, item_type = grid, item_name = ""  [derived image from all tiles]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x32, item_type = hvc1, item_name = ""  [thumbnail]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x33, item_type = Exif, item_name = ""  [EXIF]

    iref (size = 0x94, version = 0, contains array of SingleItemTypeReferenceBox)
        dimg (size = 0x6C, from_item_ID = 0x31, reference_count = 0x30, to_item_ID = 0x1, 0x2 ... 0x30) [derived image]
        thmb (size = 0xE, from_item_ID = 0x32, reference_count = 0x1, to_item_ID = 0x31) [thumbnail]
        cdsc (size = 0xE, from_item_ID = 0x33, reference_count = 0x1, to_item_ID = 0x31) [content description ref / exif]

    iprp (size = 0x6F3)
        ipco (size = 0x5AD) = ItemPropertyContainerBox = property data*
            colr (size = 0x230) = Colour Information 1
            hvcC (size = 0x70) = decoder configuration 1
            ispe (size = 0x14) = spatial extent 1-1
            ispe (size = 0x14) = spatial extent 1-2
            irot (size = 0x9) = Image rotation 1
            pixi (size = 0x10) = Pixel information 1
            colr (size = 0x230) = Colour Information 2
            hvcC (size = 0x70) = decoder configuration 2
            ispe (size = 0x14) = spatial extent 2-1
            pixi (size = 0x10) = Pixel information 2
        ipma (size = 0x13E) = Item Property Association = connects property data in ipco to item numbers*

            List of 0x32 items. Each item has the structure [item number(2 bytes), size (1 byte), data (size bytes)]
    idat (size = 0x10)
    iloc (size = 0x340, version = 1, offset_size = 4, length_size = 4, base_offset_size = 0, index_size = 0, item_count = 0x33 )
        [item_id = 0x1, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X1, extent_length = Y1]
        [item_id = 0x2, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X2, extent_length = Y2]
        [item_id = 0x33, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X33, extent_length = Y33]

mdat (size = variable, contains data on EXIF / thumbnail / image data)

It looks a little daunting (and this doesn't even show all of the boxes/fields!) but once you figure out which fields are relevant, its not too bad.  We've color coded some sections to make it more followable/wake up those weary eyes.

The ftyp section declares the 'majorbrand' (i.e. file type) as "heic".
The meta section declares how to interpret the raw data stored in the mdat section. Notable sub-boxes include:
    hdlr = The 'handler_type' is set to "pict" which means this is an image (as opposed to a video).
    pitm = Specifies the Primary Item number (eg item_ID 0x31)
    iinf = Contains a list of ItemInfoEntrys. The number of items and sizes will change with resolution/shape (eg camera specs, square photo).  From the 2017 WWDC 513 presentation and actual iOS samples we've observed, images are divided/stored as tiles.
            For a 4032 x 3024 resolution image, there were 0x33 items declared in each .heic file. These consisted of:
            0x30 items with each item_type = 'hvc1'. Each item corresponds to a 512x512 tile.
            1 'grid' item represents the full derived image
            1 'hvc1' item is used for the 320x240 thumbnail
            1 'Exif' item is used for storing EXIF data
    iref = contains array of SingleItemTypeReferenceBox items. From this section we can see that item_ID = 31 is a derived image ('dimg') that refers to item_IDs 0x1 to 0x30 (tiles). There are also references to the thumbnail and exif items.
    iprp = connects item_IDs in the 'ipma' sub-section to properties in the 'ipco' sub-section. *We were unable to find much public documentation on how this is implemented (apart from the Nokia HEIF Github source code).
    iloc = contains file offsets for each item_ID section. e.g. For EXIF (item_ID = 0x33), the extent_offset = 0x000043DB,  extent_length = 0x000007F8. So if we go to the file offset at 0x000043DB, we will see the EXIF item data.
The mdat section contains the raw image data, thumbnail and Exif information.

Due to the tiling, full file recovery could be a bastard a lot more complicated compared to recovering a jpeg (where you can carve everything between the 0xFFD8 and 0xFFD9 markers).
As iOS 11 also uses file based encryption, it *should* be impossible to carve & recover .heic files anyway.
However, if those .heic files were also copied to a separate non-encrypted device (eg PC) and then corrupted/deleted, it *may* be possible to repair or recover some/all of the tiles (theoretically!).

OK, suck it up buttercups ...because there's more!

Here's the file structure for a 6.73 second 7.3 MB Apple HEVC / H.265 encoded .mov taken with an iPhone 8 Plus:

ftyp (size = 0x14, majorbrand = 'qt  ')

wide (size = 0x8)
mdat (size = 0x00746120, contains HEVC / H.265 video data)
moov (size = 0x0028FA)
    mvhd (size = 0x6C, version = 0, creation_time = 0xD5FFE81E (secs since 1JAN1904), modification_time = 0xD5FFE825, ­

                timescale = 0x0000258 = 600 dec. units per sec, duration = 0x00000FC7 = 4039 dec units => 4039/600 = 6.73 secs, 
                next_track_ID = 0x5)
    trak (size = 0x0FE6)
        tkhd (size = 0x5C, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x1, 
                  duration = 0xFC7, width = 0x07800000 => 0x780 = 1920 decimal, height = 0x04380000 => 0x438 = 1080 decimal)
        tapt (size = 0x44)
        edts (size = 0x24)
        mdia (size = 0xF1A) = media box
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x258, 
                        duration = 0xFC7)
            hdlr (size = 0x31, component type = mhlr = media handler, component subtype = vide, component manufacturer = appl,

                     component name = "Core Media Video"
            minf (size = 0xEC1) = contains file offsets to samples/chunks of samples
    trak (size = 0x07B4)
        tkhd (size = 0x5C, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x2, 
                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        mdia (size = 0x72C)
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x000AC44 =
                        44100 samples/sec, duration = 0x00049000 = 299008 samples = 6.78 sec)
            hdlr (size = 0x31, component type = mhlr = media handler, component subtype = soun, component manufacturer = appl,

                     component name = "Core Media Audio"
            minf (size = 0x6D3) = contains file offsets to samples/chunks of samples
    trak (size = 0x042E)
        tkhd (size = 0x5C, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x3, 
                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        tref (size = 0x20)
        mdia (size = 0x386)
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = D5FFE825, timescale = 0x258, 
                        duration = 0xFC7)
            hdlr (size = 0x34, component type = mhlr = media handler, component subtype = meta, component manufacturer = appl,

                     component name = "Core Media Metadata"
            minf (size = 0x32A) = contains file offsets to samples/chunks of samples
    trak (size = 0x0271)
        tkhd (size = 0x5C, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x4, 

                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        tref (size = 0x20)
        mdia (size = 0x1C9)
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x258, 
                        duration = 0xFC7)
            hdlr (size = 0x34, component type = mhlr = media handler, component subtype = meta, component manufacturer = appl,

                     component name = "Core Media Metadata"
            minf (size = 0x16D) => contains file offsets to samples/chunks of samples

udta (size= 0x08)
free (size = 0x400)
meta (size = 0x5BD)
    hdlr (size = 0x22, component subtype = mdta)
    keys (size = 0xC9) => contains various metadata field names
    ilst (size = 0xCA) => contains various metadata field values
    free (size = 0x400)

free (size = 0x88)

We can see some familiar 4 letter strings (reckon you might be spouting some others of your own by now ...) and the offset information is now contained in the 'moov' section (recall that offset info is stored in the 'meta' / 'iloc' section for a .heic).
Also, instead of utilising "items" like .heic, the movie is organised into traks (eg video trak, sound trak). These 'trak' boxes include file offsets to the 'mdat' section (via 'trak' / 'mdia' / 'minf').
The 'moov' box has a movie header atom labelled 'mvhd'. This shows the length of the movie and creation/modified dates (amongst other things).
There were 4 traks recorded - one for video (track_ID=1), one for sound (track_ID=2) and two for meta data (track_ID=3 and 4). The second (smaller) metadata trak (track_ID=4) may be extending the first (track_ID=3) metadata trak (due to space limitations?) as the metadata strings are different but seem related.
'free' marks boxes that can be ignored/skipped
'meta' marks a box containing metadata however, the 'meta' structure from a .mov *will not* match the 'meta' data structure from a .heic image. Presumably Exiftool will grab metadata from both 'trak' and 'meta' boxes.

In other observed H.265 .mov files (both smaller and larger), multiple 'mdat' sections were observed. This may be related to the existence of a 'hoov' box which we couldn't find any documentation on. The 'hoov' box appeared at an lower (earlier) file offset than the 'moov' and also contained 'mvhd' and 'trak' boxes etc.
Perhaps the 'hoov' box was a previous 'moov' box that had its name modified so its data can be overwritten? eg as file grows in size, data gets re-written? #SpeculatorMonkey

Final Thoughts

Oh, my aching lederhosen!
And this post has only just scratched the arse surface of the HEIF-y beast.
There are a lot more possibilities with HEIF than what Apple has currently implemented. The Nokiatech Github site demonstrates a bunch of different image file possibilities (eg single images, sequences of images, HD movies, combined images/video).

Although we weren't able to capture a native Apple "Live Photo" for examination, we *suspect* it will use a sequences .heics file and have a 'moov' box in addition to 'ftyp', 'mdat' and 'meta' boxes. This was kinda shown on slide 60 of the 2017 Apple WWDC slides for High Efficiency Image File Format.

This post by states that Apple "Live Photos" initially consisted of a 12 MP jpeg with 45 frames of H.264 .mov at 15 fps (i.e. 3 secs video = 1.5 secs before/after button press).
This anandtech forum article states that "Live Photos" are:
"1440x1080 HEVC on certain devices, albeit paired with HEIC images now instead of JPEG. There is also the option of leaving it has 1440x1080 h.264 with JPEG though."

Anyhoo, if you are able to catch a "Live Photos" unicorn file, we'd be very interested to hear about its file structure (leave a comment?).

For "Live Photos" we tried directly connecting an iPhone 8 running iOS 11.0.3 to a Windows 7 PC and was able to see the DCIM folders. The iPhone 8 was set to the default "High Efficiency" Photos with Mac/PC transfer set to "Keep Originals". However, after copying the files over, when we looked at the transferred .heic file structures on the PC they were single images.
There were no 'moov' or 'trak' items. So it looks like iOS is not openly exposing their "Live Photo" file structure to non-Apple devices. Boo! :'(

Finally, if you know of any other easy to install/use .heic viewers or have any thoughts/suggestions, please leave a note in the comments.

Monday, 21 August 2017

Monkey Unpacks Some Python

UNPACK-ing Python .. Now with added monkey!
Some forensic folks have suggested that a Python tutorial on how to read/print binary data types might be helpful to budding Python programmers in the community.
So in this post, we will simulate reverse engineering a fictional contact file format and then write a Python script to extract/print out the values.
For brevity, this post ass-umes the reader has a basic knowledge of Python (i.e. they can launch a script and know about functions/assigning variables etc.). There are plenty of introductory tutorials online - if you are a beginner, you might want to check out the Google Developer Python course before proceeding.

The script ( has been tested with both Python v2.7.12 and Python 3.4.1 on a Win7x64 PC.
Historically, Python 2 had more supported 3rd party libraries. Consequently, it was the first version of Python that this monkey learned and we are actually more familiar with Python 2. Python 2's End Of Life is currently scheduled for April 2020 so there's a few years left. However, as this script does not rely on 3rd party libraries, we have adapted it to run on both Python 2 and 3.
The main difference affecting this script was that Python 3 treats strings as Unicode by default so we had to add an encode('utf-8') call when searching through our data file.

There is more than one way to code a solution. We have tried to make this code easy to follow instead of making it "Pythonic" (whatever that even means) or by adding lots of error checking code (if you write a script, you should know how it to use it!).
The Python script ( and sample binary file (testctx.bin) will be posted to my brand-monkey-spanking-new GitHub Python Tutorials folder.

So, here's a screenshot of the "testctx.bin" file we want to read:
Screenshot of "testctx.bin" (brought to us courtesy of WinHex!)

Note: The first contact record is highlighted and offsets are listed in decimal. Curious George is ... curious?

Using some reverse engineering strategies that we previously wrote about here we can make a few observations regarding the structure of each Contact record ...

  • We can see there's a repeated "ctx!" string before each Contact record.
  • After each record's "ctx!" field, there is a Little Endian 2 byte field that seems to increase with each subsequent record (eg 0x0100 at decimal offset 68, 0x02FF at decimal offset 516, 0xFFFF at decimal offset 804). For initial classification, we will say its an index record number.
  • Each record has a UTF16LE (ie 2 bytes per character) string that contains a name (eg George).
  • Each record has a UTF8/ASCII (ie 1 byte per character) string that contains a phone number (eg 5551234).
  • Before each of the strings, there is a one byte integer corresponding to the string size in bytes.
  • The last field seems to be a 4 byte field. By observing which bytes vary and which bytes remain constant(ie the left most bytes change more rapidly than the rightmost bytes), we suspect the last field is a Little Endian timestamp field. Feeding in the first record's last 4 bytes (ie 0x26CDDB56) into DCode results in a valid date/time for a Unix 32 bit Little Endian timestamp
DCoding the Contact timestamp

So here's our contact record format:
Contact record data structure

And here's a summary of what we want the script to do:

1. Open "testctx.bin" file (read only)
2. Store file contents
3. Search file contents for ctx! markers
4. For each hit:
    4a. Print hit offset
    4b. Extract Index Number field and print
    4c. Extract Name Length field and print
    4d. Extract Name String (UTF16LE) field and print
    4e. Extract Phone Length field and print
    4f. Extract Phone String (UTF8) field and print
    4g. Extract Unix Timestamp field and print (in ISO format)

5. Close file


The Script

OK now that we know what we want to do, here's how we implement each step in code ...

Steps 1 & 2 Open file and store file contents (See "" lines 25-33):
1. Open "testctx.bin" file (read only)
2. Store file contents

For step 1, we open the "textctx.bin" file in read-only binary mode (what the "rb" stands for):
    fb = open(filename, "rb")

We chose read-only mode because we don't want to change the file contents and we chose binary mode because we are interpreting the file as raw bytes (not text).
Then to read/store the file contents, we call:
    filecontent =

So the "filecontent" variable will now contain every byte from the "testctx.bin" file and individual bytes can be accessed directly using the "slice" notation.
For example, filecontent[0:3] is 3 bytes long and includes the bytes at offsets 0, 1 and 2. It does NOT include the byte at offset 3.
If we replace the start/end locations of our slice example with a variable called startoffset, we get:
This will include the 3 bytes at start, start+1, start+2 only.
The reader might want to remember that little notation nugget as monkey has the feeling it will be popping up again later ... (Hehe, Poo jokes are still floating around in 2017!)

Step 3: Search file contents for "ctx!" markers (See "" lines 35-49):
Knowing that "ctx!" encoded in ASCII/UTF8 is x63 x74 x78 x21, we can use a variable "searchstring" to represent our search term in hex:
    searchstring = "\x63\x74\x78\x21"

We now consider the "filecontent" variable as one big string of bytes ...
Python string types have a find() method which searches the parent string for a substring. The find() method returns -1 if the substring is not found otherwise, it returns the first offset where it found the substring. The find() method can also take an starting offset argument so we can use a while loop to repeatedly call find() with an incrementing starting offset until we get no more hits. Thus we can find an offset for each substring hit in the parent string, which we can then store in a Python list called "hitlist".
Here's the code:
    nexthit = filecontent.find(searchstring.encode('utf-8'), 0)
    hitlist = []
    while nexthit >= 0:
        nexthit = filecontent.find(searchstring.encode(), nexthit + 1)

We use searchstring.encode('utf-8') because of Python 3 compatibility issues. Python 3 treats all strings as Unicode by default, where as we need to search in UTF8 (ie byte by byte). So we have to encode the searchstring as UTF8 before running the search.
Default Python 2 strings are represented as sequences of raw bytes so calling searchstring.encode('utf-8') in Python 2 has no real effect - we could have used Python 2 lines such as:
    nexthit = filecontent.find(searchstring, 0)
    nexthit = filecontent.find(searchstring, nexthit + 1)
This was the only major script change required for Python2 and Python 3 compatibility.

Step 4: Looping through each hit (See "" lines 50-88):
Now we have our hitlist of offsets to "ctx!" markers and we know how each contact record is structured, so we can iterate through the filecontent variable using a for loop and extract/print the data we need using the slice notation we discussed previously.

4a. We print out each hit offset in both decimal and hexadecimal.
    print("\nHit found at offset: " + str(hit) + " decimal = " + hex(hit) + " hex")

We use the str() function to convert the "hit" offset variable into a decimal string for printing and the hex() function to convert the hit offset variable into a hexadecimal string.

4b. The first field ("Index Number") after the "ctx!" marker will start 4 bytes after the hit offset. To calculate the offset, we can use code like:
    indexnum_offset = hit + 4 
As we have already read the entire file into filecontent, we can access the 2 byte "Index Number" field and interpret it as a Little Endian 2 byte integer as follows:
    indexnum = struct.unpack("<H", filecontent[indexnum_offset:(indexnum_offset+2)])[0]

We are using the struct module's "unpack" function on the given filecontent slice to interpret the slice as a LE 2 byte integer and store it in the "indexnum" variable.
The "<H" argument tells unpack how to interpret the raw bytes i.e. "<" for Little Endian, "H" for unsigned 2 byte integer.
The unpack function returns a tuple (kinda like a sequence of variables) so we specify the "[0]" at the end to retrieve the first converted value. It seems a bit weird until you find out that you can chain types together in the same unpack call. For example, "<HH" specifies 2 consecutive LE unsigned 2 byte integers. Unfortunately, we cannot use chaining here due to the variable length of Name/Phone strings in the contact record.
There's a bunch of other unpack types defined in the Python help documentation (search for "pack unpack").

We can now print out our interpreted "indexnum" value but we need to use the str() function to convert our Index Number integer into a printable string. We can use code such as:
    print("indexnum = " + str(indexnum))

We can re-use a similar pattern of code for the remaining fields in the record.
That is, we calculate the offset of field X, interpret those slice bytes and then print.
Because we know the record field sizes (or can read them e.g. via "Name Length" size byte), calculating the offsets becomes an exercise in adding field sizes to previous field offsets to get to the next offset address.

4c. So for the second field ("Name Length") we can use:
    namelength_offset = indexnum_offset + 2
    print("namelength_offset = " + str(namelength_offset))
    namelength = struct.unpack("B", filecontent[namelength_offset:(namelength_offset+1)])[0]
    print("namelength = " + str(namelength))

For the "Name Length" field (one byte long), we use a starting offset ("namelength_offset") which is 2 bytes past the "Index Number" offset ("Index Number" field is 2 bytes long).
We use unpack with the "B" argument as we are interpreting the 1 byte at filecontent[name_length_offset:(namelength_offset+1)] as an unsigned 1 byte integer and storing it in the "namelength" variable.

4d. For the third field ("Name String") we can use:
    namestring_offset = namelength_offset + 1
    namestring = filecontent[namestring_offset:(namestring_offset+namelength)].decode('utf-16-le')
    print("namestring = " + namestring)

After calculating the "Name String" field offset (should be one byte past the "Name Length" field), we can use the string.decode('utf-16-le') method to interpret the filecontent[namestring_offset:(namestring_offset+namelength)] slice as a UTF16LE string and store it in the "namestring" variable.

4e. For the fourth field ("Phone Length") we can use:
    phonelength_offset = namestring_offset + namelength
    phonelength = struct.unpack("B", filecontent[phonelength_offset:(phonelength_offset+1)])[0]
    print("phonelength = " + str(phonelength))

After calculating the "Phone Length" field offset (should be "Name Length" bytes past the "Name String" offset), we use unpack with the "B" argument as we are interpreting the 1 byte at filecontent[phonelength_offset:(phonelength_offset+1)] as an unsigned 1 byte integer and storing it in the "phonelength" variable.

4f. For the fifth field ("Phone String") we can use:
    phonestring_offset = phonelength_offset + 1
    print("phonestring_offset = " + str(phonestring_offset))
    phonestring = filecontent[phonestring_offset:(phonestring_offset+phonelength)].decode('utf-8')
    print("phonestring = " + phonestring)

After calculating the "Phone String" field offset (should be one byte past the "Phone Length" field), we can use the string.decode('utf-8') method to interpret the filecontent[phonestring_offset:(phonestring_offset+phonelength)] slice as a UTF8 string and store it in the "phonestring" variable.

4g. For the sixth and last field ("Unix Timestamp") we can use:
    timestamp_offset = phonestring_offset + phonelength
    print("timestamp_offset = " + str(timestamp_offset))
    timestamp = struct.unpack("<I", filecontent[timestamp_offset:(timestamp_offset+4)])[0]
    print("raw timestamp decimal value = " + str(timestamp))
    timestring = datetime.datetime.utcfromtimestamp(timestamp).strftime("%Y-%m-%dT%H:%M:%S")
    print("timestring = " + timestring)

We calculate the timestamp offset as being "Phone Length" bytes past the "Phone String" field and print the timestamp offset to help with debugging.
We use unpack with the "<I" argument to interpret the 4 byte filecontent[timestamp_offset:(timestamp_offset+4)] slice as a LE unsigned 4 byte integer and then store the integer value in the "timestamp" variable.
eg interprets 0x26CDDB56 LE as 0x56DBCD26 BE = 1457245478 decimal = number of seconds since 1JAN1970.
We then call the datetime.datetime.utcfromtimestamp() method to create a Python "datetime" object using the number of seconds since 1JAN1970. The returned datetime object has a "strftime" method we can call to obtain a human readable ISO format string. The "%Y-%m-%dT%H:%M:%S" argument to strftime() specifies that we want a datetime string formatted as Year-Month-DayTHour:Minute:Second.

Step 5: After we process all of the "ctx!" hits, we close the file (See "" line 89):

For shiggles, we also print out the number of hits in the hitlist on line 91 before the script finishes.
    print("\nProcessed " + str(len(hitlist)) + " ctx! hits. Exiting ...\n")

Running the script

For Python v2.7.12:
In a Win7x64 command terminal window with "" and "testctx.bin" copied to "c:\":

Running v2017-08-19

Hit found at offset: 64 decimal = 0x40 hex
indexnum = 1
namelength_offset = 70
namelength = 12
namestring = George
phonelength = 7
phonestring_offset = 84
phonestring = 5551234
timestamp_offset = 91
raw timestamp decimal value = 1457245478
timestring = 2016-03-06T06:24:38

Hit found at offset: 512 decimal = 0x200 hex
indexnum = 65282
namelength_offset = 518
namelength = 18
namestring = King Kong
phonelength = 9
phonestring_offset = 538
phonestring = +15554321
timestamp_offset = 547
raw timestamp decimal value = 1457245695
timestring = 2016-03-06T06:28:15

Hit found at offset: 800 decimal = 0x320 hex
indexnum = 65535
namelength_offset = 806
namelength = 30
namestring = Magilla Gorilla
phonelength = 10
phonestring_offset = 838
phonestring = +445552468
timestamp_offset = 848
raw timestamp decimal value = 1457258495
timestring = 2016-03-06T10:01:35

Processed 3 ctx! hits. Exiting ...


For Python 3.4.1:
 In a Win7x64 command terminal window with "" and "testctx.bin" copied to "c:\":

Running v2017-08-19

Hit found at offset: 64 decimal = 0x40 hex
indexnum = 1
namelength_offset = 70
namelength = 12
namestring = George
phonelength = 7
phonestring_offset = 84
phonestring = 5551234
timestamp_offset = 91
raw timestamp decimal value = 1457245478
timestring = 2016-03-06T06:24:38

Hit found at offset: 512 decimal = 0x200 hex
indexnum = 65282
namelength_offset = 518
namelength = 18
namestring = King Kong
phonelength = 9
phonestring_offset = 538
phonestring = +15554321
timestamp_offset = 547
raw timestamp decimal value = 1457245695
timestring = 2016-03-06T06:28:15

Hit found at offset: 800 decimal = 0x320 hex
indexnum = 65535
namelength_offset = 806
namelength = 30
namestring = Magilla Gorilla
phonelength = 10
phonestring_offset = 838
phonestring = +445552468
timestamp_offset = 848
raw timestamp decimal value = 1457258495
timestring = 2016-03-06T10:01:35

Processed 3 ctx! hits. Exiting ...


We can see that all of the name and phone strings are complete / as shown in the Hex view picture.
We also verified that each "timestring" value corresponded to it's raw LE hex value using Dcode.

Final Thoughts

After you know the basics of a language, programming is a skill best sharpened by working on actual projects (not reading books or blog posts).
Google and StackOverflow are your friends when researching how to code common tasks in Python.
Which makes print statements your No-BS-tell-it-like-it-is best friend when debugging (e.g. print offset addresses and/or values to debug). A well placed print statement can be the easiest way of finding out that your fifth cola/coffee didn't do you any favours.

The code in this script is intended for use with files that can fit into memory (ie 0 MB to *maybe* hundreds of MB).
Larger files may require breaking up the file into chunks before reading/processing.

In writing this script, we used Notepad++ (v6.7.9.2) with the Language set to Python to get the funky syntax highlighting (eg comments in green, auto-indenting). The TAB size was set to 4 spaces via the Settings, Preferences, Tab Settings menu. We disabled "Word Wrap" (under View menu) and enabled line numbers (under Settings, Preferences, Editing menu) so if/when you get a runtime error, you can find the relevant line more readily.

If you are in the forensic community and found this post helpful or you're in the forensic community and had some questions/thoughts about the code, please leave a comment or send me an email (No, I will not do your homework/assignment! But if its for a new artifact for a case, monkey might be convinced ;).