Monday 25 May 2020

Recovering and Replaying Garmin Voice Instructions

Wait a minute monkey, did you say Carmen or Garmin?

We had a damaged Garmin nuvi 56LM GPS unit from which we recovered a text file containing a voice log.
It was a bit of an unusual process so we thought it might be interesting to share the story.
As a result of this effort, monkey wrote a Python3 script ( that uses the free espeak-ng library to convert the Garmin 56LM voice log text to WAV files for better place name recognition. The script is available from GitHub.

Special Thanks to the following people for their assistance:
Sasha Sheremetov from Rusolut for his advice with the data recovery
Ken Case from Berla for his advice regarding GPS data logs
Benjamin "BJ" Duncan for sharing his findings regarding the voice log
Katie Russ for creating her helpful website

Our story begins with a damaged Garmin nuvi 56LM which "provides easy-to-follow, spoken turn-by-turn directions with street names". Well, it used to!
Wikipedia states it was released in 2014.

Google found an interesting paper from around that time - "Garmin satnavs forensic methods and artefacts: An exploratory study" by Alexandre Arbelet (August 2014).
And while it did not cover our model, it mentioned GPX files as a potential source of GPS tracklogs.
Sounds promising eh?

Unfortunately, our device was damaged beyond repair so chipoff was our only option.
The chip was a SanDisk 8 GB eMMC chip. Multiple reads of the chip produced the same hash so the chip seemed pretty stable.
X-Ways, Autopsy, Oxygen Forensic Detective and FTK Imager did not recognize partitions from the dump.
Cellebrite Physical Analyzer's Garmin Legacy chain also did not not extract any information from the dump.
ASCII plaintext was visible in dump though so not all hope was lost.

Ken from Berla advised that Garmin usually use a FAT32 partition which contains the GPX tracklogs.
Unfortunately, the first 512 byte sector of the dump did not end with the usual 55AA for an MBR so we needed to find a way to extract the filesystems.

Sasha from Rusolut suggested using R-studio to recover/extract the filesystems from the dump.
Great Success!
There were 2 extracted partitions - FAT16 (128 MB) and FAT32 (3.3 GB - a little smaller than expected. Maybe Garmin used 8 GB chips for commonality/ease of upgrade reasons?)

Some noteworthy files were found while looking for timestamped latitude/longitude coordinates ...
Note: This was based on the contents of ONE device, other devices/models probably store their data differently.

Contained a "history" table with scaled latitude/longitude numbers. By multiplying these raw numbers by 180/2^31, we were able to obtain plottable latitude/longitude coordinates.

The "route_segment" table contained timestamp and latitude/longitude route info (unsure if these were travelled).
The "history" table had start latitude/longitude, end latitude/longitude and start times.
Note: Garmin timestamps measure seconds since 31 DEC 1989 (Garmin launch date) - see here for further details.
By adding 631065600 seconds to the Garmin numeric timestamp, you have the number of seconds since the 1970 Unix epoch which then makes it easier to find the human readable time (there are more tools supporting Unix epoch time than for Garmin time).

Contained possible latitude/longitude coordinates with timestamps. However, the file also contained non-ASCII/binary bytes.


Contained potential timestamped search information.

Contained ASCII latitude/longitude strings (search for "GPS main")

Unknown file format which possibly contains the current trip log?

Contained various settings, version and model info.
It also mentioned a GPX directory with references to .gpx files (which did not exist in our device).
Sasha from Rusolut confirmed that his test unit had these files.
I'm not sure why there was a discrepancy - perhaps it was due to regional differences or the user settings?
Looking up the file format for GPX logs showed that they are XML text files which use certain keywords/field names to record the latitude and longitude.
Searching the entire dump for the "trkpt" and  "trkType" keywords did not find any GPX formatted data though.

All semi-interesting info so far ...
However, BJ also brought our attention to an interesting artifact that spawned this post ...

This text file appears to chronologically log GPS spoken instructions. Due to our lack of test devices, we can't guarantee the vehicle was at the exact location spoken but it might be used to show the user was in or aware of the area.

As an example, the voice log contained lines like:
D[2019/06/12 07:55:03] {22ce6e88} [vpm_tts_parse.c:vpm_tts_parse:3770] Navigation phrase selected: Keep right $USR_TO_NEXT_ROAD. (229)
D[2019/06/12 07:55:03] {22ce6e88} [vpm_tts_log.c:vpm_tts_log_phonetics:277] Map Phonetics: "nju "IN|gl@nd *"haI|%we (MDB Lang: 23)

The spoken voice section is the string occurring between "Map Phonetics: " and "(MDB Lang: 23)".
e.g. "nju "IN|gl@nd *"haI|%we
Another example could be:
"wE|st@n *"mo|t@|%we

The line format can be generalized as:
D[YYYY/MM/DD HH:MM:SS] {4byte hex id? process/thread id?} [vpm_tts_log.c:vpm_tts_log_phonetics:277] Map Phonetics: VOICE_STRING (MDB Lang: 23)

Where VOICE_STRING looks like a localized pronunication guide for the system.

Trivia note:
There were also various lines with the string "Voice Language: Australian English-Karen (TTS)" which seems to indicate which voice the user heard.
A bit of Googling found the voice of Karen who has also done other recognisable voice over work (e.g. Garmin GPS – Australian Karen, Navman GPS – Australian Karen, Apple iPhone 4s & 5 – Australian Voice of Siri).

"OK, Karen", there must be a system involved with the pronunciation of the string but at the time, monkey thought it was probably proprietary to Garmin.
Fast forward to a couple of weeks ago and monkey had a mini-breakthrough.
There's a system for pronunciation called the International Phonetic Alphabet (IPA).
If you've read Wikipedia or a dictionary you've probably seen these weird pronunciation symbols and sarcastically thought "Yeah that helps".
For example, the Wikipedia page for "Cooking banana" uses IPA to descibe how to pronounce "plantain".
See the highlighted text containing the weird symbols in the picture below:

Example of IPA pronunciation (highlighted text)

However, if we are limited to the 95 character printable English ASCII set, we need an additional method of encoding those weird IPA symbols. 
There are a several methods available but for our purposes, we will limit the discussion to the Kirshenbaum and X-SAMPA systems.

Looking again at the voice strings from our log - it appears they are using the X_SAMPA system. e.g. double quotes for emphasis, use of { symbols etc.

Conveniently, Katie Russ has created a website using Amazon's Polly Speech API that can take (X-SAMPA) IPA strings and convert them into sound. And with configurable voices/accents too!
You can find her website here:

If you're so inclined, try copying and pasting the following text:
"nju "IN|gl@nd *"haI|%we
into the website and you should hear the equivalent "New England Highway" pronounced. Very cool!

IPA-Reader Website Example

This got monkey thinking - a website is good for one or two strings but copying and pasting hundreds of entries from a device log is not practical.
Amazon Polly isn't free either so that prompted a search for a free alternative.

We found an open source C library called espeak-ng.
While you can compile/build it from scratch, Ubuntu also has it as an installable debian package. Much easier!
Note: We had issues trying to install it on Ubuntu 16.04 (probably because its no longer supported) but had no issues with Ubuntu 20.04.

Here is the espeak-ng help page for Ubuntu 20.04.
It allows users to input a (Kirshenbaum) IPA string and then hear/record the corresponding audio to a WAV file.

To install it on Ubuntu 18+, type:
sudo apt-get install espeak-ng

Then you can use it like:
espeak-ng "[[Hello w3:ld]]"
Note: string is enclosed in double quotes

Its not as polished as Amazon Polly and it can get a bit confused by some strings but it works reasonably well (if you don't mind some Steeeephen Haaaawking like ASMR)
Also note it uses Kirshenbaum strings as input and not X-SAMPA strings (like in the voice log) so some conversion is necessary.
Note: We found that the comparison/conversion chart listed in Wikipedia did not quite translate for all of our data so our script had to do some customized conversions. These conversions may not sound correct to other users depending on the language.

Here's a summary of the conversion process we figured out:
- Enclose the input string between '[[' and ']]' chracters to have the symbols interpreted rather than spelled out  
- Replace '{' characters with 'a'
- Replace double quotes " with single quotes ' for primary stress indication
- Replace '%' with ',' for secondary stress indication
- Replace 'A' with 'a'

For more details on the Kirshenbaum system see here.

For example, the voice log string is:
"h{|m@nd *"{|v@n|ju
which converts to:
[['ha|m@nd *'a|v@n|ju]]

And to hear it spoken as "Hammond Avenue" type:
espeak-ng "[['ha|m@nd *'a|v@n|ju]]"
Note: enclosing double quotes when entering via command line.

To save it as a WAV file you can use:
espeak-ng "[['ha|m@nd *'a|v@n|ju]]" -w output.WAV
Some converted strings may not sound right / recognizable so for those (hopefully rare) occasions, you can enter the voice log string into the website to hear the spoken phrase. It may also help to adjust the espeak-ng playback speed using the -s argument.

The Script

Now that we know how to get an audio file for one voice string, lets try automating the string extractions from the entire voice log.

Input file: vpm_log_all.txt
Output files: An HTML report table containing the original line text and line number, the converted text file for input into espeak-ng and a link to the output WAV file.

Here's the generalized script logic:
Open/Read vpm_log_all.txt
For each line:
    Extract the voice string from the line text
    Convert and write the voice string to LINENUMBER.txt
    Call "espeak-ng -s 100 -w LINENUMBER.WAV -f LINENUMBER.txt" to generate the WAV file
    Store (LINENUMBER, line text, converted voice text, WAV filename) in list

Read list
Print HTML table from list ("Log Line No.", "Log Line Text", "Processed espeak-ng string" (linked to text file), "Audio File" link)

And here is how to run it - this outputs .WAV, .txt and Report.html files to the given "op" output directory:
python3 -f vpm_log_all.log -o op 2020-05-17 Initial
Directory  op  Created
185.WAV = [[*'maks|,wEl 'strit]]
2225.WAV = [['ha|m@nd *'a|v@n|ju]]

Processed 875 voice entries. Exiting ...
Here's the contents of the "op" output directory:

And here's what the "Report.html" looks like (with redacted timestamps):

From the above screenshot, you can see the table is pretty simple - click on the links to view either the input text file or open the WAV audio file.

Note: We tried calling "espeak-ng" from the script using the converted voice string (instead of a text file containing the string) but the generated WAV file kept getting truncated for an unknown reason i.e. words were missing. Using the text file input seemed to avoid this issue.

The script was written/tested with Python3 on Ubuntu 20.04 LTS but we only had one set of test data so it probably needs some tweaking.

Final Thoughts

We have succesfully written a script to extract and convert IPA strings from a Garmin nuvi 56LM (2014) voice log into WAV files.

The script's code (see "process_voicestring" function) for converting the voice log string to the espeak-ng input string may require some adjustment depending on the user data/language settings.

The script may also work for other models of Garmin GPS but this has not been tested.

If you see/have seen similar voice logs in other devices, it would be great to hear from you in the comments section.