Friday 3 July 2015

Extracting Ones BLOBs From The Clutches Of SQLite

SQLite BLOB work used to be an adventure ... Not anymore!

Did you know that SQLite databases can also hold binary data? BLOB fields can contain pictures, audio, base64 encoded data and any other binary data you care to wobble your gelatinous finger at.
Monkey's recent foray into SQLite led him to the magic of the "pragma table_info" SQLite query (which returns a table's column configuration). This means that we don't have to hard code our queries and/or know the database schema before we query a table. Consequently, given a table name, we can now query that table for any BLOBs and dump them to file for further analysis. This could come in handy when analysing a browser cache history database (for images) or a voicemail database (for recorded messages) or a contact database (for images) or any other stored binary BLOB (eg binary .plists). So it also sounds like a good fit for mobile device forensic analysis ... awww, my favorite - how did you know? :)

Special Thanks to Alex Caithness of CCLForensics for inspiring/laying the groundwork for this script idea via his 2013 post. Monkey salutes you Alex (in the nice way)!

You can download the Python script ( from my GitHub page.

The Script

Here's the script's help text:

cheeky@ubuntu:~$ python ./ -h
Running v2015-07-03

usage: [-h] db table outputdir

Extracts BLOB fields from a given SQLite Database

positional arguments:
  db          SQLite DB filename
  table       SQLite DB table name containing BLOB(s)
  outputdir   Output directory for storing extracted BLOBs

optional arguments:
  -h, --help  show this help message and exit

Given an SQLite database filename, the table name containing BLOBs and an output directory, this script will query the table for any BLOB columns, process the BLOB content for file type and then dump the contents to a file in the specified output directory. The file type processing is currently limited to some common mobile device file types but it should be easy to modify for other types in the future (depending on their file signatures). All the processing really does is determine the output filename extension. All BLOBs are extracted - some just have a more user friendly file extension (vs. the default .blob extension).
And before you get on Monkey's back about the lack of service, have you seen how many different file formats there are these days?! ;)

Back to the script ... There's a bit of housekeeping stuff at the beginning of the script eg checking the database file exists, creating the output directory if required.
The real fun begins when we do the "pragma table_info" query. This returns a row entry for each column type in the specified table. Each row entry (ie column type) has the following fields:
ColumnID, Name, Type, NotNull, DefaultValue, PrimaryKey

The Type field should be set to "BLOB" for BLOBs. The Name field is set to the column name.
We can also figure out which column is being used for the Primary Key (ie unique key index for each record) by looking at the PrimaryKey value (which should be set to 1 for the Primary Key).

Now that we know which columns contain BLOBs, we can formulate a SELECT query which looks something like:
SELECT primarykeyname, columnname(s) FROM tablename ORDER BY primarykeyname;
Then we iterate through each of the returned table rows and for each BLOB column, we call the user defined "calculate_filename" function to construct an output filename using the table name, the primary key value (rowid), the BLOB column name and first several bytes of the BLOB.
From the first several bytes of each BLOB, we look for certain file signatures (.jpg, .png, .zip, .bplist, .3gp, .3g2, .amr) and name the output file's extension accordingly. If the BLOB was not one of the previously mentioned types, it is given the default file extension of ".blob".
The file naming convention is:

Where .ext can be: .jpg, .png, .zip, .blob (default), .bplist (untested), .3gp (untested), .3g2 (untested) or .amr (untested).

For Base64 encoded BLOBs - unfortunately, there does not appear to be a reliable way of determining if a field is Base64 encoded unless you actually try to base64 decode it and the output "looks valid". Counting the Base64 encoded bytes and monitoring the characters used might find some Base64 encodings but it could also catch some strings which are not necessarily Base64 encoded. So, any Base64 BLOBs will end up with a .blob extension.

For further information on file signatures, check out Gary Kessler's huge compendium of file signatures.


On Ubuntu 14.04 LTS x64 (running Python 2.7.6), we used the Firefox SQLite Manager plugin to create 3 test databases. One had no BLOB data (testnoblob.sqlite), one had one BLOB column (testblob.sqlite) and the last had three BLOB columns (testblobs.sqlite). Due to lack of test data, we only tested the script with .jpg, .png and .zip BLOBs.
Fun fact: You can also use SQLite Manager to insert existing files as BLOBs into a table.

Here's what the database which does not contain any BLOBs (testnoblob.sqlite) looks like:

Now we run the script against the database:

cheeky@ubuntu:~$ python ./ testnoblob.sqlite main noblobop
Running v2015-07-03

Creating outputdir directory ...
Primary Key name is: id
No BLOB columns detected ... Exiting

Here's the contents of the "noblobop" directory ... NOTHING! Because there's no BLOBs in the database, silly!

Here's what the second database containing one BLOB column (testblob.sqlite) looks like:

Now we run the script against the database containing one BLOB column:

cheeky@ubuntu:~$ python ./ testblob.sqlite main blobop
Running v2015-07-03

Creating outputdir directory ...
Primary Key name is: id
Detecting BLOB columns = blobby
Extracting ... blobop/main_row_1_blobby.jpg
Extracting ... blobop/
Extracting ... blobop/main_row_3_blobby.blob

Exiting ...

Here's the contents of the "blobop" output directory:

We can see the the .jpg, .zip / office document and base64 text BLOBs have all been extracted successfully.
This was also confirmed by checking the file sizes of each output file against its BLOB size in the table.

Finally, here's the database containing three BLOB columns (testblobs.sqlite):

Now we run the script against the three BLOB database:

cheeky@ubuntu:~$ python ./ testblobs.sqlite main blobops
Running v2015-07-03

Creating outputdir directory ...
Primary Key name is: id
Detecting BLOB columns = blobby, blobby2, blobby3
Extracting ... blobops/main_row_1_blobby.jpg
Extracting ... blobops/main_row_1_blobby2.jpg
Extracting ... blobops/main_row_1_blobby3.png
Extracting ... blobops/
Extracting ... blobops/
Extracting ... blobops/
Extracting ... blobops/main_row_3_blobby.blob
Extracting ... blobops/main_row_3_blobby2.blob
Extracting ... blobops/main_row_3_blobby3.blob

Exiting ...

Here's the contents of the "blobops" output directory:

Note: The .png BLOB (for row id=1, column = blobby3) has also been successfully extracted along with .jpg, .zip and base64 text BLOBs.

We have also run the script on Windows 7 x64 with Python 2.7.6 with the same results.

So there you have it, repeat after me - "Everything seems to be in order ... I think I'll go across the street for some orange sherbet ..."

Final Thoughts

It is hoped that this script ( can be used during an SQLite database / mobile forensics analysis to quickly retrieve embedded binary data such as pictures, voicemail recordings, video, binary .plist and .zip BLOBs.

As mentioned in the last post, to avoid changing data, analysts should only interface with copies of an SQLite database and not the original file.

This script does not handle deleted BLOB data.

And ... I'm spent! (For those too young, that's another classic Austin Powers movie reference)