Sunday, 28 June 2015

How u like Base(64)?




Monkey was having such a great time, no one had the heart to tell him he had the wrong type of base ...


A recent blog post by Heather Mahalik (@HeatherMahalik)
mentioned that a multiple Base64 decoding tool would be useful for mobile application analysis. What is Base64? Basically, it converts bytes into a printable 64 character set. This encoding is typically used when sending email and/or transferring/obfuscating data. Check out the Wikipedia page for more gory details.
There's already several existing tools we can use to perform Base64 decoding. For example, *nix systems have the "base64" command and recently Monkey found that Notepad++ (v6.7.9.2) will handle multiple Base64 encodes/decodes.
However, as most mobile apps use SQLite databases for storage, it would be pretty painful to first query the database and then manually perform each Base64 decode. And especially, if the field was Base64 encoded multiple times ... Unless of course, you had your own army of monkey interns!

Thankfully, we have previously used Python to interface with SQLite databases and after some quick Googling, we also found that Python has baked in Base64 encode/decode functionality.
So a scripted solution seems like the way to go (Sorry, intern monkey army!).

You can download the script (sqlite-base64-decode.py) from my GitHub page.

The Script

The user has to provide the script with the database filename, the table name, the Base64 encoded field's name and the number of iterations to run the Base64 decode.
The script will then query the database and then print each row's column values and the respective Base64 decode result in tab separated format.

Each app's database will have its own schema so we first need to run a "pragma table_info" query to find out how the database is laid out. 
Specifically, we want to find out:
- the table's Primary Key name (for ordering the main query by),
- the table column names (for printing) and
- the index (column number) of the Base64 encoded column (the user provided the encoded field's name but we also need to know the index)

Once we have this info, we can then run our main query which will be the equivalent of:
SELECT * FROM tablename ORDER BY primarykeyname;
We then iterate through each returned row, run the base64.decodestring function the requested number of times and print both the returned row data and the decoded result.
On a decode error, the script prints "*** UNKNOWN ***" for the decoded value.

Here's the help text:

cheeky@ubuntu:~$ python ./sqlite-base64-decode.py -h
Running sqlite-base64-decode v2015-06-27
usage: sqlite-base64-decode.py [-h] db table b64field b64count

Extracts/decodes a base64 field from a SQLite DB

positional arguments:
  db          Sqlite DB filename
  table       Sqlite DB table name containing b64field
  b64field    Suspected Sqlite Base64 encoded column name
  b64count    Number of times to run base64 decoding on b64field

optional arguments:
  -h, --help  show this help message and exit
cheeky@ubuntu:~$


Future work might have the script sample each column's data to figure out which is Base64 encoded.
Base64 encoded data is typically limited to the following characters:
A-Z
a-z
0-9
+
/
=


Because the = sign is used for padding, it is usually a good indicator of Base64 encoding (especially at the end of the encoded string).
Base64 encoding usually takes 3 binary bytes (24 bits) and turns it into 4 printable bytes (32 bits). So the final encoding should be a multiple of 4 bytes.
Additionally, the more times you encode in Base64, the longer the resultant string.

Testing

For testing, we added the "base64enc" column to our previous post's testsms.sqlite database (specifically, the "sms" table). The test data looked like this:

Modified "sms" table with "base64enc" column added

The values for "base64enc" correspond to 2 x Base64 encoding the "message" value.
To obtain the 2 x Base64 encoded value, on Ubuntu we can do this:

cheeky@ubuntu:~$ echo -n 'Hey Monkey!' | base64
SGV5IE1vbmtleSE=
cheeky@ubuntu:~$

cheeky@ubuntu:~$ echo -n 'SGV5IE1vbmtleSE=' | base64
U0dWNUlFMXZibXRsZVNFPQ==
cheeky@ubuntu:~$ 


Note: The "-n" removes the newline character added by the "echo" command

So we can see our last encoding result corresponds to our "sms" table pic above.
ie 2 x Base64 encoding of 'Hey Monkey!' is U0dWNUlFMXZibXRsZVNFPQ==

Similarly, we can also use Notepad++ to do the encoding via "Plugins ... MIME Tools ... Base64 Encode".



As we see in the pic above, I used Notepad++ to 2 x Base64 encode the various "message" values and then inserted those values into the "sms" table's "base64enc" field using the SQLite Manager Firefox Plugin.

Now we run our script on our newly modified testsms.sqlite file ...
For shiggles, lets initially specify a 1 x Base64 decode:

cheeky@ubuntu:~$ python ./sqlite-base64-decode.py testsms.sqlite sms base64enc 1
Running sqlite-base64-decode v2015-06-27
Primary Key name is: id
Base64 Fieldname index is: 6
id    phone    message    seen    sent    date    base64enc    B64Decoded
=======================================================================================
1    555-1234    Hey Monkey!    1    0    None    U0dWNUlFMXZibXRsZVNFPQ==    SGV5IE1vbmtleSE=
2    555-4321    Hey Stranger!    0    1    None    U0dWNUlGTjBjbUZ1WjJWeUlRPT0=    SGV5IFN0cmFuZ2VyIQ==
3    555-4321    P is for PAGEDUMP!    0    1    None    VUNCcGN5Qm1iM0lnVUVGSFJVUlZUVkFo    UCBpcyBmb3IgUEFHRURVTVAh
4    555-4321    I wonder what people with a life are doing right now ...    0    1    None    U1NCM2IyNWtaWElnZDJoaGRDQndaVzl3YkdVZ2QybDBhQ0JoSUd4cFptVWdZWEpsSUdSdmFXNW5JSEpwWjJoMElHNXZkeUF1TGk0PQ==    SSB3b25kZXIgd2hhdCBwZW9wbGUgd2l0aCBhIGxpZmUgYXJlIGRvaW5nIHJpZ2h0IG5vdyAuLi4=
5    555-4321    This is so exciting! It reminds me of one time ... at Band Camp ...    0    1    None    VkdocGN5QnBjeUJ6YnlCbGVHTnBkR2x1WnlFZ1NYUWdjbVZ0YVc1a2N5QnRaU0J2WmlCdmJtVWdkR2x0WlNBdUxpNGdZWFFnUW1GdVpDQkRZVzF3SUM0dUxnPT0=    VGhpcyBpcyBzbyBleGNpdGluZyEgSXQgcmVtaW5kcyBtZSBvZiBvbmUgdGltZSAuLi4gYXQgQmFuZCBDYW1wIC4uLg==

Exiting ...
cheeky@ubuntu:~$


No real surprises here. We can see the "B64Decoded" fields are still Base64 encoded. Also, apologies for the crappy layout ...
Now let's try a 2 x Base64 decode:

cheeky@ubuntu:~$ python ./sqlite-base64-decode.py testsms.sqlite sms base64enc 2
Running sqlite-base64-decode v2015-06-27
Primary Key name is: id
Base64 Fieldname index is: 6
id    phone    message    seen    sent    date    base64enc    B64Decoded
=======================================================================================
1    555-1234    Hey Monkey!    1    0    None    U0dWNUlFMXZibXRsZVNFPQ==    Hey Monkey!
2    555-4321    Hey Stranger!    0    1    None    U0dWNUlGTjBjbUZ1WjJWeUlRPT0=    Hey Stranger!
3    555-4321    P is for PAGEDUMP!    0    1    None    VUNCcGN5Qm1iM0lnVUVGSFJVUlZUVkFo    P is for PAGEDUMP!
4    555-4321    I wonder what people with a life are doing right now ...    0    1    None    U1NCM2IyNWtaWElnZDJoaGRDQndaVzl3YkdVZ2QybDBhQ0JoSUd4cFptVWdZWEpsSUdSdmFXNW5JSEpwWjJoMElHNXZkeUF1TGk0PQ==    I wonder what people with a life are doing right now ...
5    555-4321    This is so exciting! It reminds me of one time ... at Band Camp ...    0    1    None    VkdocGN5QnBjeUJ6YnlCbGVHTnBkR2x1WnlFZ1NYUWdjbVZ0YVc1a2N5QnRaU0J2WmlCdmJtVWdkR2x0WlNBdUxpNGdZWFFnUW1GdVpDQkRZVzF3SUM0dUxnPT0=    This is so exciting! It reminds me of one time ... at Band Camp ...

Exiting ...
cheeky@ubuntu:~$


Note: The "message" and "B64Decoded" fields are the same - we have found our original message! :)
Finally, let's try a 3 x Base64 decode to see if the script falls into a screaming heap:

cheeky@ubuntu:~$ python ./sqlite-base64-decode.py testsms.sqlite sms base64enc 3
Running sqlite-base64-decode v2015-06-27
Primary Key name is: id
Base64 Fieldname index is: 6
id    phone    message    seen    sent    date    base64enc    B64Decoded
=======================================================================================
1    555-1234    Hey Monkey!    1    0    None    U0dWNUlFMXZibXRsZVNFPQ==    *** UNKNOWN ***
2    555-4321    Hey Stranger!    0    1    None    U0dWNUlGTjBjbUZ1WjJWeUlRPT0=    *** UNKNOWN ***
3    555-4321    P is for PAGEDUMP!    0    1    None    VUNCcGN5Qm1iM0lnVUVGSFJVUlZUVkFo    *** UNKNOWN ***
4    555-4321    I wonder what people with a life are doing right now ...    0    1    None    U1NCM2IyNWtaWElnZDJoaGRDQndaVzl3YkdVZ2QybDBhQ0JoSUd4cFptVWdZWEpsSUdSdmFXNW5JSEpwWjJoMElHNXZkeUF1TGk0PQ==    *** UNKNOWN ***
5    555-4321    This is so exciting! It reminds me of one time ... at Band Camp ...    0    1    None    VkdocGN5QnBjeUJ6YnlCbGVHTnBkR2x1WnlFZ1NYUWdjbVZ0YVc1a2N5QnRaU0J2WmlCdmJtVWdkR2x0WlNBdUxpNGdZWFFnUW1GdVpDQkRZVzF3SUM0dUxnPT0=    *** UNKNOWN ***

Exiting ...
cheeky@ubuntu:~$ 


Note: The "*** UNKNOWN ***" values indicate that a decoding error has occurred (from testing this is usually due to a padding error).

We also ran these tests on a Windows 7x64 PC running Python 2.7.6 with the same results.

Final Thoughts

Special Thanks to Heather Mahalik for mentioning the need for the script. One of the great things about getting script ideas from Rockstar practioners in the field, means it's not going to be some banana-in-the-sky idea that no one uses. This script might actually be useful LOL.

The script ass-umes only one field is Base64 encoded and that the Primary Key only uses one field.
The script has only been tested with Monkey's own funky data - it will be interesting to see how it goes against some real life user data.

The "pragma table_info" query is something Monkey will probably re-use in the future because it allows us to discover a database table's schema rather than hard-coding a bunch of assumptions about the table.

Deleted table data is not addressed by this script.

Monkey's recent blue period of posts might be drawing to a close. Oh well, it was fun while it lasted. Maybe I can now get a life ... yeah, right ;)