Friday 10 October 2014

Google-ei'd ?!


Hmmm ... I seem to be having some trouble focusing after this latest post

Ever looked closely at a Google search URL and seen a weird "ei" parameter in there?
While it doesn't seem to occur for every search, when it does, that "ei" parameter contains an encoded Unix UTC timestamp (and other things Google only knows). Interpreting this artifact can thus allow forensic analysts to date a particular search session.

This artifact has been observed at various times while testing (on Windows 8.1) in Firefox (v32.0.3), Chrome (v38.02125.101) and IE (v11.0.9600.16384). As it seems to be initiated by Google's servers, this browser independence makes sense.

Special Thanks to Phillip Moore (@phillmoore) who suggested this script idea and also helped test it.

The Python script (google-ei-time.py) is available from my GitHub page  and is based on the following 2013 reference written by Kevin Jones for the "Deed Poll Office Blog". This article also lists a PHP conversion script but more importantly, it shows an "ei" value conversion example which we can use to initially validate our script.

When does "ei" happen?


Whenever donkeys vote! Eee-ore! Eee-ore!

*DFIRcricket chirps* ... Ahem, moving along  ...

According to this discussion forum it was noticed around 31 August 2013 but Phillip had some test data which has dates going back to 2011.

It does not seem to matter if you use a google.com country specific address (eg google.com.au) or the non-redirecting Google web address of www.google.com/ncr. The "ei" parameter occurs with both URLs.

Using Firefox on Windows 8.1, I went to www.google.com/ncr and searched for "bananas".
The resultant URL displayed was "https://www.google.com/?gws_rd=ssl#q=bananas".

I then clicked on the "Images" search category and got the following URL:
https://www.google.com/search?q=bananas&biw=1920&bih=988&source=lnms&tbm=isch&sa=X&ei=t7I2VLP0OYWJ8QWMrIGIAQ&ved=0CAYQ_AUoAQ

Subsequent sub-category clicks results in different "ei" parameters being returned.

I then cleared the Firefox history, went to www.google.com.au, searched for "bananas" and got the following URL:
https://www.google.com.au/search?q=bananas&sa=G&gbv=1&sei=BrU2VKfrB9Xz8gX2iILoBA

I then clicked on the "Images" search category and got the following URL:
https://www.google.com.au/search?q=bananas&gbv=1&prmd=ivnse&source=lnms&tbm=isch&sa=X&ei=BrU2VOLVNIPo8gXptIGoBg&ved=0CAUQ_AU

Note: It seems that the "sei" parameter seen initially also contains a similar timestamp mechanism as the "ei" parameter.

Similarly, using the Google search box in Firefox to search for "yellow bananas" resulted in:
https://www.google.com.au/search?q=yellow+bananas&client=firefox-a&hs=gjx&rls=org.mozilla:en-US:official&channel=sb&gbv=1&sei=i7c2VJm4I43_8QXrtIKYAw

Clicking on the subsequent "Images" search category returned:
https://www.google.com.au/search?q=yellow+bananas&client=firefox-a&hs=hjx&rls=org.mozilla:en-US:official&channel=sb&gbv=1&prmd=ivns&source=lnms&tbm=isch&sa=X&ei=jLc2VLfHEY2B8gX3vYGgDQ&ved=0CAUQ_AU

The "ei" parameter is also returned in Firefox's Private Browser mode.

Writing the Script


The first thing to note is that the "ei" parameter is unpadded and URL safe base64 encoded.
Base64 encoding is a way of writing (binary) data using the ASCII alphabet (see here).
There should be 4 output bytes produced for every 3 input bytes. Therefore, the output string size should be a multiple of 4.
However, if the input size is not a multiple of 3 bytes, padding (ie adding "=" characters) is usually added after encoding to make the final size a multiple of 4.
Google apparently does not feel like providing this padding so we'll have to handle it using this algorithm ...

padlength = 4 - the remainder of ("ei"s size in bytes divided by 4)

or in Python-ese,
padlength = 4 - (len(ei) % 4)

So if "ei" is 21 bytes long, the extra padding required is 4 - (21 % 4) = 4 - 1 = 3
This makes the total size = 21 + 3 = 24 (which is a multiple of 4).

Note: Typically, "ei" is 22 bytes long (ie 2 bytes of padding is required) but it can be longer/shorter.
If "ei" is a multiple of 4 (ie remainder is 0), then padlength should be set to 0. For example, a 24 byte long "ei" does not require padding.

URL safe base64 encoding means substituting "-" instead of "+" and "_" instead of "/" after the base64 encoding has been performed. This is because "+" and "/" are reserved characters within URLs.
Conveniently, Python provides a library function to handle both the reverse substitution and base64 decoding - base64.urlsafe_b64decode.

So now we have our base64 decoded string, we can read the first 4 bytes and calculate the timestamp.
To do this requires a bit of background maths. Given a Little Endian 4 byte integer like this:

[Byte0 Byte1 Byte2 Byte3]

Byte0 is least significant. Byte3 is most significant. To make things easier to follow, we'll do some re-arranging ...

[Byte3 Byte2 Byte1 Byte0]

Each byte range is 256 times the previous byte's range.
For example:
0xFF = 255 decimal, 0xFF00 = 255 * 256 = 65280 decimal, xFF0000 = 255 * 256 *256 = 16711680 decimal

So our final 4 byte integer value can be calculated using an algorithm like:
Byte0 + Byte2*256 + Byte3*256*256 + Byte3*256*256*256

We can then call Python's datetime's utcfromtimestamp and strftime methods to convert/print out our human readable string.

Testing the Script


Here's the help usage text for the script.
c:\Python27>python google-ei-time.py
Running google-ei-time.py v2014-10-10

Usage: google-ei-time.py -e EITERM -q OR google-ei-time.py -u URL -q

Options:
  -h, --help  show this help message and exit
  -e EITERM   Google search URLs EI parameter value
  -u URL      Complete Google search URL
  -q          (Optional) Quiet output (only outputs timestamp string)

c:\Python27>

The script takes either the "ei" term manually extracted from a URL (-e) OR the whole URL (-u) and returns a human readable timestamp string.
If you think the default output is too chatty and just want the answer (for scripting or just because you're a barbarian), you can use the -q argument.
It was developed and initially tested using Python 2.7 on a Window 7 PC. It has also been tested on SANS SIFT v3.

Here's an "ei" usage example:
c:\Python27>python google-ei-time.py -e tci4UszSJeLN7Ab9xYD4CQ
Running google-ei-time.py v2014-10-10

Input ei term = tci4UszSJeLN7Ab9xYD4CQ
Padded base64 string = tci4UszSJeLN7Ab9xYD4CQ==
Extracted timestamp = 1387841717
Human readable timestamp (UTC) = 2013-12-23T23:35:17

c:\Python27>

This example "ei" value was taken from the Deed Poll blog article and the script output matches their result.
And here's the "quiet" version equivalent of the above ...
c:\Python27>python google-ei-time.py -e tci4UszSJeLN7Ab9xYD4CQ -q
2013-12-23T23:35:17

c:\Python27>

Here's a complete URL parsing example:
c:\Python27>python google-ei-time.py -u "http://www.google.com.au/?gfe_rd=cr&ei=tci4UszSJeLN7Ab9xYD4CQ"
Running google-ei-time.py v2014-10-10

URL's ei term = tci4UszSJeLN7Ab9xYD4CQ
Padded base64 string = tci4UszSJeLN7Ab9xYD4CQ==
Extracted timestamp = 1387841717
Human readable timestamp (UTC) = 2013-12-23T23:35:17

c:\Python27>

Note: Out of habit, I have enclosed the URL in quotes (") but it's probably not necessary ...

And here's the quiet version for the previous complete URL parsing example ...
c:\Python27>python google-ei-time.py -u "http://www.google.com.au/?gfe_rd=cr&ei=tci4UszSJeLN7Ab9xYD4CQ" -q
2013-12-23T23:35:17

c:\Python27>

For shiggles, let's try our script with the "sei" parameter we noticed earlier ...
Using Firefox's Google search box, I typed "bananas gone wild" (pervert!) and got the following URL:
https://www.google.com.au/search?q=bananas+gone+wild&client=firefox-a&hs=iJz&rls=org.mozilla:en-US:official&channel=sb&gbv=1&sei=Sc82VJGYBsT58QXBgYLYAw

c:\Python27>python google-ei-time.py -e Sc82VJGYBsT58QXBgYLYAw
Running google-ei-time.py v2014-10-10

Input ei term = Sc82VJGYBsT58QXBgYLYAw
Padded base64 string = Sc82VJGYBsT58QXBgYLYAw==
Extracted timestamp = 1412878153
Human readable timestamp (UTC) = 2014-10-09T18:09:13

c:\Python27>

The output seems correct after taking into account the timezone and daylight savings time difference.
I've also tested it using other "ei" values from various searches I've done locally but there's not much point boring you any further with those.

Discrepancy Issue:
Phillip got the Deed Poll Office's PHP script function working however it's output differed with our script's output for some of the same input test data. Uh-oh!
Specifically, there was a discrepancy in the extracted timestamp values whenevever there's a "-" or "_" around the start of the input "ei" value.
I'm a complete novice to PHP but it looks like they might have their URL-friendly substitutions around the wrong way?
According to W3schools, the syntax for PHP's "str_replace" is
str_replace(find, replace, string, count)

Where:
find =  the value to find,
replace = the value to replace the value in find
string = the string to be searched
count = Optional. A variable that counts the number of replacements

And according to the PHP website entry for str_replace():
If find and replace are arrays, then str_replace() takes a value from each array and uses them to search and replace.

So it looks like this line at the start of the Deed Poll Office function:
 $ei = base64_decode(str_replace(array('_', '-'), array('+', '/'), $ei));
is replacing "_" with "+" and "-" with "/" before calling base64_decode.

According to Wikipedia's entry on base64 encoding mentioned earlier, modified Base64 for URL variants exist where:
 the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_'

So following that logic, decoding URL safe base64 (containing "-" and "_") would involve substituting "+" for "-" and "/" for "_". Which is not what the Deed Poll function seems to be doing ... Let us know in the comments if you disagree?

As our script uses Python's base64.urlsafe_b64decode function to perform the substitution and base64 decode, I'm quietly confident in it's output.

Anyhoo, be wary that any "ei" value containing "_" or "-" at the start of the string will result in that discrepancy.
When the "-" and/or "_" characters occur towards the end of the "ei" string, they don't seem to affect the timestamp (which occurs at the beginning of the string) and so the PHP script output seems to match our script's output.
When there's no "-" or "_" characters in the input "ei" string, the two script outputs also seem to match OK.

Because the "ei" value is based on a Google server's time, there doesn't appear to be an easy way to confirm which script is correct (from this end anyway).

Final Thoughts


Thanks to Phillip Moore's suggestion and testing, we now have a Python script that can take a Google search URL with an "ei" parameter and return a human readable timestamp of when that search occurred. The script also seems to extract valid timestamp values for Google "sei" parameters. More research about when the "ei" parameter occurs would be nice but just finding that "ei" parameter should allow you to date that search session.
There are also some discrepancies between what the Deed Poll Office PHP script outputs and what our script outputs whenever "_" and "-" characters are contained at the start of the "ei" input string.

Whew! Three blog posts in a week - a new personal best. Now if I could only stop going cross eyed ...