Cheeky4n6Monkey - Learning About Digital Forensics: April 2012

Tuesday, 10 April 2012

I Thought I Saw A Twittering TweetDeck Monkey

Girl, Unallocated recently asked why I wasn't on Twitter. I'll admit that when Twitter first came out I was pretty skeptical - "Who would want to read about my over-ripe bananas?" and "140 characters isn't nearly enough to insult someone (creatively)". Not having a Twitter capable mobile device was just another nail in the coffin. However, after GU's recent questioning, I've since discovered that a lot of forensicky update notifications occur on Twitter and it's also a great way to ~~stalk~~ meet new people (I am obviously joking ... I don't really care about forensicky updates ;).
Anyhoo, I started looking into a couple of PC Twitter clients called TweetDeck and MetroTwit. A recent Youtube demo video showed MetroTwit using 600 Mb RAM just in normal ops. When you consider that the ancient WinXP PC I'm using has 2 GB RAM and I also like to run SIFT on VMWare, that made the decision pretty easy. Hello, TweetDeck! *I also noticed that Corey Harrell has used it - so I know at least one other person might find this post useful/semi-interesting ;)

I did a brief check after installation, and TweetDeck "only" uses ~100 Mb. A bit steep for 140 character messages, but methinks my trusty old PC can live with that. Also, as TweetDeck has been bought by Twitter, you kind of hope they have their functionality and security all sorted. Except ... recently there was an issue with TweetDeck granting a user access to other people's TweetDeck/Facebook profiles. This user wasn't actively trying to subvert the system, he just opened up TweetDeck and there it all was. So TweetDeck shut down for a while and they (presumably) fixed the issue(s). Not particularly inspiring stuff eh?

Which brings me to the topic of this post - I was initially interested to see what artifacts (if any) TweetDeck leaves behind. And it was while monkeying around with TweetDeck that I discovered that the Twitter web search functions also return some interesting information.

TweetDeck v1.3 (on WinXP)

The first thing TweetDeck wants you to do after installing, is register an email address and password for the TweetDeck interface. Chances are, most people will use the same login/password as their Twitter account. Tsk, tsk.

After launching TweetDeck and adding my Twitter profile, I ran SysInternals TCPView and found that TweetDeck connects via TCP over port 443 (HTTPS) to both "twitter.com" and "amazon.com" domains. Presumably, Twitter has outsourced their storage of tweets/images to Amazon's cloud storage. These connections seemed to timeout if nothing is browsed.
I wasn't going to go hardcore and network capture the traffic because a shortish time ago, in a University not so far away, Mike Wilkinson told my Network Forensics class that intercepting live network traffic is best left to the cats with warrants (in this part of The Empire anyway).

So then I ran SysInternals ProcMon (v2.96) and tried to see if TweetDeck was reading/writing any account info to the Windows Registry ... No such luck! Either I missed the entries (not uncommon) or they just weren't there. I did however, notice that the following file got accessed quite a lot:

C:\Documents and Settings\USER\Local Settings\Application Data\twitter\TweetDeck\localStorage\qrc__0.localstorage

I have obviously changed the profile name to protect the guilty ;)

So firing up our trusty WinHex hex editor revealed the following file header snippet in "qrc__0.localstorage":

SQLite3 DB

Look familiar? Those sneaky TweetDeck Developers!
So I copied the "qrc__0.localstorage" over to SANS SIFT to play around with it a little more (using the SQLite Manager Firefox Plugin) and here's what I found:
- The file contains one SQLite table called "ItemTable"
- "ItemTable" consists of a "rowid", "key" and "value" column fields
- "rowid" seems to increase with use - presumably as key/value fields are updated

Most of the "value" fields appear to be JSON encoded. More information about JSON is available here. But the 1 minute summary is:

- JSON is an acronym for "JavaScript Object Notation"
- JSON is a text based encoding system based on name/value pairs
- Possible value types are: strings (delimited by ""), numbers (base 10, can be negative/decimal fractions/base 10 exponentials), objects, arrays, true, false, null
- An object is defined using { } to enclose a name separated by a colon and then the value.
eg {object_name: object_value}
- An array is defined using [ ] to enclose a series of comma separated values.
eg [array_value1, array_value2]
- It is possible to nest values. So as we will see later, you can have an array of tweet objects each of which has an array of key/value objects. This can make it difficult to sort out all the different brackets and commas.

So what are the actual key/value fields in the "ItemTable" database?

Normally, there are 7 key/value pairs in the "ItemTable" database table:
"tweetdeck_account" = An "email" object representing the TweetDeck user's login (eg "user@email.com").
"columns" = Suspect it represents user column arrangement preferences for the GUI.
"hoard" = An object with it's name derived from the "email" object value (eg "user@email.com"). Used to store a 312 character alphanumeric string which I suspect is a hash/encrypted string (or maybe a group of such strings).
"clients" = Several objects representing account information plus the last 4 TweetDeck search terms.
"__version__" = 2.
"_session" = A 44 character alphanumeric string which probably represents a session key. It seems to change with each launch.
"feeds" = Suspect it has to do with the user defined columns each having its own data connection.

Additional testing revealed that if "Keep Me Signed In" is ticked on the TweetDeck login page and then TweetDeck is closed, these artifacts remain readable in "ItemTable" (using SQLite Manager).
If TweetDeck is closed and "Keep Me Signed In" is NOT ticked, then "ItemTable" appears empty (in SQLite Manager) . However, the data is still appears to be (partially) viewable with WinHex (looking at "qrc__0.localstorage").

By chance, I copied an active/open "qrc__0.localstorage" to SIFT and noticed that there were 2 extra key/value fields in "ItemTable" when TweetDeck is running. These field names are prepended with a ":" (indicating they're in shared memory) and are called:
":CACHED_HASHTAGS" = Had no values
":CACHED_TWITTER_USERS_V2" = Appears to be a cache of Twitter user profiles that the TweetDeck user is "following". Example format follows:
{"lowercase_handle1":["123456781", "CapitalizedHandle1", "Proper Name1", "http link to profile1 pic"], "lowercase_handle2":["123456782", "CapitalizedHandle2", "Proper Name2", "http link to profile2 pic"]}

Note: The numerical fields turn out to be Twitter user ID numbers.

I was only able to see these additional fields after copying an active "qrc__0.localstorage". The weird thing was, after closing/reopening/closing (staying logged in) TweetDeck, I was still able to see these fields in both SQLite Manager and WinHex. After I subsequently opened/logged out/closed TweetDeck, these fields were no longer visible in SQLite Manager but some of the data still appears viewable with WinHex.
Perhaps copying the active file meant the contents of shared memory were also auto-magically written to the new file copy. The subsequent opening(s) of this database file then showed the previously recorded ":" fields in the database.
And perhaps TweetDeck/SQLite doesn't delete/overwrite all the data in a table when "deleting" a table's contents (eg after logging out). Perhaps it just deletes the addressing mechanism? Hence, while you can't see the table in SQLite Manager, you can still see some data using WinHex? To quote Doris Day et al: "Perhaps, perhaps, perhaps".

OK, so we've taken a look at "ItemTable" - now what?

One of the more interesting things I noticed in "ItemTable" was that the "columns", "clients" and "feeds" fields mention a recurring multiple digit number sequence throughout each. There were also similar looking numbers present in ":CACHED_TWITTER_USERS_V2".
Suspecting this number to be a Twitter user ID number, I went to the Twitter Developer site and found this article regarding the Twitter REST API (API = Application Programming Interface). It lists lots of searches that you can perform from the comfort of your favourite browser. These search queries are run against the Twitter database via a Twitter web interface. The number of searches may be limited per hour and may not be all-inclusive.
So to confirm that the multi-digit number I was seeing in my "ItemTable" table was a valid Twitter user ID number, I typed the following into Firefox on SIFT.

https://api.twitter.com/1/users/lookup.json?user_id=XXXXXXXXX
where XXXXXXXXX = Twitter user ID number

SIFT then auto-magically launched a save dialog for the returned JSON file. I then used gedit to display it and saw a bunch of JSON text. The first key/value field was "id":XXXXXXXXX followed later by the corresponding "name" and "screen_name" values of the TweetDeck user. The JSON also showed various other profile info which is described later in the testing section below.

So, given a viable TweetDeck "qrc__0.localstorage" database file, we can derive a TweetDeck user's Twitter user ID number. We can then use the Twitter REST API with that user ID number to determine that user's handle name. Subsequently, I also ran the lookup query using the numbers found in the ":CACHED_TWITTER_USERS_V2" user profiles. These numbers were found to correspond to their surrounding handle name.

More Twitter REST API Searches

We can also use the Twitter user ID number/handle to perform other queries with the Twitter REST API:

For retrieving a Twitter user's Profile info we can use:
https://api.twitter.com/1/users/show.json?user_id=XXXXXXXXX
OR
https://api.twitter.com/1/users/show.json?screen_name=TwitterHandle
Note: the "screen_name" argument is not case sensitive
For more info see here.

To list a Twitter user's "follower" ID numbers (eg who is following user_id=XXXXXXXXX?) we can use:
https://api.twitter.com/1/followers/ids.json?user_id=XXXXXXXXX
OR
https://api.twitter.com/1/followers/ids.json?screen_name=TwitterHandle
For more info see here.

To list a Twitter user's friend ID numbers (eg who is user_id=XXXXXXXXX following?) we can use:
https://api.twitter.com/1/friends/ids.json?user_id=XXXXXXXXX
OR
https://api.twitter.com/1/friends/ids.json?screen_name=TwitterHandle
For more info see here.

Given a list of Twitter user ID numbers (eg from a friend/follower lookup), you can use a comma separated list to retrieve each Twitter user's profile info via:
https://api.twitter.com/1/users/lookup.json?user_id=XXXXXXXXX, YYYYYYYYY, ZZZZZZZZZ
For more info see here.

You can also search Twitter by term:
http://search.twitter.com/search.json?q=TwitterHandle
This will return the last weeks worth of tweets containing the given TwitterHandle (eg tweets at/from/mentioning). It might not return 100% of all possible results. For more info see here.
Alternatively, you can perform a conventional web search at:
http://twitter.com/#!/search
This will return the results within the browser. If you need to record these results, having a JSON file (via the first search method) might make documentation easier rather than a screenshot/printout for the second method.

Additional Resources (Stuff I didn't know where else to put)

This article describes how each tweet has its own unique ID number.
This website allows you to retrieve a Twitter user's handle (eg Batman) from their Twitter user ID number or vice-versa. The website also states that the numeric user ID is like a primary key (unique number) that does not change. So if a user changes their Twitter handle, they can still be found via their unique ID number.
I also found out about these similar sites (here and here) which translate from a Twitter user's handle to a Twitter user's ID number (one way only).
From my limited testing, the results from these sites appear to be consistent with each other. Given a choice however, I would trust the Twitter REST API over these webpages. All that stuff about about straight from the horses mouth etc.
Whilst doing a search for TweetDeck artifacts, I came across this since superseded article. It describes the database schema for a previous version of TweetDeck. It's interesting that the developers have since changed to a more generic (and harder to understand) schema of key/JSON value pairs. An attempt at extra security through obscurity?

Enter the Twit-Monkey!

Manually parsing through all this JSON text was making me cross(eyed) and I vaguely remember Harlan Carvey mentioning he had written a Perl script that read JSONs. Just FYI, I'm going for a name drop record in this post :)
Anyhoo, I thought it would take me as long to read through some of these search results as it would to write a Perl script to print out JSONs in a more readable format (how's that for cocky?). So then I *strutted* to CPAN and found the aptly named JSON Perl package. This package has functions to both encode/decode JSONs and best of all, it has a function to "pretty print" JSONs in a more human readable format.

To install it on SIFT I typed:
"sudo cpan JSON"
and then said yes to any dependency installs.

Now we should be ready to write the script. As usual, my code is pretty hack-tacular but it seems to work OK for each of the search types listed above.

The Code

#CODE BEGINS ON LINE BELOW
#!/usr/bin/perl -w

# Perl script to pretty print JSON files

use strict;

use Getopt::Long;
use JSON;

my $version = "json-printer.pl v2012-04-08";
my $help = 0;
my $filename="";
my @jsonlist;

GetOptions('help|h' => \$help,
            'f=s' => \$filename);

if ($help || $filename eq "")
{
    print("\nHelp for $version\n\n");
    print("Perl script to pretty print JSON files.\n");
    print("Example: json-printer.pl -f search-5.json\n");
    exit;
}

open(my $tfile, "<", $filename) or die "Can't Open $filename JSON File!";
@jsonlist = <$tfile>; # extract each line to a list element
chomp(@jsonlist); # get rid of newlines
close($tfile);

my $json_string = join(' ', @jsonlist); # join list elements into one big happy scalar string

my $json = JSON->new->allow_nonref; # create a new JSON object which converts non-references into their values for encoding
my $perl_scalar = $json->decode($json_string); # converts JSON string into Perl hash(es)

# at this point (if you wanted to) you can add code to iterate thru the hash(es) to extract/use values.

my $pretty_printed = $json->pretty->encode( $perl_scalar ); # re-encode the hash just so we can then pretty print it (hack-tacular!)
print $pretty_printed;

#CODE ENDS ON LINE ABOVE

Code Summary

This short script ("json-printer.pl") lets the JSON Perl package do all the heavy lifting. First, it reads the user specified JSON file into a scalar string ("$json_string") and then it calls "$json->decode" to convert the string into a Perl hash data structure. The script then calls "$json->pretty->encode" to convert that hash data structure into a more human readable string format. It's a round-about way of doing things but it seems to work.

Note: "$perl_scalar" is a reference to a hash data structure. If you wanted to, you could use it to iterate through the hash structure and retrieve any of the individual values.

Code Testing

You'll just have to trust me on this and/or try out the code yourself. The script above is able to read the JSON data returned by the various Twitter REST API searches listed previously.
I don't feel comfortable displaying other people's tweeting information here (eg their user id, what method they use to tweet). Its probably no more detailed than you could get by searching www.twitter.com/search but I just don't think it's my place.

Anyhoo, as some of the REST API searches mentioned previously return information about individual tweets (eg in a user profile's "status" field), I thought it would be interesting to have a look at these returned tweet fields (without divulging real data). Each tweet in a JSON appears to have the following format:

"to_user_id" = Destination Twitter user ID number (in unquoted numeric format).
"source" = Source URL which can show how tweet was posted (eg via twitter.com, tweetdeck, tweethopper etc). Some URL arguments may also list the device (eg iPad, Android).
"profile_image_url" = HTTP URL showing the source's avatar filename.
"profile_image_url_https" = HTTPS URL showing the source's avatar filename.
"created_at" = Date and Time of tweet creation.
"text" = Tweet text including @ addressing (eg "@Superman How do you keep your tights so clean?").
"id" = Unique tweet ID number (in "" quoted string format).
"in_reply_to_status_id" = When available, shows the tweet ID number this tweet is replying to (in "" quoted string format).
"from_user_id_str" = Source's Twitter user ID number (in "" quoted string format).
"from_user" = Source's Twitter user handle (eg "Batman").
"geo" = Source's(?) location information.
"from_user_id" = Source's Twitter user ID number (in unquoted numeric format).
"id_str" = Unique tweet ID number (in "" quoted string format).
"iso_language_code" = Presumably the language used to tweet (eg "en" = English).
"to_user_id_str" = Destination's Twitter user ID number (in "" quoted string format). Can also be null.
"to_user" = Destination's Twitter user handle string (eg "Superman"). Can also be null.
"from_user_name" = Source's Twitter user personal name string (eg "Bruce Wayne")
"to_user_name" = Destination's Twitter user personal name string (eg "Clark Kent"). Can also be null.

Note: This list is not all inclusive (eg there's some optional reply fields I've left out).

Twitter help uses the term "Username" to denote the addressing mechanism name (eg @Batman). It then uses "Name" to represent your personal name ID (eg "Bruce Wayne"). Got it? Now here's where it might get a little confusing ...
In the above tweet message, the "to_user_name" and "from_user_name" contain the personal ID Names (eg "Bruce Wayne"). The "to_user" and "from_user" contain the addressing mechanism names (eg "Batman").

Thanks to Frank McClain for sending me a "Mice" tweet to analyse/help confirm my findings. Quack-tacular!

Some of the REST API searches mentioned previously may return a Twitter user's Profile information. Here are some selected Profile fields I found interesting:

"friends_count" = How many Twitter users this user is following (in unquoted numeric format).
"profile_image_url" = HTTP URL string showing user's avatar image filename.
"profile_background_image_url_https" = HTTPS URL string showing user's profile background image filename.
"url" = User's website URL string (eg "www.blog.com").
"id" = User's Twitter user ID number (in unquoted numeric format).
"screen_name" = User's handle/screen name (eg "Batman").
"location" = User's location string (eg "Gotham").
"lang" = Presumably the user's tweet language (eg "en" for English).
"followers_count" = How many Twitter users are following this user (in unquoted numeric format).
"name" = User's personal name ID (eg "Bruce Wayne").
"description" = User's witty description of themselves (eg "The Dark Knight").
"status" = Object containing the user's most recent (unprotected) sent tweet (see above tweet format for more details).
"profile_image_url_https" = HTTPS URL string showing user's avatar image filename.
"created_at" = Date and Time when the user's account was created.
"utc_offset" = Appears to be the numeric timezone offset in seconds from GMT (eg -21600).
"profile_background_image_url" = HTTP URL string showing user's profile background image filename.
"protected" = Boolean presumably representing whether the user's tweets are protected (eg true/false).
"id_str" = User's Twitter user ID number (in "" quoted string format).
"time_zone" = Appears to be string based representation of time zone (eg "Central Time (US & Canada)").
"geo_enabled" = Boolean presumably representing if the user's location is disclosed.
"profile_use_background_image" = Boolean representing if the user is using their background profile picture.

Adding to the name confusion mentioned earlier, we now see the "screen_name" and "name" keys. In this case, "screen_name" seems to refer to the addressing mechanism name (eg "Batman") and "name" refers to the personal ID name (eg "Bruce Wayne"). And while not explicitly present in these fields, there's also the commonly used "handle" term (eg "Batman"). Also note that "id" in a tweet message refers to the unique tweet ID number where as "id" in a profile represents a Twitter user's ID number ... Consistency? Pffft! We don't need no stinkin' consistency! Everyone's a critic huh?

Summary

We've installed TweetDeck and seen that it can store/leave behind the last 4 user search terms and multiple Twitter user ID number artifacts. These user ID numbers can then be used to find corresponding Twitter handles/addressing mechanisms via a Twitter REST API web search. The Twitter REST API can also be used to retrieve other information about the user's followers/leaders/profile.
We have also coded a small Perl script ("json-printer.pl") to make it easier to read the JSON formatted results returned from the Twitter REST API.
Any comments/suggestions/requests for further name droppage (the record now stands at 5 / 6 if you count Doris) are welcome.
FYI This will probably be my last post for a few weeks (month?) as I have to swap my PC for some renovation work boots. I will still be contactable via email / Twitter though. Hopefully, I'll be back with some new ideas and all of my digits in the near future ;)

Tuesday, 3 April 2012

Creating a RegRipper Plugins Maintenance Perl Script

I recently asked Corey Harrell about any unfulfilled programming ideas he might have and he told me about his idea for a RegRipper Plugins maintenance Perl script.

He was after a script that would go into the plugins directory and then tell the user what plugins were missing from the various "Plugins" text files. Note: I am using "Plugins" to refer to the "ntuser", "software", "sam", "system", "security" and "all" text files. These contain a list of Perl (*.pl) plugins that a user can use to batch run against a particular hive.
He also mentioned that it might be handy to be able to exclude certain files (eg malware plugins) from being considered so if a user has deleted certain entries, they aren't constantly being reminded "Hey! There's a difference between what's in the "Plugins" file and what's actually available (ie the *.pl plugins)".

UPDATE: This script was not intended to be used to blindly load every possible *.pl plugin into their relevant "Plugins" text file. It was intended for use as a diagnostic tool "What (*.pl) plugins are missing from my current "Plugins" file?" (eg "ntuser"). Users can then look at the missing plugins list and determine if they should/should not include them in their "Plugins" text file.

UPDATE: Script ("PrintMissingPlugins" function) has been updated so commented-out entries (eg #ccleaner) in "Plugins" text files are not used in comparisons for missing *.pl plugins. Thanks Francesco!

UPDATE: Script ("PrintMissingPlugins" function) has been updated so blank lines in "Plugins" text files are not used in comparisons for missing *.pl plugins Have also updated the Code and Testing sections.

So let's see how we can implement Corey's idea ...

The Code

# CODE STARTS ON LINE BELOW
#!/usr/bin/perl
# Note: I removed the "-w" above because the eval/require/getHive section was generating numerous variable masking warnings
# This shouldn't affect us as we only call the plugin's getHive method.

# Perl script to list updates for RegRipper plugin files (ntuser, software, sam, system, security, all)
# Original Idea by Corey Harrell
# Coded by cheeky4n6monkey@gmail.com
# Note: Probably best if you do NOT run this script from the RegRipper Plugins directory
# as it could affect the *.pl plugin processing.

# Created (2012-04-02)
# Modified (2012-04-03) for handling "#" commented out lines in "Plugins" text files (Thanks Francesco!)
# Modified (2012-04-04) for handling blank lines in "Plugins" text files (Thanks Francesco!)
# Modified (2012-04-04) re-instated "No missing plugins message"

use strict;

use Getopt::Long;
use File::Spec::Functions qw/catfile/;

my $version = "regripper-maint.pl v2012-04-04";
my $help = 0;
my $plugindir = "";
my $excludefile = "";
my @pluginslist;

GetOptions('help|h' => \$help,
    'dir=s' => \$plugindir,
    'x=s' => \$excludefile,
    'p=s@' => \@pluginslist);

if ($help || @pluginslist == 0 || $plugindir eq "")
{
    print("\nHelp for $version\n\n");
    print("Perl script to list discrepancies for RegRipper Plugin files (ntuser, software, sam, system, security, all)\n");
    print("\nUsage: regripper-maint.pl [-h|help] [-dir plugins_dir] [-x exclude_files] [-p plugins_list]\n");
    print("-h|help ............ Help (print this information). Does not run anything else.\n");
    print("-dir plugins_dir ... Plugins directory.\n");
    print("-x exclude_file .... Exclude text file which lists plugins to exclude from any checks.\n");
    print("-p plugins_list .... Plugins file(s) to be checked (eg all, ntuser).\n");
    print("\nExamples: \n");
    print("regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -p ntuser -p sam\n");
    print("regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -x regripper-maint-exclude -p all\n\n");
    exit;
}

print "\nRunning $version\n\n";

# List of plugins derived from the default (ntuser, software, sam, system, security, all) plugins text files
my @ntuser_plugins;
my @software_plugins;
my @sam_plugins;
my @system_plugins;
my @security_plugins;
my @all_plugins;

# Directory file listing of *.pl files in user's plugins directory
my @readinplugins;
# @readinplugins list broken up by hive
my @ntuser_actual;
my @software_actual;
my @sam_actual;
my @system_actual;
my @security_actual;
my @all_actual;

my @excludelist;

# Extract entries from user nominated exclude file
if (not $excludefile eq "")
{
    open(my $xfile, "<", $excludefile) or die "Can't Open $excludefile Exclude File!";
    @excludelist = <$xfile>; # extract each line to a list element
    chomp(@excludelist); # get rid of newlines
    close($xfile);
    foreach my $ig (@excludelist)
    {
        print "Ignoring the $ig plugin for any comparisons\n";
    }
    print "\n";
}

# Read in the entries in the default Plugins text file(s)
# Plugin files have lowercase names
foreach my $plugin (@pluginslist)
{
    open(my $pluginsfile, "<", catfile($plugindir,$plugin) ) or die "Can't Open $plugin Plugins File!";
    if ($plugin =~ /ntuser/)
    {
        print "Reading the ntuser Plugins File\n";
        @ntuser_plugins = <$pluginsfile>; # extract each line to a list element
        chomp(@ntuser_plugins); # get rid of newlines
    }
    if ($plugin =~ /software/)
    {
        print "Reading the software Plugins File\n";
        @software_plugins = <$pluginsfile>;
        chomp(@software_plugins);
    }
    if ($plugin =~ /sam/)
    {
        print "Reading the sam Plugins File\n";
        @sam_plugins = <$pluginsfile>;
        chomp(@sam_plugins);
    }
    if ($plugin =~ /system/)
    {
        print "Reading the system Plugins File\n";
        @system_plugins = <$pluginsfile>;
        chomp(@system_plugins);
    }
    if ($plugin =~ /security/)
    {
        print "Reading the security Plugins File\n";
        @security_plugins = <$pluginsfile>;
        chomp(@security_plugins);
    }
    if ($plugin =~ /all/)
    {
        print "Reading the all Plugins File\n";
        @all_plugins = <$pluginsfile>;
        chomp(@all_plugins);
    }
    close $pluginsfile;
}

# This code for determining a package's hive was cut / pasted / edited from "rip.pl" lines 42-67.
# Reads in the *.pl plugin files from the plugins directory and store plugin names in hive related "actual" lists
# Note: the "all_actual" list will later be a concatenation of all of the "actual" lists populated below
opendir(DIR, $plugindir) || die "Could Not Open $plugindir: $!\n";
@readinplugins = readdir(DIR);
closedir(DIR);

foreach my $p (@readinplugins)
{
    my $hive;
    next unless ($p =~ m/\.pl$/); # gonna skip any files which don't end in .pl
    my $pkg = (split(/\./,$p,2))[0]; # extract the package name (by removing the .pl)
    $p = catfile($plugindir, $p); # catfile is used to create absolute path filename (from File::Spec::Functions)
    eval
    {
        require $p; # "require" needs to be inside an eval in order to import package functions ?
        $hive = $pkg->getHive(); # hive name should be UPPERCASE but could be mixed *sigh*
    };
    print "Error: $@\n" if ($@);

    if ($hive =~ /NTUSER/i )
    {
        push(@ntuser_actual, $pkg);
    }
    elsif ($hive =~ /SOFTWARE/i )
    {
        push(@software_actual, $pkg);
    }
    elsif ($hive =~ /SAM/i )
    {
        push(@sam_actual, $pkg);
    }
    elsif ($hive =~ /SYSTEM/i )
    {
        push(@system_actual, $pkg);
    }
    elsif ($hive =~ /SECURITY/i )
    {
        push(@security_actual, $pkg);
    }
    elsif ($hive =~ /ALL/i )
    {
        push(@all_actual, $pkg); # some .pl plugins have "All" listed as their hive
    }
}

# Calls PrintMissingPlugins to compare a Plugins text file list with an "actual" *.pl plugins list
foreach my $plugin (@pluginslist)
{
    if ($plugin =~ /ntuser/)
    {
        PrintMissingPlugins("NTUSER", "ntuser", \@ntuser_actual, \@ntuser_plugins);
    }
    if ($plugin =~ /software/)
    {
        PrintMissingPlugins("SOFTWARE", "software", \@software_actual, \@software_plugins);
    }
    if ($plugin =~ /sam/)
    {
        PrintMissingPlugins("SAM", "sam", \@sam_actual, \@sam_plugins);
    }
    if ($plugin =~ /system/)
    {
        PrintMissingPlugins("SYSTEM", "system", \@system_actual, \@system_plugins);
    }
    if ($plugin =~ /security/)
    {
        PrintMissingPlugins("SECURITY", "security", \@security_actual, \@security_plugins);
    }
    if ($plugin =~ /all/)
    {
        PrintMissingPlugins("ALL", "all", \@all_actual, \@all_plugins);
    }
}

# End Main

sub PrintMissingPlugins
{
    my $hive = shift; # hive name
    my $name = shift; # Plugins text file name
    my $actual_plugins = shift; # reference to list of plugins derived from *.pl files
    my $listed_plugins = shift; # reference to list of plugins derived from Plugins text file

    my @missing_plugins; # list stores *.pl files which are NOT declared in a given Plugins text file
    my @missing_pl; # list stores Plugin entries which do NOT have a corresponding .pl file

    print "\nThere are ".scalar(@$actual_plugins)." $hive plugins in $plugindir\n";
    # We say "lines" because there can accidental multiple declarations *sigh*
    print "There are ".scalar(@$listed_plugins)." plugin lines listed in the $name Plugins file\n";
    print scalar(@excludelist)." plugins are being ignored\n";

    # Handle the "all" Plugin case discrepancy
    # There's a large mixture of different hive plugins listed in the "all" Plugins text file
    # and only a handful of plugins who actually return "All" as their hive.
    # At this point, @all_actual should only contain plugins which list "All" as their hive
    # In a fit of hacktacular-ness, we'll now also add the contents from the other "actual" arrays to @all_actual.
    # Otherwise, when we compare the list of "All" hive plugins (@$actual_plugins) with
    # the lines in the "all" Plugin (@$listed_plugins), there will be a lot of differences reported.
    if ($hive eq "ALL")
    {
        push(@$actual_plugins, @ntuser_actual);
        push(@$actual_plugins, @software_actual);
        push(@$actual_plugins, @sam_actual);
        push(@$actual_plugins, @system_actual);
        push(@$actual_plugins, @security_actual);
    }
    # From here on, @all_actual / @$actual_plugins will contain a list of every processed type of .pl plugin file

    # For each *.pl plugin file, check that it has a corresponding entry in the given Plugins text file
    foreach my $pkg (@$actual_plugins)
    {
        my $res = scalar(grep(/^($pkg)$/, @$listed_plugins)); # is this .pl listed in Plugins file ?
        my $ignore = scalar(grep(/^($pkg)$/, @excludelist)); # is this .pl being ignored ?
        if ( ($res eq 0) and ($ignore eq 0) )
        {
            push(@missing_plugins, $pkg);
        }
    }
    if (@missing_plugins)
    {
        print "\nThere are ".scalar(@missing_plugins)." plugins missing from the $name Plugins file:\n";
        foreach my $miss (@missing_plugins)
        {
            print $miss."\n";
        }
    }

    # For each line in the Plugins text file, check that it has a corresponding *.pl file
    foreach my $plug (@$listed_plugins)
    {
        # If this Plugins entry has been commented out (by a preceding "#") OR if it starts with a whitespace (newline),
        # skip to next entry so we don't get told there's no corresponding .pl file
        if ( ($plug =~ /^#/) or ( not $plug =~ /^\S/ ) )
        {
            next;
        }
        my $res = scalar (grep(/^($plug)$/, @$actual_plugins)); # does this Plugin entry have a corresponding .pl file ?
        my $ignore = scalar(grep(/^($plug)$/, @excludelist)); # is this Plugin entry being ignored ?
        if ( ($res eq 0) and ($ignore eq 0) )
        {
            push(@missing_pl, $plug);
        }
    }
    if (@missing_pl)
    {
        print "\nThere are ".scalar(@missing_pl)." plugins declared in the $name Plugins file with no corresponding .pl file:\n";
        foreach my $miss (@missing_pl)
        {
            print $miss."\n";
        }
    }

    if ( (@missing_plugins eq 0) and (@missing_pl eq 0) )
    {
        print "No missing plugins detected\n";
    }

}

# CODE ENDS HERE

Code Summary

Here's a high level overview of the code:
- Parses command line arguments using "GetOptions"
- Prints out Help message (if required)
- Extract entries from user nominated exclude text file (if one is specified)
- Read in the entries in the specified "Plugins" text file(s)
- Reads in the *.pl plugin names from the plugins directory and store them in hive related "actual" lists
- Calls "PrintMissingPlugins" subroutine to compare the differences between a "Plugins" text file list with an "actual" *.pl plugins list. Function also allows for commented out lines (eg "# COMMENT") and blank lines in "Plugin" files.

If you have RegRipper installed, you should already have the necessary Perl packages installed (eg File::Spec::Functions).

It's not the most concise/efficient code. If in doubt, I tried to make it more readable (at least for me!). I also made more comments in the code so I wouldn't have to write a lot in this section. I think that might prove more convenient than scrolling up/down between the summary and the code.

Finally, I should mention that I kept my "regripper-maint.pl" script (and my exclusion text file "regripper-maint-exclude") in "/home/sansforensics/". I didn't want the script to parse itself when looking for .pl plugin files. I suspect I could have just as easily called "regripper-maint.pl" from "/usr/local/src/regripper/". Meh.

Testing

We'll start testing (on SIFT v2.12 using the default RegRipper install) with the 2 examples given in the script's Help message.

sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -p ntuser -p sam

Running regripper-maint.pl v2012-04-04

Reading the ntuser Plugins File
Reading the sam Plugins File

There are 98 NTUSER plugins in /usr/local/src/regripper/plugins/
There are 97 plugin lines listed in the ntuser Plugins file
0 plugins are being ignored

There are 1 plugins missing from the ntuser Plugins file:
ccleaner

There are 1 SAM plugins in /usr/local/src/regripper/plugins/
There are 1 plugin lines listed in the sam Plugins file
0 plugins are being ignored
No missing plugins detected
sansforensics@SIFT-Workstation:~$

Looks OK! I have not added my "ccleaner.pl" script to the "ntuser" Plugins file so the results make sense. We can check the number of lines in the "ntuser" Plugins file using the following:

sansforensics@SIFT-Workstation:~$ wc -l /usr/local/src/regripper/plugins/ntuser
97 /usr/local/src/regripper/plugins/ntuser
sansforensics@SIFT-Workstation:~$

Just as reported, there's 97 lines in the "ntuser" Plugins file. As for the SAM results:

sansforensics@SIFT-Workstation:~$ wc -l /usr/local/src/regripper/plugins/sam
1 /usr/local/src/regripper/plugins/sam
sansforensics@SIFT-Workstation:~$

Which seems OK as there's only one "sam" Plugins entry ("samparse").
Now let's try the second help example ...

sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -x regripper-maint-exclude -p all

Running regripper-maint.pl v2012-04-04

Ignoring the ccleaner plugin for any comparisons

Reading the all Plugins File

There are 3 ALL plugins in /usr/local/src/regripper/plugins/
There are 204 plugin lines listed in the all Plugins file
1 plugins are being ignored

There are 2 plugins missing from the all Plugins file:
winlivemail
winlivemsn

There are 2 plugins declared in the all Plugins file with no corresponding .pl file:
port_dev
wlm_cu
sansforensics@SIFT-Workstation:~$

So this example shows the script referring to our "regripper-maint-exclude" exclusion file (which has one line containing "ccleaner"). Hence, it ignores any "ccleaner" plugin comparisons.
We can also see that while only 3 "ALL" *.pl plugins were found, there are 204 lines declared in the "all" Plugins file. The "all" Plugins file is a special case in that it can contain plugins which refer to more than one type of hive. See the code comments for more info on how I dealt with this (it ain't particularly pretty).
Anyhoo, we can also see that there are 2 .pl plugins which are NOT declared in the "all" Plugins file. Open the "all" file and verify it for yourself - there is no "winlivemail" or "winlivemsn" entry.
There are also 2 entries in the "all" Plugins file which don't have corresponding .pl files (ie "port_dev.pl" and "wlm_cu.pl" do not exist). That's gonna make it a bit hard to call those plugins eh?

And here is the same example test except WITHOUT using the "regripper-maint-exclude" exception file:

sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -p all

Running regripper-maint.pl v2012-04-04

Reading the all Plugins File

There are 3 ALL plugins in /usr/local/src/regripper/plugins/
There are 204 plugin lines listed in the all Plugins file
0 plugins are being ignored

There are 3 plugins missing from the all Plugins file:
winlivemail
winlivemsn
ccleaner

There are 2 plugins declared in the all Plugins file with no corresponding .pl file:
port_dev
wlm_cu
sansforensics@SIFT-Workstation:~$

You can see that the "ccleaner" plugin is now included in the missing plugins.
Just to prove this isn't all smoke and mirrors, here are the results for all 6 hives (with no exclusions):

sansforensics@SIFT-Workstation:~$ ./regripper-maint.pl -dir /usr/local/src/regripper/plugins/ -p ntuser -p software -p sam -p system -p security -p all

Running regripper-maint.pl v2012-04-04

Reading the ntuser Plugins File
Reading the software Plugins File
Reading the sam Plugins File
Reading the system Plugins File
Reading the security Plugins File
Reading the all Plugins File

There are 98 NTUSER plugins in /usr/local/src/regripper/plugins/
There are 97 plugin lines listed in the ntuser Plugins file
0 plugins are being ignored

There are 1 plugins missing from the ntuser Plugins file:
ccleaner

There are 54 SOFTWARE plugins in /usr/local/src/regripper/plugins/
There are 54 plugin lines listed in the software Plugins file
0 plugins are being ignored
No missing plugins detected

There are 1 SAM plugins in /usr/local/src/regripper/plugins/
There are 1 plugin lines listed in the sam Plugins file
0 plugins are being ignored
No missing plugins detected

There are 44 SYSTEM plugins in /usr/local/src/regripper/plugins/
There are 44 plugin lines listed in the system Plugins file
0 plugins are being ignored
No missing plugins detected

There are 3 SECURITY plugins in /usr/local/src/regripper/plugins/
There are 3 plugin lines listed in the security Plugins file
0 plugins are being ignored
No missing plugins detected

There are 3 ALL plugins in /usr/local/src/regripper/plugins/
There are 204 plugin lines listed in the all Plugins file
0 plugins are being ignored

There are 3 plugins missing from the all Plugins file:
winlivemail
winlivemsn
ccleaner

There are 2 plugins declared in the all Plugins file with no corresponding .pl file:
port_dev
wlm_cu
sansforensics@SIFT-Workstation:~$

I subsequently viewed the relevant Plugin files and confirmed the number of plugin lines declared matched what was printed above. I could also have used the "wc -l" trick mentioned previously as well. Meh.

Summary

Using Corey Harrell's idea, we've coded a RegRipper maintenance script which can detect discrepancies between whats declared in Plugin text files and what .pl plugins actually exist.
While this script doesn't help you if you don't use the Plugin text files, it has still been an interesting programming exercise. I can feel my Perl-fu growing stronger by the day ... muhahaha!

And for those that don't know, Cheeky4n6Monkey is now (intermittently) on Twitter (@Cheeky4n6Monkey) ... and co-incidentally(?) getting a lot less work done! Thankyou to all those who have already extended their welcome.

As always, please leave a comment if you found this post helpful / interesting / snake oil. PS I will send a copy of this script to the RegRipper folks just in case they find it useful (Thanks Brett and Francesco!).