When reverse engineering stuff you often get a directory tree with a whole bunch of files (both binaries and text files) and you want to quickly find all occurrences of keywords you are interested in. Typical examples for this problem are program directories of applications, extracted Apps or the root filesystem of an embedded system.
When reverse engineering Linux-based firmware images you typically start by extracting the root filesystem (or initrd) so that you can analyze the userspace programs, scripts and configuration files. There are already some good tutorials (  ) and tools like binwalk and firmware-mod-kit which automate many steps of finding/extracting the root filesystem from a binary firmware image. However, once you've got the root filesystem, you often find a whole bunch of files and it can be quite difficult to find the interesting stuff to analyze. For instance, you may find a juicy configuration variable in /etc and want to find all references to this configuration variable in the firmware. Using the standard
grep utility does a good job at analyzing text files but it isn't nearly as useful for binary files, which may still contain the keyword you are looking for. By default
grep only says whether the keyword is there or not and it doesn't display the context around the keyword (as it does for text files). Forcing
grep to treat binaries as text files using the
-a option also doesn't solve the problem either since
grep will then output a whole bunch of binary data before and after the match until the next newline and you probably don't want to see this binary data in your terminal.
But luckily there are a lot of useful standard tools available on a Linux system and you can cleverly combine them to overcome this limitation. I've come up with the following command for grepping through directory trees:
find . -type f -print0|xargs -0 strings -a --print-file-name|grep -i -E ':.*your_keyword_here'|less -S
find command just searches the current directory for files and prints the filenames to standard output separated by a null byte. Using a null byte instead of a newline makes sure that it doesn't fail if filenames in the tree contain special characters such as a spaces or newlines. Using the filter
"-type f" makes sure that it only finds regular files and not directories, symlinks, devices or unix domain sockets, which may exist in your directory as well and would cause problems with the following tools.
The output of find is piped to
xargs, which will call the command
strings for all files found by the
find command. The option
xargs that the input is separated by null bytes instead of newlines. The program
strings looks through the file and outputs all sequences of at least 4 printable characters. Since
grep processes the output of
strings and not the actual files,
grep can't show the filename of a match (as it does when using
grep to recursively search in a directory). Since you typically want to know in which files your search results are, you can use the option
strings so that the output contains the filename as well. The
-a option of
strings tells it to parse the whole file and not only certain sections of ELF files.
The next step is to use
grep to filter the output of strings in order to search for a specific keyword. If you don't want to search case-insensitively, you'll have to remove the
-i option of
grep. Using the pattern
':.*' before the actual keyword makes sure that it won't flood your search results with all strings of a file if the filename (which is prepended by the
strings) already contains the keyword you are searching for.
Last but not least I recommend piping the results to
less -S so that
less will only use one line of the screen per result. This makes the results easier to interpret especially if you have really long lines in the results (which occasionally happens with firmware images) and you don't want to have a hundred lines of wrapped text for one single search result. You can still see the full output lines by scrolling horizontally in
less (or just use the search function of
less to navigate to the actual keyword).
The search can take some time especially for large directory trees. In that case you can easily speed up the process by saving the output of
strings to a file:
find . -type f -print0|xargs -0 strings -a --print-file-name > /tmp/strings.txt
This intermediate results can then be used for many searches:
cat /tmp/strings.txt|grep -i -E ':.*your_keyword_here'|less -S
A test with the 2.1 GB /usr/lib/ directory on my notebook created a 1.2 GB strings.txt and searching this file takes some 10 seconds given that it is still cached in memory.
The same commands can also be used for other reversing tasks such as program directories, extracted apps or even web applications (which may also include binary files like sqlite databases).
If you expect other character encodings such as utf16 (wich is quite common for Windows applications), you will need to use the
-e option of
strings. The following command tries ascii/utf8, utf16 and utf32:
for enc in S l L;do find . -type f -print0|xargs -0 strings -e $enc --print-file-name;done > /tmp/strings.txt cat /tmp/strings.txt|grep -i -E ':.*your_keyword_here'|less -S