How I Use grep Command

What Is grep

grep stands for “global regular expression print”.
From the docs:

The grep utility searches any given input files, selecting lines that match one or more patterns.

To me, basically grep is a tool to filter through output of some command. But as you will see, later in this post, you can use it to search through the content of files to get the results using grep alone.

But regular expression (regex) is sometimes scary to write, since there’s a lot of syntax to express some stuff that we forget soon after writing it down. But in most cases, you will just use some term to filter for, not an actual regex. That could be “cat”, “purr”, “.txt”, etc.

Basics

Basic syntax is:

grep [OPTIONS] PATTERN [FILE...]

grep can be used with piping other commands to it or standalone. Below is the example with the pipe to ls command, note the $, when you see that, that is the line that contains commands I typed, everything else is the output of the command.

$ ls -laFh
total 0
drwxr-xr-x  8 milosgarunovic  staff   256B Mar  2 20:08 ./
drwxr-xr-x  8 milosgarunovic  staff   256B Mar  2 19:23 ../
-rw-r--r--  1 milosgarunovic  staff     0B Mar  2 20:08 1.json
-rw-r--r--  1 milosgarunovic  staff     0B Mar  2 20:08 1.txt
-rw-r--r--  1 milosgarunovic  staff     0B Mar  2 20:08 1.txt2
-rw-r--r--  1 milosgarunovic  staff     0B Mar  2 20:08 1.xml
$ ls -laFh | grep .txt
-rw-r--r--  1 milosgarunovic  staff     0B Mar  2 20:08 1.txt
-rw-r--r--  1 milosgarunovic  staff     0B Mar  2 20:08 1.txt2

In the example above, grep will filter everything from the input (which is the output of the ls command) that contains .txt search term, and will give whole lines as output. You can use any command that produces some output to filter through data.

Customization

I’ve created alias for grep - alias grep="grep --color=auto -i". So when I type grep, I’m getting that from alias. To check that out, you can type type grep and you will get grep is aliased to `grep --color=auto -i'. I know that now I’m calling alias every time instead of the command itself, and I’m ok with that.

--color=auto will give color (in my case red color) to term you searched for.

-i, --ignore-case is for case insensitive search. Just be aware that case insensitive search is slower than case sensitive. But it was never unbearably slow for me. Even for about 300.000 files that I searched for (which comes next in this post), I’ve got results in under 45 seconds.

For me this works for now, I never had to drop those two arguments for something to work.

Besides, the whole post contains ls -laFh, this too I’ve made an alias to ll, so alias ll="ls -laFh" is the command. I type ll always, but since it’s not standard command, I’m writing the full ls command with flags for better understanding.

Search File Content

In the company I work at, we have integration with 3rd party system, where the flow is that we pick up some files from them, process them, put things into the database, and save files (even tho they are processed). Files we get come with some standard naming format, with the important data in the file, most important data being the id (of user, patient, encounter, clinic etc.). One day, some data seems to be missing from the system. We had to figure out if we got the files in the first place, where the bug is and similar.

Since the data isn’t in our database (yet), next level is to search for files. We got ids that we should search for, but those ids aren’t in the name of the file, but inside. This is the first time I needed to go through about 300.000 files to find information that is lost, if it’s even there.

The problem explained above is something where I spent quite some time on (to be honest, more than 4 hours of experimenting with command line and grep, I was still new then), but I’ve learned a lot so I don’t regret it.

To figure out which files contain the data, I searched through them with grep -Rl $SEARCH_TERM $PATH. From man grep:

-l, –files-with-matches
Only the names of files containing selected lines are written to standard output. grep will only search a file until a match has been found, making searches potentially less expensive. Pathnames are listed once per file searched. If the standard input is searched, the string ``(standard input)’’ is written.

-R, -r, –recursive
Recursively search subdirectories listed.

So -R search recursively, and inside the files. Use recursion with caution.

-l will return the directory name. I didn’t research that much, but -l flag should be used with -R in combination. It can probably be used with other flags too.

Looking at docs for recursive search, I can use -R, -r, or --recursive, but for some reason I’m used to -R, but you can use any one of them. For the example, I’ve made files that contain only numbers 1, 2 and 3 inside them. Just to keep it simple. So if I want to find number 3, or any other search term in any of these directories, I can do something like in the example below:

$ ls -laFh
total 72
drwxr-xr-x  11 milosgarunovic  staff   352B Mar  2 19:26 ./
drwxr-xr-x   8 milosgarunovic  staff   256B Mar  2 19:23 ../
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:26 1.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:26 2.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:26 3.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:26 4.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:27 5.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:27 6.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:26 7.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:27 8.txt
-rw-r--r--   1 milosgarunovic  staff     2B Mar  2 19:27 9.txt
$
$ grep -Rl 3 .
./9.txt
./8.txt
$
$ grep -rl 3 .
./9.txt
./8.txt
$
$ grep -l --recursive 3 .
./9.txt
./8.txt

So what about if I have subdirectories? Like we have in production, the PATH/success and PATH/error. Here’s an example of what would happen:

$ tree .
.
├── error
│   ├── 5.txt
│   ├── 6.txt
│   ├── 7.txt
│   ├── 8.txt
│   └── 9.txt
└── success
    ├── 1.txt
    ├── 2.txt
    ├── 3.txt
    └── 4.txt

2 directories, 9 files
$ grep -Rl 3 .
./error/9.txt
./error/8.txt
$ grep -Rl 1 .
./success/2.txt
./success/1.txt
./error/5.txt
./error/7.txt

grep Chaining

Grep can be chained, to filter few times. An example combining previous ones, would be to search for files containing something, and then filter by filename:

$ grep -Rl 1 .
./success/2.txt
./success/1.txt
./error/5.txt
./error/7.txt
$ grep -Rl 1 . | grep success
./success/2.txt
./success/1.txt
$ grep -Rl 1 . | grep error
./error/5.txt
./error/7.txt

Of course, this is not as optimal if you have subdirectories, since it searches both directories and then you search for only one of those directories. To optimize this example, you can use --include-dir or --exclude-dir to include/exclude dirs you want to search, obviously you would usually use latter flag.

$ grep -Rl --exclude-dir success 1 .
./error/5.txt
./error/7.txt

You can pipe something else to output of grep, which I also use from time to time, to count number of files for example.

$ grep -Rl 1 . | wc -l
       4 # this is the response of `wc' command

wc -l stands for “word count - lines”, in human readable format.

Next example is something I use when I forget some longer command that I don’t type very often, and I didn’t create an alias or script for. This is usually some docker command, or ssh to remote server that I forget the IP address or hostname.

$ history | grep ssh
 6650  ssh -f tunel@tunel-something.com -L 1443:something.com:443 -N
 6683  ssh -f tunel@tunel-something.com -L 1443:something.com:443 -N
 7365  ssh -f tunel@tunel-something.com -L 1443:something.com:443 -N

Tip: use !6650 to execute the first command from previous example (of course this won’t work on your machine, but it wil work the numbers you get in return to corresponding commands). Also you can do !! to execute the last command you typed, so you don’t have to type up and enter. This is usually useful when you forget to type sudo, so you can just do sudo !!.

Getting help

Using mac and linux command line (I didn’t use it in Windows, so I can’t speak for that), you will find yourself forgetting a lot of flags and similar, and for that you should use the man (short for manual) pages. You can usually type man grep to get grep manual. Once you are in manual, you can navigate through it with typing / then the search term.

For example, I’ve made a mistake while writing this post for the flag --exclude-dir, I’ve typed --exclude-dirs first (notice the ’s’ at the end, it shouldn’t be there), which grep didn’t find, so I’ve checked the manual, with flow man grep, then hitting / and typing exclude. Through man page (or less), navigate with n for next and N for previous. Hit q to exit.

You can also use manual online, which you can find here.

Invert Match - Exclude

If you wanted to find if some process runs, you can type ps aux | grep $TERM. This will give you:

$ ps aux | grep postgres
milosgarunovic   73808   0.9  0.0  4268036    820 s003  S+    2:09PM   0:00.00 grep --color=auto -i postgres
milosgarunovic     888   0.0  0.0  4343260   2144   ??  Ss   21Jan19  22:07.11 postgres: stats collector process
milosgarunovic     887   0.0  0.0  4488256   3720   ??  Ss   21Jan19   5:46.78 postgres: autovacuum launcher process
# truncated output

Look at the first line of the output, it contains the grep command as well. But if we don’t want to have that, we can chain greps, which I usually use, you can exclude it like this:

$ ps aux | grep -v grep | grep postgres
milosgarunovic     888   0.0  0.0  4343260   2144   ??  Ss   21Jan19  22:07.11 postgres: stats collector process
milosgarunovic     887   0.0  0.0  4488256   3720   ??  Ss   21Jan19   5:46.78 postgres: autovacuum launcher process
# truncated output

You can put grep -v grep anywhere after the first pipe, but the problem is that if you have --color=auto set for grep, like I do, it won’t highlight your search term if you put it to the end, like ps aux | grep postgres | grep -v grep. Since you are excluding the term grep at the end, you won’t get highlighting for the term postgres. So my practice is that if I want to exclude something, I exclude it right away, and most important search term is in the last pipe.

Practical example I made while writing this post

In the middle of writing this post, I’ve found one more purpose to use grep, and that is to get every post that is draft. While using hugo - which this website is based upon - you have content directory that has all the posts that will be rendered. Every post has some metadata, one of which is draft: true/false. So using example from Search File Content, I’ve made a bash script:

#!/usr/bin/env bash

grep -Rl 'draft: true' ./content

You can notice here that if you have more than one word to search for, you should wrap it in single or double quotes.