What Is grep
grep
stands for “global regular expression print”.
From the docs:
The grep utility searches any given input files, selecting lines that match one or more patterns.
To me, basically grep
is a tool to filter through output of some command. But as you will see, later in this
post, you can use it to search through the content of files to get the results using grep
alone.
But regular expression (regex) is sometimes scary to write, since there’s a lot of syntax to express some stuff that we forget soon after writing it down. But in most cases, you will just use some term to filter for, not an actual regex. That could be “cat”, “purr”, “.txt”, etc.
Basics
Basic syntax is:
grep [OPTIONS] PATTERN [FILE...]
grep
can be used with piping other commands to it or standalone. Below is the example with the pipe to ls
command,
note the $
, when you see that, that is the line that contains commands I typed, everything else is the output of the
command.
$ ls -laFh
total 0
drwxr-xr-x 8 milosgarunovic staff 256B Mar 2 20:08 ./
drwxr-xr-x 8 milosgarunovic staff 256B Mar 2 19:23 ../
-rw-r--r-- 1 milosgarunovic staff 0B Mar 2 20:08 1.json
-rw-r--r-- 1 milosgarunovic staff 0B Mar 2 20:08 1.txt
-rw-r--r-- 1 milosgarunovic staff 0B Mar 2 20:08 1.txt2
-rw-r--r-- 1 milosgarunovic staff 0B Mar 2 20:08 1.xml
$ ls -laFh | grep .txt
-rw-r--r-- 1 milosgarunovic staff 0B Mar 2 20:08 1.txt
-rw-r--r-- 1 milosgarunovic staff 0B Mar 2 20:08 1.txt2
In the example above, grep
will filter everything from the input (which is the output of the ls
command) that
contains .txt
search term, and will give whole lines as output. You can use any command that produces some output to
filter through data.
Customization
I’ve created alias for grep - alias grep="grep --color=auto -i"
. So when I type grep
, I’m getting that from alias.
To check that out, you can type type grep
and you will get grep is aliased to `grep --color=auto -i'
. I know that
now I’m calling alias every time instead of the command itself, and I’m ok with that.
--color=auto
will give color (in my case red color) to term you searched for.
-i, --ignore-case
is for case insensitive search. Just be aware that case insensitive search is slower than case
sensitive. But it was never unbearably slow for me. Even for about 300.000 files that I searched for (which comes next
in this post), I’ve got results in under 45 seconds.
For me this works for now, I never had to drop those two arguments for something to work.
Besides, the whole post contains ls -laFh
, this too I’ve made an alias to ll
, so alias ll="ls -laFh"
is the
command. I type ll
always, but since it’s not standard command, I’m writing the full ls
command with flags for
better understanding.
Search File Content
In the company I work at, we have integration with 3rd party system, where the flow is that we pick up some files from them, process them, put things into the database, and save files (even tho they are processed). Files we get come with some standard naming format, with the important data in the file, most important data being the id (of user, patient, encounter, clinic etc.). One day, some data seems to be missing from the system. We had to figure out if we got the files in the first place, where the bug is and similar.
Since the data isn’t in our database (yet), next level is to search for files. We got ids that we should search for, but those ids aren’t in the name of the file, but inside. This is the first time I needed to go through about 300.000 files to find information that is lost, if it’s even there.
The problem explained above is something where I spent quite some time on (to be honest, more than 4 hours of
experimenting with command line and grep
, I was still new then), but I’ve learned a lot so I don’t regret it.
To figure out which files contain the data, I searched through them with grep -Rl $SEARCH_TERM $PATH
.
From man grep
:
-l, –files-with-matches
Only the names of files containing selected lines are written to standard output. grep will only search a file until a match has been found, making searches potentially less expensive. Pathnames are listed once per file searched. If the standard input is searched, the string ``(standard input)’’ is written.
-R, -r, –recursive
Recursively search subdirectories listed.
So -R
search recursively, and inside the files. Use recursion with caution.
-l
will return the directory name. I didn’t research that much, but -l
flag should be used with -R
in combination.
It can probably be used with other flags too.
Looking at docs for recursive search, I can use -R
, -r
, or --recursive
, but for some reason I’m used to -R
, but
you can use any one of them. For the example, I’ve made files that contain only numbers 1
, 2
and 3
inside them.
Just to keep it simple. So if I want to find number 3, or any other search term in any of these directories, I can do
something like in the example below:
$ ls -laFh
total 72
drwxr-xr-x 11 milosgarunovic staff 352B Mar 2 19:26 ./
drwxr-xr-x 8 milosgarunovic staff 256B Mar 2 19:23 ../
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:26 1.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:26 2.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:26 3.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:26 4.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:27 5.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:27 6.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:26 7.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:27 8.txt
-rw-r--r-- 1 milosgarunovic staff 2B Mar 2 19:27 9.txt
$
$ grep -Rl 3 .
./9.txt
./8.txt
$
$ grep -rl 3 .
./9.txt
./8.txt
$
$ grep -l --recursive 3 .
./9.txt
./8.txt
So what about if I have subdirectories? Like we have in production, the PATH/success
and PATH/error
. Here’s an
example of what would happen:
$ tree .
.
├── error
│ ├── 5.txt
│ ├── 6.txt
│ ├── 7.txt
│ ├── 8.txt
│ └── 9.txt
└── success
├── 1.txt
├── 2.txt
├── 3.txt
└── 4.txt
2 directories, 9 files
$ grep -Rl 3 .
./error/9.txt
./error/8.txt
$ grep -Rl 1 .
./success/2.txt
./success/1.txt
./error/5.txt
./error/7.txt
grep Chaining
Grep can be chained, to filter few times. An example combining previous ones, would be to search for files containing something, and then filter by filename:
$ grep -Rl 1 .
./success/2.txt
./success/1.txt
./error/5.txt
./error/7.txt
$ grep -Rl 1 . | grep success
./success/2.txt
./success/1.txt
$ grep -Rl 1 . | grep error
./error/5.txt
./error/7.txt
Of course, this is not as optimal if you have subdirectories, since it searches both directories and then you search for
only one of those directories. To optimize this example, you can use --include-dir
or --exclude-dir
to
include/exclude dirs you want to search, obviously you would usually use latter flag.
$ grep -Rl --exclude-dir success 1 .
./error/5.txt
./error/7.txt
You can pipe something else to output of grep, which I also use from time to time, to count number of files for example.
$ grep -Rl 1 . | wc -l
4 # this is the response of `wc' command
wc -l
stands for “word count - lines”, in human readable format.
Next example is something I use when I forget some longer command that I don’t type very often, and I didn’t create an
alias or script for. This is usually some docker
command, or ssh
to remote server that I forget the IP address or
hostname.
$ history | grep ssh
6650 ssh -f tunel@tunel-something.com -L 1443:something.com:443 -N
6683 ssh -f tunel@tunel-something.com -L 1443:something.com:443 -N
7365 ssh -f tunel@tunel-something.com -L 1443:something.com:443 -N
Tip: use !6650
to execute the first command from previous example (of course this won’t work on your machine, but
it wil work the numbers you get in return to corresponding commands). Also you can do !!
to execute the last command
you typed, so you don’t have to type up and enter. This is usually useful when you forget to type sudo
, so you can
just do sudo !!
.
Getting help
Using mac and linux command line (I didn’t use it in Windows, so I can’t speak for that), you will find yourself
forgetting a lot of flags and similar, and for that you should use the man
(short for manual) pages. You can usually
type man grep
to get grep
manual. Once you are in manual, you can navigate through it with typing /
then the
search term.
For example, I’ve made a mistake while writing this post for the flag --exclude-dir
, I’ve typed --exclude-dirs
first
(notice the ’s’ at the end, it shouldn’t be there), which grep didn’t find, so I’ve checked the manual, with flow
man grep
, then hitting /
and typing exclude
. Through man
page (or less
), navigate with n
for next and N
for previous. Hit q
to exit.
You can also use manual online, which you can find here.
Invert Match - Exclude
If you wanted to find if some process runs, you can type ps aux | grep $TERM
. This will give you:
$ ps aux | grep postgres
milosgarunovic 73808 0.9 0.0 4268036 820 s003 S+ 2:09PM 0:00.00 grep --color=auto -i postgres
milosgarunovic 888 0.0 0.0 4343260 2144 ?? Ss 21Jan19 22:07.11 postgres: stats collector process
milosgarunovic 887 0.0 0.0 4488256 3720 ?? Ss 21Jan19 5:46.78 postgres: autovacuum launcher process
# truncated output
Look at the first line of the output, it contains the grep
command as well. But if we don’t want to have that, we can
chain greps, which I usually use, you can exclude it like this:
$ ps aux | grep -v grep | grep postgres
milosgarunovic 888 0.0 0.0 4343260 2144 ?? Ss 21Jan19 22:07.11 postgres: stats collector process
milosgarunovic 887 0.0 0.0 4488256 3720 ?? Ss 21Jan19 5:46.78 postgres: autovacuum launcher process
# truncated output
You can put grep -v grep
anywhere after the first pipe, but the problem is that if you have --color=auto
set for
grep
, like I do, it won’t highlight your search term if you put it to the end, like ps aux | grep postgres | grep -v grep
. Since you are excluding the term grep
at the end, you won’t get highlighting for the term postgres
. So my
practice is that if I want to exclude something, I exclude it right away, and most important search term is in the last
pipe.
Practical example I made while writing this post
In the middle of writing this post, I’ve found one more purpose to use grep, and that is to get every post that is
draft. While using hugo - which this website is based upon - you have content
directory that has
all the posts that will be rendered. Every post has some metadata, one of which is draft: true/false
. So using
example from Search File Content, I’ve made a bash script:
#!/usr/bin/env bash
grep -Rl 'draft: true' ./content
You can notice here that if you have more than one word to search for, you should wrap it in single or double quotes.