grep - Terminal Tooling
grep unix_tools
Overview
Grep is a fundamental part of the command line toolbelt. True to the Unix philosophy,
grep does one thing, and does it well. Its job is to search input for a given
search pattern and print all lines that match. Think of it like Command-f
for the
terminal.
Its origins date back to the ed
text editor. It was a common practice within ed
to execute
the command g/r(egular) e(expression)/p
, meaning, globally
search for lines that match
the given regular expression
and print
them. This functionality was so useful, that the
name "grep" became synonomous with the act of searching files and printing matches, and
a standalone grep
program was born.1
There are two ways to have grep
search text. One is to provide a list of files as
the target of the search which will be searched line-by-line for your pattern
and matching lines will be printed. The other is to search incoming text from stdin
and print the matching lines.
Examples - Searching files
Basics
Let's look at some examples, looking in the same k8s-app
project you've seen
in all previous examples. In this case, we'll focus first in the backend
directory.
echo "Current directory: $(pwd)" echo "----------" ls
Current directory: /Users/jharder/Code/k8s-app/backend ---------- Dockerfile app.go db.go go.mod go.sum main.go tracing.go
Let's say you're interesting in finding all occurrences of the phrase trace
.
grep trace *
app.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) { db.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) { go.mod: go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 go.mod: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 go.mod: go.opentelemetry.io/otel/trace v1.14.0 go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 h1:TKf2uAs2ueguzLaxOCBXNpHxfO/aC7PAdDsSH0IbeRQ= go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0/go.mod h1:HrbCVv40OOLTABmOn1ZWty6CHXkU8DK/Urc43tHug70= go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 h1:3jAYbRHQAqzLjd9I4tzxwJ8Pk/N6AqBcF6m1ZHrxG94= go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0/go.mod h1:+N7zNjIJv4K+DeX67XXET0P+eIciESgaFDBqh+ZJFS4= go.sum:go.opentelemetry.io/otel/trace v1.14.0 h1:wp2Mmvj41tDsyAJXiWDWpfNsOiIyd38fy85pyKcFq/M= go.sum:go.opentelemetry.io/otel/trace v1.14.0/go.mod h1:8avnQLK+CG77yNLUae4ea2JDQ6iT+gozhnZjy/rw9G8= main.go: "go.opentelemetry.io/otel/trace" main.go: span := trace.SpanFromContext(ctx) main.go: log.Printf("Error shutting down tracer provider: %v", err) tracing.go: "go.opentelemetry.io/otel/exporters/otlp/otlptrace" tracing.go: "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" tracing.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" tracing.go:func initExporter(ctx context.Context) (*otlptrace.Exporter, error) { tracing.go: client := otlptracehttp.NewClient() tracing.go: exporter, err := otlptrace.New(ctx, client) tracing.go:func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) { tracing.go: // https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/otlptracehttp/example_test.go#L68 tracing.go: // exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint()) tracing.go: trace_provider := sdktrace.NewTracerProvider( tracing.go: sdktrace.WithSampler(sdktrace.AlwaysSample()), tracing.go: sdktrace.WithBatcher(exporter), tracing.go: sdktrace.WithResource(resource.NewWithAttributes(semconv.SchemaURL, semconv.ServiceName(name))), tracing.go: otel.SetTracerProvider(trace_provider) tracing.go: return trace_provider, nil
That's a lot of results, but we probably don't care about go.mod
and go.sum
. We can trim the
results down by specifying a glob that feeds only specific files to grep to search through.2
Let's search for only Go files (with the .go
suffix).
If we instead run grep trace **.go
, our shell will expand *.go
to app.go db.go main.go tracing.go
3,
and then feed that as arguments to grep
. Let's see the results:4
grep trace *.go
app.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) { db.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) { main.go: "go.opentelemetry.io/otel/trace" main.go: span := trace.SpanFromContext(ctx) main.go: log.Printf("Error shutting down tracer provider: %v", err) tracing.go: "go.opentelemetry.io/otel/exporters/otlp/otlptrace" tracing.go: "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" tracing.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" tracing.go:func initExporter(ctx context.Context) (*otlptrace.Exporter, error) { tracing.go: client := otlptracehttp.NewClient() tracing.go: exporter, err := otlptrace.New(ctx, client) tracing.go:func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) { tracing.go: // https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/otlptracehttp/example_test.go#L68 tracing.go: // exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint()) tracing.go: trace_provider := sdktrace.NewTracerProvider( tracing.go: sdktrace.WithSampler(sdktrace.AlwaysSample()), tracing.go: sdktrace.WithBatcher(exporter), tracing.go: sdktrace.WithResource(resource.NewWithAttributes(semconv.SchemaURL, semconv.ServiceName(name))), tracing.go: otel.SetTracerProvider(trace_provider) tracing.go: return trace_provider, nil
Excluding files
It's a bit annoying to have to deal with tracing.go
polluting our results as it's obviously
going to have a bunch of hits which we don't care about. There's no quick and easy way to
exclude it from the shell glob, but grep
itself gives us a way to exclude certain files
from being processed when it runs. --exclude
takes a pattern that it will run against the
name of each file it will process, if the pattern matches, grep
will skip that file. The
opposite behavior can be achieved with --include
.
grep --exclude 'tracing.go' trace *.go
app.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) { db.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) { main.go: "go.opentelemetry.io/otel/trace" main.go: span := trace.SpanFromContext(ctx) main.go: log.Printf("Error shutting down tracer provider: %v", err)
Line numbers
Having these results is handy, but it would be better to know where in the file these matches
can be found. Passing the -n
or --line-number
flag provides the line number of each hit.
grep -n trace
Context
This is indeed nicer, but it would be nicer still if we had a bit of context for each of the matches. What is going on around the lines where these results were found?
You can use the -C
or --context
argument with a positive number and that number of
lines before and after the match will be printed to stdout as well.
grep -n --context=3 trace main.go
8- 9- "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp" 10- "go.opentelemetry.io/otel" 11: "go.opentelemetry.io/otel/trace" 12-) 13- 14-const name = "recipe-service" -- 23- return func(w http.ResponseWriter, r *http.Request) { 24- ctx := r.Context() 25- // grabs the current span 26: span := trace.SpanFromContext(ctx) 27- span.AddEvent("GetRecipes") 28- log.Println("GET /recipes") 29- recipes, err := app.getRecipes(ctx) -- 74- } 75- defer func() { 76- if err := tp.Shutdown(context.Background()); err != nil { 77: log.Printf("Error shutting down tracer provider: %v", err) 78- } 79- }() 80-
Lines ending in -
signify they are lines added for context, lines ending in :
signify the
matched line itself, and --
separates each matched line (and its context).
You can also specify the number of lines to display before and after separately using B
or --before-context
and -A
or --after-context
respectively.
Count
Sometimes it can be helpful to understand how often a particular search term appears
in your searched files. For this, you can use -c
or --count
.
grep -c trace *.go
app.go:2 db.go:2 main.go:3 tracing.go:15
Matching files only
If you didn't care about which lines specifically mached your pattern, but only which
files contained the match, you can use the -l
or --files-with-matches
flag to only show
the files instead.
grep -l --exclude '*.sum' --exclude '*.mod' trace *
app.go db.go main.go tracing.go
You can also see here that the exclude (and their --include
couterpart) actually take
a shell glob as their argument, not just an exact file name. Here we use this to exclude
all files ending in .sum
or .mod
.
Printing mached content only
Perhaps the inverse of showing the files that match is showing the match only, not the rest
of the line. This is accomplished using the -o
or --only-matching
flag. This is likely most
handy when paired witn the -E
or --extended-regexp
flag which enables full regular expression
support.
The regular expression searches for anything that starts with With
and matches any number of
characters that aren't the (
symbol. This has the effect of finding any function definition
or call starting with With
. Using -o
means all we get back is what actually mached the regular
expression, and which file it came from. If we were interested in knowing where in the file
the match was, we could add the -n
flag as well.
grep --exclude '*.sum' --exclude '*.mod' -o -E 'With[^(]*' *
db.go:WithTracerProvider tracing.go:WithPrettyPrint tracing.go:WithSampler tracing.go:WithBatcher tracing.go:WithResource tracing.go:WithAttributes
Searching recursively
Now suppose you didn't know which folder contained the files you were looking for. Instead
you can run grep
from any directory and pass the -r
or --recursive
flag to change grep from
searching through only the files explicitly given to searching all files recursively in the
provided directory and all children directories and files.
Lets go up one directory from our go app example. In the parent directory we have directories for multiple applications and the IaC kubernetes code to deploy each of them:
echo "Current Directry: $(pwd)" echo "---------------" ls -F
Current Directry: /Users/jharder/Code/k8s-app --------------- backend/ docker-compose.yml files/ frontend/ kubernetes/
We may not want to restrict our search to just the backend; let's search for trace
from here
using the recursive flag.
When searching recursively, you can specify the directry to start the search from in place of the files to search in. If you don't provide a directory, the current working directory will be used instead
grep -r trace | head -n 10
./frontend/node_modules/@types/express-serve-static-core/index.d.ts: trace: IRouterMatcher<this>; ./frontend/node_modules/@types/express-serve-static-core/index.d.ts: trace: IRouterHandler<this, Route>; ./frontend/node_modules/@types/node/globals.d.ts: * Optional override for formatting stack traces ./frontend/node_modules/@types/node/globals.d.ts: * @see https://v8.dev/docs/stack-trace-api#customizing-stack-traces ./frontend/node_modules/@types/node/ts4.8/globals.d.ts: * Optional override for formatting stack traces ./frontend/node_modules/@types/node/ts4.8/globals.d.ts: * @see https://v8.dev/docs/stack-trace-api#customizing-stack-traces ./frontend/node_modules/@types/node/ts4.8/tls.d.ts: * When enabled, TLS packet trace information is written to `stderr`. This can be ./frontend/node_modules/@types/node/ts4.8/tls.d.ts: * Note: The format of the output is identical to the output of `openssl s_client -trace` or `openssl s_server -trace`. While it is produced by OpenSSL's`SSL_trace()` function, the format is ./frontend/node_modules/@types/node/ts4.8/tls.d.ts: * When enabled, TLS packet trace information is written to `stderr`. This can be ./frontend/node_modules/@types/node/ts4.8/trace_events.d.ts: * The `trace_events` module provides a mechanism to centralize tracing information
I'll save you from scrolling through the thousands of lines of results from
./frontend/node_modules/
by just displaying the first 10 lines using head.
Grep can save us from this annoyance as well; when searching recursively, you can exclude
whole directories from its search in the same way that you could exclude individual files.
Rather than using --exclude
, we can use --exclude-dir
grep -r --exclude-dir 'node_modules' trace
./backend/go.mod: go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 ./backend/go.mod: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 ./backend/go.mod: go.opentelemetry.io/otel/trace v1.14.0 ./backend/db.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" ./backend/db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) { ./backend/tracing.go: "go.opentelemetry.io/otel/exporters/otlp/otlptrace" ./backend/tracing.go: "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" ./backend/tracing.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" ./backend/tracing.go:func initExporter(ctx context.Context) (*otlptrace.Exporter, error) { ./backend/tracing.go: client := otlptracehttp.NewClient() ./backend/tracing.go: exporter, err := otlptrace.New(ctx, client) ./backend/tracing.go:func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) { ./backend/tracing.go: // https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/otlptracehttp/example_test.go#L68 ./backend/tracing.go: // exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint()) ./backend/tracing.go: trace_provider := sdktrace.NewTracerProvider( ./backend/tracing.go: sdktrace.WithSampler(sdktrace.AlwaysSample()), ./backend/tracing.go: sdktrace.WithBatcher(exporter), ./backend/tracing.go: sdktrace.WithResource(resource.NewWithAttributes(semconv.SchemaURL, semconv.ServiceName(name))), ./backend/tracing.go: otel.SetTracerProvider(trace_provider) ./backend/tracing.go: return trace_provider, nil ./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 h1:TKf2uAs2ueguzLaxOCBXNpHxfO/aC7PAdDsSH0IbeRQ= ./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0/go.mod h1:HrbCVv40OOLTABmOn1ZWty6CHXkU8DK/Urc43tHug70= ./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 h1:3jAYbRHQAqzLjd9I4tzxwJ8Pk/N6AqBcF6m1ZHrxG94= ./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0/go.mod h1:+N7zNjIJv4K+DeX67XXET0P+eIciESgaFDBqh+ZJFS4= ./backend/go.sum:go.opentelemetry.io/otel/trace v1.14.0 h1:wp2Mmvj41tDsyAJXiWDWpfNsOiIyd38fy85pyKcFq/M= ./backend/go.sum:go.opentelemetry.io/otel/trace v1.14.0/go.mod h1:8avnQLK+CG77yNLUae4ea2JDQ6iT+gozhnZjy/rw9G8= ./backend/app.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" ./backend/app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) { ./backend/main.go: "go.opentelemetry.io/otel/trace" ./backend/main.go: span := trace.SpanFromContext(ctx) ./backend/main.go: log.Printf("Error shutting down tracer provider: %v", err) ./files/config.yml: traces: ./kubernetes/otelcollector-config.yml: traces:
--exclude
and --exclude-dir
can be compined as well. Let's remove ./backend/tracing.go
and go.mod
and
go.sum
from our results using --exclude
.
grep -r --exclude-dir 'node_modules' --exclude 'tracing.go' --exclude '*.sum' --exclude '*.mod' trace
./backend/db.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" ./backend/db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) { ./backend/app.go: sdktrace "go.opentelemetry.io/otel/sdk/trace" ./backend/app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) { ./backend/main.go: "go.opentelemetry.io/otel/trace" ./backend/main.go: span := trace.SpanFromContext(ctx) ./backend/main.go: log.Printf("Error shutting down tracer provider: %v", err) ./files/config.yml: traces: ./kubernetes/otelcollector-config.yml: traces:
Hopefully this gives you a feel for how you can combine some of the different flags and features of grep to pinpoint what you're looking for. But that's just half of the story for grep. In addition to searching files, it is just as capable at searching anything that you throw at it coming from STDIN.
Examples - Searching STDIN
Filtering command output
When you use the pipline operator |
, the output of the previous command is fed as input
to the next command. For grep, this means that you can search the results of anything
that produces output.
One example I've used recently is when trying to figure out which docker containers
I have downloaded on my system. docker image ls
just shows me the complete list, but if
I have hundreds of images downloaded it can be quite a bear to look through them all to find
the one I'm looking for. Grep comes to the rescue! If I'm looking only for which versions of python
I have available, I can filter the results of docker image ls
down using grep.5
docker image ls | grep -h python | cut -w -f 1,2
python 3.7 python latest
Filtering streams
Because of the wonderful magic of streams combined with pipelines, you can grep the output
of a command that may run for a long time (or forever). Say you are trying to monitor a
log file that's being continuously written to. You can use tail -f
to follow the contents
and display each new line as soon as it's written to the file, but there might be a lot of
noise is this particular log file. You can filter the contents in mid stream by piping tail
to grep.
tail -f tmp.log | grep -h -E 'WARNING|ERROR|FATAL'
[2023-07-15T19:23:31.968712+00:00] app.ERROR: Something went wrong. [2023-07-15T19:23:32.205448+00:00] app.ERROR: Something slightly worse went wrong. [2023-07-15T19:23:34.422216+00:00] app.WARNING: Could not find file, I'm freaking out man. [2023-07-15T19:23:35.382214+00:00] app.FATAL: Something really bad happened, crashing now.
With the power of imagination you can see each line appearing as it is written to the log.
Here we're using the -E
flag which allows us to use regular expressions for our pattern,
allowing us to only print logs with severity 'WARNING' or 'ERROR' or 'FATAL'.
With this, we can filter results as they're streamed to only warning
or higher severity
logs. But sometimes we want to see wore context before a fatal log occurs. Simple filtering
out info
and warning
logs might have conceiled the problem. Let's use --before-context
to
see what happened before that fatal event.
grep --before-context=4 FATAL tmp.log
[2023-07-15T19:23:33.542697+00:00] app.DEBUG: I'm just going to delete this file, this is fine. [2023-07-15T19:23:34.101258+00:00] app.DEBUG: Buisiness as usual. [2023-07-15T19:23:34.223435+00:00] app.DEBUG: Trying to open file. [2023-07-15T19:23:34.422216+00:00] app.WARNING: Could not find file, I'm freaking out man. [2023-07-15T19:23:35.382214+00:00] app.FATAL: Something really bad happened, crashing now.
Conclusion
Because grep can operate on any input sent to it, the possible ways to use it are endless. If you ever find yourself in a situation where you wish you could filter some data, you should probably reach for grep first and see if it meets your needs. There are other searching and filtering tools out there that are more specialized, but grep works everywhere, and it's installed everywhere.
I would encourage you to read the whole grep man pages, here. I covered most of the common use cases, but there are many more flags and features hidden away in grep for those willing to search for them.
Notably, this article did not cover any of the variations of grep like pgrep
, rgrep
, egrep
,
fgrep
, etc. You can read more about them in the grep manual.
Footnotes:
https://en.wikipedia.org/wiki/Ed_(text_editor)
Despite being notoriously difficult to understand, ed went on to influence and spawn ex, then vi, leading to vim and neovim.
One thing to note is that file expansion done through globbing is done prior to feeding any arguments to grep; this doesn't matter in this case but it's something to keep in mind.
Another thing to note is that shell filename expansion (globbing) does not use regular
expressions. *.go
would be an illegal regular expression because *
is a quantifier and must
come after some character class expression (like .
). In shell filename expansion, *
matches
every filename. *.go
matches every file that ends with .go
(note the .
is not a special
character like it is in regex). ?
matches any one character (this is exactly like .
in
regular expressions).
I'm using the -h
flag for grep
which surpresses the 'filename' prefix of its output. When grep
is used in a pipeline instead of the filename, it prefixes every line with '(standard input):'
which is not super helpful.
I'm also using cut to just show the columns I care about. This is mostly so that the output shows up nicely in this blog; it's not strictly necessary.