Grep

Oct 27, 2023    #posix   #how-to  

Overview

Grep is a fundamental part of the command line toolbelt. True to the Unix philosophy , grep does one thing, and does it well. Its job is to search input for a given search pattern and print all lines that match. Think of it like Command-f for the terminal.

Its origins date back to the ed text editor. It was a common practice within ed to execute the command g/r(egular) e(expression)/p, meaning, globally search for lines that match the given regular expression and print them. This functionality was so useful, that the name "grep" became synonomous with the act of searching files and printing matches, and a standalone grep program was born.1

There are two ways to have grep search text. One is to provide a list of files as the target of the search which will be searched line-by-line for your pattern and matching lines will be printed. The other is to search incoming text from stdin and print the matching lines.

Examples - Searching files

Basics

Let’s look at some examples, looking in the same k8s-app project you’ve seen in all previous examples. In this case, we’ll focus first in the backend directory.

echo "Current directory: $(pwd)"
echo "----------"
ls
Current directory: /Users/jharder/Code/k8s-app/backend
----------
Dockerfile
app.go
db.go
go.mod
go.sum
main.go
tracing.go

Let’s say you’re interesting in finding all occurrences of the phrase trace.

grep trace *
app.go:   sdktrace "go.opentelemetry.io/otel/sdk/trace"
app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) {
db.go:    sdktrace "go.opentelemetry.io/otel/sdk/trace"
db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) {
go.mod:   go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0
go.mod:   go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0
go.mod:   go.opentelemetry.io/otel/trace v1.14.0
go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 h1:TKf2uAs2ueguzLaxOCBXNpHxfO/aC7PAdDsSH0IbeRQ=
go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0/go.mod h1:HrbCVv40OOLTABmOn1ZWty6CHXkU8DK/Urc43tHug70=
go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 h1:3jAYbRHQAqzLjd9I4tzxwJ8Pk/N6AqBcF6m1ZHrxG94=
go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0/go.mod h1:+N7zNjIJv4K+DeX67XXET0P+eIciESgaFDBqh+ZJFS4=
go.sum:go.opentelemetry.io/otel/trace v1.14.0 h1:wp2Mmvj41tDsyAJXiWDWpfNsOiIyd38fy85pyKcFq/M=
go.sum:go.opentelemetry.io/otel/trace v1.14.0/go.mod h1:8avnQLK+CG77yNLUae4ea2JDQ6iT+gozhnZjy/rw9G8=
main.go:  "go.opentelemetry.io/otel/trace"
main.go:      span := trace.SpanFromContext(ctx)
main.go:          log.Printf("Error shutting down tracer provider: %v", err)
tracing.go:   "go.opentelemetry.io/otel/exporters/otlp/otlptrace"
tracing.go:   "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
tracing.go:   sdktrace "go.opentelemetry.io/otel/sdk/trace"
tracing.go:func initExporter(ctx context.Context) (*otlptrace.Exporter, error) {
tracing.go:   client := otlptracehttp.NewClient()
tracing.go:   exporter, err := otlptrace.New(ctx, client)
tracing.go:func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
tracing.go:   // https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/otlptracehttp/example_test.go#L68
tracing.go:   // exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())
tracing.go:   trace_provider := sdktrace.NewTracerProvider(
tracing.go:       sdktrace.WithSampler(sdktrace.AlwaysSample()),
tracing.go:       sdktrace.WithBatcher(exporter),
tracing.go:       sdktrace.WithResource(resource.NewWithAttributes(semconv.SchemaURL, semconv.ServiceName(name))),
tracing.go:   otel.SetTracerProvider(trace_provider)
tracing.go:   return trace_provider, nil

That’s a lot of results, but we probably don’t care about go.mod and go.sum. We can trim the results down by specifying a glob that feeds only specific files to grep to search through.2 Let’s search for only Go files (with the .go suffix).

If we instead run grep trace **.go, our shell will expand *.go to app.go db.go main.go tracing.go3, and then feed that as arguments to grep. Let’s see the results:4

grep trace *.go
app.go:   sdktrace "go.opentelemetry.io/otel/sdk/trace"
app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) {
db.go:    sdktrace "go.opentelemetry.io/otel/sdk/trace"
db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) {
main.go:  "go.opentelemetry.io/otel/trace"
main.go:      span := trace.SpanFromContext(ctx)
main.go:          log.Printf("Error shutting down tracer provider: %v", err)
tracing.go:   "go.opentelemetry.io/otel/exporters/otlp/otlptrace"
tracing.go:   "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
tracing.go:   sdktrace "go.opentelemetry.io/otel/sdk/trace"
tracing.go:func initExporter(ctx context.Context) (*otlptrace.Exporter, error) {
tracing.go:   client := otlptracehttp.NewClient()
tracing.go:   exporter, err := otlptrace.New(ctx, client)
tracing.go:func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
tracing.go:   // https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/otlptracehttp/example_test.go#L68
tracing.go:   // exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())
tracing.go:   trace_provider := sdktrace.NewTracerProvider(
tracing.go:       sdktrace.WithSampler(sdktrace.AlwaysSample()),
tracing.go:       sdktrace.WithBatcher(exporter),
tracing.go:       sdktrace.WithResource(resource.NewWithAttributes(semconv.SchemaURL, semconv.ServiceName(name))),
tracing.go:   otel.SetTracerProvider(trace_provider)
tracing.go:   return trace_provider, nil

Excluding files

It’s a bit annoying to have to deal with tracing.go polluting our results as it’s obviously going to have a bunch of hits which we don’t care about. There’s no quick and easy way to exclude it from the shell glob, but grep itself gives us a way to exclude certain files from being processed when it runs. --exclude takes a pattern that it will run against the name of each file it will process, if the pattern matches, grep will skip that file. The opposite behavior can be achieved with --include.

grep --exclude 'tracing.go' trace *.go
app.go: sdktrace "go.opentelemetry.io/otel/sdk/trace"
app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) {
db.go:  sdktrace "go.opentelemetry.io/otel/sdk/trace"
db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) {
main.go:    "go.opentelemetry.io/otel/trace"
main.go:        span := trace.SpanFromContext(ctx)
main.go:            log.Printf("Error shutting down tracer provider: %v", err)

Line numbers

Having these results is handy, but it would be better to know where in the file these matches can be found. Passing the -n or --line-number flag provides the line number of each hit.

grep -n trace
11: "go.opentelemetry.io/otel/trace"
26:     span := trace.SpanFromContext(ctx)
77:         log.Printf("Error shutting down tracer provider: %v", err)

Context

This is indeed nicer, but it would be nicer still if we had a bit of context for each of the matches. What is going on around the lines where these results were found?

You can use the -C or --context argument with a positive number and that number of lines before and after the match will be printed to stdout as well.

grep -n --context=3 trace main.go
8-
9-    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
10-   "go.opentelemetry.io/otel"
11:   "go.opentelemetry.io/otel/trace"
12-)
13-
14-const name = "recipe-service"
--
23-   return func(w http.ResponseWriter, r *http.Request) {
24-       ctx := r.Context()
25-       // grabs the current span
26:       span := trace.SpanFromContext(ctx)
27-       span.AddEvent("GetRecipes")
28-       log.Println("GET /recipes")
29-       recipes, err := app.getRecipes(ctx)
--
74-   }
75-   defer func() {
76-       if err := tp.Shutdown(context.Background()); err != nil {
77:           log.Printf("Error shutting down tracer provider: %v", err)
78-       }
79-   }()
80-

Lines ending in - signify they are lines added for context, lines ending in : signify the matched line itself, and -- separates each matched line (and its context).

You can also specify the number of lines to display before and after separately using B or --before-context and -A or --after-context respectively.

Count

Sometimes it can be helpful to understand how often a particular search term appears in your searched files. For this, you can use -c or --count.

grep -c trace *.go
app.go:2
db.go:2
main.go:3
tracing.go:15

Matching files only

If you didn’t care about which lines specifically mached your pattern, but only which files contained the match, you can use the -l or --files-with-matches flag to only show the files instead.

grep -l --exclude '*.sum' --exclude '*.mod' trace *
app.go
db.go
main.go
tracing.go

You can also see here that the exclude (and their --include couterpart) actually take a shell glob as their argument, not just an exact file name. Here we use this to exclude all files ending in .sum or .mod.

Printing mached content only

Perhaps the inverse of showing the files that match is showing the match only, not the rest of the line. This is accomplished using the -o or --only-matching flag. This is likely most handy when paired witn the -E or --extended-regexp flag which enables full regular expression support.

The regular expression searches for anything that starts with With and matches any number of characters that aren't the ( symbol. This has the effect of finding any function definition or call starting with With. Using -o means all we get back is what actually mached the regular expression, and which file it came from. If we were interested in knowing where in the file the match was, we could add the -n flag as well.

grep --exclude '*.sum' --exclude '*.mod' -o -E 'With[^(]*' *
db.go:WithTracerProvider
tracing.go:WithPrettyPrint
tracing.go:WithSampler
tracing.go:WithBatcher
tracing.go:WithResource
tracing.go:WithAttributes

Searching recursively

Now suppose you didn’t know which folder contained the files you were looking for. Instead you can run grep from any directory and pass the -r or --recursive flag to change grep from searching through only the files explicitly given to searching all files recursively in the provided directory and all children directories and files.

Lets go up one directory from our go app example. In the parent directory we have directories for multiple applications and the IaC kubernetes code to deploy each of them:

echo "Current Directry: $(pwd)"
echo "---------------"
ls -F
Current Directry: /Users/jharder/Code/k8s-app
---------------
backend/
docker-compose.yml
files/
frontend/
kubernetes/

We may not want to restrict our search to just the backend; let’s search for trace from here using the recursive flag.

When searching recursively, you can specify the directry to start the search from in place of the files to search in. If you don't provide a directory, the current working directory will be used instead

grep -r trace | head -n 10
./frontend/node_modules/@types/express-serve-static-core/index.d.ts:    trace: IRouterMatcher<this>;
./frontend/node_modules/@types/express-serve-static-core/index.d.ts:    trace: IRouterHandler<this, Route>;
./frontend/node_modules/@types/node/globals.d.ts:     * Optional override for formatting stack traces
./frontend/node_modules/@types/node/globals.d.ts:     * @see https://v8.dev/docs/stack-trace-api#customizing-stack-traces
./frontend/node_modules/@types/node/ts4.8/globals.d.ts:     * Optional override for formatting stack traces
./frontend/node_modules/@types/node/ts4.8/globals.d.ts:     * @see https://v8.dev/docs/stack-trace-api#customizing-stack-traces
./frontend/node_modules/@types/node/ts4.8/tls.d.ts:         * When enabled, TLS packet trace information is written to `stderr`. This can be
./frontend/node_modules/@types/node/ts4.8/tls.d.ts:         * Note: The format of the output is identical to the output of `openssl s_client -trace` or `openssl s_server -trace`. While it is produced by OpenSSL's`SSL_trace()` function, the format is
./frontend/node_modules/@types/node/ts4.8/tls.d.ts:         * When enabled, TLS packet trace information is written to `stderr`. This can be
./frontend/node_modules/@types/node/ts4.8/trace_events.d.ts: * The `trace_events` module provides a mechanism to centralize tracing information

I’ll save you from scrolling through the thousands of lines of results from ./frontend/node_modules/ by just displaying the first 10 lines using head .

Grep can save us from this annoyance as well; when searching recursively, you can exclude whole directories from its search in the same way that you could exclude individual files. Rather than using --exclude, we can use --exclude-dir

grep -r --exclude-dir 'node_modules' trace
./backend/go.mod:   go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0
./backend/go.mod:   go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0
./backend/go.mod:   go.opentelemetry.io/otel/trace v1.14.0
./backend/db.go:    sdktrace "go.opentelemetry.io/otel/sdk/trace"
./backend/db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) {
./backend/tracing.go:   "go.opentelemetry.io/otel/exporters/otlp/otlptrace"
./backend/tracing.go:   "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
./backend/tracing.go:   sdktrace "go.opentelemetry.io/otel/sdk/trace"
./backend/tracing.go:func initExporter(ctx context.Context) (*otlptrace.Exporter, error) {
./backend/tracing.go:   client := otlptracehttp.NewClient()
./backend/tracing.go:   exporter, err := otlptrace.New(ctx, client)
./backend/tracing.go:func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
./backend/tracing.go:   // https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/otlp/otlptrace/otlptracehttp/example_test.go#L68
./backend/tracing.go:   // exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())
./backend/tracing.go:   trace_provider := sdktrace.NewTracerProvider(
./backend/tracing.go:       sdktrace.WithSampler(sdktrace.AlwaysSample()),
./backend/tracing.go:       sdktrace.WithBatcher(exporter),
./backend/tracing.go:       sdktrace.WithResource(resource.NewWithAttributes(semconv.SchemaURL, semconv.ServiceName(name))),
./backend/tracing.go:   otel.SetTracerProvider(trace_provider)
./backend/tracing.go:   return trace_provider, nil
./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 h1:TKf2uAs2ueguzLaxOCBXNpHxfO/aC7PAdDsSH0IbeRQ=
./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0/go.mod h1:HrbCVv40OOLTABmOn1ZWty6CHXkU8DK/Urc43tHug70=
./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 h1:3jAYbRHQAqzLjd9I4tzxwJ8Pk/N6AqBcF6m1ZHrxG94=
./backend/go.sum:go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0/go.mod h1:+N7zNjIJv4K+DeX67XXET0P+eIciESgaFDBqh+ZJFS4=
./backend/go.sum:go.opentelemetry.io/otel/trace v1.14.0 h1:wp2Mmvj41tDsyAJXiWDWpfNsOiIyd38fy85pyKcFq/M=
./backend/go.sum:go.opentelemetry.io/otel/trace v1.14.0/go.mod h1:8avnQLK+CG77yNLUae4ea2JDQ6iT+gozhnZjy/rw9G8=
./backend/app.go:   sdktrace "go.opentelemetry.io/otel/sdk/trace"
./backend/app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) {
./backend/main.go:  "go.opentelemetry.io/otel/trace"
./backend/main.go:      span := trace.SpanFromContext(ctx)
./backend/main.go:          log.Printf("Error shutting down tracer provider: %v", err)
./files/config.yml:    traces:
./kubernetes/otelcollector-config.yml:        traces:

--exclude and --exclude-dir can be compined as well. Let’s remove ./backend/tracing.go and go.mod and go.sum from our results using --exclude.

grep -r --exclude-dir 'node_modules' --exclude 'tracing.go' --exclude '*.sum' --exclude '*.mod' trace
./backend/db.go:  sdktrace "go.opentelemetry.io/otel/sdk/trace"
./backend/db.go:func dbConn(ctx context.Context, tp *sdktrace.TracerProvider) (*mongo.Client, error) {
./backend/app.go: sdktrace "go.opentelemetry.io/otel/sdk/trace"
./backend/app.go:func newApp(ctx context.Context, tp *sdktrace.TracerProvider) (*App, error) {
./backend/main.go:    "go.opentelemetry.io/otel/trace"
./backend/main.go:        span := trace.SpanFromContext(ctx)
./backend/main.go:            log.Printf("Error shutting down tracer provider: %v", err)
./files/config.yml:    traces:
./kubernetes/otelcollector-config.yml:        traces:

Hopefully this gives you a feel for how you can combine some of the different flags and features of grep to pinpoint what you’re looking for. But that’s just half of the story for grep. In addition to searching files, it is just as capable at searching anything that you throw at it coming from STDIN.

Examples - Searching STDIN

Filtering command output

When you use the pipline operator |, the output of the previous command is fed as input to the next command. For grep, this means that you can search the results of anything that produces output.

One example I’ve used recently is when trying to figure out which docker containers I have downloaded on my system. docker image ls just shows me the complete list, but if I have hundreds of images downloaded it can be quite a bear to look through them all to find the one I’m looking for. Grep comes to the rescue! If I’m looking only for which versions of python I have available, I can filter the results of docker image ls down using grep.5

docker image ls | grep -h python | cut -w -f 1,2
python    3.7
python    latest

Filtering streams

Because of the wonderful magic of streams combined with pipelines , you can grep the output of a command that may run for a long time (or forever). Say you are trying to monitor a log file that's being continuously written to. You can use tail -f to follow the contents and display each new line as soon as it’s written to the file, but there might be a lot of noise is this particular log file. You can filter the contents in mid stream by piping tail to grep.

tail -f tmp.log | grep -h -E 'WARNING|ERROR|FATAL'
[2023-07-15T19:23:31.968712+00:00] app.ERROR: Something went wrong.
[2023-07-15T19:23:32.205448+00:00] app.ERROR: Something slightly worse went wrong.
[2023-07-15T19:23:34.422216+00:00] app.WARNING: Could not find file, I'm freaking out man.
[2023-07-15T19:23:35.382214+00:00] app.FATAL: Something really bad happened, crashing now.

With the power of imagination you can see each line appearing as it is written to the log. Here we’re using the -E flag which allows us to use regular expressions for our pattern, allowing us to only print logs with severity ‘WARNING’ or ‘ERROR’ or ‘FATAL’.

With this, we can filter results as they’re streamed to only warning or higher severity logs. But sometimes we want to see wore context before a fatal log occurs. Simple filtering out info and warning logs might have conceiled the problem. Let’s use --before-context to see what happened before that fatal event.

grep --before-context=4 FATAL tmp.log
[2023-07-15T19:23:33.542697+00:00] app.DEBUG: I'm just going to delete this file, this is fine.
[2023-07-15T19:23:34.101258+00:00] app.DEBUG: Buisiness as usual.
[2023-07-15T19:23:34.223435+00:00] app.DEBUG: Trying to open file.
[2023-07-15T19:23:34.422216+00:00] app.WARNING: Could not find file, I'm freaking out man.
[2023-07-15T19:23:35.382214+00:00] app.FATAL: Something really bad happened, crashing now.

Conclusion

Because grep can operate on any input sent to it, the possible ways to use it are endless. If you ever find yourself in a situation where you wish you could filter some data, you should probably reach for grep first and see if it meets your needs. There are other searching and filtering tools out there that are more specialized, but grep works everywhere, and it’s installed everywhere.

I would encourage you to read the whole grep man pages, here . I covered most of the common use cases, but there are many more flags and features hidden away in grep for those willing to search for them.

Notably, this article did not cover any of the variations of grep like pgrep, rgrep, egrep, fgrep, etc. You can read more about them in the grep manual.

Footnotes


  1. https://en.wikipedia.org/wiki/Ed_(text_editor)

    Despite being notoriously difficult to understand, ed went on to influence and spawn ex, then vi, leading to vim and neovim. ↩︎

  2. One thing to note is that file expansion done through globbing is done prior to feeding any arguments to grep; this doesn't matter in this case but it's something to keep in mind. ↩︎

  3. Another thing to note is that shell filename expansion (globbing) does not use regular expressions. *.go would be an illegal regular expression because * is a quantifier and must come after some character class expression (like .). In shell filename expansion, * matches every filename. *.go matches every file that ends with .go (note the . is not a special character like it is in regex). ? matches any one character (this is exactly like . in regular expressions). ↩︎

  4. If you're ever unsure of the result of a shell glob, you can run echo or ls first to see which files will be selected. ↩︎

  5. I'm using the -h flag for grep which surpresses the 'filename' prefix of its output. When grep is used in a pipeline instead of the filename, it prefixes every line with '(standard input):' which is not super helpful.

    I'm also using cut to just show the columns I care about. This is mostly so that the output shows up nicely in this blog; it's not strictly necessary. ↩︎