Find

Nov 3, 2023    #posix   #how-to  

Overview

Find is a utility to find files on your computer given a wide range of search parameters. It will recursively walk your filesystem (breadth-first ) and print (by default) all filenames that have matched your search terms. Think of it like grep but for file names rather than file contents.

You can search on a variety of metadata on files, ranging from name, creation time, update time, owner, group, size, and type (file, directory, link, socket, etc.).

Additionally you can execute a variety of commands on the files find finds.

Terminology

If you read the docs , which I recommend that you do , you’ll see referenes to a handful of terms which may be unfamiliar.

Expression

The docs state an expression is composed of the “primaries” and “operands”. In a less opaque version, it is a search term (with any accompanying arguments) and something to do to each of the results found.

Primary

A primary is a search term used to limit the list of files returned by find OR an action to perform on the list of files returned.

Operand : Operands (or operators as they’re sometimes referred as) are ways to combine expressions, like or or and.

Primaries - Finding Files

Enough theory, let’s see some examples. For these examples, we will be investigating a Ruby library meant to query Jira’s API.

By file names

find . -name '*.yml'
./.rubocop.yml
./.travis.yml

This executes a search starting in the current directory (.) for all files whose name (-name) matches the bash glob pattern *.yml. The same search can be done in a case-insensitive fashion using -iname instead of -name.

Truth be told, this pattern of searching for files by some name glob accounts for 95% of my use of find. But there’s so much more it can do.

By access time

Consider the scenario where you were recently working on a project and wanted to see which files you have edited in the last 30 minutes.

find . -amin -30 -name '*.rb'
./lib/jira.rb

The -amin primary finds all files that have been accessed 30 or fewer minutes ago. We specified -30 and not 30 to represent “30 or fewer”. If we wanted to only find files that have been accessed more than 30 minutes ago, we could specify the primary with a + (-amin +30). This use of + and - applies to any primary that takes a numeric argument.

If you wanted to look at the files changed over a longer period of time without having to pull in a command substitution (-amin -$(expr 60 \* 24 \* 15)), you can instead use -atime which excepts a number and a time specifier. s for seconds, m for minutes, h for hours, d for days, and w for weeks.

By creation time

If you were instead interested in the file creation time rather than access time, you could use -Bmin and its counterpart -Btime which peform the same searches as -amin and -atime but for file creation time.

find . -Btime -3d
./lib/jira/api/just_created.rb

By owner and group

In some instances it can be handy to find files owned by a particular user, or particular group. Find has you covered.

find . -user root
./bin/stray_file

This finds all the files owned by root. Similaraly, -group GROUP finds all the files set to the group GROUP.

I rarely use these flags as I rarely work on machines with multiple users or exotic group membership, but if you’re a system administrator, these could be very handy. Regardless, learning more about the UNIX file permission model is very helpful so I would recommend reading up more on it if you’re curious.1

By type and size

Shifting gears, let’s say you’re trying to track down files that are hogging your hard drive space. -size helps you find files larger or smaller than your given size.

find . -size +512k
./coverage/index.html

This will find all files larger than 512 kilobytes. You can also search in units of megabytes with M, gigabytes with G, terabytes with T and if you’re working with truly titanic files, petabytes using P.

I need to come clean about something. I’ve used the term files throughout this article to refer to the results that find returns from its searches, but that’s not exactly true. If you’ve experimented with find yourself, you have even noticed that find doesn’t just return files. It returns everything on the filesystem: files, directories, sybolic links, sockets, and more.

You can tell find to only return results of a given type with the -type primary.

find . -type d ! -path '*git*' ! -path '*coverage*'
.
./bin
./spec
./spec/issue
./spec/api
./spec/report
./.yardoc
./.yardoc/objects
./lib
./lib/jira
./lib/jira/issue
./lib/jira/api
./lib/jira/report
./doc
./doc/css
./doc/js
./doc/Jira
./doc/Jira/Reporting
./doc/Jira/Issue
./doc/Jira/Api
./doc/Jira/Report
./doc/Jira/Client
./.idea

This returns only the directories in the current directory and its children. The -path primary searches in the whole path of the file, not just the file itself, and the ! will be covered later when we talk about operands

Primaries - Acting on files

So far find has only been printing the results that it finds, but you can take other actions on the matched files as well. This is implicitly the same as using the -print primary. Similar to it is -ls, which formats its output as though you had ran ls -l on the list of files returned.

find . -type f -name '*.md' -ls
30288144        8 -rw-r--r--    1 jharder          staff                 622 Feb  8  2023 ./CHANGELOG.md
35271368       24 -rw-r--r--    1 jharder          staff               11914 Apr 13  2023 ./README.md

If instead you wanted to clean up a bunch of files, you can use find with the -delete primary to delete all the files find finds. No reaching for xargs necessary.

You can run any utility you want on the list of files using the -exec primary, though there are two important things to keep in mind:

  1. You must terminate the command with an escaped ; in order to tell find when your exec command ends and the next primary begins.
  2. To reference the filename in the utility us {}.
find . -type f -name '*.md' -exec echo {} \;
./CHANGELOG.md
./README.md

You can replicate the -delete primary by using -exec rm {} \; but I might wonder why you’re doing that. A helpful variation of -exec is -ok, which is identical to -exec but asks you for permission first. You you wanted to delete a handful of files but wanted to confirm before they get deleted -ok is a great option.

find . -empty -ok rm {} \;
"rm ./lib/empty_file_2"? no
"rm ./lib/jira/api/just_created.rb"? no
"rm ./lib/empty_file_1"? yes

Operands

Operands are quite simple, in fact, most of these examples have been using them! The operators listed in the man pages list ( ), !, -and, and -or. When multiple primaries are provided to find, -and is implicitly used to link them together. find . -empty -atime +1w is the same as typing find . -empty -and -atime +1w, meaning, find all the files that are empty AND haven’t been accessed in at least a week.

( expression ) sets an order of precedence similar to how paretheses work in math. (1+3) * 4 would evaluate 1+3 before * 4 because of the parentheses. -or and ! (also expressed as -not) are similar, and follow boolean logic that you should be familiar with.

Conclusion

Find is a humble utility that seems simple on its face, but weilds great power in the hands of those who know how to handle it. I’ve used find for years mostly to locate files with a certain name or file extension, but in researching this article I’ve found numerous other use cases.

I encourage you to poke around and consider find when you’re looking for some files and don’t know where they’re stored. Or you could use it to find files bigger than 1G and consider cleaning them up. Or finding empty files that have been sitting around for a long time.

Footnotes


  1. https://mason.gmu.edu/~montecin/UNIXpermiss.htm  ↩︎