1. Technology

Find and Remove Duplicate Files in Linux

duplicates-teaserIt might seem unnecessary to worry about duplicate files when you have terabytes of storage. However, if you care about file organization, you’ll want to avoid duplicates on your Linux system. You can find and remove duplicate files either via the command line or with a specialized desktop app.

Use the “Find” Command

duplicates-find-command

In case you’re not familiar with this powerful command, you can learn about it in our guide. By combining find with other essential Linux commands, like xargs, we can get a list of duplicate files in a folder (and all its subfolders). The command first compares files by size, then checks their MD5 hashes, which are unique bits of information about every file. To scan for duplicate files, open your console, navigate to the desired folder and type:

find -not -empty -type f -printf "%sn" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

This one-liner does the following:

find -not -empty -type f -printf "%sn" – looks for regular files which are not empty and prints their size. If you care about file organization, you can easily find and remove duplicate files either via the command line or with a specialized desktop app.

sort -rn – sorts the file sizes in reverse order.

uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 – prints only duplicate lines. In this case, names of duplicate files.

xargs -0 md5sum | sort | – sorts the MD5 hashes of scanned files.

uniq -w32 --all-repeated=separate – compares the first 32 characters of MD5 hashes and prints those which are duplicates.

Note that this command doesn’t automatically remove duplicates – it only outputs a list, and you can delete files manually if you want. If you prefer to manage your files in an application that offers more options at once, the next solution might suit you.

Employ dupeGuru

DupeGuru is a cross-platform application that comes in three editions: Standard (SE), Music and Picture. It’s designed to find duplicate files based on multiple criteria (file names, file size, MD5 hashes) and uses fuzzy-matching to detect similar files. Windows and OS X users can download the installation files from the official website, and Ubuntu users can pull dupeGuru from the repository:

sudo add-apt-repository ppa:hsoft/ppa sudo apt-get update sudo apt-get install dupeguru

duplicates-dupeguru-search

To search for duplicates, first add some folders by pressing the “+” button. Setting a folder state to “Reference” means that other folders’ contents are compared to it. Before clicking “Scan,” check the “View -> Preferences” dialog to ensure that everything is properly set up.

duplicates-dupeguru-preferencesIf you care about file organization, you can easily find and remove duplicate files either via the command line or with a specialized desktop app.

“Scan Type” varies across dupeGuru editions; in Standard, you can compare files and folders by contents and filename. Picture edition offers comparison by EXIF timestamp and “Picture blocks” – a time-consuming option that divides each picture into a grid and calculates the average color for every tile. In Music edition, you can analyze “Fields,” “Tags” and “Audio content.” Some settings depend on the scan type: “Word weighting” and “Match similar words” work only when you search for file names. Conversely, “Filter Hardness” doesn’t apply when you perform a “Contents” scan.

DupeGuru can ignore small files and links (shortcuts) to a file, and lets you use regular expressions to further customize your query. You can also save search results to work on them later. Apple fans will love the fact that dupeGuru supports iPhoto and Aperture libraries and can manage iTunes libraries.

duplicates-dupeguru-details

When dupeGuru finds duplicates, a new window opens with reference files colored in blue and their duplicates listed below. The toolbar displays basic information, and you can see more about every file if you select it and click the “Details” button.

duplicates-dupeguru-actions

You can manage duplicate files directly from dupeGuru – the “Actions” menu shows everything you can do. Select files by ticking the checkbox or clicking their name; you can select all or multiple files using keyboard shortcuts (hold Shift/Ctrl and click on desired files). If you’re interested in differences between duplicate files, toggle Delta Values. The results can be re-prioritized (so the files listed as dupes become references) and sorted according to various criteria like modification date and size. The official dupeGuru user guide is helpful and clearly written, so you can rely on it if you ever get stuck.

Naturally, it would be more practical if dupeGuru wasn’t split into three editions – after all, most users love one-stop solutions. Still, if you don’t want to use the find command, dupeGuru provides a neat and quick way to eradicate dupes from your filesystem. Can you recommend some other tools for removing duplicate files? Do you prefer the command line for this task? Tell us in the comments.

The post Find and Remove Duplicate Files in Linux appeared first on Make Tech Easier.



No Comments
Comments to: Find and Remove Duplicate Files in Linux

Recent Articles

Good Reads

Worlwide

Overview VipsPM – Project Management Suite is a Powerful web-based Application. VipsPM is a perfect tool to fulfill all your project management needs like managing Projects, Tasks, Defects, Incidents, Timesheets, Meetings, Appointments, Files, Documents, Users, Clients, Departments, ToDos, Project Planning, Holidays and Reports. It has simple yet efficient layout will make managing projects easier than […]

Trending

Turquoise Jewelry is one of the ancient healing stones used for personal adornment and astrological benefits. The rare greenish blue-colored pectolite is celebrated for its enchanting powers among many crystal lovers. It is a hydrated phosphate of copper and aluminum that ranks 5 to 6 on the Mohs hardness scale. It is deemed a protective […]
24 March 2020, the evening when the Government of India ordered a nationwide lockdown for 21 days. Because the deadly Coronavirus crept into the world and turned it into a sinking ship, put unforeseen pressures on all of us with its destructive intentions. Soon after, it turned into a giant monster. Omicron, the new variant […]