Finding and Removing Unused Files Through Command Line - shell

My websites file structure has gotten very messy over the years from uploading random files to test different things out. I have a list of all my files such as this:
file1.html
another.html
otherstuff.php
cool.jpg
whatsthisdo.js
hmmmm.js
Is there any way I can input my list of files via command line and search the contents of all the other files on my website and output a list of the files that aren't mentioned anywhere on my other files?
For example, if cool.jpg and hmmmm.js weren't mentioned in any of my other files then it could output them in a list like this:
cool.jpg
hmmmm.js
And then any of those other files mentioned above aren't listed because they are mentioned somewhere in another file. Note: I don't want it to just automatically delete the unused files, I'll do that manually.
Also, of course I have multiple folders so it will need to search recursively from my current location and output all the unused (unreferenced) files.
I'm thinking command line would be the fastest/easiest way, unless someone knows of another. Thanks in advance for any help that you guys can be!

Yep! This is pretty easy to do with grep. In this case, you would run a command like:
$ for orphan in `cat orphans.txt`; do \
echo "Checking for presence of ${orphan} in present directory..." ;
grep -rl $orphan . ; done
And orphans.txt would look like your list of files above, one file per line. You can add -i to the grep above if you want to grep case-insensitively. And you would want to run that command in /var/www or wherever your distribution keeps its webroots. If, after you see the above "Checking for..." and no matches below, you haven't got any files matching that name.

Related

Combine CSV files with condition

I need to combine all the csv files in some directory (.csv), provided that there are other files with the same name in this directory, but with different expansion (.csv.done).
If a csv file doesn't have .done in this extension then I don't need it for combine process.
What is the best way to do it using Bash ?
This approach is a solution to your problem. I see you've commented that it "didn't work", but whatever the reason is for it not working, it's likely simple to fix e.g. if you forgot to include key details, or failed to adapt it appropriately to suit your specific situation. If you need further help troubleshooting, add more info to your question.
The approach:
for f in *.csv.done
do
cat "${f%.*}" >> combined_file.csv
done
How it works:
In your example, you have 3 files named 1.csv 2.csv 3.csv and two 'done' files named 1.csv.done 2.csv.done.
This script begins by making a list of all files that end in .csv.done (two files: 1.csv.done 2.csv.done).
It then uses a parameter expansion, specifically ${parameter%word}, to 'shorten' the name of the two files in the list to .csv (instead of .csv.done).
Then it 'prints' the content of the two 'shortened' filenames (1.csv and 2.csv) into a 'combined' file.
It doesn't 'print' the content of 1.csv.done or 2.csv.done, or 3.csv, because these files weren't in the original 'list'.
If you run this script multiple times, it will keep adding the contents of files 1.csv and 2.csv to the 'combined' file (only run it once, or delete the 'combined' file before running it again)

Bash get all specific files in specific directory

I have a script that takes as an argument a path to a file upon which it performs certain operations. These files are stored in directories with path storage///_id/files (so in 2016 July 22 it would be storage/2016/Jul/22_1/files for the first set of files, .../Jul/22_2/files for second one etc.). The problem is each directory stores files with two extensions (say file.doc, file.txt) and I want to perform operations only on .txt files. I've tested earlier something like
for file in "/home/gonczor/temp/"*/*".txt"; do
echo "$file"
done
And it worked perfectly given that names in directories don't change. When I move one step further and add this 22_1, 22_2, 23_1 directories something strange happens.
This is my script (simplified):
for file in "$FILE_PATH/""$YEAR/""$MONTH/""$DAY"*/*".txt"; do
my_program ${report}
done
And instead of finding .../2016/Jul/22_1/file.txt it finds /2016/Jul/22*/*.txt
How can I make it work? The solution I've tried to make up is from here

Extracting contents of many zipped folders into a single directory

Kind of easy question, but I can't find the answer. I want to extract the contents of multiple zipped folders into a single directory. I am using the bash console, which is the only tool available on the particular website I am using.
For example, I have two folders: a.zip (which contains a1.txt and a2.txt) and b.zip (which contains b1.txt and b2.txt). I want to get extract all four text files into a single directory.
I have tried
unzip \*.zip -d \newdirectory
But it creates two directories (a and b) with two text files in each.
I also tried concatenating the two zipped folders into one big folder and extracting it, but it still creates two directories, even when I specify a new directory.
I can't figure what I am doing wrong. Any help?
Thanks in advance!
Use the -j parameter to ignore any directory structure.
unzip -j -d /path/to/your/directory '*.zip*'

Append part of folder name to all .gz within

I have a folder of data folders with the following structure:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
I want to modify all the data.gz within each sample folder by appending the sample name but not the random numbers to get:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName2_data1.gz
It seems like this should be a simple mv for loop but I haven't been able to figure out how to pull part of a folder name using basename.
for i in */Data/Intensities/BaseCalls/*.gz; do mv $i "fastq""/"${i%%-*}"."`basename $i`; done
I couldn't figure out how to make the files stay in their original folder but for my purposes it works to have all the files go to a new folder ("fastq")
I suppose the "sampleName" part doesn't include dashes. In that case, use the standard pattern removal expansion: %%. That is, suppose your full path (relative to directory root) is stored in $path, just do ${path%%-*} to extract the "sampleName" part. Search for %% in the Bash Reference Manual for more details. As a simple example:
> path=sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
> echo ${path%%-*}
sampleName1
Otherwise, you could also use more advanced substring extraction based on regex. See BashFAQ/100 or Manipulating Strings from the TLDP Advanced Bash Scripting Guide.
Update. Here's the full command to perform the job described, and it is entirely native to the shell:
for file in */Data/Intensities/BaseCalls/*.gz; do
mv "$file" "${file%/*}/${file%%-*}_${file##*/}"
done

How to exclude files using ls?

I'm using a python script I wrote to take some standard input from ls and load the data in the files described by that path. It looks something like this:
ls -d /path/to/files/* | python read_files.py
The files have a certain name structure based on what data they have in them but are all stored in the same directory. The files I want to use have the name structure A<fileID>_###.txt (where ### is always some 3 digit number). I can accomplish getting only the files that start with A by just changing what I have above slightly to ls -d /path/to/files/A*. HOWEVER, some files have a suffix flag called B (so the file looks like A<fileID>_###B.txt) and I DO NOT want to include those.
So, my question is, is there a way to exclude those files that end in ...B.txt (or a way to only include files that end in a number)? I thought about something to the effect of:
ls -d /path/to/file/R*%d.txt
to only include files that end in a number followed by the file extension, but couldn't find any documentation on anything of the sort.
You could try this : ls A*[^B].txt
With extended globbing.
shopt -s extglob
ls R*!(B).txt

Resources