How to exclude files using ls? - bash

I'm using a python script I wrote to take some standard input from ls and load the data in the files described by that path. It looks something like this:
ls -d /path/to/files/* | python read_files.py
The files have a certain name structure based on what data they have in them but are all stored in the same directory. The files I want to use have the name structure A<fileID>_###.txt (where ### is always some 3 digit number). I can accomplish getting only the files that start with A by just changing what I have above slightly to ls -d /path/to/files/A*. HOWEVER, some files have a suffix flag called B (so the file looks like A<fileID>_###B.txt) and I DO NOT want to include those.
So, my question is, is there a way to exclude those files that end in ...B.txt (or a way to only include files that end in a number)? I thought about something to the effect of:
ls -d /path/to/file/R*%d.txt
to only include files that end in a number followed by the file extension, but couldn't find any documentation on anything of the sort.

You could try this : ls A*[^B].txt

With extended globbing.
shopt -s extglob
ls R*!(B).txt

Related

Shell, copy files with similar names

I would like to copy a series of similar files from the current directory to the target directory, the files under the current directory are:
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0001_ux.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0001_uz.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0002_ux.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0002_uz.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0003_ux.hst
prod07_sim0500-W31-0.2_velocity-models-2D_t80_f0003_uz.hst
Where sim is from sim0001 to sim0500 and f is from f0001 to f0009. I only need f0002, f0005 and f0008. I write the following code:
target_dir="projects/data"
for i in {0001..0500}; do
for s in f000{2,5,8}; do
files="[*]$i[*]$s[*]"
cp $files target_dir
done
done
I am very new to Shell, and wondering how to write the $files="[*]$i[*]$s[*]"$, so that it could match only the f0002, f0005 and f0008. The reason why I also use for i in {0001..0500}; do is that the files are too large and I would like to make sure I could access some completed ones (for example, including all sim0001) in the beginning.
Edit: changed for s in f0002 f0005 f0008; do to f000{2,5,8}.
What you need is globbing and a bit different quoting:
cp *"$i"*"$s"* "$target_dir"
Not storing this in a variable is intentional - it's faster and it's safe. If you end up with such a large list of files that you start running into system limits you'll have to look into xargs.

Append part of folder name to all .gz within

I have a folder of data folders with the following structure:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
I want to modify all the data.gz within each sample folder by appending the sample name but not the random numbers to get:
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data1.gz
sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName1_data2.gz
sampleName2-randomNumbers/subfolder1/subfolder2/subfolder3/sampleName2_data1.gz
It seems like this should be a simple mv for loop but I haven't been able to figure out how to pull part of a folder name using basename.
for i in */Data/Intensities/BaseCalls/*.gz; do mv $i "fastq""/"${i%%-*}"."`basename $i`; done
I couldn't figure out how to make the files stay in their original folder but for my purposes it works to have all the files go to a new folder ("fastq")
I suppose the "sampleName" part doesn't include dashes. In that case, use the standard pattern removal expansion: %%. That is, suppose your full path (relative to directory root) is stored in $path, just do ${path%%-*} to extract the "sampleName" part. Search for %% in the Bash Reference Manual for more details. As a simple example:
> path=sampleName1-randomNumbers/subfolder1/subfolder2/subfolder3/data1.gz
> echo ${path%%-*}
sampleName1
Otherwise, you could also use more advanced substring extraction based on regex. See BashFAQ/100 or Manipulating Strings from the TLDP Advanced Bash Scripting Guide.
Update. Here's the full command to perform the job described, and it is entirely native to the shell:
for file in */Data/Intensities/BaseCalls/*.gz; do
mv "$file" "${file%/*}/${file%%-*}_${file##*/}"
done

need to get recursive list of files using mac command line

I need to get the contents of a folder via mac console window
and put into a text file via >output.txt:
existing structure looks like:
folder/index.html
folder/images/backpack.png
folder/shared/bootstrap/fonts/helvertica.eot
folder/css/fonts/helverticabold.eot
folder/shared/css/astyle.css
folder/js/libs/jquery-ui-1.10.4/jquery-ui.min.js",
folder/js/libs/jquery.tipsy.js
folder/js/libs/raphael.js
what I want looks would look like this (the folder is missing):
index.html
images/backpack.png
shared/bootstrap/fonts/helvertica.eot
css/fonts/helverticabold.eot
shared/css/astyle.css
js/libs/jquery-ui-1.10.4/jquery-ui.min.js
js/libs/jquery.tipsy.js
js/libs/raphael.js
No css/fonts or js/libs or css folders listed
i.e. no folders….. and no formatting like
/folder/shared/css/astyle.css Or
./folder/shared/css/astyle.css
even better would be with parens and commas:
“index.html”,
“images/backpack.png”,
“shared/bootstrap/fonts/helvertica.eot”,
“css/fonts/helverticabold.eot”,
“shared/css/astyle.css”,
“js/libs/jquery-ui-1.10.4/jquery-ui.min.js”,
“js/libs/jquery.tipsy.js”,
“js/libs/raphael.js”
As I want to make a json document. Is this possible?
Thanks.
This is the sort of task that find is good at:
% find folder -type f | sed -e 's,folder/,",' -e 's/$/",/'
You might be able to adjust the 's,folder/,",' substitution by, say
(cd folder; find . -type f) | sed 's/\(.*\)/"\1",/'
Further refinements are an exercise for the reader!

Create new files from existing ones but change their extension

In shell, what is a good way to duplicating files in an existing directory so that the result gives the same file but with a different extension? So taking something like:
path/view/blah.html.erb
And adding:
path/view/blah.mobile.erb
So that in the path/view directory, there would be:
path/view/blah.html.erb
path/view/blah.mobile.erb
I'd ideally like to perform this at a directory level and not create the file if it already has both extensions but that isn't necessary.
You can do:
cd /path/view/
for f in *.html.erb; do
cp "$f" "${f/.html./.mobile.}"
done
PS: This replaces first instance of .html. with .mobile., syntax is bash specific (let me know if you're not using BASH).

Finding and Removing Unused Files Through Command Line

My websites file structure has gotten very messy over the years from uploading random files to test different things out. I have a list of all my files such as this:
file1.html
another.html
otherstuff.php
cool.jpg
whatsthisdo.js
hmmmm.js
Is there any way I can input my list of files via command line and search the contents of all the other files on my website and output a list of the files that aren't mentioned anywhere on my other files?
For example, if cool.jpg and hmmmm.js weren't mentioned in any of my other files then it could output them in a list like this:
cool.jpg
hmmmm.js
And then any of those other files mentioned above aren't listed because they are mentioned somewhere in another file. Note: I don't want it to just automatically delete the unused files, I'll do that manually.
Also, of course I have multiple folders so it will need to search recursively from my current location and output all the unused (unreferenced) files.
I'm thinking command line would be the fastest/easiest way, unless someone knows of another. Thanks in advance for any help that you guys can be!
Yep! This is pretty easy to do with grep. In this case, you would run a command like:
$ for orphan in `cat orphans.txt`; do \
echo "Checking for presence of ${orphan} in present directory..." ;
grep -rl $orphan . ; done
And orphans.txt would look like your list of files above, one file per line. You can add -i to the grep above if you want to grep case-insensitively. And you would want to run that command in /var/www or wherever your distribution keeps its webroots. If, after you see the above "Checking for..." and no matches below, you haven't got any files matching that name.

Resources