Recursively dumping the content of a file located in different folders - bash

Still being a newbie with bash-programming I am fighting with another task I got. A specific file called ".dump" (yes, with a dot in the beginning) is located in each folder and always contains three numbers. I need to dump the third number in a variable in case it is greater than 1000 and then print this and the folder name locating the number. So the outcome should look like this:
"/dir1/ 1245"
"/dir1/subdir1/ 3434"
"/dir1/subdir2/ 10003"
"/dir1/subdir2/subsubdir3/ 4123"
"/dir2/ 45440"
(without "" and each of them in a new line (not sure, why it is not shown correctly here))
I was playing around with awk, find and while, but the results are that bad that I do not wanna post them here, which I hope is understood. So any code snippet helping is appreciated.

This could be cleaned up, but should work:
find /dir1 /dir2 -name .dump -exec sh -c 'k=$(awk "\$3 > 1000{print \$3; exit 1}" $0) ||
echo ${0%.dump} $k ' {} \;
(I'm assuming that all three numbers in your .dump files appear on one line. The awk will need to be modified if the input is in a different format.)

Related

Bash: Identifying file based on part of filename

I have a folder containing paired files with names that look like this:
PB3999_Tail_XYZ_1234.bam
PB3999_PB_YWZ_5524.bam
I want to pass the files into a for loop as such:
for input in `ls PB*_Tail_.bam`; do tumor=${input%_Tail_*.bam}; $gatk Mutect2 -I $input -I$tumor${*}; done
The issue is, I can't seem to get the syntax right for the tumor input. I want it to recognise the paired file by the first part of the name PB3999_PB while ignoring the second half of the file name _YWZ_5524 that does not match.
Thank you for any help!
Just replaced ${*} with * and added _PB_ suffix to the prefix, to the script in the question. And, renamed variables.
for tailfname in PB*_Tail_*.bam; do
pairprefix="${tailfname%_Tail_*.bam}"
echo command with ${tailfname} ${pairprefix}_PB_*.bam
done
Hope this helps. The name tumor sounds scary. Hope the right files are paired.
I'm trying to fully understand what you want to do here.
If you want to extract just the first two parts, this should do:
echo "PB3999_Tail_XYZ_1234.bam" | cut -d '_' -f 1-2
That returns just the "PB3999_Tail" part.

How to batch replace part of filenames with the name of their parent directory in a Bash script?

All of my file names follow this pattern:
abc_001.jpg
def_002.jpg
ghi_003.jpg
I want to replace the characters before the numbers and the underscore (not necessarily letters) with the name of the directory in which those files are located. Let's say this directory is called 'Pictures'. So, it would be:
Pictures_001.jpg
Pictures_002.jpg
Pictures_003.jpg
Normally, the way this website works, is that you show what you have done, what problem you have, and we give you a hint on how to solve it. You didn't show us anything, so I will give you a starting point, but not the complete solution.
You need to know what to replace: you have given the examples abc_001 and def_002, are you sure that the length of the "to-be-replaced" part always is equal to 3? In that case, you might use the cut basic command for deleting this. In other ways, you might use the position of the '_' character or you might use grep -o for this matter, like in this simple example:
ls -ltra | grep -o "_[0-9][0-9][0-9].jpg"
As far as the current directory is concerned, you might find this, using the environment variable $PWD (in case Pictures is the deepest subdirectory, you might use cut, using '/' as a separator and take the last found entry).
You can see the current directory with pwd, but alse with echo "${PWD}".
With ${x#something} you can delete something from the beginning of the variable. something can have wildcards, in which case # deletes the smallest, and ## the largest match.
First try the next command for understanding above explanation:
echo "The last part of the current directory `pwd` is ${PWD##*/}"
The same construction can be used for cutting the filename, so you can do
for f in *_*.jpg; do
mv "$f" "${PWD##*/}_${f#*_}"
done

output from while read loop using iterative filename/timestamp

I am attempting to create a script that will iteratively run a command against a variable (which is a fully qualified filename) and output the results of that command to an individually named/timestamped file (to %S accuracy). Im not great with this stuff at all
here is what I do:
find /vmfs/volumes/unlistedpathname/unlistedfoldername |
while read list;do
vmkfstools -D "$list" >> duringmigration_10mins_"$list".$(date +"%Y.%m.%d.%H.%M.%S");
done
the output im hoping for is something like
duringmigration_10mins_blahblahblah.vmx.2016.09.25.21.26.35
of course it doesnt work, and im not exactly sure how to solve it. I know the problem outright is $list as the filename variable will reprint the fullpath, so I need some sort of way to tell the loop "hey just use the filename as the variable NOT the full path" but im not sure how to do that in this case. Im also hoping to be able to run this from any location not specific path.
There are two problems preventing the behaviour you are looking for:
As you saw the filenames returned by find include the full path.
Your find command will return all the files and the directory name.
We solve #1 by calling basename on $list in the output filename.
We solve #2 by adding -type f to the find command to only return files and not directories.
find /vmfs/volumes/unlistedpathname/unlistedfoldername -type f |
while read list ; do
vmkfstools -D "${list}" >> "duringmigration_10mins_$(basename "${list}").$(date +"%Y.%m.%d.%H.%M.%S")"
done

How to find a directory in Unix, whose ordinary files have the greatest line count together

Ok, to be clear, this is a school assignment, and I don't need the entire code. The problem is this: I use
set subory = ("$subory:q" `sh -c "find '$cesta' -type f 2> /dev/null"`)
to fill variable subory with all ordinary files in specified path. Then I have a foreach where I count lines of all files in a directory, that's not the problem. Problem is, that when this script is tested, some big directories are use as the path. What happens is that the script doesn't finish, but gives error message word too long. That word is subory. This is a real problem, because $cesta can be an element of a long list of paths. I tried, but I cannot solve this problem. Any ideas? I'm a bit lost.
EDIT: To be clear, the task is to assign each directory a number, that represents the total line count of all it's files, and then pick the directory with greatest number.
You need to reorganize your code. For example:
find "$cesta" -type f -execdir wc -l {} +
This will run wc on all the files found, without ever running afoul of command line-length limitations, "invalid" characters like newlines in filenames, etc. And you don't need to spawn a new shell to do it.

Bash scripting print list of files

Its my first time to use BASH scripting and been looking to some tutorials but cant figure out some codes. I just want to list all the files in a folder, but i cant do it.
Heres my code so far.
#!/bin/bash
# My first script
echo "Printing files..."
FILES="/Bash/sample/*"
for f in $FILES
do
echo "this is $f"
done
and here is my output..
Printing files...
this is /Bash/sample/*
What is wrong with my code?
You misunderstood what bash means by the word "in". The statement for f in $FILES simply iterates over (space-delimited) words in the string $FILES, whose value is "/Bash/sample" (one word). You seemingly want the files that are "in" the named directory, a spatial metaphor that bash's syntax doesn't assume, so you would have to explicitly tell it to list the files.
for f in `ls $FILES` # illustrates the problem - but don't actually do this (see below)
...
might do it. This converts the output of the ls command into a string, "in" which there will be one word per file.
NB: this example is to help understand what "in" means but is not a good general solution. It will run into trouble as soon as one of the files has a space in its nameā€”such files will contribute two or more words to the list, each of which taken alone may not be a valid filename. This highlights (a) that you should always take extra steps to program around the whitespace problem in bash and similar shells, and (b) that you should avoid spaces in your own file and directory names, because you'll come across plenty of otherwise useful third-party scripts and utilities that have not made the effort to comply with (a). Unfortunately, proper compliance can often lead to quite obfuscated syntax in bash.
I think problem in path "/Bash/sample/*".
U need change this location to absolute, for example:
/home/username/Bash/sample/*
Or use relative path, for example:
~/Bash/sample/*
On most systems this is fully equivalent for:
/home/username/Bash/sample/*
Where username is your current username, use whoami to see your current username.
Best place for learning Bash: http://www.tldp.org/LDP/abs/html/index.html
This should work:
echo "Printing files..."
FILES=(/Bash/sample/*) # create an array.
# Works with filenames containing spaces.
# String variable does not work for that case.
for f in "${FILES[#]}" # iterate over the array.
do
echo "this is $f"
done
& you should not parse ls output.
Take a list of your files)
If you want to take list of your files and see them:
ls ###Takes list###
ls -sh ###Takes list + File size###
...
If you want to send list of files to a file to read and check them later:
ls > FileName.Format ###Takes list and sends them to a file###
ls > FileName.Format ###Takes list with file size and sends them to a file###

Resources