Bash script, reading from set of files in a directory - bash

I have a set of files places in a directory, and I want to read the second line from each file, extract the first substring which is placed between braces " () " and rename that file with this substing.
I'm not looking for a full bash code, I just need some hints and commands to use for each step
Example:
a file has these lines:
/* USER: 202166 (just_yousef) */
/* PROBLEM: 2954 (11854 - Egypt) */
/* SUBMISSION: 11071978 */
/* SUBMISSION TIME: 2012-12-25 15:49:25 */
/* LANGUAGE: 3 */
I need to take substring "11854 - Egypt" and rename the file with it, and proceed to the next file.

from each file
for f in /the/directory/*; do
# ...
done
read the second line [and] extract the first substring which is placed between [parentheses]
v=$(sed -n '2 { s/[^(]*(\([^)]*\)).*/\1/; p; q }' < "${f}")
rename that file
Use the mv command.

Use For each in the directory pipe the same with xargs and filter the files you want to read . and pass the file name with path to the Script which will do reading of second line.
For the script open the file and extract the String a per the logic.
THen close the file and use rename command to the file name you have as an input to this script.
rename $0 .txt
If the file structure is recurrrsive the extract the name before the last / using index of method of string.

Looks like sed may well be the tool for the job:
for i in *
do
j=`sed -e '1d;2{;s/.*(\(.*\)).*/\1/;q;}' "$i"`
test -z "$j" || test -e "$j" || mv -v "$i$ "$j"
done
I put the test for $j empty or already existing as safeguards; you might think of others. I also gave the -v flag to mv so you can see what it is doing.
You may prefer to use sed -n and just act on lines matching PROBLEM: if that's more reliable than always using the second line.

Related

Globbing a filename and then saving to a variable

I've got some files in a directory with a standard format, I'm looking to use a txt file with part of the filenames to extend them through * then finally add on a .gz tag as an output
For example, a file called 1.SNV.111-T.vcf in my directory, I have 111-T in my txt file.
#!/bin/bash
while getopts f: flag
do
case "${flag}" in
f) file=${OPTARG};;
esac
done
while IFS="" read -r p || [ -n "$p" ]
do
vcf="*${p}.vcf"
bgzip -c ${vcf} > ${vcf}.gz
done < $file
This will successfully run bgzip but actually save the output to be:
'*111-T.vcf.gz'
So adding .gz at the end has "deactivated" the * character, as pointed out by Barmar this is because there isn't a file in my directory called 1.SNV.111-T.vcf.gz so the wildcard is inactivated, please can anyone help?
I'm new to bash scripting but I assume there must be some way to save the "absolute" value of my vcf variable so that once it has found a match the first time, it's now a string that can be used downstream? I really cant find anything online.
The problem is that wildcards are only expanded when they match an existing file. You can't use a wildcard in the filename you're trying to create.
You need to get the expanded filename into the vcf variable. You can do it this way:
vcf=$(echo *"$p.vcf")
bgzip -c "$vcf" > "$vcf.gz"

Bash File names will not append to file from script

Hello I am trying to get all files with Jane's name to a separate file called oldFiles.txt. In a directory called "data" I am reading from a list of file names from a file called list.txt, from which I put all the file names containing the name Jane into the files variable. Then I'm trying to test the files variable with the files in list.txt to ensure they are in the file system, then append the all the files containing jane to the oldFiles.txt file(which will be in the scripts directory), after it tests to make sure the item within the files variable passes.
#!/bin/bash
> oldFiles.txt
files= grep " jane " ../data/list.txt | cut -d' ' -f 3
if test -e ~data/$files; then
for file in $files; do
if test -e ~/scripts/$file; then
echo $file>> oldFiles.txt
else
echo "no files"
fi
done
fi
The above code gets the desired files and displays them correctly, as well as creates the oldFiles.txt file, but when I open the file after running the script I find that nothing was appended to the file. I tried changing the file assignment to a pointer instead files= grep " jane " ../data/list.txt | cut -d' ' -f 3 ---> files=$(grep " jane " ../data/list.txt) to see if that would help by just capturing raw data to write to file, but then the error comes up "too many arguments on line 5" which is the 1st if test statement. The only way I get the script to work semi-properly is when I do ./findJane.sh > oldFiles.txt on the shell command line, which is me essentially manually creating the file. How would I go about this so that I create oldFiles.txt and append to the oldFiles.txt all within the script?
The biggest problem you have is matching names like "jane" or "Jane's", etc. while not matching "Janes". grep provides the options -i (case insensitive match) and -w (whole-word match) which can tailor your search to what you appear to want without having to use the kludge (" jane ") of appending spaces before an after your search term. (to properly do that you would use [[:space:]]jane[[:space:]])
You also have the problem of what is your "script dir" if you call your script from a directory other than the one containing your script, such as calling your script from your $HOME directory with bash script/findJane.sh. In that case your script will attempt to append to $HOME/oldFiles.txt. The positional parameter $0 always contains the full pathname to the current script being run, so you can capture the script directory no matter where you call the script from with:
dirname "$0"
You are using bash, so store all the filenames resulting from your grep command in an array, not some general variable (especially since your use of " jane " suggests that your filenames contain whitespace)
You can make your script much more flexible if you take the information of your input file (e.g list.txt), the term to search for (e.g. "jane"), the location where to check for existence of the files (e.g. $HOME/data) and the output filename to append the names to (e.g. "oldFile.txt") as command line [positonal] parameters. You can give each default values so it behaves as you currently desire without providing any arguments.
Even with the additional scripting flexibility of taking the command line arguments, the script actually has fewer lines simply filling an array using mapfile (synonymous with readarray) and then looping over the contents of the array. You also avoid the additional subshell for dirname with a simple parameter expansion and test whether the path component is empty -- to replace with '.', up to you.
If I've understood your goal correctly, you can put all the pieces together with:
#!/bin/bash
# positional parameters
src="${1:-../data/list.txt}" # 1st param - input (default: ../data/list.txt)
term="${2:-jane}" # 2nd param - search term (default: jane)
data="${3:-$HOME/data}" # 3rd param - file location (defaut: ../data)
outfn="${4:-oldFiles.txt}" # 4th param - output (default: oldFiles.txt)
# save the path to the current script in script
script="$(dirname "$0")"
# if outfn not given, prepend path to script to outfn to output
# in script directory (if script called from elsewhere)
[ -z "$4" ] && outfn="$script/$outfn"
# split names w/term into array
# using the -iw option for case-insensitive whole-word match
mapfile -t files < <(grep -iw "$term" "$src" | cut -d' ' -f 3)
# loop over files array
for ((i=0; i<${#files[#]}; i++)); do
# test existence of file in data directory, redirect name to outfn
[ -e "$data/${files[i]}" ] && printf "%s\n" "${files[i]}" >> "$outfn"
done
(note: test expression and [ expression ] are synonymous, use what you like, though you may find [ expression ] a bit more readable)
(further note: "Janes" being plural is not considered the same as the singular -- adjust the grep expression as desired)
Example Use/Output
As was pointed out in the comment, without a sample of your input file, we cannot provide an exact test to confirm your desired behavior.
Let me know if you have questions.
As far as I can tell, this is what you're going for. This is totally a community effort based on the comments, catching your bugs. Obviously credit to Mark and Jetchisel for finding most of the issues. Notable changes:
Fixed $files to use command substitution
Fixed path to data/$file, assuming you have a directory at ~/data full of files
Fixed the test to not test for a string of files, but just the single file (also using -f to make sure it's a regular file)
Using double brackets — you could also use double quotes instead, but you explicitly have a Bash shebang so there's no harm in using Bash syntax
Adding a second message about not matching files, because there are two possible cases there; you may need to adapt depending on the output you're looking for
Removed the initial empty redirection — if you need to ensure that the file is clear before the rest of the script, then it should be added back, but if not, it's not doing any useful work
Changed the shebang to make sure you're using the user's preferred Bash, and added set -e because you should always add set -e
#!/usr/bin/env bash
set -e
files=$(grep " jane " ../data/list.txt | cut -d' ' -f 3)
for file in $files; do
if [[ -f $HOME/data/$file ]]; then
if [[ -f $HOME/scripts/$file ]]; then
echo "$file" >> oldFiles.txt
else
echo "no matching file"
fi
else
echo "no files"
fi
done

rename all files of a specific type in a directory

I am trying to use bash to rename all .txt files in a directory that match a specific pattern. My two attempts below have removed the files from the directory and threw an error. Thank you :)
input
16-0000_File-A_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import.txt
16-0002_File-B_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import.txt
desired output
16-0000_File-A_multianno.txt
16-0002_File-B_multianno.txt
Bash attempt 1 this removes the files from the directory
for f in /home/cmccabe/Desktop/test/vcf/overall/annovar/*_classify.txt ; do
# Grab file prefix.
p=${f%%_*_}
bname=`basename $f`
pref=${bname%%.txt}
mv "$f" ${p}_multianno.txt
done
Bash attempt 2 Substitution replacement not terminated at (eval 1) line 1.
for f in /home/cmccabe/Desktop/test/vcf/overall/annovar/*_classify.txt ; do
# Grab file prefix.
p=${f%%_*_}
bname=`basename $f`
pref=${bname%%.txt}
rename -n 's/^$f/' *${p}_multianno.txt
done
You don't need a loop. rename alone can do this:
rename -n 's/(.*?_[^_]+).*/${1}_multianno.txt/g' /home/cmccabe/Desktop/test/vcf/overall/annovar/*_classify.txt
The meaning of the regular expression is roughly,
capture everything from the start until the 2nd _,
match the rest,
and replace with the captured prefix and append _multianno.txt
With the -n flag, this command will print what it would do without actually doing it.
When the output looks good, remove the -n and rerun.

Catenate files with blank lines between them [duplicate]

This question already has answers here:
Concatenating Files And Insert New Line In Between Files
(8 answers)
Closed 7 years ago.
How can we copy all the contents of all the files in a given directory into a file so that there are two empty lines between contents of each files?
Need not to mention, I am new to bash scripting, and I know this is not an extra complicated code!
Any help will be greatly appreciated.
Related links are following:
* How do I compare the contents of all the files in a directory against another directory?
* Append contents of one file into another
* BASH: Copy all files and directories into another directory in the same parent directory
After reading comments, my initial attempt is this:
cat * > newfile.txt
But this does not create two empty lines between contents of each new files.
Try this.
awk 'FNR==1 && NR>1 { printf("\n\n") }1' * >newfile.txt
The variable FNR is the line number within the current file and NR is the line number overall.
One way:
(
files=(*)
cat "${files[0]}"
for (( i = 1; i < "${#files[#]}" ; ++i )) ; do
echo
echo
cat "${files[i]}"
done
) > newfile.txt
Example of file organization:
I have a directory ~/Pictures/Temp
If I wanted to move PNG's from that directory to another directory I would first want to set a variable for my file names:
# This could be other file types as well
file=$(find ~/Pictures/Temp/*.png)
Of course there are many ways to view this check out:
$ man find
$ man ls
Then I would want to set a directory variable (especially if this directory is going to be something like a date
dir=$(some defining command here perhaps an awk of an ls -lt)
# Then we want to check for that directories existence and make it if
# it doesn't exist
[[ -e $dir ]] || mkdir "$dir"
# [[ -d $dir ]] will work here as well
You could write a for loop:
# t is time with the sleep this will be in seconds
# super useful with an every minute crontab
for ((t=1; t<59; t++));
do
# see above
file=$(blah blah)
# do nothing if there is no file to move
[[ -z $file ]] || mv "$file" "$dir/$file"
sleep 1
done
Please Google this if any of it seems unclear here are some useful links:
https://www.gnu.org/software/gawk/manual/html_node/For-Statement.html
http://www.gnu.org/software/gawk/manual/gawk.html
Best link on this page is below:
http://mywiki.wooledge.org/BashFAQ/031
Edit:
Anyhow where I was going with that whole answer is that you could easily write a script that will organize certain files on your system for 60 sec and write a crontab to automatically do your organizing for you:
crontab -e
Here is an example
$ crontab -l
* * * * * ~/Applications/Startup/Desktop-Cleanup.sh
# where ~/Applications/Startup/Desktop-Cleanup.sh is a custom application that I wrote

Shell script: get file path till some desired text

I need a way to fetch the desired file path as mentioned below:
example:
file paths: (Desired input)
/root/tmp/uname/abc.txt
/root/tmp/uname/abc/abc.txt
/root/uname/abc.txt
Now, I want to print path till uname/ directory.
like: (expected output)
/root/tmp/uname
/root/tmp/uname
/root/uname
Need to extract path till any desired directory.
The variable substitution ${var%pattern} produces the value of var with any suffix matching pattern removed.
for p in /root/tmp/uname/abc.txt /root/tmp/uname/abc/abc.txt /root/uname/abc.txt
do
echo "${p%/uname*}/uname"
done
There is also ${var#pattern} to remove any matching prefix.
If the paths are in a file, use a while read instead of a for loop.
while read -r p; do
echo "${p%/uname*}/uname"
done <file
... though in that case, sed 's%\(/uname\)/.*%1%' file will be simpler and faster.
Suppose the file temp1.txt contains
root#nviewsrvr ~Competitive ] cat temp1.txt
/root/tmp/uname/abc.txt
/root/tmp/uname/abc/abc.txt
/root/uname/abc.txt
Following script takes the input from this file and gives you the desired result.
#!/bin/bash
cat temp1.txt|while read LINE
do
for p in `echo ${LINE}`
do
echo "${p%/uname*}/uname"
done
done
Results:
[xvishuk#ecamolx1820 ~/Competitive]$ ./forUname.sh
/root/tmp/uname
/root/tmp/uname
/root/uname
I have hardcoded the file as temp1.txt you can pass them as arguments to the script.

Resources