Why is this bash loop failing to concatenate the files? - bash

I am at my wits end as to why this loop is failing to concatenate the files the way I need it. Basically, lets say we have following files:
AB124661.lane3.R1.fastq.gz
AB124661.lane4.R1.fastq.gz
AB124661.lane3.R2.fastq.gz
AB124661.lane4.R2.fastq.gz
What we want is:
cat AB124661.lane3.R1.fastq.gz AB124661.lane4.R1.fastq.gz > AB124661.R1.fastq.gz
cat AB124661.lane3.R2.fastq.gz AB124661.lane4.R2.fastq.gz > AB124661.R2.fastq.gz
What I tried (and didn't work):
Create and save file names (AB124661) to a ID file:
ls -1 R1.gz | awk -F '.' '{print $1}' | sort | uniq > ID
This creates an ID file that stores the samples/files name.
Run the following loop:
for i in `cat ./ID`; do cat $i\.lane3.R1.fastq.gz $i\.lane4.R1.fastq.gz \> out/$i\.R1.fastq.gz; done
for i in `cat ./ID`; do cat $i\.lane3.R2.fastq.gz $i\.lane4.R2.fastq.gz \> out/$i\.R2.fastq.gz; done
The loop fails and concatenates into empty files.
Things I tried:
Yes, the ID file is definitely in the folder
When I run with echo it shows the cat command correct
Any help will be very much appreciated,
Best,
AC

why are you escaping the \> ? That's going to result in a cat: '>': No such file or directory instead of a redirection.
Don't read lines with for
while IFS= read -r id; do
cat "${id}.lane3.R1.fastq.gz" "${id}.lane4.R1.fastq.gz" > "out/${id}.R1.fastq.gz"
cat "${id}.lane3.R2.fastq.gz" "${id}.lane4.R2.fastq.gz" > "out/${id}.R2.fastq.gz"
done < ./ID

Let say you have id stored in file ./ID per line
while read -r line; do
cat "$line".lane3.R1.fastq.gz "$line".lane4.R1.fastq.gz > "$line".R1.fastq.gz
cat "$line".lane3.R2.fastq.gz "$line".lane4.R2.fastq.gz > "$line".R2.fastq.gz
done < ./ID

A pure shell solution could be like that:
for file in *.fastq.gz; do
id=${file%%.*}
[ -e "$id".R1.fastq.gz ] || cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
[ -e "$id".R2.fastq.gz ] || cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
Alternatively:
printf '%s\n' *.fastq.gz | cut -d. -f1 | sort -u |
while IFS= read -r id; do
cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
This solution assumes filenames of interest don't contain newline characters.

Related

Displaying file contents issue using while loop

#!/bin/bash
{ cat sample.txt; echo; } | while read -r -a A_Name; do
if [ ! -z "${A_Name[0]}" ]; then
echo " ${A_Name[0]%.isx} "
fi
done
I am trying to display contents of a text file (which includes .isx files) using while loop but when i try to eliminate extension with %, it doesnt work.
Output
.isx is appearing for first two values:
./test.sh
abc.isx
def.isx
ghi
Input
sample.txt file:
abc.isx
def.isx
ghi.isx
Please, assist. Thank you.
why so complicated?
#!/bin/bash
cat sample.txt | while read line; do
echo "${line%.isx}"
done
or with sed
sed "s/\.isx//" sample.txt >output.txt
or with sed and inplace replacement
sed -i "s/\.isx//" sample.txt

How to write a command line script that will loop through every line in a text file and append a string at the end of each? [duplicate]

How do I add a string after each line in a file using bash? Can it be done using the sed command, if so how?
If your sed allows in place editing via the -i parameter:
sed -e 's/$/string after each line/' -i filename
If not, you have to make a temporary file:
typeset TMP_FILE=$( mktemp )
touch "${TMP_FILE}"
cp -p filename "${TMP_FILE}"
sed -e 's/$/string after each line/' "${TMP_FILE}" > filename
I prefer echo. using pure bash:
cat file | while read line; do echo ${line}$string; done
I prefer using awk.
If there is only one column, use $0, else replace it with the last column.
One way,
awk '{print $0, "string to append after each line"}' file > new_file
or this,
awk '$0=$0"string to append after each line"' file > new_file
If you have it, the lam (laminate) utility can do it, for example:
$ lam filename -s "string after each line"
Pure POSIX shell and sponge:
suffix=foobar
while read l ; do printf '%s\n' "$l" "${suffix}" ; done < file |
sponge file
xargs and printf:
suffix=foobar
xargs -L 1 printf "%s${suffix}\n" < file | sponge file
Using join:
suffix=foobar
join file file -e "${suffix}" -o 1.1,2.99999 | sponge file
Shell tools using paste, yes, head
& wc:
suffix=foobar
paste file <(yes "${suffix}" | head -$(wc -l < file) ) | sponge file
Note that paste inserts a Tab char before $suffix.
Of course sponge can be replaced with a temp file, afterwards mv'd over the original filename, as with some other answers...
This is just to add on using the echo command to add a string at the end of each line in a file:
cat input-file | while read line; do echo ${line}"string to add" >> output-file; done
Adding >> directs the changes we've made to the output file.
Sed is a little ugly, you could do it elegantly like so:
hendry#i7 tmp$ cat foo
bar
candy
car
hendry#i7 tmp$ for i in `cat foo`; do echo ${i}bar; done
barbar
candybar
carbar

Read multiple variables from file

I need to read a file that has lines like
user=username1
pass=password1
How can I read multiple lines like this into separate variables like username and password?
Would I use awk or grep? I have found ways to read lines into variables with grep but would I need to read the file for each individual item?
The end result is to use these variables to access a database via the command line. So I need to be able to read, store and use these values in other commands.
if the process which generates the file is safe and has shell syntax just source the file.
. ./file
Otherwise the file can be processes before to add quotes
perl -ne 'if (/^([A-Za-z_]\w*)=(.*)/) {$k=$1;$v=$2;$v=~s/\x27/\x27\\\x27\x27/g;print "$k=\x27$v\x27\n";}' <file >file2
. ./file2
If you want to use awk then
Input
$ cat file
user=username1
pass=password1
Reading
$ user=$(awk -F= '$1=="user"{print $2;exit}' file)
$ pass=$(awk -F= '$1=="pass"{print $2;exit}' file)
Output
$ echo $user
username1
$ echo $pass
password1
You could use a loop for your file perhaps, but this is probably the functionality you're looking for.
$ echo 'user=username1' | awk -F= '{print $2}'
username1
Using the -F flag sets the delimiter to = and we select the 2nd item from the row.
file.txt:
user=username1
pass=password1
user=username2
pass=password2
user=username3
pass=password3
Do to avoid browsing several times the file file.txt:
#!/usr/bin/env bash
func () {
echo "user:$1 pass:$2"
}
i=0
while IFS='' read -r line; do
if [ $i -eq 0 ]; then
i=1
user=$(echo ${line} | cut -f2 -d'=')
else
i=0
pass=$(echo ${line} | cut -f2 -d'=')
func "$user" "$pass"
fi
done < file.txt
Output:
user:username1 pass:password1
user:username2 pass:password2
user:username3 pass:password3

Shell Script Retokenize Property Values to Keys For All Files In a Directory

Previously, I wrote a small shell script to "retokenize" a file (useful for comparing for sanity checks). I'm currently in the need of doing something similar for a folder instead of just one file.
I'm curious if there is an easy way to rework the following to be like a method / function and how to recursively pass all files in a folder to the method, so that the end result is that all files in the folder are "retokenized". Hoping to see if there is a quick and easy way to do this. Being doing some googling and playing around, but want to see if anyone here has a quick / easy / clean solution.
Working version for one file:
#!/bin/bash
date
outputDump="output.txt"
prodPropsFile="input.properties"
prodPropsSortedFile="sorted.properties"
tempPropsFile="temp.properties"
echo "Removing comments and empty lines from prod properties file"
sed '/^#/d' < $prodPropsFile > $tempPropsFile
sed '/^s*$/d' < $tempPropsFile > $prodPropsSortedFile
cp $prodPropsSortedFile $tempPropsFile
echo "Sorting prod properties by value length. So don't do double tokenization"
awk -F"=" '{ st = index($0,"="); print length(substr($0,st+1)),$0 }' $tempPropsFile | sort -rn | cut -d" " -f2- > $prodPropsSortedFile
echo "Retokenizing."
while IFS== read k v;
do
# Sed escape /, \, and &. Needed for urls like jdbc connections, etc.
escapedV=$(echo $v | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')
# The /gI will replace the tokens globally with case insensitive, this is important in case someone does something like "http://..." versus "HTTP://...".
sed -i -- "s/$escapedV/$k/gI" $outputDump;
done < "$prodPropsSortedFile"
Example property file:
%%token1%%=value1
%%token2%%=value2
Example input file:
This is a file that has value1 and value2.
Example output file:
This is a file that has %%token1%% and %%token2%%.
Updated Script that Works for All Files in a Folder on my Mac:
#!/bin/bash
date
retokenize()
{
echo "Retokenizing $file"
while IFS== read k v;
do
# Sed escape /, \, and &. Needed for urls like jdbc connections, etc.
escapedV=$(echo $v | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')
sed -i '' "s/$escapedV/$k/g" $file;
done < "$prodPropsSortedFile"
}
# Copy our input to a output file that we will modify, so we don't affect the original.
inputDump="IIQExports"
prodPropsFile="input.properties"
prodPropsSortedFile="sorted.properties"
tempPropsFile="temp.properties"
echo "Removing comments and empty lines from prod properties file"
sed '/^#/d' < $prodPropsFile > $tempPropsFile
sed '/^s*$/d' < $tempPropsFile > $prodPropsSortedFile
cp $prodPropsSortedFile $tempPropsFile
echo "Sorting prod properties by length."
awk -F"=" '{ st = index($0,"="); print length(substr($0,st+1)),$0 }' $tempPropsFile | sort -rn | cut -d" " -f2- > $prodPropsSortedFile
echo "Retokenizing."
find ./$inputDump/ -type f > foo.txt
IFS=$'\n';for file in $(cat foo.txt);
do
retokenize $file;
done
echo "Done."
date

How to remove a filename from the list of path in Shell

I would like to remove a file name only from the following configuration file.
Configuration File -- test.conf
knowledgebase/arun/test.rf
knowledgebase/arunraj/tester/test.drl
knowledgebase/arunraj2/arun/test/tester.drl
The above file should be read. And removed contents should went to another file called output.txt
Following are my try. It is not working to me at all. I am getting empty files only.
#!/bin/bash
file=test.conf
while IFS= read -r line
do
# grep --exclude=*.drl line
# awk 'BEGIN {getline line ; gsub("*.drl","", line) ; print line}'
# awk '{ gsub("/",".drl",$NF); print line }' arun.conf
# awk 'NF{NF--};1' line arun.conf
echo $line | rev | cut -d'/' -f 1 | rev >> output.txt
done < "$file"
Expected Output :
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test
There's the dirname command to make it easy and reliable:
#!/bin/bash
file=test.conf
while IFS= read -r line
do
dirname "$line"
done < "$file" > output.txt
There are Bash shell parameter expansions that will work OK with the list of names given but won't work reliably for some names:
file=test.conf
while IFS= read -r line
do
echo "${line%/*}"
done < "$file" > output.txt
There's sed to do the job — easily with the given set of names:
sed 's%/[^/]*$%%' test.conf > output.txt
It's harder if you have to deal with names like /plain.file (or plain.file — the same sorts of edge cases that trip up the shell expansion).
You could add Perl, Python, Awk variants to the list of ways of doing the job.
You can get the path like this:
path=${fullpath%/*}
It cuts away the string after the last /
Using awk one liner you can do this:
awk 'BEGIN{FS=OFS="/"} {NF--} 1' test.conf
Output:
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test

Resources