bash: process substitution, paste and echo - bash

I'm trying out process substitution and this is just a fun exercise.
I want to append the string "XXX" to all the values of 'ls':
paste -d ' ' <(ls -1) <(echo "XXX")
How come this does not work? XXX is not appended. However if I want to append the file name to itself such as
paste -d ' ' <(ls -1) <(ls -1)
it works.
I do not understand the behavior. Both echo and ls -1 write to stdout but echo's output isn't read by paste.

Try doing this, using a printf hack to display the file with zero length output and XXX appended.
paste -d ' ' <(ls -1) <(printf "%.0sXXX\n" * )
Demo :
$ ls -1
filename1
filename10
filename2
filename3
filename4
filename5
filename6
filename7
filename8
filename9
Output :
filename1 XXX
filename10 XXX
filename2 XXX
filename3 XXX
filename4 XXX
filename5 XXX
filename6 XXX
filename7 XXX
filename8 XXX
filename9 XXX
If you just want to append XXX, this one will be simpler :
printf "%sXXX\n"

If you want the XXX after every line of ls -l output, you need a second command that output x times the string. You are echoing it just once and therefore it will get appended to the first line of ls output only.
If you are searching for a tiny command line to achieve the task you may use sed:
ls -l | sed -n 's/\(^.*\)$/\1 XXX/p'

And here's a funny one, not using any external command except the legendary yes command!
while read -u 4 head && read -u 5 tail ; do echo "$head $tail"; done 4< <(ls -1) 5< <(yes XXX)
(I'm only posting this because it's funny and it's actually not 100% off topic since it uses file descriptors and process substitutions)

... you have to:
for i in $( ls -1 ); do echo "$i XXXX"; done
Never use for i in $(command). See this answer for more details.
So, to answer of this original question, you could simply use something like this :
for file in *; do echo "$file XXXX"; done
Another solution with awk :
ls -1|awk '{print $0" XXXX"}'
awk '{print $0" XXXX"}' <(ls -1) # with process substitution
Another solution with sed :
ls -1|sed "s/\(.*\)/\1 XXXX/g"
sed "s/\(.*\)/\1 XXXX/g" <(ls -1) # with process substitution
And useless solutions, just for fun :
while read; do echo "$REPLY XXXX"; done <<< "$(ls -1)"
ls -1|while read; do echo "$REPLY XXXX"; done

It does it only for the first line, since it groups the first line from parameter 1 with the first line from parameter 2:
paste -d ' ' <(ls -1) <(echo "XXX")
... outputs:
/dir/file-a XXXX
/dir/file-b
/dir/file-c
... you have to:
for i in $( ls -1 ); do echo "$i XXXX"; done

You can use xargs for the same effect:
ls -1 | xargs -I{} echo {} XXX

Related

Why is this bash loop failing to concatenate the files?

I am at my wits end as to why this loop is failing to concatenate the files the way I need it. Basically, lets say we have following files:
AB124661.lane3.R1.fastq.gz
AB124661.lane4.R1.fastq.gz
AB124661.lane3.R2.fastq.gz
AB124661.lane4.R2.fastq.gz
What we want is:
cat AB124661.lane3.R1.fastq.gz AB124661.lane4.R1.fastq.gz > AB124661.R1.fastq.gz
cat AB124661.lane3.R2.fastq.gz AB124661.lane4.R2.fastq.gz > AB124661.R2.fastq.gz
What I tried (and didn't work):
Create and save file names (AB124661) to a ID file:
ls -1 R1.gz | awk -F '.' '{print $1}' | sort | uniq > ID
This creates an ID file that stores the samples/files name.
Run the following loop:
for i in `cat ./ID`; do cat $i\.lane3.R1.fastq.gz $i\.lane4.R1.fastq.gz \> out/$i\.R1.fastq.gz; done
for i in `cat ./ID`; do cat $i\.lane3.R2.fastq.gz $i\.lane4.R2.fastq.gz \> out/$i\.R2.fastq.gz; done
The loop fails and concatenates into empty files.
Things I tried:
Yes, the ID file is definitely in the folder
When I run with echo it shows the cat command correct
Any help will be very much appreciated,
Best,
AC
why are you escaping the \> ? That's going to result in a cat: '>': No such file or directory instead of a redirection.
Don't read lines with for
while IFS= read -r id; do
cat "${id}.lane3.R1.fastq.gz" "${id}.lane4.R1.fastq.gz" > "out/${id}.R1.fastq.gz"
cat "${id}.lane3.R2.fastq.gz" "${id}.lane4.R2.fastq.gz" > "out/${id}.R2.fastq.gz"
done < ./ID
Let say you have id stored in file ./ID per line
while read -r line; do
cat "$line".lane3.R1.fastq.gz "$line".lane4.R1.fastq.gz > "$line".R1.fastq.gz
cat "$line".lane3.R2.fastq.gz "$line".lane4.R2.fastq.gz > "$line".R2.fastq.gz
done < ./ID
A pure shell solution could be like that:
for file in *.fastq.gz; do
id=${file%%.*}
[ -e "$id".R1.fastq.gz ] || cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
[ -e "$id".R2.fastq.gz ] || cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
Alternatively:
printf '%s\n' *.fastq.gz | cut -d. -f1 | sort -u |
while IFS= read -r id; do
cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
This solution assumes filenames of interest don't contain newline characters.

Concatenate the output of 2 commands in the same line in Unix

I have a command like below
md5sum test1.txt | cut -f 1 -d " " >> test.txt
I want output of the above result prefixed with File_CheckSum:
Expected output: File_CheckSum: <checksumvalue>
I tried as follows
echo 'File_Checksum:' >> test.txt | md5sum test.txt | cut -f 1 -d " " >> test.txt
but getting result as
File_Checksum:
adbch345wjlfjsafhals
I want the entire output in 1 line
File_Checksum: adbch345wjlfjsafhals
echo writes a newline after it finishes writing its arguments. Some versions of echo allow a -n option to suppress this, but it's better to use printf instead.
You can use a command group to concatenate the the standard output of your two commands:
{ printf 'File_Checksum: '; md5sum test.txt | cut -f 1 -d " "; } >> test.txt
Note that there is a race condition here: you can theoretically write to test.txt before md5sum is done reading from it, causing you to checksum more data than you intended. (Your original command mentions test1.txt and test.txt as separate files, so it's not clear if you are really reading from and writing to the same file.)
You can use command grouping to have a list of commands executed as a unit and redirect the output of the group at once:
{ printf 'File_Checksum: '; md5sum test1.txt | cut -f 1 -d " " } >> test.txt
printf "%s: %s\n" "File_Checksum:" "$(md5sum < test1.txt | cut ...)" > test.txt
Note that if you are trying to compute the hash of test.txt(the same file you are trying to write to), this changes things significantly.
Another option is:
{
printf "File_Checksum: "
md5sum ...
} > test.txt
Or:
exec > test.txt
printf "File_Checksum: "
md5sum ...
but be aware that all subsequent commands will also write their output to test.txt. The typical way to restore stdout is:
exec 3>&1
exec > test.txt # Redirect all subsequent commands to `test.txt`
printf "File_Checksum: "
md5sum ...
exec >&3 # Restore original stdout
Operator &&
e.g. mkdir example && cd example

Bash Shellscript Column Check Error Handling

I am writing a Bash Shellscript. I need to check a file for if $value1 contains $value2. $value1 is the column number (1, 4, 5 as an example) and $value2 ($value2 can be '03', '04' , '09' etc) is the String I am looking for. If the column contains the $value2 then perform a move of the file to an error directory. I was wondering what is the best approach to this. I was thinking awk or is there another way?
$value1 and $value2 are stored in a config file. I have control over what format I can use. Here's an example. The file separator is Octal \036. I just depicted with | below.
Example
$value1=5
$value2=04
Input example1.txt
example|42|udajha|llama|04
example|22|udajha|llama|02
Input example2.txt
example|22|udajha|llama|02
Result
move example1.txt to /home/user/error_directory and example2.txt stays in current directory (nothing happens)
awk can report out which files meet this condition:
awk -F"|" -v columnToSearch=$value1 -v valueToFind=$value2 '$columnToSearch==valueToFind{print FILENAME}' example1.txt example2.txt
Then you can do your mv based on that.
Example using a pipe to xargs (with smaller variable names since you get the idea by now):
awk -F"|" -v c=$value1 -v v=$value2 '$c==v{print FILENAME}' example1.txt example2.txt | xargs -I{} mv -i {} /home/user/error_directory
If you're writing a bash shell script then you can break it down by column using cut.
There are really so many options that it depends on what you want to get done.
In my experience with data I'd use a colon rather than pipe because it allows me to avoid the escape with the 'cut' command.
Changing the data files to:
cat example1.txt
example:42:udajha:llama:04
example:22:udajha:llama:02
I'd write it like this: (adding -x so that you can see the processing, but in your code you'd not need to do that.)
[root#]# cat mysript.sh
#!/bin/sh -x
one=`cat example1.txt | cut -d: -f5`
two=`cat example2.txt | cut -d: -f5`
for i in $one
do
if [ $i -eq $two ]
then
movethis=`grep $two example1.txt`
echo $movethis >> /home/me/error.txt
fi
done
cat /home/me/error.txt
[root#]# ./mysript.sh
++ cat example1.txt
++ cut -d: -f5
+ one='04
02 '
++ cat example2.txt
++ cut -d: -f5
+ two=02
+ for i in '$one'
+ '[' 04 -eq 02 ']'
+ for i in '$one'
+ '[' 02 -eq 02 ']'
++ grep 02 example1.txt
+ movethis='example:22:udajha:llama:02 '
+ echo example:22:udajha:llama:02
+ cat /home/me/error.txt
example:22:udajha:llama:02
You can use any command you live to move your content. Touch, cp, mv, what ever you want to use there.

How to remove a filename from the list of path in Shell

I would like to remove a file name only from the following configuration file.
Configuration File -- test.conf
knowledgebase/arun/test.rf
knowledgebase/arunraj/tester/test.drl
knowledgebase/arunraj2/arun/test/tester.drl
The above file should be read. And removed contents should went to another file called output.txt
Following are my try. It is not working to me at all. I am getting empty files only.
#!/bin/bash
file=test.conf
while IFS= read -r line
do
# grep --exclude=*.drl line
# awk 'BEGIN {getline line ; gsub("*.drl","", line) ; print line}'
# awk '{ gsub("/",".drl",$NF); print line }' arun.conf
# awk 'NF{NF--};1' line arun.conf
echo $line | rev | cut -d'/' -f 1 | rev >> output.txt
done < "$file"
Expected Output :
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test
There's the dirname command to make it easy and reliable:
#!/bin/bash
file=test.conf
while IFS= read -r line
do
dirname "$line"
done < "$file" > output.txt
There are Bash shell parameter expansions that will work OK with the list of names given but won't work reliably for some names:
file=test.conf
while IFS= read -r line
do
echo "${line%/*}"
done < "$file" > output.txt
There's sed to do the job — easily with the given set of names:
sed 's%/[^/]*$%%' test.conf > output.txt
It's harder if you have to deal with names like /plain.file (or plain.file — the same sorts of edge cases that trip up the shell expansion).
You could add Perl, Python, Awk variants to the list of ways of doing the job.
You can get the path like this:
path=${fullpath%/*}
It cuts away the string after the last /
Using awk one liner you can do this:
awk 'BEGIN{FS=OFS="/"} {NF--} 1' test.conf
Output:
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test

Shell script: count the copies of each line from a txt

I would like to count the copies of each line in a txt file and I have tried so many things until know, but none worked well. In my case the text has just a word in each line.
This was my last try
echo -n 'enter file for edit: '
read file
for line in $file ; do
echo 'grep -w $line $file'
done; <$file
For example:
input file
a
a
a
c
c
Output file
a 3
c 2
Thanks in advance.
$ sort < $file | uniq -c | awk '{print $2 " " $1}'

Resources