Is it possible to generate a checksum (md5 ) a string in a shell? - bash

I would like to have a unique ID for filenames so I can iterate over the IDs and compare the checksums of the files?
Is it possible to have a checksum for the name of the file so I can have a unique ID per filename?
I would welcome other ideas.

Is it what you want?
Plain string:
serce#unit:~$ echo "Hello, checksum!" | md5sum
9f898618b071286a14d1937f9db13b8f -
And file content:
serce#unit:~$ md5sum agent.yml
3ed53c48f073bd321339cd6a4c716c17 -

Yes it is possible using md5sum and basename $0 gives the name of current file
Assuming I have the script as below named md5Gen.sh
#!/bin/bash
mdf5string=$(basename "$0" | md5sum )
echo -e `basename "$0"` $mdf5string
Running the script would give me
md5Gen.sh 911949bd2ab8467162e27c1b6b5633c0 -

Yes, it is possible to obtain the MD5 of an string:
$ printf '%s' "This-Filename" | md5sum
dd829ba5a7ba7bdf7a391f2e0bd7cd1f -
It is important to understand that there is no newline at the end of the printed string. An equivalent in bash would be to use echo -n:
$ echo -n "This-Filename" | md5sum
dd829ba5a7ba7bdf7a391f2e0bd7cd1f -
The -n (valid in bash) is important because otherwise your hash would change with the inclusion of a newline that is not part of the text:
$ echo "This-Filename" | md5sum
7ccba9dffa4baf9ca0e56c078aa09a07 -
That also apply to file contents:
$ echo -n "This-Filename" > infile
$ md5sum infile
dd829ba5a7ba7bdf7a391f2e0bd7cd1f infile
$ echo "This-Filename" > infile
$ md5sum infile
7ccba9dffa4baf9ca0e56c078aa09a07 infile

Related

Why is this bash loop failing to concatenate the files?

I am at my wits end as to why this loop is failing to concatenate the files the way I need it. Basically, lets say we have following files:
AB124661.lane3.R1.fastq.gz
AB124661.lane4.R1.fastq.gz
AB124661.lane3.R2.fastq.gz
AB124661.lane4.R2.fastq.gz
What we want is:
cat AB124661.lane3.R1.fastq.gz AB124661.lane4.R1.fastq.gz > AB124661.R1.fastq.gz
cat AB124661.lane3.R2.fastq.gz AB124661.lane4.R2.fastq.gz > AB124661.R2.fastq.gz
What I tried (and didn't work):
Create and save file names (AB124661) to a ID file:
ls -1 R1.gz | awk -F '.' '{print $1}' | sort | uniq > ID
This creates an ID file that stores the samples/files name.
Run the following loop:
for i in `cat ./ID`; do cat $i\.lane3.R1.fastq.gz $i\.lane4.R1.fastq.gz \> out/$i\.R1.fastq.gz; done
for i in `cat ./ID`; do cat $i\.lane3.R2.fastq.gz $i\.lane4.R2.fastq.gz \> out/$i\.R2.fastq.gz; done
The loop fails and concatenates into empty files.
Things I tried:
Yes, the ID file is definitely in the folder
When I run with echo it shows the cat command correct
Any help will be very much appreciated,
Best,
AC
why are you escaping the \> ? That's going to result in a cat: '>': No such file or directory instead of a redirection.
Don't read lines with for
while IFS= read -r id; do
cat "${id}.lane3.R1.fastq.gz" "${id}.lane4.R1.fastq.gz" > "out/${id}.R1.fastq.gz"
cat "${id}.lane3.R2.fastq.gz" "${id}.lane4.R2.fastq.gz" > "out/${id}.R2.fastq.gz"
done < ./ID
Let say you have id stored in file ./ID per line
while read -r line; do
cat "$line".lane3.R1.fastq.gz "$line".lane4.R1.fastq.gz > "$line".R1.fastq.gz
cat "$line".lane3.R2.fastq.gz "$line".lane4.R2.fastq.gz > "$line".R2.fastq.gz
done < ./ID
A pure shell solution could be like that:
for file in *.fastq.gz; do
id=${file%%.*}
[ -e "$id".R1.fastq.gz ] || cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
[ -e "$id".R2.fastq.gz ] || cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
Alternatively:
printf '%s\n' *.fastq.gz | cut -d. -f1 | sort -u |
while IFS= read -r id; do
cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
This solution assumes filenames of interest don't contain newline characters.

How to write a command line script that will loop through every line in a text file and append a string at the end of each? [duplicate]

How do I add a string after each line in a file using bash? Can it be done using the sed command, if so how?
If your sed allows in place editing via the -i parameter:
sed -e 's/$/string after each line/' -i filename
If not, you have to make a temporary file:
typeset TMP_FILE=$( mktemp )
touch "${TMP_FILE}"
cp -p filename "${TMP_FILE}"
sed -e 's/$/string after each line/' "${TMP_FILE}" > filename
I prefer echo. using pure bash:
cat file | while read line; do echo ${line}$string; done
I prefer using awk.
If there is only one column, use $0, else replace it with the last column.
One way,
awk '{print $0, "string to append after each line"}' file > new_file
or this,
awk '$0=$0"string to append after each line"' file > new_file
If you have it, the lam (laminate) utility can do it, for example:
$ lam filename -s "string after each line"
Pure POSIX shell and sponge:
suffix=foobar
while read l ; do printf '%s\n' "$l" "${suffix}" ; done < file |
sponge file
xargs and printf:
suffix=foobar
xargs -L 1 printf "%s${suffix}\n" < file | sponge file
Using join:
suffix=foobar
join file file -e "${suffix}" -o 1.1,2.99999 | sponge file
Shell tools using paste, yes, head
& wc:
suffix=foobar
paste file <(yes "${suffix}" | head -$(wc -l < file) ) | sponge file
Note that paste inserts a Tab char before $suffix.
Of course sponge can be replaced with a temp file, afterwards mv'd over the original filename, as with some other answers...
This is just to add on using the echo command to add a string at the end of each line in a file:
cat input-file | while read line; do echo ${line}"string to add" >> output-file; done
Adding >> directs the changes we've made to the output file.
Sed is a little ugly, you could do it elegantly like so:
hendry#i7 tmp$ cat foo
bar
candy
car
hendry#i7 tmp$ for i in `cat foo`; do echo ${i}bar; done
barbar
candybar
carbar

extracting a variable's value from text file using bash

I am using Linux and bash.
I have a simple text file like below:
VAR1=100
VAR2=5
VAR3=0
VAR4=99
I want to extract by means of bash the value of VAR2, that is 5.
How could I do that?
Assuming the file is called vars.txt
sed -n 's/^VAR2=\(.*\)/\1/p' < vars.txt
You can use the value elsewhere like this using single back quotes
echo VAR2=`sed -n 's/^VAR2=\(.*\)/\1/p' < txt`
The simplest way might be to use source or simply . to read and execute the file. This would work with your example, because there are no spaces in the variable values. Otherwise you need to use grep + cut or awk, as stated in other answers.
. /path/to/your/file
echo $VAR2
[edit]
As stated by dawg, this would make the other variables available in your script too, and possibly overwrite existing variables.
Given:
$ echo "$txt"
VAR1=100
VAR2=5
VAR3=0
VAR4=99
You can use awk:
$ echo "$txt" | awk -F= '/^VAR2/ { print $2 }'
5
Or grep and cut:
$ echo "$txt" | egrep '^VAR2=\d+' | cut -d = -f 2
5
On Bash, you can insert the value of those assignments into the current shell using source and filter the lines you wish to use. In this case, only the line VAR2=5 will be used. You need to write that to a file and then source that file:
$ echo "$txt" | grep '^VAR2' > tmp && source tmp && rm tmp
$ echo $VAR2
5
For the files as described, you can just source the file as bash script which will run it's content and update you workspace environment with it. For example:
source file.txt
echo $VAR2
Assume this as your txt file, named test.txt
VAR2 = 5
VAR3 = 0
VAR4 = 99
you can cat test.txt | grep 'VAR2' | awk '{printf $3}'
and then your output will be: 5
Here, cat test.txt will display the content of test.txt in your terminal,grep 'VAR2' will list lines containing 'VAR2' and awk '{printf $3}' will print the value of the variable

Grep-ing a list of filename against a csv list of names

I have a CSV files containing a list of ids, numbers, each on a row. Let's call that file ids.csv
In a directory i have a big number of files, name "file_123456_smth.csv", where 123456 is an id that could be found in the ids csv file
Now, what I am trying to achieve: compare the names of the files with the ids stored in ids.csv. If 123456 is found in ids.csv then the filename should be displayed.
What i've tried:
ls -a | xargs grep -L cat ../../ids.csv
Of course, this does not work, but gives an idea of my direction.
Lets see if I understood you correctly...
$ cat ids.csv
123
456
789
$ ls *.csv
file_123_smth.csv file_321_smth.csv file_789_smth.csv ids.csv
$ ./c.sh
123 found in file_123_smth.csv
789 found in file_789_smth.csv
where c.sh looks like this:
#!/bin/bash
ID="ids.csv"
for file in *.csv
do
if [[ $file =~ file ]] # just do the filtering on files
then # containing the actual string "file"
id=$(cut -d_ -f2 <<< "$file")
grep -q "$id" $ID && echo "$id found in $file"
fi
done

Extract directory path and filename

I have a variable which has the directory path, along with the file name. I want to extract the filename alone from the Unix directory path and store it in a variable.
fspec="/exp/home1/abc.txt"
Use the basename command to extract the filename from the path:
[/tmp]$ export fspec=/exp/home1/abc.txt
[/tmp]$ fname=`basename $fspec`
[/tmp]$ echo $fname
abc.txt
bash to get file name
fspec="/exp/home1/abc.txt"
filename="${fspec##*/}" # get filename
dirname="${fspec%/*}" # get directory/path name
other ways
awk
$ echo $fspec | awk -F"/" '{print $NF}'
abc.txt
sed
$ echo $fspec | sed 's/.*\///'
abc.txt
using IFS
$ IFS="/"
$ set -- $fspec
$ eval echo \${${##}}
abc.txt
You can simply do:
base=$(basename "$fspec")
dirname "/usr/home/theconjuring/music/song.mp3"
will yield
/usr/home/theconjuring/music.
bash:
fspec="/exp/home1/abc.txt"
fname="${fspec##*/}"
echo $fspec | tr "/" "\n"|tail -1
Using bash "here string":
$ fspec="/exp/home1/abc.txt"
$ tr "/" "\n" <<< $fspec | tail -1
abc.txt
$ filename=$(tr "/" "\n" <<< $fspec | tail -1)
$ echo $filename
abc.txt
The benefit of the "here string" is that it avoids the need/overhead of running an echo command. In other words, the "here string" is internal to the shell. That is:
$ tr <<< $fspec
as opposed to:
$ echo $fspec | tr

Resources