Only get hash value using md5sum (without filename) - bash

I use md5sum to generate a hash value for a file.
But I only need to receive the hash value, not the file name.
md5=`md5sum ${my_iso_file}`
echo ${md5}
Output:
3abb17b66815bc7946cefe727737d295 ./iso/somefile.iso
How can I 'strip' the file name and only retain the value?

A simple array assignment works... Note that the first element of a Bash array can be addressed by just the name without the [0] index, i.e., $md5 contains only the 32 characters of md5sum.
md5=($(md5sum file))
echo $md5
# 53c8fdfcbb60cf8e1a1ee90601cc8fe2

Using AWK:
md5=`md5sum ${my_iso_file} | awk '{ print $1 }'`

You can use cut to split the line on spaces and return only the first such field:
md5=$(md5sum "$my_iso_file" | cut -d ' ' -f 1)

On Mac OS X:
md5 -q file

md5="$(md5sum "${my_iso_file}")"
md5="${md5%% *}" # remove the first space and everything after it
echo "${md5}"

Another way is to do:
md5sum filename | cut -f 1 -d " "
cut will split the line to each space and return only the first field.

By leaning on head:
md5_for_file=`md5sum ${my_iso_file}|head -c 32`

One way:
set -- $(md5sum $file)
md5=$1
Another way:
md5=$(md5sum $file | while read sum file; do echo $sum; done)
Another way:
md5=$(set -- $(md5sum $file); echo $1)
(Do not try that with backticks unless you're very brave and very good with backslashes.)
The advantage of these solutions over other solutions is that they only invoke md5sum and the shell, rather than other programs such as awk or sed. Whether that actually matters is then a separate question; you'd probably be hard pressed to notice the difference.

If you need to print it and don't need a newline, you can use:
printf $(md5sum filename)

md5=$(md5sum < $file | tr -d ' -')

md5=`md5sum ${my_iso_file} | cut -b-32`

md5sum puts a backslash before the hash if there is a backslash in the file name. The first 32 characters or anything before the first space may not be a proper hash.
It will not happen when using standard input (file name will be just -), so pixelbeat's answer will work, but many others will require adding something like | tail -c 32.

if you're concerned about screwy filenames :
md5sum < "${file_name}" | awk NF=1
f244e67ca3e71fff91cdf9b8bd3aa7a5
other messier ways to deal with this :
md5sum "${file_name}" | awk NF=NF OFS= FS=' .*$'
or
| awk '_{ exit }++_' RS=' '
f244e67ca3e71fff91cdf9b8bd3aa7a5
to do it entirely inside awk :
mawk 'BEGIN {
__ = ARGV[ --ARGC ]
_ = sprintf("%c",(_+=(_^=_<_)+_)^_+_*++_)
RS = FS
gsub(_,"&\\\\&",__)
( _=" md5sum < "((_)(__)_) ) | getline
print $(_*close(_)) }' "${file_name}"
f244e67ca3e71fff91cdf9b8bd3aa7a5

Well, I had the same problem today, but I was trying to get the file MD5 hash when running the find command.
I got the most voted question and wrapped it in a function called md5 to run in the find command. The mission for me was to calculate the hash for all files in a folder and output it as hash:filename.
md5() { md5sum $1 | awk '{ printf "%s",$1 }'; }
export -f md5
find -type f -exec bash -c 'md5 "$0"' {} \; -exec echo -n ':' \; -print
So, I'd got some pieces from here and also from 'find -exec' a shell function in Linux

For the sake of completeness, a way with sed using a regular expression and a capture group:
md5=$(md5sum "${my_iso_file}" | sed -r 's:\\*([^ ]*).*:\1:')
The regular expression is capturing everything in a group until a space is reached. To get a capture group working, you need to capture everything in sed.
(More about sed and capture groups here: How can I output only captured groups with sed?)
As delimiter in sed, I use colons because they are not valid in file paths and I don't have to escape the slashes in the filepath.

Another way:
md5=$(md5sum ${my_iso_file} | sed '/ .*//' )

md5=$(md5sum < index.html | head -c -4)

Related

Getting last X fields from a specific line in a CSV file using bash

I'm trying to get as bash variable list of users which are in my csv file. Problem is that number of users is random and can be from 1-5.
Example CSV file:
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
I would like to get something like
list_of_users="cat file.csv | grep "record2_data2" | <something> "
echo $list_of_users
user1,user2,user3,user4
I'm trying this:
cat file.csv | grep "record2_data2" | awk -F, -v OFS=',' '{print $4,$5,$6,$7,$8 }' | sed 's/"//g'
My result is:
user2,user3,user4,,
Question:
How to remove all "," from the end of my result? Sometimes it is just one but sometimes can be user1,,,,
Can I do it in better way? Users always starts after 3rd column in my file.
This will do what your code seems to be trying to do (print the users for a given string record2_data2 which only exists in the 2nd field):
$ awk -F',' '{gsub(/"/,"")} $2=="record2_data2"{sub(/([^,]*,){3}/,""); print}' file.csv
user1,user2,user3,user4
but I don't see how that's related to your question subject of Getting last X records from CSV file using bash so idk if it's what you really want or not.
Better to use a bash array, and join it into a CSV string when needed:
#!/usr/bin/env bash
readarray -t listofusers < <(cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u))
IFS=,
printf "%s\n" "${listofusers[*]}"
cut -d, -f4- file.csv | tr -d '"' | tr ',' $'\n' | sort -u is the important bit - it first only prints out the fourth and following fields of the CSV input file, removes quotes, turns commas into newlines, and then sorts the resulting usernames, removing duplicates. That output is then read into an array with the readarray builtin, and you can manipulate it and the individual elements however you need.
GNU sed solution, let file.csv content be
"record1_data1","record1_data2","record1_data3","user1","user2"
"record2_data1","record2_data2","record2_data3","user1","user2","user3","user4"
"record3_data1","record3_data2","record3_data3","user1"
then
sed -n -e 's/"//g' -e '/record2_data/ s/[^,]*,[^,]*,[^,]*,// p' file.csv
gives output
user1,user2,user3,user4
Explanation: -n turns off automatic printing, expressions meaning is as follow: 1st substitute globally " using empty string i.e. delete them, 2nd for line containing record2_data substitute (s) everything up to and including 3rd , with empty string i.e. delete it and print (p) such changed line.
(tested in GNU sed 4.2.2)
awk -F',' '
/record2_data2/{
for(i=4;i<=NF;i++) o=sprintf("%s%s,",o,$i);
gsub(/"|,$/,"",o);
print o
}' file.csv
user1,user2,user3,user4
This might work for you (GNU sed):
sed -E '/record2_data/!d;s/"([^"]*)"(,)?/\1\2/4g;s///g' file
Delete all records except for that containing record2_data.
Remove double quotes from the fourth field onward.
Remove any double quoted fields.

How to get all the group names in given subscription az cli [duplicate]

I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
/home/parent/child1/child2/filename
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?
Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
/home/parent/child1/child2/child3/filename
/home/parent/child1/child2/filename
/home/parent/child1/filename
This would be the output:
$ awk -F"/" '{print $NF}' file
filename
filename
filename
In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename
filename
If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array
Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
filename
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}
filename
Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
filename
Glad to notice there are better manners
It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
bar
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
bar.png
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
/home/parent/child1/filename3
$ while read b ; do basename "$b" ; done < a
filename1
filename2
filename3
I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}
filename
you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
filename
2nd to last :
mawk '$!--NF=$NF' FS='/'
child2
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
child1
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
child0
echo '/home/parent/child1/child2/filename'
parent
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`
just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
'
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed
'
filename
You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
or
sed -n 's/.*\/\([^\/]*\)$/\1/p'

Grep to exclude comment like # and -- with trailing spaces and within line

I tried to grep the word inside file which contains # and -- as a comment. The command that I used is
grep "^[^#]" -H -R -I "pathtofile" | grep "^[^--]" | grep -in ${1} | awk -F : ' { print $2 } ' | uniq)
which will print the file name of specific word. However, if there is a line like this
--test_specific_word_test test
The code above will treat above code as not to skip it. This case also apply to where the comment is in line with the code like var=1 --comment.
Should I use sed to delete comment line first or use just grep.
The downside is I have a significant amount of file to search and GNU grep is 2.0 and I can't upgrade the grep version because I don't have permission.
The command you've provided uses grep 4 times. You can skip commented lines with a single grep command:
grep -v "^ *\(--\|#\)" "pathtofile"
To print the filenames containing word1 use cut like so:
grep -Hv "^ *\(--\|#\)" filenames | grep "word1" | cut -d: -f1
To skip inline comments use sed:
sed "s/\(.*\)\(--\|#\).*/\1/g" inputfile
Sample input:
word1
word2
-word3 # inline comment
#comment1
--comment2
#comment3
output:
word1
word2
-word3
If in fact you are attempting to parse a programming language's source files, you are probably better off using a proper parser. Here is an attempt at refactoring your code into an Awk script, with several guesses as to what exactly the script should actually do.
find pathtofile -type f -exec awk -v word="$1" -F : '
# this doesn't reimplement grep -I though
{ sub("(#|--).*", "") } # remove comments
tolower($0) ~ tolower(word) && !($2 in a) { print FILENAME ":" FNR ":" $2; a[$2] }' {} +
This has the obvious flaw that if the programming language allows for # or -- in quoted strings and doesn't regard those as comments, the script will do the wrong thing.
There are no word boundaries in your script, so I didn't put any in mine either. This means if word="dog" then it will print any string which contains the three adjacent letters d-o-g in this order, even in substring matches like "doggone" or "endogenous". If that's not what you want, you can add word boundary markers -- if you have GNU Awk, you can say BEGIN { word = "\\<" word "\\> } at the beginning of the script; or see here.
The technique to add the key to an array and only print the key if it wasn't already in the array is a common way to implement uniq. This will fail if find returns so many files that it will end up running more than one instance of awk -- this will be controlled by the value of ARG_MAX of your kernel.

Ignoring lines from grep matching any element in a bash array

I have an array (superStringIgnoreArray) containing superstrings like "formula", "forest", "foreign", "fortify", and I am running the following grep lines:
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI "$hitWord" >> "$OUTPUT_FILE"'
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI --color=always "$hitWord" | more'
Where hitWord in this instance would be "for".
How can I return all hits that do not match any entry in my superStringIgnoreArray? (so lines containing "for", "form", "fort" "fork" "forming" would be returned, but "fortify", "forest", etc would not).
Example output:
srcToSearch/open_source_licenses.txt:12:source software packages. One or more such open_source_licenses.txt files may there**for**e
srcToSearch/open_source_licenses.txt:19:-- **For** vCenter Server 5.5u2 GA, the license in**for**mation listed in Parts 2,
srcToSearch/open_source_licenses.txt:22:-- **For** vCenter Server on Linux Virtual Appliance 5.5u2 GA, the license
srcToSearch/open_source_licenses.txt:23:in**for**mation listed in Parts 1, 2, 3, 4, 5 and 6 are applicable.
srcToSearch/open_source_licenses.txt:29:document. This list is provided **for** your convenience; please read further if
grep + bash solution:
superStringIgnoreArray=("formula" "forest" "foreign" "fortify")
grep -HniIr "$hitWord" "$SEARCH_DIR"/* \
| grep -v -f <(printf '%s\n' "${superStringIgnoreArray[#]}") | tee "$OUTPUT_FILE"
since you're outputting the filenames, chaining another grep won't be trivial, but you can achieve the same with awk
$ grep -HniIFr "$hitWord" "$SEARCH_DIR" |
awk 'BEGIN {OFS=FS=":"}
NR==FNR {a[tolower($0)]; next}
{f=$1;n=$2;$1=$2="";
for(k in a) if(tolower($0)~k) next}
{$1=f;$2=n;print}' blacklist -
here awk limits the matches after the filename with : delimiter. If you "hitWord" is a literal adding -F will help. awk is still doing pattern matching though. tolower() is to make the second step case insensitive too.
Since delimiter ":" can appear within the body, we can't depend on $3 in awk, instead, store $1 and $2; remove them from the line, match and add them back before printing. I guess at this point you can add the first grep functionality to this awk as well.
However, I think without -o flag, this and other line based solutions will fail when there is a actual match and unwanted match on the same line. If the unwanted superstrings are few, perhaps a negative lookback/lookahead pattern is a better solution.
If your blacklist is not a file but an array, you can do file substitution as in the other answer, replace with
... | awk '...' <(printf '%s\n' "${superStringIgnoreArray[#]}") -

How to truncate trailing space in xargs

I would like to use xargs to list the contents of some files based on the output of command A. Xargs replace-str seem to be adding a space to the end and causing the command to fail. Any suggestions? I know this can be worked around using for loop. But curious to know how to do this using xargs.
lsscsi |awk -F\/ '/ATA/ {print $NF}' | xargs -L 1 -I % cat /sys/block/%/queue/scheduler
cat: /sys/block/sda /queue/scheduler: No such file or directory
The problem is not with xargs -I, which does not append a space to each argument, which can be verified as follows:
$ echo 'sda' | xargs -I % echo '[%]'
[sda]
Incidentally, specifying -L 1 in addition to -I is pointless: -I implies line-by-line processing.
Therefore, it must be the output from the command that provides input to xargs that contains the trailing space.
You can adapt your awk command to fix that:
lsscsi |
awk -F/ '/ATA/ {sub(/ $/,"", $NF); print $NF}' |
xargs -I % cat '/sys/block/%/queue/scheduler'
sub(/ $/,"", $NF) replaces a trailing space in field $NF with the empty string, thereby effectively removing it.
Note how I've (single-)quoted cat's argument so as to make it work even with filenames with spaces.
lsscsi |awk -F\/ '/ATA/ {print $NF}'| awk '{print $NF}' | xargs -L 1 -I % cat /sys/block/%/queue/scheduler
The first awk stmt splits by "/" so anything else is considered as field. In this is case "sda " becomes whole field including a space at the end. But by default, awk removes space . So after the pipe, the second awk prints $NF (which is last word of the line) and leaves out " " space as delimiter. awk { print $1 } will do the same because we have only one word, "sda" which is both first and last.

Resources