using awk within loop to replace field - bash

I have written a script finding the hash value from a dictionary and outputting it in the form "word:md5sum" for each word. I then have a file of names which I would like to use to place each name followed by every hash value i.e.
tom:word1hash
tom:word2hash
.
.
bob:word1hash
and so on. Everything works fine but I can not figure out the substitution. Here is my script.
$#!/bin/bash
#/etc/dictionaries-common/words
cat words.txt | while read line; do echo -n "$line:" >> dbHashFile.txt
echo "$line" | md5sum | sed 's/[ ]-//g' >> dbHashFile.txt; done
cat users.txt | while read name
do
cat dbHashFile.txt >> nameHash.txt;
awk '{$1="$name"}' nameHash.txt;
cat nameHash.txt >> dbHash.txt;
done
the line
$awk '{$1="$name"}' nameHash.txt;
is where I attempt to do the substitution.
thank you for your help

Try replacing the entire contents of the last loop (both cats and the awk) with:
awk -v name="$name" -F ':' '{ print name ":" $2 }' dbHashFile.txt >>dbHash.txt

Related

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

Unix file pattern issue: append changing value of variable pattern to copies of matching line

I have a file with contents:
abc|r=1,f=2,c=2
abc|r=1,f=2,c=2;r=3,f=4,c=8
I want a result like below:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
The third column value is r value. A new line would be inserted for each occurrence.
I have tried with:
for i in `cat $xxxx.txt`
do
#echo $i
live=$(echo $i | awk -F " " '{print $1}')
home=$(echo $i | awk -F " " '{print $2}')
echo $live
done
but is not working properly. I am a beginner to sed/awk and not sure how can I use them. Can someone please help on this?
awk to the rescue!
$ awk -F'[,;|]' '{c=0;
for(i=2;i<=NF;i++)
if(match($i,/^r=/)) a[c++]=substr($i,RSTART+2);
delim=substr($0,length($0))=="|"?"":"|";
for(i=0;i<c;i++) print $0 delim a[i]}' file
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Use an inner routine (made up of GNU grep, sed, and tr) to compile a second more elaborate sed command, the output of which needs further cleanup with more sed. Call the input file "foo".
sed -n $(grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n') foo | \
sed 's/|[0-9|]*|/|/'
Output:
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|3
Looking at the inner sed code:
grep -no 'r=[0-9]*' foo | \
sed 's/^[0-9]*/&s#.*#\&/;s/:r=/|/;s/.*/&#p;/' | \
tr -d '\n'
It's purpose is to parse foo on-the-fly (when foo changes, so will the output), and in this instance come up with:
1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;
Which is almost perfect, but it leaves in old data on the last line:
sed -n '1s#.*#&|1#p;2s#.*#&|1#p;2s#.*#&|3#p;' foo
abc|r=1,f=2,c=2|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1
abc|r=1,f=2,c=2;r=3,f=4,c=8|1|3
...which old data |1 is what the final sed 's/|[0-9|]*|/|/' removes.
Here is a pure bash solution. I wouldn't recommend actually using this, but it might help you understand better how to work with files in bash.
# Iterate over each line, splitting into three fields
# using | as the delimiter. (f3 is only there to make
# sure a trailing | is not included in the value of f2)
while IFS="|" read -r f1 f2 f3; do
# Create an array of variable groups from $f2, using ;
# as the delimiter
IFS=";" read -a groups <<< "$f2"
for group in "${groups[#]}"; do
# Get each variable from the group separately
# by splitting on ,
IFS=, read -a vars <<< "$group"
for var in "${vars[#]}"; do
# Split each assignment on =, create
# the variable for real, and quit once we
# have found r
IFS== read name value <<< "$var"
declare "$name=$value"
[[ $name == r ]] && break
done
# Output the desired line for the current value of r
printf '%s|%s|%s\n' "$f1" "$f2" "$r"
done
done < $xxxx.txt
Changes for ksh:
read -A instead of read -a.
typeset instead of declare.
If <<< is a problem, you can use a here document instead. For example:
IFS=";" read -A groups <<EOF
$f2
EOF

How to remove a filename from the list of path in Shell

I would like to remove a file name only from the following configuration file.
Configuration File -- test.conf
knowledgebase/arun/test.rf
knowledgebase/arunraj/tester/test.drl
knowledgebase/arunraj2/arun/test/tester.drl
The above file should be read. And removed contents should went to another file called output.txt
Following are my try. It is not working to me at all. I am getting empty files only.
#!/bin/bash
file=test.conf
while IFS= read -r line
do
# grep --exclude=*.drl line
# awk 'BEGIN {getline line ; gsub("*.drl","", line) ; print line}'
# awk '{ gsub("/",".drl",$NF); print line }' arun.conf
# awk 'NF{NF--};1' line arun.conf
echo $line | rev | cut -d'/' -f 1 | rev >> output.txt
done < "$file"
Expected Output :
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test
There's the dirname command to make it easy and reliable:
#!/bin/bash
file=test.conf
while IFS= read -r line
do
dirname "$line"
done < "$file" > output.txt
There are Bash shell parameter expansions that will work OK with the list of names given but won't work reliably for some names:
file=test.conf
while IFS= read -r line
do
echo "${line%/*}"
done < "$file" > output.txt
There's sed to do the job — easily with the given set of names:
sed 's%/[^/]*$%%' test.conf > output.txt
It's harder if you have to deal with names like /plain.file (or plain.file — the same sorts of edge cases that trip up the shell expansion).
You could add Perl, Python, Awk variants to the list of ways of doing the job.
You can get the path like this:
path=${fullpath%/*}
It cuts away the string after the last /
Using awk one liner you can do this:
awk 'BEGIN{FS=OFS="/"} {NF--} 1' test.conf
Output:
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test

I want to re-arrange a file in an order in shell

I have a file test.txt like below spaces in between each record
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
Now I want to rearrange it as below into a output.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service/2.2
I've tried:
final_str=''
COMPOSITES=''
# Re-arranging the composites and preparing the composite property file
while read line; do
partition_val="$(echo $line | cut -d ',' -f 2)"
composite_temp1_val="$(echo $line | cut -d ',' -f 1)"
composite_val="$(echo $composite_temp1_val | cut -d '[' -f 1)"
version_temp1_val="$(echo $composite_temp1_val | cut -d '[' -f 2)"
version_val="$(echo $version_temp1_val | cut -d ']' -f 1)"
final_str="$partition_val/$composite_val/$version_val,"
COMPOSITES=$COMPOSITES$final_str
done <./temp/test.txt
We start with the file:
$ cat test.txt
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
We can rearrange that file as follows:
$ awk -F, -v RS=" " 'BEGIN{printf "COMPOSITES=";} {gsub(/[[]/, "/"); gsub(/[]]/, ""); if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1;}' test.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service2/2.2
The same command split over multiple lines is:
awk -F, -v RS=" " '
BEGIN{
printf "COMPOSITES=";
}
{
gsub(/[[]/, "/")
gsub(/[]]/, "")
if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1
}
' test.txt
Here's what I came up with.
awk -F '[],[]' -v RS=" " 'BEGIN{printf("COMPOSITES=")}/../{printf("%s/%s/%s,",$4,$1,$2);}' test.txt
Broken out for easier reading:
awk -F '[],[]' -v RS=" " '
BEGIN {
printf("COMPOSITES=");
}
/../ {
printf("%s/%s/%s,",$4,$1,$2);
}' test.txt
More detailed explanation of the script:
-F '[],[]' - use commas or square brackets as field separators
-v RS=" " - use just the space as a record separator
'BEGIN{printf("COMPOSITES=")} - starts your line
/../ - run the following code on any line that has at least two characters. This avoids the empty field at the end of a line terminating with a space.
printf("%s/%s/%s,",$4,$1,$2); - print the elements using a printf() format string that matches the output you specified.
As concise as this is, the format string does leave a trailing comma at the end of the line. If this is a problem, it can be avoided with a bit of extra code.
You could also do this in sed, if you like writing code in line noise.
sed -e 's:\([^[]*\).\([^]]*\).,\([^,]*\), :\3/\1/\2,:g;s/^/COMPOSITES=/;s/,$//' test.txt
Finally, if you want to avoid external tools like sed and awk, you can do this in bash alone:
a=($(<test.txt))
echo -n "COMPOSITES="
for i in "${a[#]}"; do
i="${i%,}"
t="${i%]*}"
printf "%s/%s/%s," "${i#*,}" "${i%[*}" "${t#*[}"
done
echo ""
This slurps the contents of test.txt into an array, which means your input data must be separated by whitespace, per your example. It then adds the prefix, then steps through the array, using Parameter Expansion to massage the data into the fields you need. The last line (echo "") is helpful for testing; you may want to eliminate it in practice.

Bash script read specifc value from files of an entire folder

I have a problem creating a script that reads specific value from all the files of an entire folder
I have a number of email files in a directory and I need to extract from each file, 2 specific values.
After that I have to put them into a new file that looks like that:
--------------
To: value1
value2
--------------
This is what I want to do, but I don't know how to create the script:
# I am putting the name of the files into a temp file
`ls -l | awk '{print $9 }' >tmpfile`
# use for the name of a file
`date=`date +"%T"
# The first specific value from file (phone number)
var1=`cat tmpfile | grep "To: 0" | awk '{print $2 }' | cut -b -10 `
# The second specific value from file(subject)
var2=cat file | grep Subject | awk '{print $2$3$4$5$6$7$8$9$10 }'
# Put the first value in a new file on the first row
echo "To: 4"$var1"" > sms-$date
# Put the second value in the same file on the second row
echo ""$var2"" >>sms-$date
.......
and do the same for every file in the directory
I tried using while and for functions but I couldn't finalize the script
Thank You
I've made a few changes to your script, hopefully they will be useful to you:
#!/bin/bash
for file in *; do
var1=$(awk '/To: 0/ {print substr($2,0,10)}' "$file")
var2=$(awk '/Subject/ {for (i=2; i<=10; ++i) s=s$i; print s}' "$file")
outfile="sms-"$(date +"%T")
i=0
while [ -f "$outfile" ]; do outfile="sms-$date-"$((i++)); done
echo "To: 4$var1" > "$outfile"
echo "$var2" >> "$outfile"
done
The for loop just goes through every file in the folder that you run the script from.
I have added added an additional suffix $i to the end of the file name. If no file with the same date already exists, then the file will be created without the suffix. Otherwise the value of $i will keep increasing until there is no file with the same name.
I'm using $( ) rather than backticks, this is just a personal preference but it can be clearer in my opinion, especially when there are other quotes about.
There's not usually any need to pipe the output of grep to awk. You can do the search in awk using the / / syntax.
I have removed the cut -b -10 and replaced it with substr($2, 0, 10), which prints the first 10 characters from column 2.
It's not much shorter but I used a loop rather than the $2$3..., I think it looks a bit neater.
There's no need for all the extra " in the two output lines.
I sugest to try the following:
#!/bin/sh
RESULT_FILE=sms-`date +"%T"`
DIR=.
fgrep -l 'To: 0' "$DIR" | while read FILE; do
var1=`fgrep 'To: 0' "$FILE" | awk '{print $2 }' | cut -b -10`
var2=`fgrep 'Subject' "$FILE" | awk '{print $2$3$4$5$6$7$8$9$10 }'`
echo "To: 4$var1" >>"$RESULT_FIL"
echo "$var2" >>"$RESULT_FIL"
done

Resources