I have a file for example with the name file.csv and content
adult,REZ
man,BRB
women,SYO
animal,HIJ
and a line that is nor a directory nor a file
file.csv BRB1 REZ3 SYO2
And what I want to do is change the content of the file with the words that are on the line and then get the nth letter of that word with the number at the end of the those words in capital
and the output should then be
umo
I know that I can get over the line with
for i in "${#:2}"
do
words+=$(echo "$i ")
done
and then the output is
REZ3 BRB1 SYO2
Using awk:
Pass the string of values as an awk variable and then split them into an array a. For each record in file.csv, iterate this array and if the second field of current record matches the first three characters of the current array value, then strip the target character from the first field of the current record and append it to a variable. Print the value of the aggregated variable.
awk -v arr="BRB1 REZ3 SYO2" -F, 'BEGIN{split(arr,a," ")} {for (v in a) { if ($2 == substr(a[v],0,3)) {n=substr(a[v],length(a[v]),1); w=w""substr($1,n,1) }}} END{print w}' file.csv
umo
You can also put this into a script:
#!/bin/bash
words="${2}"
src_file="${1}"
awk -v arr="$words" -F, 'BEGIN{split(arr,a," ")} \
{for (v in a) { \
if ($2 == substr(a[v],0,3)) { \
n=substr(a[v],length(a[v]),1); \
w=w""substr($1,n,1);
}
}
} END{print w}' "$src_file"
Script execution:
./script file.csv "BRB1 REZ3 SYO2"
umo
This is a way using sed.
Create a pattern string from command arguments and convert lines with sed.
#!/bin/bash
file="$1"
pat='s/^/ /;Te;'
for i in ${#:2}; do
pat+=$(echo $i | sed 's#^\([^0-9]*\)\([0-9]*\)$#s/.\\{\2\\}\\(.\\).*,\1$/\\1/;#')
done
pat+='Te;H;:e;${x;s/\n//g;p}'
eval "sed -n '$pat' $file"
Try this code:
#!/bin/bash
declare -A idx_dic
filename="$1"
pattern_string=""
for i in "${#:2}";
do
pattern_words=$(echo "$i" | grep -oE '[A-Z]+')
index=$(echo "$i" | grep -oE '[0-9]+')
pattern_string+=$(echo "$pattern_words|")
idx_dic["$pattern_words"]="$index"
done
pattern_string=${pattern_string%|*}
while IFS= read -r line
do
line_pattern=$(echo $line | grep -oE $pattern_string)
[[ -n $line_pattern ]] && line_index="${idx_dic[$line_pattern]}" && echo $line | awk -v i="$line_index" '{split($0, chars, ""); printf("%s", chars[i]);}'
done < $filename
first find the capital words pattern and catch the index corresponding
then construct the hole pattern words string which connect with |
at last, iterate the every line according to the pattern string, and find the letter by the index
Execute this script.sh like:
bash script.sh file.csv BRB1 REZ3 SYO2
I have to optimize a shell script, but after one week, i didn't succeed to optimize it enough.
I have to search recursively for .c .h and .cpp file in a directory, and check if word like this exist:
"float short unsigned continue for signed void default goto sizeof volatile do if static while"
words=$(echo $# | sed 's/ /\\|/g')
files=$(find $dir -name '*.cpp' -o -name '*.c' -o -name '*.h' )
for file in $files; do
(
test=$(grep -woh "$words" "$file" | sort -u | awk '{print}' ORS=' ')
if [ "$test" != "" ] ; then
echo "$(realpath $file) contains : $test"
fi
)&
done
wait
I have tried with xargs and -exec, but no result, i have to keep this result format:
/usr/include/c++/6/bits/stl_set.h contains : default for if void
Maybe you can help me (to optimize it)..
EDIT: I have to keep one occurence of each word
YES: while, for, volatile...
NOPE: while, for, for, volatile...
If you are interested in finding all files that have at least one match of any of your patterns, you can use globstar:
shopt -s globstar
oldIFS=$IFS; IFS='|'; patterns="$*"; IFS=$oldIFS # make a | delimited string from arguments
grep -lwE "$patterns" **/*.c **/*.h **/*.cpp # list files with matching patterns
globstar
If set, the pattern ‘**’ used in a filename expansion context
will match all files and zero or more directories and subdirectories.
If the pattern is followed by a ‘/’, only directories and
subdirectories match.
Here is an approach that keeps your desired format while eliminating the use of find and bash looping:
words='float|short|unsigned|continue|for|signed|void|default|goto|sizeof|volatile|do|if|static|while'
grep -rwoE --include '*.[ch]' --include '*.cpp' "$words" path | awk -F: '$1!=last{printf "%s%s: contains %s",r,$1,$2; last=$1; r=ORS; delete a; a[$2]} $1==last && !($2 in a){printf " %s",$2; a[$2]} END{print""}'
How it works
grep -rwoE --include '*.[ch]' --include '*.cpp' "$words" path
This searches recursively through directories starting with path looking only in files whose names match the globs *.[ch] or *.cpp.
awk -F: '$1!=last{printf "%s%s: contains %s",r,$1,$2; last=$1; r=ORS; delete a; a[$2]} $1==last{printf " %s",$2} END{print""}'
This awk command reformats the output of grep to match your desired output. The script uses a variable last and array a. last keeps track of which file we are on and a contains the list of words seen so far. In more detail:
-F:
This tells awk to use : as the field separator. In this way, the first field is the file name and the second is the word that is found. (limitation: file names that include : are not supported.)
'$1!=last{printf "%s%s: contains %s",r,$1,$2; last=$1; r=ORS; delete a; a[$2]}
Every time that the file name, $1, does not match the variable last, we start the output for a new file. Then, we update last to contain the name of this new file. We then delete array a and then assign key $2 to a new array a.
$1==last && !($2 in a){printf " %s",$2; a[$2]}
If the current file name is the same as the previous and the current word has not been seen before, we print out the new word found. We also add this word, $2 as a key to array a.
END{print""}
This prints out a final newline (record separator) character.
Multiline version of code
For those who prefer their code spread out over multiple lines:
grep -rwoE \
--include '*.[ch]' \
--include '*.cpp' \
"$words" path |
awk -F: '
$1!=last{
printf "%s%s: contains %s",r,$1,$2
last=$1
r=ORS
delete a
a[$2]
}
$1==last && !($2 in a){
printf " %s",$2; a[$2]
}
END{
print""
}'
You should be able to do most of this with a single grep command:
grep -Rw $dir --include \*.c --include \*.h --include \*.cpp -oe "$words"
This will put it in file:word format, so all that's left is to change it to produce the output that you want.
echo $output | sed 's/:/ /g' | awk '{print $1 " contains : " $2}'
Then you can add | sort -u to get only single occurrences for each word in each file.
#!/bin/bash
#dir=.
words=$(echo $# | sed 's/ /\\|/g')
grep -Rw $dir --include \*.c --include \*.h --include \*.cpp -oe "$words" \
| sort -u \
| sed 's/:/ /g' \
| awk '{print $1 " contains : " $2}'
I have a file test.txt like below spaces in between each record
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
Now I want to rearrange it as below into a output.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service/2.2
I've tried:
final_str=''
COMPOSITES=''
# Re-arranging the composites and preparing the composite property file
while read line; do
partition_val="$(echo $line | cut -d ',' -f 2)"
composite_temp1_val="$(echo $line | cut -d ',' -f 1)"
composite_val="$(echo $composite_temp1_val | cut -d '[' -f 1)"
version_temp1_val="$(echo $composite_temp1_val | cut -d '[' -f 2)"
version_val="$(echo $version_temp1_val | cut -d ']' -f 1)"
final_str="$partition_val/$composite_val/$version_val,"
COMPOSITES=$COMPOSITES$final_str
done <./temp/test.txt
We start with the file:
$ cat test.txt
service[1.1],parttion, service[1.2],parttion, service[1.3],parttion, service[2.1],parttion, service2[2.2],parttion,
We can rearrange that file as follows:
$ awk -F, -v RS=" " 'BEGIN{printf "COMPOSITES=";} {gsub(/[[]/, "/"); gsub(/[]]/, ""); if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1;}' test.txt
COMPOSITES=parttion/service/1.1,parttion/service/1.2,parttion/service/1.3,parttion/service/2.1,parttion/service2/2.2
The same command split over multiple lines is:
awk -F, -v RS=" " '
BEGIN{
printf "COMPOSITES=";
}
{
gsub(/[[]/, "/")
gsub(/[]]/, "")
if (NF>1) printf "%s%s/%s",NR==1?"":",",$2,$1
}
' test.txt
Here's what I came up with.
awk -F '[],[]' -v RS=" " 'BEGIN{printf("COMPOSITES=")}/../{printf("%s/%s/%s,",$4,$1,$2);}' test.txt
Broken out for easier reading:
awk -F '[],[]' -v RS=" " '
BEGIN {
printf("COMPOSITES=");
}
/../ {
printf("%s/%s/%s,",$4,$1,$2);
}' test.txt
More detailed explanation of the script:
-F '[],[]' - use commas or square brackets as field separators
-v RS=" " - use just the space as a record separator
'BEGIN{printf("COMPOSITES=")} - starts your line
/../ - run the following code on any line that has at least two characters. This avoids the empty field at the end of a line terminating with a space.
printf("%s/%s/%s,",$4,$1,$2); - print the elements using a printf() format string that matches the output you specified.
As concise as this is, the format string does leave a trailing comma at the end of the line. If this is a problem, it can be avoided with a bit of extra code.
You could also do this in sed, if you like writing code in line noise.
sed -e 's:\([^[]*\).\([^]]*\).,\([^,]*\), :\3/\1/\2,:g;s/^/COMPOSITES=/;s/,$//' test.txt
Finally, if you want to avoid external tools like sed and awk, you can do this in bash alone:
a=($(<test.txt))
echo -n "COMPOSITES="
for i in "${a[#]}"; do
i="${i%,}"
t="${i%]*}"
printf "%s/%s/%s," "${i#*,}" "${i%[*}" "${t#*[}"
done
echo ""
This slurps the contents of test.txt into an array, which means your input data must be separated by whitespace, per your example. It then adds the prefix, then steps through the array, using Parameter Expansion to massage the data into the fields you need. The last line (echo "") is helpful for testing; you may want to eliminate it in practice.
My file looks like
//
[297]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.123648,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,(47:0.0275497,39:0.0275497):0.0125652):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[2271]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[687]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[186]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0665766):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.0339623,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.00916857):0.0793167):0.0576977):0.0378275):0.552713);
So after the first line every line starts with a number in brackets. I would need to grep the number in brackets and output it into a new file (without [) ..how would that be done>
grep -Po '(?<=\[)\d+(?=\])' file > new_file
-P for Perl regexs so it is possible to use:
\d for a digit
positive lookbehind and positive lookahead ((?<=\[) and (?=\]))
-o for only matching
Another possibility if your grep doesn't support the -P option but awk is available could be this:
awk -F '[][]' '{ if ($2 != "") print $2 }' file > new_file
-F tells awk to accept both ] and [ as a field delimiter, $2 then contains the number you want and is printed.
In three steps using simple commands:
grep -v "//" inputfile | cut -d"[" -f2 | cut -d"]" -f1
In sed can you remove everything outside the []:
grep -v "//" inputfile | sed 's/.*\[\(.*\)].*/\1/'