Switch column if value is found in an array - bash

If the table contains a string from the file I need to replace the with a '-' and then change column four to what ever column two had.
I have the following .txt file:
0
1
2
and I have a csv:
carrot, 0, cat, r
orange, 2, cat, m
banana, 4, robin, d
output:
carrot, -, cat, 0
orange, -, cat, 2
banana, 4, robin, d
What I've currently got is I have done a for loop for the csv file line by line and used grep to change if it contains the word. If greater than one replace it with a dash. I think this method is very inefficient and was wondering if there was a better method.

This is classical case for awk tool:
awk 'BEGIN{ FS = OFS = ", " }
NR == FNR{ a[$1]; next }
{ if ($2 in a) { $4 = $2; $2 = "-" } }1' file.txt file.csv
The output:
carrot, -, cat, 0
orange, -, cat, 2
banana, 4, robin, d

Related

Check if there is space in specific range of indexes in a line in a file and replace the space with 0

I have the following issue which I am facing. I have several lines in a file and I have an example shown below.
The values between position 4-10 can contain any combinations of values and spaces and it has be replaced with 0 only if it is an empty space.
#input.txt
014 789000
234455 899800
1213839489404040
In the example line above, we can see that we have empty spaces between position 4-10. This position to check is fixed. I want to be able to check if every line in my file has empty spaces between position 4-10, and if there is empty space, I want to be able to replace it with 0. I have provided below the desired output in the file.
#desired output in the input.txt
0140000000789000 # note that 0 has been added in the position 4-10
2344550000899800 # # note that 0 has been added in the position 7-10
1213839489404040
I was able to do the following in my code below. The code below is only able to add values at specified position using the sed command and I want to be able to modify this code such that it can do the task mentioned above.
May someone please help me ?
My Code:
#append script
function append_vals_to_line {
insert_pos=$(($1 - 1))
sed -i 's/\(.\{'$insert_pos'\}\)/\1'$2'/' "$3"
}
column_num_to_insert="$1"
append_value ="$2"
input_file_to_append_vals_in ="$3"
append_vals_to_line "$column_num_to_insert" "$append_value" "$input_file_to_append_vals_in"
I suggest an awk solution for this:
awk '{s = substr($0, 4, 7); gsub(/ /, 0, s); print substr($0, 1, 3) s substr($0, 11)}' file
#input.txt
0140000000789000
2344550000899800
1213839489404040
A more readable version:
awk '{
s = substr($0, 4, 7)
gsub(/ /, 0, s)
print substr($0, 1, 3) s substr($0, 11)
}' file
This command first gets a substring from 4th to 10th position in variable s. Then using gsub we replace each space with a 0 and finally we recompose and print whole line.
Consider this awk command that takes start and end position from arguments:
awk -v p1=4 -v p2=10 '{
s = substr($0, p1, p2-p1+1)
gsub(/ /, 0, s)
print substr($0, 1, p1-1) s substr($0, p2+1)
}' file
You can do:
awk '
/[[:blank:]]/{
sub(/[[:blank:]]*$/,"") # remove trailing spaces if any
gsub(/[[:blank:]]/,"0") # replace each /[[:blank:]]/ with "0"
} 1' file
Alternatively, you can do:
awk '
length($2) {
$0=sprintf("%s%0*d",$1,16-length($1),$2)
} 1' file
Either prints:
0140000000789000
2344550000899800
1213839489404040

awk: math operations of multi-column data in multiple CSV files

I am working on bash script that loops multi-column data filles and executes integrated AWK code to operate with the multi-column data.
#!/bin/bash
home="$PWD"
# folder with the outputs
rescore="${home}"/rescore
# folder with the folders to analyse
storage="${home}"/results
while read -r d; do
awk -F ", *" ' # set field separator to comma, followed by 0 or more whitespaces
FNR==1 {
if (n) { # calculate the results of previous file
f= # apply this equation to rescore data using values of $3 and $2
f[suffix] = f # store the results in the array
n=$1 # take ID of the column
}
prefix=suffix=FILENAME
sub(/_.*/, "", prefix)
sub(/\/[^\/]+$/, "", suffix)
sub(/^.*_/, "", suffix)
n = 1 # count of samples
min = 0 # lowest value of $3 (assuming all $3 < 0)
}
FNR > 1 {
s += $3
s2 += $3 * $3
++n
if ($3 < min) min = $3 # update the lowest value
}
print "ID" prefix, rescoring
for (i in n)
printf "%s %.2f\n", i, f[i]
}' "${d}_"*/input.csv > "${rescore}/"${d%%_*}".csv"
done < <(find . -maxdepth 1 -type d -name '*_*_*' | awk -F '[_/]' '!seen[$2]++ {print $2}')
Briefly, the workflow should process each line of the input.csv located inside ${d} folder that correctly has been identified by my bash script:
# input.csv located in the folder 10V1_cne_lig12
ID, POP, dG
1, 142, -5.6500 # this is dG(min)
2, 10, -5.5000
3, 2, -4.9500
4, 150, -4.1200
My AWK script is expected to process each line of each CSV file in order to reduce them to the two columns, keeping in the output: i) the number from the first column of the input.csv (contained ID of the processed line) + the name of the folder ($d) contained the CSV file as well as ii) the result of the math operation (f) applied on the numbers in POP and dG columns of the input.csv:
f(ID)= sqrt(((dG(ID)+10)/10)^2+((POP(ID)-240)/240))^2)
where dG(ID) is the value of dG ($3) of the "rescored" line of input.csv, and POP(ID) is its POP value ($2).Eventually output.csv contained information regarding 1 input.csv should be in the following format:
# output.csv
ID, rescore value
1 10V1_cne_lig12, f(ID1)
2 10V1_cne_lig12, f(ID2)
3 10V1_cne_lig12, f(ID3)
4 10V1_cne_lig12, f(ID4)
While bash part of my code (dealing with the looping of CSVs in the distinct directories) works correctly I am stuck with the AWK code, which does not assign correctly ID of the lines in order that I could apply demonstrated math operations using $2 and $3 columns of the line with precised ID.
given the input file: folder/file
ID, POP, dG
1, 142, -5.6500
2, 10, -5.5000
3, 2, -4.9500
4, 150, -4.1200
this script
$ awk -F', *' -v OFS=', ' '
FNR==1 {path=FILENAME; sub(/\/[^/]+$/,"",path); print $1,"rescore value"; next}
{print $1" "path, sqrt((($3+10)/10)^2+(($2-240)/240)^2)}' folder/file
will produce
ID, rescore value
1 folder, 0.596625
2 folder, 1.05873
3 folder, 1.11285
4 folder, 0.697402
Not sure what the rest of your code does, but I guess you can integrate it in.

Modify eachline characters between two patterns

I need to modify the certain characters between two patterns in each line.
Eample:: (File content saved as myfile.txt)
abc, def, 1, {,jsdfsd,kfgdsf,lgfgd}, 2, pqr, stu
abc, def, 1, {,jsdfsqwe,k,fdfsfl}, 2, pqr, stu
abc, def, 1, {,asdasdj,kgfdgdf,ldsfsdf}, 2, pqr, stu
abc, def, 1, {,jsds,kfdsf,fdsl}, 2, pqr, stu
I want to edit & save myfile.txt like mentioned below
abc, def, 1, {jsdfsd kfgdsf lgfgd}, 2, pqr, stu
abc, def, 1, {jsdfsqwe k fdfsfl}, 2, pqr, stu
abc, def, 1, {asdasdj kgfdgdf ldsfsdf}, 2, pqr, stu
abc, def, 1, {jsds kfdsf fdsl}, 2, pqr, stu
I've used following command to edit & save myfile.txt
sed '/1,/,/,2/{/1,/n;/,2/!{s/,/ /g}}' myfile.txt
This command did not helped me to achive my goal. Please help to fix this issue.
awk would be more suitable in such case:
awk 'BEGIN{ FS=OFS=", " }{ gsub(/,/, " ", $4); sub(/\{ /, "{", $4) }1' file
The output:
abc, def, 1, {jsdfsd kfgdsf lgfgd}, 2, pqr, stu
abc, def, 1, {jsdfsqwe k fdfsfl}, 2, pqr, stu
abc, def, 1, {asdasdj kgfdgdf ldsfsdf}, 2, pqr, stu
abc, def, 1, {jsds kfdsf fdsl}, 2, pqr, stu
In vim there is also the possibility of lookaheads and lookbehinds:
%s/\v(\{.*)#<=,(.*})#=/ /g
Matches every , between a { and a } and replaces them with a space.
if it is the case, that a , directly afer a { should be deleted, not replaced with a space, it is possible to run this line in a slightly modified version first:
%s/\v(\{)#<=,(.*})#=/ /g
Since you also have the tag vim, you can do it in vim via:
:%normal 0f{vi{:s/\%V,/ /g^M
Where the last two characters are actually Ctrl+V followed by Ctrl+M
One option is to use a sub-replace-expression in your :substitute command.
:%s/{\zs[^}]*\ze}/\=substitute(submatch(0)[1:], ',', ' ', 'g')
This matches in between your curly braces and then replaces each , with a space while avoiding the first comma.
For more help see:
:h sub-replace-expression
:h /\zs
:h submatch()
:h sublist
:h substitute()
Could you please try following awk and let me know if this helps you.
awk 'match($0,/{.*}/){val=substr($0,RSTART,RLENGTH);gsub(/,/," ",val);gsub(/{ /,"{",val);gsub(/} /,"}",val);print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH+1);next} 1' Input_file
Adding a non-one liner form of solution too now.
awk '
match($0,/{.*}/){
val=substr($0,RSTART,RLENGTH);
gsub(/,/," ",val);
gsub(/{ /,"{",val);
gsub(/} /,"}",val);
print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH+1)
next
}
1
' Input_file
Using vim:
:%normal! 4f,xf,r f,r
: ........... command mode
% ........... in the whole file
normal! ..... normal mode
4f, ......... jump inside { block
x ........... delete the first comma
f, .......... jump to the next comma
r<Space> .... replace comma with space
Some more awk:
awk 'NR>1{gsub(/,/," ",$1); $0=RS $0}1' FS=} OFS=} RS={ ORS= file
or
awk '{gsub(/,/, " ", $2); $2="{" $2 "}"}1' FS='[{}]' OFS= file

how can I select a columns (not lines) containning at least 5 same patterns

how can I select a columns (not lines) containning at least 5 same patterns ?
I mean something like and select the column with 3 chars 'a'
a dog c d
a dog c d
1 dog dog 4
a a dog a
z z dog z
and get ouptut as the full column like
dog c
dog c
dog dog
a dog
z dog
I m looking for a vertical version of grep command if you prefer... :)
I've trying to work around with awk but without success
pattern can be on any column any lines
I want to printout full column having at least 3 similar patterns
here in example both columns but they can be the 2cd & 3rd as well as 21th ans 102th columns...
awk to the rescue!
$ awk -v v='a' -v n=3 'NR==FNR {for(i=1;i<=NF;i++) if($i==v) c[i]++; next}
{for(i=1;i<=NF;i++)
if(c[i]>=n) printf "%s", $i OFS;
print ""}' file{,}
a
a
1
a
z
specify the value and count; double scans the file counts the occurances and prints columns that satisfy the criteria.

Split column using awk or sed

I have a file containing the following text.
dog
aa 6469
bb 5946
cc 715
cat
aa 5692
Bird
aa 3056
bb 2893
cc 1399
dd 33
I need the following output:
A-Z ,aa ,bb, cc, dd
dog, 6469, 5946 ,715, 0
cat ,5692, 0, 0, 0
Bird ,3056, 2893, 1399, 33
I tried:
awk '{$1=$1}1' OFS="," RS=
But is not giving the format I need.
Thanks in advance for your help.
Cris
With Perl
perl -00 -nE'
($t, %p) = split /[\n\s]/; $h{$t} = {%p}; # Top line, Pairs on lines
$o{$t} = ++$c; # remember Order
%k = map { $_, 1} keys %p; # find full set of subKeys
}{ # END block starts
say join ",", "A-Z", sort keys %k;
for $t (sort { $o{$a} <=> $o{$b} } keys %h) {
say join ",", $k, map { ($h{$k}{$_} // 0) } sort keys %k;
}
' data.txt
prints, in the original order
A-Z,aa,bb,cc,dd
dog,6469,5946,715,0
cat,5692,0,0,0
Bird,3056,2893,1399,33
Here's a sed solution, which works on your input, but requires that you know the column names in advance and that the column names are given as sorted full ranges starting with the first column name (so nothing like aa, cc or bb, aa or bb, cc) and that every paragraph is followed by one empty line. You would also need to adjust the script if you don't have exactly four numeric columns:
echo 'A-Z, aa, bb, cc, dd';sed -e '/./{s/.* //;H;d};x;s/\n/, /g;s/, //;s/$/, 0, 0, 0, 0/;:a;s/,[^,]*//5;ta' file
If you need to look up the sed commands, you can look at info sed, especially 3.5 Less Frequently-Used Commands.
awk to the rescue!
awk -v OFS=, 'NF==1 {h[++c]=$1}
NF==2 {v[c,$1]=$2; ks[$1]}
END {printf "%s", "A-Z";
for(k in ks) printf "%s", OFS k;
print "";
for(i=1;i<=c;i++)
{printf "%s", h[i];
for(k in ks) printf "%s", OFS v[i,k]+0;
print ""}}' file'
order of the columns will be random.

Resources