Find and update(append) csv with shell script - shell

Input file:
ID,Name,Values
1,A,vA|A2
2,B,VB
Expected output:
1,A,vA|VA2|vA3
2,B,VB
Search file for a given ID and then append a given value in the values {field}
use case : append 'testvalue' to the values filed of ID = 1
Problem is : How tho cache the line found ?
sed's s can be used to substitution, I used sed's p {print but of no use }.

Just set n to ID of the row you want to update and x to the value:
# vA3 to entry with ID==1
$ awk -F, '$1==n{$0=$0"|"x}1' n=1 x="vA3" file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
# TEST_VALUE to entry with ID==2
$ awk -F, '$1==x{$0=$0"|"v}1' x=2 v="TEST_VALUE" file
ID,Name,Values
1,A,vA|A2
2,B,VB|TEST_VALUE
Explanation:
-F, sets the field separator to be a comma.
$1==x checks if the line we are looking at contains the ID we want to change. Where $1 is the first field on each line and x is the variable we define.
If the previous condition was true then follow block gets executed {$0=$0"|"v} where $0 is the variable containing the whole line so we are just appending the string "|" and value of the variable v to end of the line.
The trailing 1 is just a shortcut in awk to say print the line. The 1 is the condition for the block which is evaluated to true and since no block is provide awk executes the default block {print $0}. Explicitly the script would be awk -F, '$1==n{$0=$0"|"x}{print $0}' n=1 x="vA3" file.

The following script is doing something similar to Your need. It is in pure bash.
#!/usr/bin/bash
[ $# -ne 2 ] && echo "Arg missing" && exit 1;
while read l; do
[ ${l%%,*} == "$1" ] && l="$l|$2"
echo $l
done <infile
You can use as script <ID> <VALUE>. Example:
$ ./script 1 va3
ID,Name,Values
1,A,vA|A2|va3
2,B,VB
$ cat infile
ID,Name,Values
1,A,vA|A2
2,B,VB

or may be this?
awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
$ echo "1,A,vA
2,B,VB" | awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
1,A,vA|VA2
2,B,VB
Edit 1: awk started supporting in-file substitution recently. But with your requirement it is best to go with sed solution that Kent has posted above.
$ cat file
ID,Name,Values
1,A,vA|A2
2,B,VB
$ awk '$1==1 { $NF=$NF"|vA3" } 1' FS=, OFS=, file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB

are your looking for this?
kent$ echo "1,A,vA
2,B,VB"|sed '/vA/s/$/|VA2/'
1,A,vA|VA2
2,B,VB
EDIT check the ID, then replace
kent$ echo "ID,Name,Values
1,A,vA|A2
2,B,VB"|sed 's/^1,.*/&|vA3/'
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
& means the matched part. that would be what you meant "cache"

sed ' 1 a\ |VA2|vA3 ' file1.txt

Related

How can we use '~|~' delimiter to split the records using scripting command?

Please suggest how can I split the columns separated with ~|~ delimiter.(file: abc.dat)
a~|~1~|~x
b~|~1~|~y
c~|~2~|~z
I am trying below awk command but getting output 0 count.
awk -F'~|~' '$2 == 1' ${file} | wc -l
With your shown samples, please try following. We need not to use wc command along with awk, it could be done within awk itself.
awk -F'~\\|~' '$2 == 1{count++} END{print count}' "$file"
Explanation: Setting field separator as ~|~(escaped | here). Then checking if 2nd field is 1, increment variable count with 1 then. In END block of this program print its value.
For saving values into shell variable use like:
var=$(awk -F'~\\|~' '$2 == 1{count++} END{print count}' "$file")
You can also use ~[|]~ as FS value, as the pipe char used inside a bracket expression always matches itself, a pipe char:
counter=$(awk 'BEGIN{FS="~[|]~"} $2==1{cnt++} END{print cnt}' file)
See the online awk demo:
s='a~|~1~|~x
b~|~1~|~y
c~|~2~|~z'
counter=$(awk 'BEGIN{FS="~[|]~"} $2==1{cnt++} END{print cnt}' <<< "$s")
echo $counter
# => 2

Read value from csv

For a csv file appearing as:
variables = cl, cd, clp, clv, cdp, cdv, ...
-0.00000002, 0.01023266, -0.00000002, 0.00000000, 0.00985099, 0.00038167, ...
-0.00000000, 0.01023305, -0.00000000, 0.00000000, 0.00985080, 0.00038225,
-0.00000002, 0.01023390, -0.00000002, 0.00000000, 0.00985075, 0.00038315,
0.00000002, 0.01023482, 0.00000002, 0.00000000, 0.00985070, 0.00038412,
-0.00000004, 0.01023574, -0.00000004, 0.00000000, 0.00985065, 0.00038509,
...
I have a short script to read values from a csv file, but as it is, it returns the entire column. I need it to assign a single value to each variable.
export IFS=","
cat file.csv | while read a b c d e f; do echo "$b; done
This returns:
cd
0.01023266
0.01023305
0.01023390
0.01023482
0.01023574
How do I make this command return only:
0.01023482
Edit:
The following line is supposed to return a single value from my file:
row=4
col=2
str=$(awk 'BEGIN{FS=OFS=","} FNR==$row' history.out | awk '{print substr($col, 1, length($col)-1)}')
echo $str
It works when I use $4, but not $row. What's the final fix? As it is, $str is returned empty.
Thanks in advance.
EDIT: Since OP has shown samples now so adding solution as per it now.
row=4
col=2
awk -v R="$row" -v C="$col" 'BEGIN{FS=","} FNR==R{print $C}' Input_file
Change FS="," to FS=", " in case you have space in your comma delimiters too.
This could be easily done in awk.
awk 'BEGIN{FS=OFS=","} FNR==4' Input_file

Add length of following line to current line in bash

I have a small sample data set test1.faa
>PROKKA_00001_A1#hypothetical#protein
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
>PROKKA_00002_A1#Cystathionine#beta-lyase
MHRFGGMVTAILKGGLDDARRFLERCELFALAESLGGVESLIEHPAIMTHASVPREIREALGISDGLVRLSVGIEDADDLLAELETALA
>PROKKA_00003_A1#hypothetical#protein
MVPIVSAAPVFTLLLTVAVFRRERLTAGRIAAVAVVVPSVILIALGH
and I would like to add the length of the following line to the headerline, followed by next line, such as
>PROKKA_00001_A1#hypothetical#protein_92
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
I tried to do this with awk, but it returns the following error:
awk: >PROKKA_00001_A1#hypothetical#protein: No such file or directory
I assume it is related to the >in the beginning? But I need it in the output file.
This is the code I tried:
#!/bin/bash
cat test1.faa | while read line
do
headerline=$(awk '/>/{print $0}' $line)
echo -e "this is the headerline \n ${headerline}"
fastaline=$(awk '!/>/{print $0}' $line)
echo -e "this is the fastaline \n ${fastaline}"
fastaline_length=$(awk -v linelength=$fastaline '{print length(linelength)}')
echo -e "this is length of fastaline \n ${fastaline_length}"
echo "${headerline}_${fastaline_length}"
echo $fastaline
done
Any suggestions on how to do this?
Could you please try following(considering that your actual Input_file is same as shown sample).
awk '/^>/{value=$0;next} {print value"_"length($0) ORS $0;value=""}' Input_file
this awk command would do what you want
awk '
/^>/ {
getline next_line
print $0 "_" length(next_line)
print next_line
}
' test1.faa

How to write a bash script that dumps itself out to stdout (for use as a help file)?

Sometimes I want a bash script that's mostly a help file. There are probably better ways to do things, but sometimes I want to just have a file called "awk_help" that I run, and it dumps my awk notes to the terminal.
How can I do this easily?
Another idea, use #!/bin/cat -- this will literally answer the title of your question since the shebang line will be displayed as well.
Turns out it can be done as pretty much a one liner, thanks to #CharlesDuffy for the suggestions!
Just put the following at the top of the file, and you're done
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
So for my awk_help example, it'd be:
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
# Basic form of all awk commands
awk search pattern { program actions }
# advanced awk
awk 'BEGIN {init} search1 {actions} search2 {actions} END { final actions }' file
# awk boolean example for matching "(me OR you) OR (john AND ! doe)"
awk '( /me|you/ ) || (/john/ && ! /doe/ )' /path/to/file
# awk - print # of lines in file
awk 'END {print NR,"coins"}' coins.txt
# Sum up gold ounces in column 2, and find out value at $425/ounce
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
# Print the last column of each line in a file, using a comma (instead of space) as a field separator:
awk -F ',' '{print $NF}' filename
# Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
# functions available
length($0) > 72, toupper,tolower
# count the # of times the word PASSED shows up in the file /tmp/out
cat /tmp/out | awk 'BEGIN {X=0} /PASSED/{X+=1; print $1 X}'
# awk regex operators
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html
I found another solution that works on Mac/Linux and works exactly as one would hope.
Just use the following as your "shebang" line, and it'll output everything from line 2 on down:
test.sh
#!/usr/bin/tail -n+2
hi there
how are you
Running this gives you what you'd expect:
$ ./test.sh
hi there
how are you
and another possible solution - just use less, and that way your file will open in searchable gui
#!/usr/bin/less
and this way you can grep if for something too, e.g.
$ ./test.sh | grep something

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources