Update csv data in bash - bash

I am trying to create a bash script that Looks through the csv file and depending on some input that would be the first value in a row of the file, it would increase the value of the second value in the row by 10.
eg. I have the file:
Jim,0,
Henry,12,
Louise,6
And with the input "Henry", I want to change the 12 to a 24 such that the file saves the value.
I have tried using sed with this
while read type time amount
do
i=$(($i+1))
if [ $i -eq 2 ];
then
x=$time
fi
done < $INPUT
sed -i '/^Henry,/s/[^,]*/$(($x+10))/1' test.csv
so that it searches through the csv until it gets to the second row (array indexing seems to start from 1 in my tests) then it saves the time (the second value in the row) to a variable then I use sid as I found here, but it doesn't seem to work, whether it is not saving or not editing I don't know.
I tried to do it with awk, but I don't understand it enough, so if any of you would be kind enough to enlighten me I would be very grateful.
Edit: Fixed bad formatting
Thanks in advance!

awk doesn't support a "in place" replacement. You should use gawk instead. For example:
gawk -F ',' -i inplace '{ $2 = ($1 == "Henry") ? $2+10 : $2 } 1' file.csv
-F defines delimeter
{ $2 = ($1 == "Henry") ? $2+10 : $2 }
ternary operator
condition expression ? statement1 : statement2
https://www.tutorialspoint.com/awk/awk_ternary_operators.htm
if condition ? then first : else second
so, if our condition is met, we are adding 10, if not, we just rewrite $2nd field as it was before
and here is a meaning of a 1 at the end:
https://unix.stackexchange.com/questions/63891/what-is-the-meaning-of-1-at-the-end-of-an-awk-script

Related

regex to print lines if value between patterns is greater than number - solution which is independent of column position

2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#CALMED#OK#58#NARDE#4356#68654768961#BHR#TST#DEV
2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#CALMED#OK#58#NARDE#89034#1234567#BHR#TST#DEV
2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#OK#58#BHREDD#234586#4254567#BHR#TST#DEV
2001-06-30T11:33:33,543 DEBUG (Bss-Thread-948:[]) SUNCA#44#77#OK#58#NARDE#89034#1034567#BHR#TST#DEV
I have log file mentioned above. I would like to print lines only if value between patterns # and #BHR is greater than 1100000.
I can see in my log file lines with values 68654768961, 1234567, 4254567, 1034567. As per the requirement the output should conatin only first 3 lines.
I am looking for regex to get desired output.
One questions, this #58#BHR should be ignore in third line ? If yes, I will get value between patterns # and #BHR#.
Normally, it should be solved this question by writing scripting according the business logical. But you could try this one line command by awk.
awk '{if (0 == system("[ $(echo \"" $0 "\"" " | grep -oP \"" "(?<=#)\\d+(?=#BHR#)\" || echo 0) -gt 1100000 ]")) {print $0}}' log_file
Mainly, it use system() to scratch the value by grep:
# if can't get the pattern value by grep, the value will assign 0
echo $one_line | grep -oP "(?<=#)\d+(?=#BHR#)" || echo 0`
and compare the value to 1100000 by [ "$value" -gt 1100000 ] in awk.
FYI, so if the value greater than 1100000 it will return 0.
system(cmd): executes cmd and returns its exit status

bash, using awk command for printing lines characters if condition

Before starting to explain my issue I have to say that it's the first time I'm using bash and the awk command.
I have a file containing a lot of lines and I am interested in printing some of these lines if certain characters of the line satisfy a condition. I already have a simple method which is working but I intend to try with awk to see if it can be faster. The command I'm trying was inspired by a colleague at work but I don't fully understand it.
My file looks like :
# 15247.479
1 23775U 96005A 18088.90328565 -.00000293 +00000-0 +00000-0 0 9992
2 23775 014.2616 019.1859 0018427 174.9850 255.8427 00.99889926081074
# 15250.479
1 23775U 96005A 18088.35358271 -.00000295 +00000-0 +00000-0 0 9990
2 23775 014.2614 019.1913 0018425 174.9634 058.1812 00.99890136081067
The 4th field number refers to a date and I want to print the lines starting with 1 and 2 if the bold number if superior to startDate and inferior to endDate.
I am trying with :
< $file awk ' BEGIN {ok=0}
{date=substring($0,19,10) if ($date>='$firstTime' && $date<= '$lastTime' ) {print; ok=1} else ok=0;next}
{if (ok) print}'
This returns a syntax error but I fear it is not the only problem. I don't really understand what the $0 in substring refers to.
Thanks everyone for the help !
Per the question about $0:
Awk is a language built for processing tables and has language features specific to both filtering and manipulating tabular data. One language feature is automatic field splitting.
If you see a $ in front of a variable or constant, it is referring to a "field." When awk sees $field_number being used in a variable context, awk splits the current record buffer based upon what is in the FS variable and allows you to work on that just as you would any other variable -- just that the backing store for that variable is the record buffer.
$0 is a special field referring to the whole of the record buffer. There are some interesting notes in the awk documentation about the side effects on $0 of assigning $field_number variables, FS and OFS that are worth an in depth read.
Here is my answer to your application:
(1) First, LC_ALL may help us for speed. I'm using ll/ul for lower and upper limits -- the reason for which will be apparent later. Specifying them as variables outside the script helps our readability. It is good practice to properly quote shell variables.
(2) It is good practice to use BEGIN { ... }, as you did in your attempt, to formally initialize variables. If using gawk, we can use LINT = 1 to test things like this.
(3) /^#/ is probably the simplest (and fastest) pattern for our reset. We use next because we never want to apply the limits to this line and we never want to see this line in our output (even if ll = ul = "").
(4) It is surprisingly easy to make a mistake on limits. Implement limits consistently one way, and our readers will thank us. We remember to check corner cases where ll and/or ul are blank. One corner case is where we have already triggered our limits and we are waiting for /^#/ -- we don't want to rescan the limits again while ok.
(5) The default action of a pattern is to print.
(6) Remembering to quote our filename variable will save us someday when we inevitably encounter the stray "$file" with spaces in the name.
LC_ALL=C awk -v ll="$firstTime" -v ul="$lastTime" ' # (1)
BEGIN { ok = 0 } # (2)
/^#/ { ok = 0; next } # (3)
!ok { ok = (ll == "" || ll <= $4) && (ul == "" || $4 <= ul) } # (4)
ok # <- print if ok # (5)
' "$file" # (6)
You're missing a ; between the variable assignment and if. And instead of concatenating shell variables, assign them to awk variables. There's no need to initialize ok=0, uninitialized variables are automatically treated as falsey. And if you want to access a field of the input, use $n where n is the field number, rather than substr().
You need to set ok=0 when you get to the next line beginning with #, otherwise you'll just keep printing the rest of the file.
awk -v firstTime="$firstTime" -v lastTime="$lastTime" '
NF > 3 && $4 > firstTime && $4 <= lastTime { print; ok=1 }
$1 == "#" { ok = 0 }
ok { print }' "$file"
This answer is based upon my original but taking into account some new information that #clem sent us in comment -- to the effect that we now know that the line we need to test is always immediately subsequent to the line matching /^#/. Therefore, when we match in this new solution, we immediately do a getline to grab the next line, and set ok based upon that next line's data. We now only check against limits on the line subsequent to our match, and we do not check against limits on lines where we shouldn't.
LC_ALL=C awk -v ll="$firstTime" -v ul="$lastTime" '
BEGIN { ok = 0 }
/^#/ {
getline
ok = (ll == "" || ll <= $4) && (ul == "" || $4 <= ul)
}
ok # <- print if ok
' "$file"

appending text to specific line in file bash

So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below:
Original file:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
Expected result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
This is my code:
while read p; do
if [[ $p == "HEA"* ]]
then
IFS=',' read -ra ADDR <<< "$p"
echo ${#ADDR[#]}
arrayCount=${#ADDR[#]}
if [ "${arrayCount}" -eq 16 ];
then
sed -i "/$p/ s/\$/,xx/g" $f
fi
fi
done <$f
Result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
What im doing wrong? I'm sure its something small but i cant find it..
It can be done using awk:
awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
-F, sets input field separator as comma
NF==16 is the condition that says execute block inside { and } if # of fields is 16
$0 = $0 FS "xx" appends xx at end of line
1 is the default awk action that means print the output
For using sed answer should be in the following:
Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first.
Use the special char & to denote the matched string
The sed statement should look like the following:
sed -i "${line_number}s/.*/&xx/"
I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.

How to combine two lines that share the same keyword?

lets say I have a file looking somewhat like this:
X NeedThis1 KEYWORD
.
.
NeedThis2 X KEYWORD
And I need to combine the two lines into one like this:
NeedThis2 NeedThis1 KEYWORD
It needs to be done for every line in that file that contains the same KEYWORD but it can't combine two lines that look like this (two X's at the first|second position)
X NeedThis1 KEYWORD
X NeedThis2 KEYWORD
I am considering myself bash-noob so any advice if it can be done with something like awk or sed would be appreciated.
awk '
{if ($1 == "X") end[$3] = $2; else start[$3] = $1}
END {for (kw in start) if (kw in end) print start[kw], end[kw], kw}
' file
Try this:
awk '
$1=="X" {key = $NF; value = $2; next}
$2=="X" && $NF==key {print value, $1, key}' file
Explanation:
When a line where first field is X, store the last field as key and second field as value.
Look for the next line where second field is X and last field matches the key stored from pervious action.
When found, print the value of last matched line along with first field of the current line and the key.
This will most definitely break if your data does not match the sample you have shown (if it has more spaces or fields in between), so feel free to adjust as per your needs.
I won't give you the full answer, but if you have some way to identify "KEYWORD" (not in your problem statement), then use a BASH associative array:
declare -A keys
while IFS= read -u3 -r line
do
set -- $line
eval keyword=\$$#
keys[$keyword]+=${line%$keyword}
done
you'll certainly have to do some more fiddling, but your problem statement is incomplete and some of the work needs to be an exercise for the reader.

Processing a tab delimited file with shell script processing

normally I would use Python/Perl for this procedure but I find myself (for political reasons) having to pull this off using a bash shell.
I have a large tab delimited file that contains six columns and the second column is integers. I need to shell script a solution that would verify that the file indeed is six columns and that the second column is indeed integers. I am assuming that I would need to use sed/awk here somewhere. Problem is that I'm not that familiar with sed/awk. Any advice would be appreciated.
Many thanks!
Lilly
gawk:
BEGIN {
FS="\t"
}
(NF != 6) || ($2 != int($2)) {
exit 1
}
Invoke as follows:
if awk -f colcheck.awk somefile
then
# is valid
else
# is not valid
fi
Well you can directly tell awk what the field delimiter is (the -F option). Inside your awk script you can tell how many fields are present in each record with the NF variable.
Oh, and you can check the second field with a regex. The whole thing might look something like this:
awk < thefile -F\\t '
{ if (NF != 6 || $2 ~ /[^0123456789]/) print "Format error, line " NR; }
'
That's probably close but I need to check the regex because Linux regex syntax variation is so insane. (edited because grrrr)
here's how to do it with awk
awk 'NF!=6||$2+0!=$2{print "error"}' file
Pure Bash:
infile='column6.dat'
lno=0
while read -a line ; do
((lno++))
if [ ${#line[#]} -ne 6 ] ; then
echo -e "line $lno has ${#line[#]} elements"
fi
if ! [[ ${line[1]} =~ ^[0-9]+$ ]] ; then
echo -e "line $lno column 2 : not an integer"
fi
done < "$infile"
Possible output:
line 19 has 5 elements
line 36 column 2 : not an integer
line 38 column 2 : not an integer
line 51 has 3 elements

Resources