awk regex and add 1 - bash

I have a html file with some variable arrays that I need to increment. I have been doing it by hand - but now is taking up too much of my time. I have been searching and trying to find the correct tool/syntax to do exactly what I want.
say I have:
file[0]=["blah0 blah0", "file0.jpg"]
file[1]=["blah1 blah1", "file1.jpg"]
file[2]=["blah2 blah2", "file2.jpg"]
What I would like to do is have the script add one to the variable number giving me room to add more variables earlier. I could specify 5 and have the result be
file[5]=["blah0 blah0", "file0.jpg"]
file[6]=["blah1 blah1", "file1.jpg"]
file[7]=["blah2 blah2", "file2.jpg"]
This is what I have tried so far - but not much luck... as it removes all the square brackets
awk -F [\]\[] '/^file\[[0-9]+\]=/ {$2="["$2+'$userinput'"]";}1' ${workdirect}/index.html > text.txt
Any advice???

#!/bin/bash
FILE=$1
awk '/^file/ {
m = match($0, "\[[0-9]+\]");
if (m) {
printf("%s%d%s\n",
substr($0, 0, RSTART),
INC + substr($0, RSTART + 1, RLENGTH - 2),
substr($0, RSTART + 2, length($0) - RSTART))
}
}' INC=$2 $1
$ foo.sh tmphtml 5
file[5]=["blah0 blah0", "file0.jpg"]
file[6]=["blah1 blah1", "file1.jpg"]
file[7]=["blah2 blah2", "file2.jpg"]

Let script be
#!/bin/bash
inc=$1
while read line; do
p1=${line%%[*}
p3=${line#*]}
p2=${line#*[}
p2=${p2%%]*}
p2=$(( ${p2} + $inc ))
echo $p1[$p2]$p3
done
Call script
$ script offset < inputfile
This is just bash, no overhead of externals.

Easier with perl:
perl -nle '$n = 5; /(file\[)([[:digit:]]+)(\]=\[.*\])/; print $1, $2 + $n, $3'

If the first line really begins file[0]= and everything is sequential, then this might work:
awk 'sub(/[0-9]+/,'$userinput'+t) {print; t=t+1}' ${workdirect}/index.html > text.txt

Related

Listing skipped numbers in a large txt file using bash

I need to find a way to display the missing numbers from a large txt file. It's a web graph that has 875,713 vertices. However, when I sort the file the largest number that is displayed at the end is 916,427. So there are some numbers not being used for vertex index. Is there a bash command I could use to do this?
I found this after searching around some other threads but I'm not entirely sure if its correct:
awk 'NR != $1 { for (i = prev + 1; i < $1; i++) {print i} } { prev = $1 + 1 }' file
Assuming the 'number' of each vertex is in the first column, you can use:
awk '{a[$1]} END{for(i = 1; i <= 916427; i++){if(!(i in a)){print i}}}' file
E.g.
# create some example data and remove "10"
seq 916427 | sed '10d' > test.txt
head test.txt
1
2
3
4
5
6
7
8
9
11
awk '{a[$1]} END { for (i = 1; i <= 916427; i++) { if (!(i in a)) {print i}}}' test.txt
10
If you don't want to store the array in memory (otherwise #jared_mamrot solution would work), you can use
awk 'NR==1 {p=$1; next} {for (i=p+1; i<$1; i++) {print i}; p=$1}' < <( sort -n file)
which sorts the file first.
Just because you tagged your question bash, I'll provide a bash solution. :)
# sample data as jared suggested, with 10 removed...
seq 916427 | sed '10d' > test.txt
# read sample data into an array...
mapfile -t a < test.txt
# reverse the $a array into $b
for i in "${a[#]}"; do b[$i]=1; done
# step through list of possible numbers, testing if each one is an index of $b
for ((i=1; i<${a[((${#a[#]}-1))]}; i++)); do [[ -z ${b[i]} ]] && echo $i; done
The line noise in the last line (${a[((${#a[#]}-1))]}) simply means "the value of the last array element", and the -1 is there because without instructions otherwise, mapfile starts numbering things at zero.
This takes a little longer to run than awk, because awk is awesome. But it runs in bash without calling any external tools. Aside from the ones generating our sample data, of course!
Note that the last line verifies $b array membership with a string comparison. You might get a very slight performance increase by doing a math comparison instead ((( ${b[i]} )) || echo $i) but the improvement would be so small that it's not even worth mentioning. Oh dang.
Note also that both this and the awk solution involve creating very large arrays in memory, then stepping through those arrays. Be careful of your memory, and don't waste array space with unnecessary data. You will probably want to pull just your indices out of your original dataset for this comparison, rather than loading everything into a bash or awk array.

How to update minor version number in bash?

Currently, I want to update the minor version in a text file using a bash command. This is the format I am dealing with: MAJOR.Minor.BugFix. I am able to increment the BugFix version number but have been unable to increment just the minor version.
I.e
01.01.00-> 01.02.00
01.99.00-> 02.00.00
This is the code snippet I found online and was trying to tweak to update the minor instead of the bug fix
echo 01.00.1 | awk -F. -v OFS=. 'NF==1{print ++$NF}; NF>1{if(length($NF+1)>length($NF))$(NF-1)++; $NF=sprintf("%0*d", length($NF), ($NF+1)%(10^length($NF))); print}'
As -F takes a regular expression -F. will match any character. Do something like -F"[.]" to make it match periods and you can just split fields without any of the length() stuff.
larsks idea of splitting into multiple lines is a good one:
echo $a | awk -F'[.]' '{
major=$1;
minor=$2;
patch=$3;
minor += 1;
major += minor / 100;
minor = minor % 100;
printf( "%02d.%02d.%02d\n", major, minor, patch );
}'
You don't need AWK for this, just read with IFS=. will do.
Though in Bash, leading zeroes indicate octal so you'll need to guard against them.
IFS=. read -r major minor bugfix <<< "$1"
# Specify base 10 in case of leading zeroes (octal)
((major=10#$major, minor=10#$minor, bugfix=10#$bugfix))
if [[ $minor -eq 99 ]]; then
((major++, minor=0))
else
((minor++))
fi
printf '%02d.%02d.%02d\n' "$major" "$minor" "$bugfix"
Test run:
$ ./test.sh 01.01.00
01.02.00
$ ./test.sh 01.99.09
02.00.09
$ ./test.sh 1.1.1
01.02.01
Quick answer:
version=01.02.00
newversion="$(printf "%06d" "$(expr "$(echo $version | sed 's/\.//g')" + 100)")"
echo "${newversion:0:2}.${newversion:2:2}.${newversion:4:2}"
Full explanation:
version=01.02.00
# get the number without decimals
rawnumber="$(echo $version | sed 's/\.//g')"
# add 100 to number (to increment minor version)
sum="$(expr "$rawnumber" + 100)"
# make number 6 digits
newnumber="$(printf "%06d" "$sum")"
# add decimals back to number
newversion="${newnumber:0:2}.${newnumber:2:2}.${newnumber:4:2}"
echo "$newversion"
awk provides a simple and efficient way to handle updating the minor-version (and increment the major-version if the minor version is 99 and setting the minor-version zero), e.g.
awk -F'.' '{
if ($2 == 99) {
$1++
$2=0
}
else
$2++
printf "%02d.%02d.%02d\n", $1, $2 ,$3
}' minorver
Above the leading-zeros are are ignored when considered as a number and then it is just a simple comparison of the minor-version to determine whether to increment the major-version and zero the minor-version or simply increment the minor-version. The printf is used to provide the formatted output:
Example Use/Output
With your data in the file minorver, you can do:
$ awk -F'.' '{
> if ($2 == 99) {
> $1++
> $2=0
> }
> else
> $2++
> printf "%02d.%02d.%02d\n", $1, $2 ,$3
> }' minorver
01.02.00
02.00.00
Let me know if you have further questions.

How can I get the column in BASH?

I have a large .xml file like that:
c1="a1" c2="b1" c3="cccc1"
c1="aa2" c2="bbbb2" c3="cc2"
c1="aaaaaa3" c2="bb3" c3="cc3"
...
I need the result like the following:
a1 b1 cccc1
aa2 bbbb2 cc2
aaaaaa3 bb3 cc3
...
How can I get the column in BASH?
I have the following method in PL/SQL,but it's very inconvenient:
SELECT C1,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 1)) c1,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 2)) c2,
TRIM(BOTH '"' FROM REGEXP_SUBSTR(C1, '"[^"]+"', 1, 3)) c3
FROM TEST;
Use cut:
cut -d'"' -f2,4,6 --output-delimiter=" " test.txt
Or you can use sed if the number of columns is not known:
sed 's/[a-z][a-z0-9]\+="\([^"]\+\)"/\1/g' < test.txt
Explanation:
[a-z][a-z0-9]\+ - matches a string starting with a alpha char followed by any number of alphanumeric chars
"\([^"]\+\)" - captures any string inside the quotes
\1 - represents the captured string that in this case is used to replace the entire match
A perl approach (based on the awk answer by #A-Ray)
perl -F'"' -ane 'print join(" ",#F[ map { 2 * $_ + 1} (0 .. $#F) ]),"\n";' < test.txt
Explanation:
-F'"' set input separator to "
-a turn autosplit on - this results in #F being filed with content of fields in the input
-n iterate through all lines but don't print them by default
-e execute code following
map { 2 * $_ + 1} (0 .. $#F) generates a list of indexes (1,3,5 ...)
#F[map { 2 * $_ + 1} (0 .. $#F)] takes a slice from the array, selecting only odd fields
join - joins the slice with spaces
NOTE: I would not use this approach without a good reason, the first two are easier.
Some benchmarking (on a Raspberry Pi, with a 60000 lines input file and output thrown away to /dev/null)
cut - 0m0.135s no surprise there
sed - 0m5.864s
perl - 0m8.218s - I guess regenerating the index list every line isn't that fast (with a hard coded slice list it goes to half, but that would defeat the purpose)
the read based solution - 0m52.027s
You can also look at the built-in substring replacement/removal bash offers. Either in a short script or One-Liner:
#!/bin/bash
while read -r line; do
new=${line//c[0-9]=/} ## remove 'cX=', where X is '0-9'
new=${new//\"/} ## remove all '"' (double-quotes)
echo "$new"
done <"$1"
exit 0
Input
$ cat dat/stuff.xml
c1="a1" c2="b1" c3="cccc1"
c1="aa2" c2="bbbb2" c3="cc2"
c1="aaaaaa3" c2="bb3" c3="cc3"
Output
$ bash parsexmlcx.sh dat/stuff.xml
a1 b1 cccc1
aa2 bbbb2 cc2
aaaaaa3 bb3 cc3
As a One-Liner
while read -r line; do new=${line//c[0-9]=/}; new=${new//\"/}; echo "$new"; done <dat/stuff.xml
awk -F '"' '{ for(i=2; i<=NF; i+=2) { printf $i" " } print "" }'
Explanation
-F '"' makes Awk treat quotation marks (") as field delimiters. For example, Awk will split the line...
c1="a1" c2="b1" c3="cccc1"
...into fields numbered as...
1: 'c1='
2: 'a1'
3: ' c2='
4: 'b1'
5: ' c3='
6: 'cccc1'
7: ''
for(i=2; i<=NF; i+=2) { printf $i" " } starts at field 2, prints the value of the field, skips a field, and continues. In this case, fields 2, 4, and 6 will be printed.
print outputs a string following by a newline. printf also outputs a string, but doesn't append a newline. Therefore...
printf $i" "
...outputs the value of field $i followed by a space.
print ""
...simply outputs a newline.

Find nth row using AWK and assign them to a variable

Okay, I have two files: one is baseline and the other is a generated report. I have to validate a specific string in both the files match, it is not just a single word see example below:
.
.
name os ksd
56633223223
some text..................
some text..................
My search criteria here is to find unique number such as "56633223223" and retrieve above 1 line and below 3 lines, i can do that on both the basefile and the report, and then compare if they match. In whole i need shell script for this.
Since the strings above and below are unique but the line count varies, I had put it in a file called "actlist":
56633223223 1 5
56633223224 1 6
56633223225 1 3
.
.
Now from below "Rcount" I get how many iterations to be performed, and in each iteration i have to get ith row and see if the word count is 3, if it is then take those values into variable form and use something like this
I'm stuck at the below, which command to be used. I'm thinking of using AWK but if there is anything better please advise. Here's some pseudo-code showing what I'm trying to do:
xxxxx=/root/xxx/xxxxxxx
Rcount=`wc -l $xxxxx | awk -F " " '{print $1}'`
i=1
while ((i <= Rcount))
do
record=_________________'(Awk command to retrieve ith(1st) record (of $xxxx),
wcount=_________________'(Awk command to count the number of words in $record)
(( i=i+1 ))
done
Note: record, wcount values are later printed to a log file.
Sounds like you're looking for something like this:
#!/bin/bash
while read -r word1 word2 word3 junk; do
if [[ -n "$word1" && -n "$word2" && -n "$word3" && -z "$junk" ]]; then
echo "all good"
else
echo "error"
fi
done < /root/shravan/actlist
This will go through each line of your input file, assigning the three columns to word1, word2 and word3. The -n tests that read hasn't assigned an empty value to each variable. The -z checks that there are only three columns, so $junk is empty.
I PROMISE you you are going about this all wrong. To find words in file1 and search for those words in file2 and file3 is just:
awk '
NR==FNR{ for (i=1;i<=NF;i++) words[$i]; next }
{ for (word in words) if ($0 ~ word) print FILENAME, word }
' file1 file2 file3
or similar (assuming a simple grep -f file1 file2 file3 isn't adequate). It DOES NOT involve shell loops to call awk to pull out strings to save in shell variables to pass to other shell commands, etc, etc.
So far all you're doing is asking us to help you implement part of what you think is the solution to your problem, but we're struggling to do that because what you're asking for doesn't make sense as part of any kind of reasonable solution to what it sounds like your problem is so it's hard to suggest anything sensible.
If you tells us what you are trying to do AS A WHOLE with sample input and expected output for your whole process then we can help you.
We don't seem to be getting anywhere so let's try a stab at the kind of solution I think you might want and then take it from there.
Look at these 2 files "old" and "new" side by side (line numbers added by the cat -n):
$ paste old new | cat -n
1 a b
2 b 56633223223
3 56633223223 c
4 c d
5 d h
6 e 56633223225
7 f i
8 g Z
9 h k
10 56633223225 l
11 i
12 j
13 k
14 l
Now lets take this "actlist":
$ cat actlist
56633223223 1 2
56633223225 1 3
and run this awk command on all 3 of the above files (yes, I know it could be briefer, more efficient, etc. but favoring simplicity and clarity for now):
$ cat tst.awk
ARGIND==1 {
numPre[$1] = $2
numSuc[$1] = $3
}
ARGIND==2 {
oldLine[FNR] = $0
if ($0 in numPre) {
oldHitFnr[$0] = FNR
}
}
ARGIND==3 {
newLine[FNR] = $0
if ($0 in numPre) {
newHitFnr[$0] = FNR
}
}
END {
for (str in numPre) {
if ( str in oldHitFnr ) {
if ( str in newHitFnr ) {
for (i=-numPre[str]; i<=numSuc[str]; i++) {
oldFnr = oldHitFnr[str] + i
newFnr = newHitFnr[str] + i
if (oldLine[oldFnr] != newLine[newFnr]) {
print str, "mismatch at old line", oldFnr, "new line", newFnr
print "\t" oldLine[oldFnr], "vs", newLine[newFnr]
}
}
}
else {
print str, "is present in old file but not new file"
}
}
else if (str in newHitFnr) {
print str, "is present in new file but not old file"
}
}
}
.
$ awk -f tst.awk actlist old new
56633223225 mismatch at old line 12 new line 8
j vs Z
It's outputing that result because the 2nd line after 56633223225 is j in file "old" but Z in file "new" and the file "actlist" said the 2 files had to be common from one line before until 3 lines after that pattern.
Is that what you're trying to do? The above uses GNU awk for ARGIND but the workaround is trivial for other awks.
Use the below code:
awk '{if (NF == 3) { word1=$1; word2=$2; word3=$3; print "Words are:" word1, word2, word3} else {print "Line", NR, "is having", NF, "Words" }}' filename.txt
I have given the solution as per the requirement.
awk '{ # awk starts from here and read a file line by line
if (NF == 3) # It will check if current line is having 3 fields. NF represents number of fields in current line
{ word1=$1; # If current line is having exact 3 fields then 1st field will be assigned to word1 variable
word2=$2; # 2nd field will be assigned to word2 variable
word3=$3; # 3rd field will be assigned to word3 variable
print word1, word2, word3} # It will print all 3 fields
}' filename.txt >> output.txt # THese 3 fields will be redirected to a file which can be used for further processing.
This is as per the requirement, but there are many other ways of doing this but it was asked using awk.

Calculate difference between two number in a file

I want to know if it is possible to calculate the difference between two float number contained in a file in two distinct lines in one bash command line.
File content example :
Start at 123456.789
...
...
...
End at 123654.987
I would like to do an echo of 123654.987-123456.789
Is that possible? What is this magic command line ?
Thank you!
awk '
/Start/ { start = $3 } # 3rd field in line matching "Start"
/End/ {
end = $3; # 3rd field in line matching "End"
print end - start # Print the difference.
}
' < file
If you really want to do this on one line:
awk '/Start/ { start = $3 } /End/ { end = $3; print end - start }' < file
you can do this with this command:
start=`grep 'Start' FILENAME| cut -d ' ' -f 3`; end=`grep 'End' FILENAME | cut -d ' ' -f 3`; echo "$end-$start" | bc
You need the 'bc' program for this (for floating point math). You can install it with apt-get install bc, or yum, or rpm, zypper... OS specific :)
Bash doesn't support floating point operations. But you can split your numbers to parts and perform integer operations. Example:
#!/bin/bash
echo $(( ${2%.*} - ${1%.*} )).$(( ${2#*.} - ${1#*.} ))
Result:
./test.sh 123456.789 123654.987
198.198
EDIT:
Correct solution would be using not command line hack, but tool designed or performing fp operations. For example, bc:
echo 123654.987-123456.789 | bc
output:
198.198
Here's a weird way:
printf -- "-%s+%s\n" $(grep -oP '(Start|End) at \K[\d.]+' file) | bc

Resources