bash script to rewrite numbers sequentially - bash

I'd like to 're-sequence' some variable assignment values that are within a large BASH script I'm writing. At present, I have to do this manually, and it's quite time-consuming. ;)
e.g.:
(some code here)
ab=0
(and some here too)
ab=3
(more code here)
cd=2; ab=1
(more code here)
ab=2
What I'd like to do is run a command that can re-order the assignment values of 'ab' so we get:
(some code here)
ab=0
(and some here too)
ab=1
(more code here)
cd=2; ab=2
(more code here)
ab=3
The indentations exist as these usually form part of a code block, like an 'if' or 'for' block.
The variable name will always be the same. The first occurrence in the script should be made a zero. I thought if something (like sed) could search for 'ab=' followed by an integer, then change that integer according to an incrementing value, this would be perfect.
Hoping someone out there may know of something that can do this already. I use 'Kate' for my BASH editing.
Any thoughts? Thank you.

$ # can also use: perl -pe 's/\bab=\K\d+/$i++/ge' file
$ perl -pe 's/(\bab=)\d+/$1.$i++/ge' file
(some code here)
ab=0
(and some here too)
ab=1
(more code here)
cd=2; ab=2
(more code here)
ab=3
(\bab=)\d+ match ab= and one or more digits. \b is word boundary marker so that words like dab=4 doesn't match
The e modifier allows to use Perl code in replacement section
$1.$i++ is string concatenation of ab= and value of $i (which is 0 by default) Then $i gets incremented
Use perl -i -pe for inplace editing

#teracoy:#try:
awk '/ab=/{sub(/ab=[0-9]+/,"ab="i++);print;next} 1' Input_file

WIth GNU awk for multi-char RS, RT, and gensub():
$ awk -v RS='\\<ab=[0-9]+' '{ORS=gensub(/[0-9]+/,i++,1,RT)}1' file
(some code here)
ab=0
(and some here too)
ab=1
(more code here)
cd=2; ab=2
(more code here)
ab=3
Use awk -i inplace ... for inplace editing if desired.

Related

Combine two expression in Bash

I did check the ABS, but it was hard to find a reference to my problem/question there.
Here it is. Consider the following code (Which extracts the first character of OtherVar and then converts MyVar to uppercase):
OtherVar=foobar
MyChar=${OtherVar:0:1} # get first character of OtherVar string variable
MyChar=${MyChar^} # first character to upper case
Could I somehow condense the second and third line into one statement?
P.S.: As was pointed out below, not needs to have a named variable. I should add, I would like to not add any sub-shells or so and would also accept a somehow hacky way to achieve the desired result.
P.P.S.: The question is purely educational.
You could do it all-in-one without forking sub-shell or running external command:
printf -v MyChar %1s "${OtherVar^}"
Or:
read -n1 MyChar <<<"${OtherVar^}"
Another option:
declare -u MyChar=${OtherVar:0:1}
But I can't see the point in such optimization in a bash script.
There are more suitable text processing interpreters, like awk, sed, even perl or python if performance matters.
You could use the cut command and put it in a complex expression to get it on one line, but I'm not sure it makes the code too much clearer:
OtherVar=foobar
MyChar=$(echo ${OtherVar^} | cut -c1-1) # uppercase first character and cut string

extract data between similar patterns

I am trying to use sed to print the contents between two patterns including the first one. I was using this answer as a source.
My file looks like this:
>item_1
abcabcabacabcabcabcabcabacabcabcabcabcabacabcabc
>item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
>item_3
cdecde
>item_4
defdefdefdefdefdefdef
I want it to start searching from item_2 (and include) and finish at next occuring > (not include). So my code is sed -n '/item_2/,/>/{/>/!p;}'.
The result wanted is:
item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
but I get it without item_2.
Any ideas?
Using awk, split input by >s and print part(s) matching item_2.
$ awk 'BEGIN{RS=">";ORS=""} /item_2/' file
item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
I would go for the awk method suggested by oguz for its simplicity. Now if you are interested in a sed way, out of curiosity, you could fix what you have already tried with a minor change :
sed -n '/^>item_2/ s/.// ; //,/>/ { />/! p }' input_file
The empty regex // recalls the previous regex, which is handy here to avoid duplicating /item_2/. But keep in mind that // is actually dynamic, it recalls the latest regex evaluated at runtime, which is not necessarily the closest regex on its left (although it's often the case). Depending on the program flow (branching, address range), the content of the same // can change and... actually here we have an interesting example ! (and I'm not saying that because it's my baby ^^)
On a line where /^>item_2/ matches, the s/.// command is executed and the latest regex before // becomes /./, so the following address range is equivalent to /./,/>/.
On a line where /^>item_2/ does not match, the latest regex before // is /^>item_2/ so the range is equivalent to /^>item_2/,/>/.
To avoid confusion here as the effect of // changes during execution, it's important to note that an address range evaluates only its left side when not triggered and only its right side when triggered.
This might work for you (GNU sed):
sed -n ':a;/^>item_2/{s/.//;:b;p;n;/^>/!bb;ba}' file
Turn off implicit printing -n.
If a line begins >item_2, remove the first character, print the line and fetch the next line
If that line does not begins with a >, repeat the last two instructions.
Otherwise, repeat the whole set of instructions.
If there will always be only one line following >item_2, then:
sed '/^>item_2/!d;s/.//;n' file

Issue with bash script using SED/AWK for substituion

I have been working on this little script at work to free up my own time and am currently stuck on part of it. The script is supposed to pull some content from a JSON, modify the content, and then re-upload it. The modification part is the portion that doesn't work.
An example of what the content looks like after being extracted from the JSON is:
<p>App1_v1.0_20160911_release.apk</p<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
The modification function is supposed to update the list with the newer app filenames in the same location. I've tried using both SED and AWK to get this to work but I haven't gotten anywhere fast.
Here are examples of both commands and the parameters for the substitution I am trying to run on the example file:
old_name=App1_.*_release.apk
new_name=App1_v1.0_20160920_1152_release.apk
sed "s/$old_name/$new_name/" body > upload
awk -v oldname="$old_name" -v newname="$new_name" '{sub(oldname, newname)}1' body > upload
What ends up happening is the substitution will change the correct part of the list, but then nuke everything between that point and the end of the list.
Thank you for any and all help.
PS: If I didn't explain something correctly or you feel some information is missing, please comment and let me know so I can better explain the problem.
There are SO many possible values of oldname, newname, and your input data that could cause either of the commands you wrote to fail - don't use that "replace a regexp with a backreference-enabled-string" approach in any command, use string operations instead (which means you can't use sed since sed doesn't support strings)
This modifies your sample input as you say you want:
$ awk -v new='App1_v1.0_20160920_1152_release.apk' 'BEGIN{RS="</p>\n?"; FS=OFS="<p>"} NR==1{$2=new} {printf "%s%s", $0, RT}' file
<p>App1_v1.0_20160920_1152_release.apk<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
If that's not adequate then edit your question to better explain your requirements and provide more truly representative sample input/output.
The above uses GNU awk for multi-char RS and RT.

Awk multiplication gives zero

I am a bit new to using awk. My goal is to create a bash function of the form:
myfunction file column value
That takes the given column number in file, multiplies it by value and rewrites the file. For now I have written the following:
function multiply_column {
file=$1
column=$2
value=$3
awk -F" " '{print $col*mul}' col=$column mul=$value $file
}
My file looks like this:
0.400000E+15 0.168933E+00 -0.180294E-44 0.168933E+00
0.401000E+15 0.167689E+00 -0.181383E-44 0.167689E+00
0.402000E+15 0.166502E+00 -0.182475E-44 0.166502E+00
0.403000E+15 0.165371E+00 -0.183569E-44 0.165371E+00
0.404000E+15 0.164298E+00 -0.184666E-44 0.164298E+00
0.405000E+15 0.163284E+00 -0.185766E-44 0.163284E+00
0.406000E+15 0.162328E+00 -0.186868E-44 0.162328E+00
0.407000E+15 0.161431E+00 -0.187972E-44 0.161431E+00
0.408000E+15 0.160593E+00 -0.189080E-44 0.160593E+00
0.409000E+15 0.159816E+00 -0.190189E-44 0.159816E+00
0.410000E+15 0.159099E+00 -0.191302E-44 0.159099E+00
0.411000E+15 0.158442E+00 -0.192416E-44 0.158442E+00
0.412000E+15 0.157847E+00 -0.193534E-44 0.157847E+00
0.413000E+15 0.157312E+00 -0.194653E-44 0.157312E+00
0.414000E+15 0.156840E+00 -0.195775E-44 0.156840E+00
0.415000E+15 0.156429E+00 -0.196899E-44 0.156429E+00
0.416000E+15 0.156081E+00 -0.198026E-44 0.156081E+00
0.417000E+15 0.155796E+00 -0.199154E-44 0.155796E+00
0.418000E+15 0.155573E+00 -0.200285E-44 0.155573E+00
0.419000E+15 0.155413E+00 -0.201418E-44 0.155413E+00
0.420000E+15 0.155318E+00 -0.202554E-44 0.155318E+00
0.421000E+15 0.155285E+00 -0.203691E-44 0.155285E+00
0.422000E+15 0.155318E+00 -0.204831E-44 0.155318E+00
0.423000E+15 0.155414E+00 -0.205973E-44 0.155414E+00
0.424000E+15 0.155575E+00 -0.207116E-44 0.155575E+00
0.425000E+15 0.155802E+00 -0.208262E-44 0.155802E+00
I managed to just print the first column, but when I multiply it with my value, awk gives me 0. I tried my function with other files where data was formatted differently, and it worked perfectly. I also tried to combine it with bc, without any success.
Does anyone see why in this case awk gives 0 ?
Thanks in advance !
######### EDIT
I just found out that if my data file uses commas and not dots (i.e. 0,400000E+15 instead of 0.400000E+15), my function works fine. So somehow, somewhere, something is configured to understand commas as the scientific notation separator instead of dots. Does that ring a bell to anyone ?
Set LC_ALL=C before executing your script to get the most commonly expected behavior for this and other locale-dependent issues. See http://www.gnu.org/software/gawk/manual/gawk.html#Locales. Also don't pointlessly set FS to it's default value, do quote your shell variables (google that if you don't know why), and do fix the way you are setting your variables to use the form that produces the most intuitive results (see http://cfajohnson.com/shell/cus-faq-2.html#Q24):
LC_ALL=C awk -v col="$column" -v mul="$value" '{print $col*mul}' "$file"
Read the book Effective Awk programming, 4th Edition, by Arnold Robbins.
There is a mismatch between the locale used to create the data file and you current one.
For example the French locale and similar ones use the comma as their decimal separator while the dot is the most widely used, and is also the POSIX default.
If you want for commas to be accepted as decimal separators, you might workaround the issue like this:
LC_NUMERIC=fr_FR.UTF-8 awk '{print $col*mul}' col="$column" mul="$value" "$file"
Note that this won't work as is with GNU awk which doesn't honor the numeric locale setting by default. You would need to use the --use-lc-numeric flag to override.
Alternatively, if you want for dots to be accepted as decimal separators but your current locale is using commas and you are not using GNU awk, you can run this:
LC_NUMERIC=C awk '{print $col*mul}' col="$column" mul="$value" "$file"

Replace string with result of command

I have data in zdt format (like this), where I want to perform this python script only on the third column (the pinyin one). I have tried to do this with sed and awk but I have not had any success due to my limited knowledge of these tools. Ideally, I want to feed the column’s contents to the python script and then have the source replaced with the yield of the script.
This is roughly what I envision but the call is not executed, not even when in quotes.
s/([a-z]+[1,2,3,4]?)(?=.*\t)/decode_pinyin(\1)/g
I am not too strict of the tools (sed, awk, python, …) used, I just want a shell script for batch processing of a number of files. It would be best if the original spaces are preserved.
Try something like this:
awk -F'\t' '{printf "decode_pinyin(\"%s\")\n", $3}' file
This outputs:
decode_pinyin("ru4xiang1 sui2su2")
decode_pinyin("ru4")
decode_pinyin("xiang1")
decode_pinyin("sui2")
decode_pinyin("su2")

Resources