I am a bit new to using awk. My goal is to create a bash function of the form:
myfunction file column value
That takes the given column number in file, multiplies it by value and rewrites the file. For now I have written the following:
function multiply_column {
file=$1
column=$2
value=$3
awk -F" " '{print $col*mul}' col=$column mul=$value $file
}
My file looks like this:
0.400000E+15 0.168933E+00 -0.180294E-44 0.168933E+00
0.401000E+15 0.167689E+00 -0.181383E-44 0.167689E+00
0.402000E+15 0.166502E+00 -0.182475E-44 0.166502E+00
0.403000E+15 0.165371E+00 -0.183569E-44 0.165371E+00
0.404000E+15 0.164298E+00 -0.184666E-44 0.164298E+00
0.405000E+15 0.163284E+00 -0.185766E-44 0.163284E+00
0.406000E+15 0.162328E+00 -0.186868E-44 0.162328E+00
0.407000E+15 0.161431E+00 -0.187972E-44 0.161431E+00
0.408000E+15 0.160593E+00 -0.189080E-44 0.160593E+00
0.409000E+15 0.159816E+00 -0.190189E-44 0.159816E+00
0.410000E+15 0.159099E+00 -0.191302E-44 0.159099E+00
0.411000E+15 0.158442E+00 -0.192416E-44 0.158442E+00
0.412000E+15 0.157847E+00 -0.193534E-44 0.157847E+00
0.413000E+15 0.157312E+00 -0.194653E-44 0.157312E+00
0.414000E+15 0.156840E+00 -0.195775E-44 0.156840E+00
0.415000E+15 0.156429E+00 -0.196899E-44 0.156429E+00
0.416000E+15 0.156081E+00 -0.198026E-44 0.156081E+00
0.417000E+15 0.155796E+00 -0.199154E-44 0.155796E+00
0.418000E+15 0.155573E+00 -0.200285E-44 0.155573E+00
0.419000E+15 0.155413E+00 -0.201418E-44 0.155413E+00
0.420000E+15 0.155318E+00 -0.202554E-44 0.155318E+00
0.421000E+15 0.155285E+00 -0.203691E-44 0.155285E+00
0.422000E+15 0.155318E+00 -0.204831E-44 0.155318E+00
0.423000E+15 0.155414E+00 -0.205973E-44 0.155414E+00
0.424000E+15 0.155575E+00 -0.207116E-44 0.155575E+00
0.425000E+15 0.155802E+00 -0.208262E-44 0.155802E+00
I managed to just print the first column, but when I multiply it with my value, awk gives me 0. I tried my function with other files where data was formatted differently, and it worked perfectly. I also tried to combine it with bc, without any success.
Does anyone see why in this case awk gives 0 ?
Thanks in advance !
######### EDIT
I just found out that if my data file uses commas and not dots (i.e. 0,400000E+15 instead of 0.400000E+15), my function works fine. So somehow, somewhere, something is configured to understand commas as the scientific notation separator instead of dots. Does that ring a bell to anyone ?
Set LC_ALL=C before executing your script to get the most commonly expected behavior for this and other locale-dependent issues. See http://www.gnu.org/software/gawk/manual/gawk.html#Locales. Also don't pointlessly set FS to it's default value, do quote your shell variables (google that if you don't know why), and do fix the way you are setting your variables to use the form that produces the most intuitive results (see http://cfajohnson.com/shell/cus-faq-2.html#Q24):
LC_ALL=C awk -v col="$column" -v mul="$value" '{print $col*mul}' "$file"
Read the book Effective Awk programming, 4th Edition, by Arnold Robbins.
There is a mismatch between the locale used to create the data file and you current one.
For example the French locale and similar ones use the comma as their decimal separator while the dot is the most widely used, and is also the POSIX default.
If you want for commas to be accepted as decimal separators, you might workaround the issue like this:
LC_NUMERIC=fr_FR.UTF-8 awk '{print $col*mul}' col="$column" mul="$value" "$file"
Note that this won't work as is with GNU awk which doesn't honor the numeric locale setting by default. You would need to use the --use-lc-numeric flag to override.
Alternatively, if you want for dots to be accepted as decimal separators but your current locale is using commas and you are not using GNU awk, you can run this:
LC_NUMERIC=C awk '{print $col*mul}' col="$column" mul="$value" "$file"
Related
Starting Question
I have a CSV file which is formed this way (variable.csv)
E,F,G,H,I,J
a1,
,b2,b3
c1,,,c4,c5,c6
As you can see, the first and second columns do not have all the commas needed. Here's what I want:
E,F,G,H,I,J
a1,,,,,
,b2,b3,,,
c1,,,c4,c5,c6
With this, now every row has the right number of columns. In other words, I'm looking for a unix command which smartly appends the correct number of commas to the end of each row to make the row have the number of columns that we expect, based off the header.
Here's what I tried, based off of some searching:
awk -F, -v OFS=, 'NF=6' variable.csv. This works in the above case, BUT...
Final Question
...Can we have this command work if the column data contains commas itself, or even new line characters? e.g.
E,F,G,H,I,J
"a1\n",
,b2,"b3,3"
c1,,,c4,c5,c6
to
E,F,G,H,I,J
"a1\n",,,,,
,b2,"b3,3",,,
c1,,,c4,c5,c6
(Apologies if this example's formatting is malformed due to the way the newline is represented.
Short answer:
python3 -c 'import fileinput,sys,csv;b=list(csv.reader(fileinput.input()));w=max(len(i)for i in b);print("\n".join([",".join(i+[""]*(w-len(i)))for i in b]))' variable.csv
The python script may be long, but this is to ensure that all cases are handled. To break it down:
import fileinput,csv
b=list(csv.reader(fileinput.input())) # create a reader obj
w=max(len(i)for i in b) # how many fields?
print("\n".join([",".join(i+[""]*(w-len(i)))for i in b])) # output
BTW, in your starting problem
awk -F, -v OFS=, 'NF<6{$6=""}1' variable.csv
should work. (I think it's implementation or version related. Your code works on GNU awk but not on Mac version.)
I'd like to 're-sequence' some variable assignment values that are within a large BASH script I'm writing. At present, I have to do this manually, and it's quite time-consuming. ;)
e.g.:
(some code here)
ab=0
(and some here too)
ab=3
(more code here)
cd=2; ab=1
(more code here)
ab=2
What I'd like to do is run a command that can re-order the assignment values of 'ab' so we get:
(some code here)
ab=0
(and some here too)
ab=1
(more code here)
cd=2; ab=2
(more code here)
ab=3
The indentations exist as these usually form part of a code block, like an 'if' or 'for' block.
The variable name will always be the same. The first occurrence in the script should be made a zero. I thought if something (like sed) could search for 'ab=' followed by an integer, then change that integer according to an incrementing value, this would be perfect.
Hoping someone out there may know of something that can do this already. I use 'Kate' for my BASH editing.
Any thoughts? Thank you.
$ # can also use: perl -pe 's/\bab=\K\d+/$i++/ge' file
$ perl -pe 's/(\bab=)\d+/$1.$i++/ge' file
(some code here)
ab=0
(and some here too)
ab=1
(more code here)
cd=2; ab=2
(more code here)
ab=3
(\bab=)\d+ match ab= and one or more digits. \b is word boundary marker so that words like dab=4 doesn't match
The e modifier allows to use Perl code in replacement section
$1.$i++ is string concatenation of ab= and value of $i (which is 0 by default) Then $i gets incremented
Use perl -i -pe for inplace editing
#teracoy:#try:
awk '/ab=/{sub(/ab=[0-9]+/,"ab="i++);print;next} 1' Input_file
WIth GNU awk for multi-char RS, RT, and gensub():
$ awk -v RS='\\<ab=[0-9]+' '{ORS=gensub(/[0-9]+/,i++,1,RT)}1' file
(some code here)
ab=0
(and some here too)
ab=1
(more code here)
cd=2; ab=2
(more code here)
ab=3
Use awk -i inplace ... for inplace editing if desired.
I have been working on this little script at work to free up my own time and am currently stuck on part of it. The script is supposed to pull some content from a JSON, modify the content, and then re-upload it. The modification part is the portion that doesn't work.
An example of what the content looks like after being extracted from the JSON is:
<p>App1_v1.0_20160911_release.apk</p<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
The modification function is supposed to update the list with the newer app filenames in the same location. I've tried using both SED and AWK to get this to work but I haven't gotten anywhere fast.
Here are examples of both commands and the parameters for the substitution I am trying to run on the example file:
old_name=App1_.*_release.apk
new_name=App1_v1.0_20160920_1152_release.apk
sed "s/$old_name/$new_name/" body > upload
awk -v oldname="$old_name" -v newname="$new_name" '{sub(oldname, newname)}1' body > upload
What ends up happening is the substitution will change the correct part of the list, but then nuke everything between that point and the end of the list.
Thank you for any and all help.
PS: If I didn't explain something correctly or you feel some information is missing, please comment and let me know so I can better explain the problem.
There are SO many possible values of oldname, newname, and your input data that could cause either of the commands you wrote to fail - don't use that "replace a regexp with a backreference-enabled-string" approach in any command, use string operations instead (which means you can't use sed since sed doesn't support strings)
This modifies your sample input as you say you want:
$ awk -v new='App1_v1.0_20160920_1152_release.apk' 'BEGIN{RS="</p>\n?"; FS=OFS="<p>"} NR==1{$2=new} {printf "%s%s", $0, RT}' file
<p>App1_v1.0_20160920_1152_release.apk<p>App2_v2.0_20160915_beta.apk</p><p>App3_v3.0_20150909_VendorRelease.apk</p>
If that's not adequate then edit your question to better explain your requirements and provide more truly representative sample input/output.
The above uses GNU awk for multi-char RS and RT.
I have a CSV file with several thousand lines, and I need to take some of the columns in that file to create another CSV file to use for import to a database.
I'm not in shape with shell scripting anymore, is there anyone who can help with pointing me in the correct direction?
I have a bash script to read the source file but when I try to print the columns I want to a new file it just doesn't work.
while IFS=, read symbol tr_ven tr_date sec_type sec_name name
do
echo "$name,$name,$symbol" >> output.csv
done < test.csv
Above is the code I have. Out of the 6 columns in the original file, I want to build a CSV with "column6, column6, collumn1"
The test CSV file is like this:
Symbol,Trading Venue,Trading Date,Security Type,Security Name,Company Name
AAAIF,Grey Market,22/01/2015,Fund,,Alternative Investment Trust
AAALF,Grey Market,22/01/2015,Ordinary Shares,,Aareal Bank AG
AAARF,Grey Market,22/01/2015,Ordinary Shares,,Aluar Aluminio Argentino S.A.I.C.
What am I doing wrong with my script? Or, is there an easier - and faster - way of doing this?
Edit
These are the real headers:
Symbol,US Trading Venue,Trading Date,OTC Tier,Caveat Emptor,Security Type,Security Class,Security Name,REG_SHO,Rule_3210,Country of Domicile,Company Name
I'm trying to get the last column, which is number 12, but it always comes up empty.
The snippet looks and works fine to me, maybe you have some weird characters in the file or it is coming from a DOS environment (use dos2unix to "clean" it!). Also, you can make use of read -r to prevent strange behaviours with backslashes.
But let's see how can awk solve this even faster:
awk 'BEGIN{FS=OFS=","} {print $6,$6,$1}' test.csv >> output.csv
Explanation
BEGIN{FS=OFS=","} this sets the input and output field separators to the comma. Alternatively, you can say -F=",", -F, or pass it as a variable with -v FS=",". The same applies for OFS.
{print $6,$6,$1} prints the 6th field twice and then the 1st one. Note that using print, every comma-separated parameter that you give will be printed with the OFS that was previously set. Here, with a comma.
I have data in zdt format (like this), where I want to perform this python script only on the third column (the pinyin one). I have tried to do this with sed and awk but I have not had any success due to my limited knowledge of these tools. Ideally, I want to feed the column’s contents to the python script and then have the source replaced with the yield of the script.
This is roughly what I envision but the call is not executed, not even when in quotes.
s/([a-z]+[1,2,3,4]?)(?=.*\t)/decode_pinyin(\1)/g
I am not too strict of the tools (sed, awk, python, …) used, I just want a shell script for batch processing of a number of files. It would be best if the original spaces are preserved.
Try something like this:
awk -F'\t' '{printf "decode_pinyin(\"%s\")\n", $3}' file
This outputs:
decode_pinyin("ru4xiang1 sui2su2")
decode_pinyin("ru4")
decode_pinyin("xiang1")
decode_pinyin("sui2")
decode_pinyin("su2")