Break up a string using AWK or PERL

Break up a string using AWK or PERL - bash

I know you can use field delimiters to break up a field in AWK, however I have a question regarding a string without any delimiters. I need to process to following data, and I'm not sure how to start:
RyanWehe989987412rwehe#asu.edu2025550126CO2001BlakeStDenver80205
JosephLee605497184josephl#mailinator.com3035550103CO5986BudweiserWayAlamosa81101
AmyJohnson783333251amyj#mailinator.com6515550164MN14N5thStMinneapolis55403
DanielJEverhard314849866everhard#asu.edu5059358554NM8830JohnsonRdAlbuquerque87122
PhilipEPeterson325764011peterson#asu.edu4561238888WA542468thAveLacey98513
MattVNulk124085733nulk#asu.edu2093865442KSManhattanStRiley87512
BrandonTLyons123456123btlyons1#asu.edu5755595459AZ635WElmStMesa85212
RogerATurtle983421567rat#gmail.com8587754321IA3400SWIslanDrdDesmoines50021
MarcJWhiz745629754marcwhiz76#yahoo.com6195323200CA215NCollegeGroveWaySandiego91210
I want to format the raw data into this:
Ryan Wehe, 989-98-7412
2001 Blake St
Denver, CO 80205
wehe#asu.edu
(202) 555-0126
Joseph Lee, 605-49-7184
5986 Budweiser Way
Alamosa, CO 81101
josephl#mailinator.com
(303) 555-0103
AmyJohnson, 783-33-3251
14 N 5th St
Minneapolis, MN 55403
amyj#mailinator.com
(651) 555-0164

To the best of my knowledge, Awk provides no facilitity for using capture groups to define the field separator.
In consideration of this I think a quick hack might be your best option:
cat addresses.txt | perl -ne '/([A-Z][[:lower:]]*)([A-Z]*[[:lower:]]*)([0-9]{9})(.*?\.\w{2,3})([0-9]{10})(.*?)([0-9]{5})/ && print "$1 $2 $3 $4 $5 $6\n"'
Which returns this:
Ryan Wehe 989987412 rwehe#asu.edu 2025550126 CO2001BlakeStDenver 80205
Joseph Lee 605497184 josephl#mailinator.com 3035550103 CO5986BudweiserWayAlamosa 81101
Amy Johnson 783333251 amyj#mailinator.com 6515550164 MN14N5thStMinneapolis 55403
Daniel JEverhard 314849866 everhard#asu.edu 5059358554 NM8830JohnsonRdAlbuquerque 87122
Philip EPeterson 325764011 peterson#asu.edu 4561238888 WA 54246
Matt VNulk 124085733 nulk#asu.edu 2093865442 KSManhattanStRiley 87512
Brandon TLyons 123456123 btlyons1#asu.edu 5755595459 AZ635WElmStMesa 85212
Roger ATurtle 983421567 rat#gmail.com 8587754321 IA3400SWIslanDrdDesmoines 50021
Marc JWhiz 745629754 marcwhiz76#yahoo.com 6195323200 CA215NCollegeGroveWaySandiego 91210
Your answer uses both formats so I was unsure if you you need to break names apart (i.e Ryan Wehe instead of RyanWehe), adjusting it to this is fairly straitforward.

Related

how to round the output in shell?

For our webshop we get from the manufacturers a csv file (automatically updated) with product data.
Some manufacturers use prices without Tax and some within.
I want to change prices with a shell script to add 21% TAX and round it to nearest .95 or .50
For example I get a sheet:
sku|ean|name|type|price_excl_vat|price
EU-123|123123123123|Product name|simple|24.9900
I use this code:
sed -i "1 s/price/price_excl_vat/" inputfile
awk '{FS="|"; OFS="|"; if (NR<=1) {print $0 "|price"} else {print $0 "|" $5*1.21}}' inputfile > outputfile
the output is:
sku|ean|name|type|price_excl_vat|price
EU-123|123123123123|Product name|simple|24.9900|30.2379
How do I round it to the correct price like below ?
sku|ean|name|type|price_excl_vat|price
EU-123|123123123123|Product name|simple|24.9900|29.95

awk to the rescue!
awk 'BEGIN {FS=OFS="|"}
$NF==$NF+0 {a=$NF*1.21;
r=a-int(a);
if (r<0.225) a=a-r-0.05;
else if (r<0.725) a=a-r+0.50;
else a=a-r+0.95;
$(NF+1)=a} 1'
note that in your example the nearest number for 30.2379 will be 30.50 Perhaps you want to round down?
To round down instead of the nearest, and with a variable price column. The new computed value will be appended to the end of the row.
awk 'BEGIN {FS=OFS="|"; k=5}
$k==$k+0 {a=$k*1.21;
r=a-int(a);
if (r<0.50) a=a-r-0.05;
else if (r<0.95) a=a-r+0.50;
else a=a-r+0.95;
$(NF+1)=a} 1'

awk '#define field separator in and out
BEGIN{FS=OFS="|"}
# add/modify a 6th field for price label if missing on header only
NR==1 && NF == 6 { $6 = "price"; print; next}
NR==1 && NF == 5 { $6 = "price"; print; next}
# add price with tva rounded to 0.01 if missing
NF == 5 { $6 = int( $5 * 121 ) / 100 }
# print the line (modified or not, ex empty lines) [7 is just a *not 0*)
7
' inputfile \
> outputfile
self documented
not sure about your sed for header becasue sample show already a header with price so take the one you want

Not knowing what you're program looks like, it makes it difficult to give you more information.
However, both awk and bash have the printf command. This command can be used for rounding floating point numbers. (Yes, Bash is integer arithmetic, but it can pretend a number is a decimal number).
I gave you the link for the C printf command because the one for Bash doesn't include the formatting codes. Read it and weep because the documentation is a bit dense, and if you've never used printf before, it can be quite difficult to understand. Fortunately, an example will bring things to light:
$ foo="23.42532"
$ printf "%2.2f\n", $foo
$ 23.43 #All rounded for you!
The f means it's a floating point number. The % tells you that this is the beginning of a formatting sequence. The 2.2 means you want 2 digits on the left side of the decimal and two digits on the right. If you said %4.2f, it would make sure there's enough room for four digits on the left side of the decimal, and left pad the number with spaces. The \n on the end is the New Line character.
Fortunately, although printf can be hard to understand at first, it's pretty much the same in almost all programming languages. It's in awk, Perl, Python, C, Java, and many more languages. And, if the information you need isn't in printf, try the documentation on sprintf which is like printf, but prints the formatted text into a string.
The best documentation I've seen is in the Perl sprintf documentation because it gives you plenty of examples.

Sorting by Date in Shell

Good day. Ive been trying to sort the following data from a txt file using shell script but as of now I`ve been unable to do so.
Here is what the data on the file looks like,:
Name:ID:Date
Clinton Mcdaniel:100:16/04/2016
Patience Mccarty:101:18/03/2013
Carol Holman:102:24/10/2013
Roth Lamb:103:11/02/2015
Chase Gardner:104:14/06/2014
Jacob Tucker:105:05/11/2013
Maite Barr:106:24/04/2014
Acton Galloway:107:18/01/2013
Helen Orr:108:10/05/2014
Avye Rose:109:07/06/2014
What i want to do is being able to sort this by Date instead of name or ID.
When i execute the following code i get this:
Code:
sort -t "/" -k3.9 -k3.4 -k3
Result:
Acton Galloway:107:18/01/2013
Amaya Lynn:149:11/08/2013
Anne Sullivan:190:12/01/2013
Bruno Hood:169:01/08/2013
Cameron Phelps:187:17/11/2013
Carol Holman:102:24/10/2013
Chaney Mcgee:183:11/09/2013
Drew Fowler:173:28/07/2013
Hadassah Green:176:17/01/2013
Jacob Tucker:105:05/11/2013
Jenette Morgan:160:28/11/2013
Lael Aguirre:148:29/05/2013
Lareina Morin:168:06/05/2013
Laura Mercado:171:06/06/2013
Leonard Richard:154:02/06/2013
As you can see it only sorts by the year, but the months and everything else are still a little out of place. Does anyone knows how to correctly sort this by date?
EDIT:
Well, I`ve found how to do it, answer below:
Code: sort -n -t":" -k3.9 -k3.4,3.5 -k3
Result:
Anne Sullivan:190:12/01/2013
Hadassah Green:176:17/01/2013
Acton Galloway:107:18/01/2013
Nasim Gonzalez:163:18/01/2013
Patience Mccarty:101:18/03/2013
Sacha Stevens:164:01/04/2013
Lareina Morin:168:06/05/2013
Lael Aguirre:148:29/05/2013
Leonard Richard:154:02/06/2013
Laura Mercado:171:06/06/2013
Drew Fowler:173:28/07/2013
Bruno Hood:169:01/08/2013
Virginia Puckett:144:08/08/2013
Moses Mckay:177:09/08/2013
Amaya Lynn:149:11/08/2013
Chaney Mcgee:183:11/09/2013
Willa Bond:153:22/09/2013
Oren Flores:184:27/09/2013
Olga Buckley:181:11/10/2013
Carol Holman:102:24/10/2013
Jacob Tucker:105:05/11/2013
Veda Gillespie:125:09/11/2013
Thor Workman:152:12/11/2013
Cameron Phelps:187:17/11/2013
Jenette Morgan:160:28/11/2013
Mason Contreras:129:29/12/2013
Martena Sosa:158:30/12/2013
Vivian Stevens:146:20/01/2014
Benedict Massey:175:02/03/2014
Macey Holden:127:01/04/2014
Orla Estrada:174:06/04/2014
Maite Barr:106:24/04/2014
Helen Orr:108:10/05/2014
Randall Colon:199:27/05/2014
Avye Rose:109:07/06/2014
Cleo Decker:117:12/06/2014
Chase Gardner:104:14/06/2014
Mark Lynn:113:21/06/2014
Geraldine Solis:197:24/06/2014
Thor Wheeler:180:25/06/2014
Aimee Martin:192:21/07/2014
Gareth Cervantes:166:26/08/2014
Serena Fernandez:122:24/09/2014
`

The sort you are using will fail for any date before year 2000 (e.g. 1999 will sort after 2098). Continuing from your question in the comment, you currently show
sort -n -t":" -k3.9 -k3.4,3.5 -k3
You should use
sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2
Explanation:
Your -t separates the fields on each colon. (':') The -k KEYDEF where KEYDEF is in the form f[.c][opt] (that's field.character option) (you need no separate option after character). Your date field is (field 3):
d d / m m / y y y y
1 2 3 4 5 6 7 8 9 0 -- chars counting from 1 in field 3
So you first sort by -k3.9 (the 9th character in field 3) which is the 2-digit year in the 4-digit field. You really want to sort on -k3.7 (which is the start of the 4-digit date)
You next sort by the month (characters 4,5) which is fine.
Lastly, you sort on -k3 (which fails to limit the characters considered). Just as you have limited the sort on the month to chars 4,5, you should limit the sort of the days to characters 1,2.
Putting that together gives you sort -n -t":" -k3.7 -k3.4,3.5 -k3.1,3.2. Hope that answers your question from the comment.

You're hamstrung by your (terrible, IMO) date format. Here's a bit of a Schwartzian transform:
awk -F'[:/]' '{printf "%s%s%s %s\n", $NF, $(NF-1), $(NF-2), $0}' file | sort -n | cut -d' ' -f2-
That extracts the year, month, day and adds it as a separate word to the start of each line. Then you can sort quite simply. Then discard that date.

Extract text and evaluate in bash

I need some help getting a script up and running. Basically I have some data that comes from a command output and want to select some of it and evaluate
Example data is
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
So far I have something along the lines of
# Define date to check
check=$(date -d "-90 days" "+%Y/%m/%d")
# Return user name
for user in $(command | awk '{print $1}')
do
# Return last logon date
$lastdate=(command | awk '{for(i=1;i<=NF;i++) if ($i==spotted) $(i+1)}')
# Evaluation date again current -90days
if $lastdate < $check; then
printf "$user not logged on for ages"
fi
done
I have a couple of problems, not least the fact that whilst I can get information from places I don't know how to go about getting it all together!! I'm also guessing my date evaluation will be more complicated but at this point that's another problem and just there to give a better idea of my intentions. If anyone can explain the logical steps needed to achieve my goal as well as propose a solution that would be great. Thanks

Every time you write a loop in shell just to manipulate text you have the wrong approach (see, for example, https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice). The general purpose text manipulation tool that comes on every UNIX installation is awk. This uses GNU awk for time functions:
$ cat tst.awk
BEGIN { check = systime() - (90 * 24 * 60 * 60) }
{
user = $1
date = gensub(/([0-9]+)\/([0-9]+)\/([0-9]+)/,"\\3 \\2 \\1 0 0 0",1,$NF)
secs = mktime(date)
if (secs < check) {
printf "%s not logged in for ages\n", user
}
}
$ cat file
JSnow <jsnow#email.com> John Snow spotted 30/1/2015
BBaggins <bbaggins#email.com> Bilbo Baggins spotted 20/03/2015
Batman <batman#email.com> Batman spotted 09/09/2015
$ cat file | awk -f tst.awk
JSnow not logged in for ages
BBaggins not logged in for ages
Batman not logged in for ages
Replace cat file with command.

calculate distance; substract the first column of the second line from the second column of the fist line using awk

I have a question. I have a file with coordinates (TAB separated)
2 10
35 50
90 200
400 10000
...
I would like to substract the first column of the second line from the second column of the fist line , i.e. calculate the distance, i.e. I would like a file with
25
40
200
...
How could I do that using awk???
Thank you very much in advance

here is an awk one-liner may help you:
kent$ awk 'a{print $1-a}{a=$2}' file
25
40
200

Here's a pure bash solution:
{
read _ ps
while read f s; do
echo $((f-ps))
((ps=s))
done
} < input_file
This only works if you have (small) integers, as it uses bash's arithmetic. If you want to deal with arbitrary sized integers or floats, you can use bc (with only one fork):
{
read _ ps
while read f s; do
printf '%s-%s\n' "$f" "$ps"
ps=$s
done
} < input_file | bc
Now I leave the others give an awk answer!
Alright, since nobody wants to upvote my answer, here's a really funny solution that uses bash and bc:
a=( $(<input_file) )
printf -- '-(%s)+(%s);\n' "${a[#]:1:${#a[#]}-2}" | bc
or the same with dc (shorter but doesn't work with negative numbers):
a=( $(<input_file) )
printf '%s %sr-pc' "${a[#]:1:${#a[#]}-2}" | dc

using sed and ksh for evaluation
sed -n "
1x
1!H
$ !b
x
s/^ *[0-9]\{1,\} \(.*\) [0-9]\{1,\} *\n* *$/\1 /
s/\([0-9]\{1,\}\)\(\n\)\([0-9]\{1,\}\) /echo \$((\3 - \1))\2/g
s/\n *$//
w /tmp/Evaluate.me
"
. /tmp/Evaluate.me
rm /tmp/Evaluate.me

Bash shell scripting - Error setting variables

I'm new at bash scripting. I tried the following:
filename01 = ''
if [ $# -eq 0 ]
then
filename01 = 'newList01.txt'
else
filename01 = $1
fi
I get the following error:
./smallScript02.sh: line 9: filename01: command not found
./smallScript02.sh: line 13: filename01: command not found
I imagine that I am not treating the variables correctly, but I don't know how. Also, I am trying to use grep to extract the second and third words from a text file. The file looks like:
1966 Bart Starr QB Green Bay Packers
1967 Johnny Unitas QB Baltimore Colts
1968 Earl Morrall QB Baltimore Colts
1969 Roman Gabriel QB Los Angeles Rams
1970 John Brodie QB San Francisco 49ers
1971 Alan Page DT Minnesota Vikings
1972 Larry Brown RB Washington Redskins
Any help would be appreciated

When you assign variables in bash, there should be no spaces on either side of the = sign.
# good
filename0="newList01.txt"
# bad
filename0 = "newlist01.txt"
For your second problem, use awk not grep. The following will extract the second and third items from each line of a file whose name is stored in $filename0:
< $filename0 awk '{print $2 $3}'

In bash (and other bourne-type shells), you can use a default value if a variable is empty or not set:
filename01=${1:-newList01.txt}
I'd recommend spending some time with the bash manual: http://www.gnu.org/software/bash/manual/bashref.html
Here's a way to extract the name:
while read first second third rest; do
echo $second $third
done < "$filename01"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio