I would like to write a unix script that do the following to have the ff result:
textfile1 contains the following text:
keyval1,1
keyval1,2
keyval1,3
keyval1,4
keyval2,1
keyval2,2
keyval3,1
keyval4,1
keyval4,3
keyval4,4
Expected result:
keyval1 (1,2,3,4)
keyval2 (1,2)
keyval2 (1)
keyval4 (1,3,4)
Thank you.
I'm new to unix and this is what I have done so far. It's not working yet though :(
#!/bin/ksh
f1 = 'cut -d "," -f 1 keyval.txt'
f2 = 'cut -d "," -f 2 keyval.txt'
while f1 <> f2
do
echo f1 "("f2")"
done > output.txt
You can do this in a breeze using AWK:
#!/usr/bin/awk -f
BEGIN {
FS = ","
closeBracket = ""
}
{
if (key != $1)
{
key = $1
printf "%s%s (%s", closeBracket, key, $2
}
else
{
printf ",%s", $2
}
closeBracket = ")\n"
}
END {
printf "%s", closeBracket
}
A bit late to the party, but I had this one laying around, almost:
#!/usr/bin/perl
while (<>)
{
/(.+),(.?+)\s*/;
push #{$h{$1}}, $2;
}
print map {"$_ (" . join(',', #{$h{$_}}) . ")\n"} sort keys %h;
Not particular beautiful but it get the job done.
Related
Below is the data file(results) contents-
13450708,13470474,US,15
24954,24845,JPN,44
14258992,14365059,US,4
24954,24845,IND,44
I want to send above data sets to email in a tabular format. For that I am using below awk script.
Now the challenge I am facing here is - I want to make the background color as red if the lastfield in the datasets ( i.e. here 15,44,4,44) > 40.
Can you please tell me how to use the same in below code.
awk 'BEGIN{
FS=","
print "<HTML>""<TABLE border="1"><TH>Store_count</TH><TH>Store_sold</TH><TH>Store_code</TH><TH>Backlogs</TH>"
}
{
printf "<TR>"
for(i=1;i<=NF;i++)
printf "<TD>%s</TD>", $i
print "</TR>"
}
END{
print "</TABLE></BODY></HTML>"
}
' results > file1.html
I really don't understand why you're struggling with this since you seem to already have all of the information to do what you want, but anyway - just change:
printf "<TD>%s</TD>", $i
to
printf "<TD%s>%s</TD>", ( (i==NF) && ($i > 40) ? " style=\"background-color:red\"" : "" ), $i
or if you don't like ternary expressions:
printf "<TD"
if ( (i==NF) && ($i > 40) ) {
printf " style=\"background-color:red\""
}
printf ">%s</TD>, $i
or similar.
Anyways realized where I did the mistake anyways for me below code is giving results as expected.
for(i=1;i<=NF;i++)
if (i ==4 && $4 >= 40)
{
printf "%s", $i
}
else
{
printf "%s", $i
}
print ""
}
END{
print ""
}
' results1 > file1.html
I would like to find the contiguous ranges given a set of dates by day
given the following sample
2016-01-01
2016-01-02
2016-01-03
2016-01-04
2016-01-05
2016-01-06
2016-01-08
2016-01-09
2016-01-10
2016-01-11
2016-01-12
2016-01-15
2016-01-16
2016-01-17
2016-01-20
2016-01-21
2016-01-30
2016-01-31
2016-02-01
I expect the following result
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01
I have already came across this question which is almost the opposite of what I want but with integers.
I have formulated the following which works with integers.
awk 'NR==1 {l=$1; n=$1} {if ($1==n){n=$1+1} else{print l"-"n-1; l=$1 ;n=$1+1} } END {print l"-"$1}' file.txt
With GNU awk for mktime():
$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currSecs = mktime( $1" "$2" "$3" 0 0 0" ) }
(currSecs - prevSecs) > (24*60*60) {
if (NR>1) {
print startDate, prevDate
}
startDate = $0
}
{ prevSecs = currSecs; prevDate = $0 }
END { print startDate, prevDate }
$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-02-01
With any awk if you don't care about ranges restarting when months change (as apparent in your expected output and the comment under your question):
$ cat tst.awk
BEGIN { FS=OFS="-" }
{ currYrMth = $1 FS $2; currDay = $3 }
(currYrMth != prevYrMth) || ((currDay - prevDay) > 1) {
if (NR>1) {
print startDate, prevDate
}
startDate = $0
}
{ prevYrMth = currYrMth; prevDay = currDay; prevDate = $0 }
END { print startDate, prevDate }
$ awk -f tst.awk file
2016-01-01-2016-01-06
2016-01-08-2016-01-12
2016-01-15-2016-01-17
2016-01-20-2016-01-21
2016-01-30-2016-01-31
2016-02-01-2016-02-01
If you have GNU Awk you can use its time functions.
gawk -F - 'NR==1 || $1 "-" $2 "-" $3 != following {
if (following != "") print start "-" latest;
start = $1 "-" $2 "-" $3
this = mktime($1 " " $2 " " $3 " 0 0 0")
}
{
this += 24*60*60
following = strftime("%F", this)
latest = $1 "-" $2 "-" $3 }
END { if (start != latest) print start "-" latest }' filename
Unit ranges will print like "2016-04-15-2016-04-15" which is a bit of a wart, but easy to fix if you need to. Also the END block has a bug in this case, but again, this should at least get you started.
gawk:
#!/bin/awk -f
BEGIN{
FS="-"
}
{
a[NR]=mktime($1" "$2" "$3" 0 0 0")
b[NR]=$2;
if ( (a[NR-1]+86400) != a[NR] || b[NR-1]!=b[NR] ) {
if(NR!=1){
print s" - "strftime("%Y-%m-%d",a[NR-1])
};
s=$0
}
}
END{
print s" - "$0
}
Create array a with index NR and value as epochtime derived from $0 using awk time function mktime.
Array b with index NR and value as the month in $2
if either epoch time from last line + 86400 ( +1 day) is not equal to epoch time in current line or month in previous line and current line differs, except for first line, print value in s" - "strftime("%Y-%m-%d",a[NR-1] and reassign s which is the start date with $0
END:
Print the last start time s and last line
So i am trying to write a bash script to check if all values in a data set are within a certain margin of the average.
so far:
#!/bin/bash
cat massbuild.csv
while IFS=, read col1 col2
do
x=$(grep "$col2" $col1.pdb | grep "HETATM" | awk '{ sum += $7; n++ } END { if (n > 0) print sum / n; }')
i=$(grep "$col2" $col1.pdb | grep "HETATM" | awk '{print $7;}')
if $(($i > $[$x + 15])); then
echo "OUTSIDE THE RANGE!"
fi
done < massbuild.csv
So far, I have broken it down by components to test, and have found the values of x and i read correctly, but it seems that adding 15 to x, or the comparison to i doesn't work.
I have read around online and i am stumped =/
Without sample input and expected output we're just guessing but MAYBE this is the right starting point for your script (untested, of course, since no in/out provided):
#!/bin/bash
awk -F, '
NR==FNR {
file = $1 ".pdb"
ARGV[ARGC] = file
file2col2s[file] = (col1to2s[file] ? file2col2s[file] FS : "") $2
next
}
FNR==1 { split(file2col2s[FILENAME],col2s) }
/HETATM/ {
for (i=1;i in col2s;i++) {
col2 = col2s[i]
if ($0 ~ col2) {
sum[FILENAME,col2] += $7
cnt[FILENAME,col2]++
}
}
}
END {
for (file in file2col2s) {
split(file2col2s[file],col2s)
for (i=1;i in col2s;i++) {
col2 = col2s[i]
print sum[file,col2]
print cnt[file,col2]
}
}
}
' massbuild.csv
Does this help?
a=4; b=0; if [ "$a" -lt "$(( $b + 5 ))" ]; then echo "a < b + 5"; else echo "a >= b + 5"; fi
Ref: http://www.tldp.org/LDP/abs/html/comparison-ops.html
I have a file on the following format
id_1,1,0,2,3,lable1
id_2,3,2,2,1,lable1
id_3,5,1,7,6,lable1
and I want the summation of each column ( I have over 300 columns)
9,3,11,10,lable1
how can I do that using bash.
I tried using what described here but didn't work.
Using awk:
$ awk -F, '{for (i=2;i<NF;i++)a[i]+=$i}END{for (i=2;i<NF;i++) printf a[i]",";print $NF}' file
9,3,11,10,lable1
This will print the sum of each column (from i=2 .. i=n-1) in a comma separated file followed the value of the last column from the last row (i.e. lable1).
If the totals would need to be grouped by the label in the last column, you could try this:
awk -F, '
{
L[$NF]
for(i=2; i<NF; i++) T[$NF,i]+=$i
}
END{
for(i in L){
s=i
for(j=NF-1; j>1; j--) s=T[i,j] FS s
print s
}
}
' file
If the labels in the last column are sorted then you could try without arrays and save memory:
awk -F, '
function labelsum(){
s=p
for(i=NF-1; i>1; i--) s=T[i] FS s
print s
split(x,T)
}
p!=$NF{
if(p) labelsum()
p=$NF
}
{
for(i=2; i<NF; i++) T[i]+=$i
}
END {
labelsum()
}
' file
Here's a Perl one-liner:
<file perl -lanF, -E 'for ( 0 .. $#F ) { $sums{ $_ } += $F[ $_ ]; } END { say join ",", map { $sums{ $_ } } sort keys %sums; }'
It will only do sums, so the first and last column in your example will be 0.
This version will follow your example output:
<file perl -lanF, -E 'for ( 1 .. $#F - 1 ) { $sums{ $_ } += $F[ $_ ]; } END { $sums{ $#F } = $F[ -1 ]; say join ",", map { $sums{ $_ } } sort keys %sums; }'
A modified version based on the solution you linked:
#!/bin/bash
colnum=6
filename="temp"
for ((i=2;i<$colnum;++i))
do
sum=$(cut -d ',' -f $i $filename | paste -sd+ | bc)
echo -n $sum','
done
head -1 $filename | cut -d ',' -f $colnum
Pure bash solution:
#!/usr/bin/bash
while IFS=, read -a arr
do
for((i=1;i<${#arr[*]}-1;i++))
do
((farr[$i]=${farr[$i]}+${arr[$i]}))
done
farr[$i]=${arr[$i]}
done < file
(IFS=,;echo "${farr[*]}")
Input file:
GET /static_register_ad_request_1_2037_0_0_0_1_1_4_8335086462.gif?pa=99439_50491&country=US&state_fips_code=US_CA&city_name=Los%2BAngeles&dpId=2&dmkNm=apple&dmlNm=iPod%2Btouch&osNm=iPhone%2BOS&osvNm=5.1.1&bNm=Safari&bvNm=null&spNm=SBC%2BInternet%2BServices&kv=0_0&sessionId=0A80187E0138A0AE42E4DE3F783E7A08&sdk_version=4.0.5.6%20&domain=805AOEtUaMu&ad_catalog=99439_50491&make=APPLE&width=320&height=460&slot_type=PREROLL&model=iPod%20touch%205.1.1&iabcat=artsandentertainment&iabsubcat=music&age=113&gender=2&zip=92869 HTTP/1.1
Output file:
domain sdk_version
805AOEtUaMu 4.0.5.6%20
I could use sed -n 's/.*sdk_version=\([^&]*\).*domain=\([^&]*\).*/\1 \2/p' to get the result, but sdk_version in first column, what I need is swap the sdk_version and domain columns in outputfile.
Could anyone help me with this? Thank you so much in advance:)
Just swap your backreferences:
sed -n 's/.*sdk_version=\([^&]*\).*domain=\([^&]*\).*/\2 \1/p'
One way using awk:
awk '
BEGIN {
FS = "&";
}
{
for ( i = 15; i <= 16; i++ ) {
split( $i, f, /=/ );
printf( "%s ", f[2] );
}
}
END {
printf "\n";
}
' infile
Output:
4.0.5.6%20 805AOEtUaMu
If you want to handle arbitrary order, I would suggest switching to awk or Perl.
perl -ne 'm/[?&]domain=([^&]+)/ && $d = $1;
m/[?&]sdk_version=([^&]+) && $s = $1;
print "$d\t$s\n"' logfile