A difference between Unicode and ASCII operators - utf-8

I found out that Unicode and ASCII operators sometimes work differently when quote-interpolated.
Consider this:
$ perl6 -e'my $a = BagHash.new: <a a a a b b b c c c c c d>;for $a.keys -> $k { say "$k => $a<<$k>>" }'
d => 1
b => 3
c => 5
a => 4
and this:
$ perl6 -e'my $a = BagHash.new: <a a a a b b b c c c c c d>;for $a.keys -> $k { say "$k => $a«$k»" }'
c => c(5) a(4) b(3) d«c»
a => c(5) a(4) b(3) d«a»
b => c(5) a(4) b(3) d«b»
d => c(5) a(4) b(3) d«d»
But this works even when using an Unicode operator:
$ perl6 -e'my $a = BagHash.new: <a a a a b b b c c c c c d>;for $a.keys -> $k { say "$k => {$a«$k»}" }'
d => 1
b => 3
a => 4
c => 5
Is this a bug, or there's an explanation I can't see?

Seems to be fixed with commit 2835 from MasterDuke17:
sub bracket_ending($matches) {
my $check := $matches[+$matches - 1];
my str $str := $check.Str;
my $last := nqp::substr($str, nqp::chars($check) - 1, 1);
- $last eq ')' || $last eq '}' || $last eq ']' || $last eq '>'
+ $last eq ')' || $last eq '}' || $last eq ']' || $last eq '>' || $last eq '»'
}

Related

Print the entire row which has difference in value while compare the columns

I want to print the entire row whose value dont match
EG :
Symbol Qty Symbol Qty Symbol qty
a 10 a 10 a 11
b 11 b 11 b 11
c 12 c 12 f 13
f 12 f 12 g 13
OUTPUT :
a 10 a 10 a 11
c 12 c 12 (empty Space)
f 12 f 12 f 13
empty space {ES} g 13
awk 'FNR==NR{a[$0];next}!($0 in a ) ' output1.csv output2.csv >> finn1.csv
awk 'FNR==NR{a[$0];next}!($0 in a ) ' finn1.csv output4.csv >> finn.csv
but this prints all in one column that is missing
Like a 11, but I require the whole line
Assuming that you only want to test for mismatched Qty fields, try this:
#!/bin/bash
declare input_file="/path/to/input_file"
declare -i header_flag=0 a b c
while read line; do
[ ${header_flag} -eq 0 ] && header_flag=1 && continue # Ignore first line.
[ ${#line} -eq 0 ] && continue # Ignore blank lines.
read x a x b x c x <<< ${line} # Reuse ${x} because it is not used.
[ ${a} -ne ${b} -o ${a} -ne ${c} -o ${b} -ne ${c} ] && echo ${line}
done < ${input_file}
The awk one-liner
awk '!($1 == $3 && $2 == $4 && $3 == $5 && $4 == $6)' file
will output
Symbol Qty Symbol Qty Symbol qty
a 10 a 10 a 11
c 12 c 12 f 12
f 12 f 12 g 13
You're going about this the wrong way: you can't mash up all the files into one and then try to find which ones have different/missing values. You need to process the individual files
$ cat file1
Symbol Qty
a 10
b 11
c 12
f 12
$ cat file2
Symbol Qty
a 10
b 11
c 12
f 12
$ cat file3
Symbol qty
a 11
b 11
f 13
g 13
Then
assuming you have GNU awk
gawk '
FNR > 1 { qty[$1][FILENAME] = $1 " " $2 }
END {
OFS = "\t"
for (sym in qty) {
missing = !((ARGV[1] in qty[sym]) && (ARGV[2] in qty[sym]) && (ARGV[3] in qty[sym]))
unequal = !(qty[sym][ARGV[1]] == qty[sym][ARGV[2]] && qty[sym][ARGV[1]] == qty[sym][ARGV[3]])
if (missing || unequal) {
print qty[sym][ARGV[1]], qty[sym][ARGV[2]], qty[sym][ARGV[3]]
}
}
}
' file{1,2,3}
outputs
a 10 a 10 a 11
c 12 c 12
f 12 f 12 f 13
g 13

How to Pivot Data Using AWK

From:
DT X Y Z
10 75 0 3
20 100 1 6
30 125 2 9
To:
DT ID VALUE
10 X 75
20 Y 0
30 Z 3
10 X 100
20 Y 1
30 Z 6
10 X 125
20 Y 2
30 Z 9
it's done
#my original dataset is separated by "," and have 280 cols
tempfile=dataset.csv;
col_count=`head -n1 $tempfile | tr -cd "," | wc -c`;
col_count=`expr $col_count + 1`;
for i in `seq 4 $col_count`; do
echo $i;
pt="{print \$"$i"}";
col_name=`head -n 1 $tempfile | sed s'/ //'g | awk -F"," "$pt"`;
awk -F"," -v header="DT,ID,$col_name" -f st.awk $tempfile | awk 'NR>1 {print substr($0,index($0,$1))",'"$col_name"'"}' | sed 's/ //g' >> New$tempfile;
done;
# file st.awk:
# the code below was found on some stackoverflow page, with some minor changes
BEGIN {
# Parse headers into an assoc array h
split(header, a, ",")
for(i in a) {
h[a[i]]=2
}
}
# Find the column numbers in the first line of a file
FNR==1{
split("", cols) # This will re-init cols
for(i=1;i<=NF;i++) {
if($i in h) {
cols[i]=1
}
}
next
}
# Print those columns on all other lines
{
res = ""
for(i=1;i<=NF;i++) {
if(i in cols) {
s = res ? OFS : ""
res = res "," $i
}
}
if (res) {
print res
}
}
You can try this awk (MAWK Version 1.2)
Your data can been 5x5 or more
mawk -v OFS='\t' '
NR==1 {
nbfield=(NF-1)
for(i=1;i<NF;i++)
ID[i]=$(i+1)
print $1 OFS "ID" OFS "VALUE"
next
}
{
numrecord=((NR-1)%nbfield)
numrecord = numrecord ? numrecord : nbfield
for(i=0;i<=nbfield;i++)
val[ID[i],numrecord]=$(i+1)
}
numrecord==nbfield {
for(i=1;i<=nbfield;i++)
for(j=1;j<=nbfield;j++)
print val[ID[0],j] OFS ID[j] OFS val[ID[j],i]
}
' infile
Input:
-- ColOne ColTwo ColThr
RowOne A B C D E
RowTwo F G H I J
RowThr K L M N O
RowFor P Q R S T
RowFiv U V W X Y
Output:
RowNbr | ColNbr | RowColVal
------ | ------ | ---------
RowOne | ColOne | A
RowOne | ColTwo | B
RowOne | ColThr | C
RowTwo | ColOne | F
RowTwo | ColTwo | G
RowTwo | ColThr | H
RowThr | ColOne | K
RowThr | ColTwo | L
RowThr | ColThr | M
Pivot script:
# pivot a table
BEGIN { # before processing innput lines, emit output header
OFS = " | " # set the output field-separator
fmtOutDtl = "%6s | %-6s | %-9s" "\n" # set the output format for all detail lines: InpRowHdr, InpColHdr, InpVal
fmtOutHdr = "%6s | ColNbr | RowColVal" "\n" # set the output format for the header line
strOutDiv = "------ | ------ | ---------" # set the divider line
print "" # emit blank line before output
} # done with output header
NR == 1 { # when we are on the innput header line / the first row
FldCnt = ( NF - 1 ) # number of columns to process is number of fields on this row, except for the first val
for( idxCol = 1; idxCol < NF; idxCol++ ) # scan col numbers after the first, ignoring the first val
ColHds[ idxCol ] = $( idxCol + 1 ) # store the next col-val as this ColHdr
printf( fmtOutHdr, "RowNbr" ) # emit header line: RowNbr-header, innput column headers
print strOutDiv # emit divider row after header line and before data lines
next # skip to the next innput row
} # done with first innput row
{ # for each body innput row
RecNbr = ( ( NR - 1 ) % FldCnt ) # get RecNum for this row: ( RecNum - 1 ) Mod [number of fields]: zero-based / 0..[number_of_cols-1]
RecNbr = RecNbr ? RecNbr : FldCnt # promote from zero-based to one-based: 0 => [number of fields]: one -based / 1..[number_of_cols ]
for( idxCol = 0; idxCol <= FldCnt; idxCol++ ) # scan col numbers including the first
Rws[ ColHds[ idxCol ], RecNbr ] = $( idxCol + 1 ) # store this row+col val in this Row position under this ColHdr
} # done with this body innput row
RecNbr == FldCnt { # when we are on the last innput row that we are processing (lines beyond FldCnt are not emitted)
for( idxCol = 1; idxCol <= FldCnt; idxCol++ ) { # scan col numbers after the first
for( idxRow = 1; idxRow <= FldCnt; idxRow++ ) { # scan row numbers after the first, up to number of cols
printf( fmtOutDtl \
,Rws[ ColHds[ 0 ] , idxCol ] \
, ColHds[ idxRow ] \
,Rws[ ColHds[ idxRow ] , idxCol ] ) # emit innput rowHdr, colHdr, row+col val
} # done scanning row numbers
print "" # emit a blank line after each innput row
} # done scanning col numbers
} # done with the last innput row
END { # after processing innput lines
} # do nothing

Break one column into several columns everytime you see a pattern

I have a quite simple question, but I find it hard to solve this problem.
I have two quite long column of data, and i want to separate it into several columns. the script should start writing data into a new column, each time it finds a specific string in the first column:
input:
A B
1 C
2 C
3 C
4 C
A D
1 D
2 D
3 D
4 D
output:
A B A D
1 C 1 D
2 C 2 D
3 C 3 D
4 C 4 D
(the separating pattern is A)
You can do this using single awk:
awk 'NR>1 && /^A/{p=1} {if (p) print a[++i], $0; else a[NR]=$0}' OFS='\t' file
A B A D
1 C 1 D
2 C 2 D
3 C 3 D
4 C 4 D
awk with paste:
$ awk '$1 == "A" { ++n } { print > ("t.tmp." n) }' input.txt
$ ls t.tmp.*
t.tmp.1 t.tmp.2
$ paste t.tmp.*
A B A D
1 C 1 D
2 C 2 D
3 C 3 D
4 C 4 D
EDIT
More efficient (only build the file name once for each group) and more robust (avoid the chance of having too many open files by closing them as we go) --- thanks, Ed Morton:
awk '$1 == "A" { close(out); out = "t.tmp." ++n} { print > out }' input.txt
(Above assumes first record contains pattern. If not, can initialize out in a BEGIN block.)
Using csplit and paste
$ csplit -zsf file infile.txt '/A/' {*}
$ paste file*
A B A D
1 C 1 D
2 C 2 D
3 C 3 D
4 C 4 D
From man csplit
csplit - split a file into sections determined by context lines
-z, --elide-empty-files
remove empty output files
-s, --quiet, --silent
do not print counts of output file sizes
-f, --prefix=PREFIX
use PREFIX instead of 'xx'
{*} repeat the previous pattern as many times as possible
using gnu awk multiline records - works for any number of occurrences of pattern - assumes equal length columns
pat=A
awk -vpat=$pat -F'\n' '
BEGIN {RS="(^|\n)"pat" "}
NR>1{
nr=NR-2
fld[nr][0]=pat" "$1
for(i=2; i<=NF; ++i)
fld[nr][i-1]=$i
}
END {
for(i=0; i < NF; ++i) {
for(j=0; j < NR-1; ++j)
printf("%s%s", j?"\t":"", fld[j][i])
printf("\n")
}
}
'
input
A B
1 C
2 C
3 C
4 C
A D
1 D
2 D
3 D
4 D
A X
1 X
3 X
5 X
7 X
output
A B A D A X
1 C 1 D 1 X
2 C 2 D 3 X
3 C 3 D 5 X
4 C 4 D 7 X
If you're reading this and wondering why it got downvoted, it's just some clown being childish because I pointed out some problems with and ways they could improve their previous answer, the downvote has nothing to do with the technical merits of this answer. This is the idiomatic awk solution to this problem.
$ awk -v OFS='\t' '
$1 == "A" { numRows=0; ++numCols }
{ val[++numRows,numCols] = $0 }
END {
for (rowNr=1;rowNr<=numRows;rowNr++) {
for (colNr=1;colNr<=numCols;colNr++) {
printf "%s%s", val[rowNr,colNr], (colNr<numCols ? OFS : ORS)
}
}
}
' file
A B A D
1 C 1 D
2 C 2 D
3 C 3 D
4 C 4 D

Nested for loop - output once

I have a nested for loop to print one letter each from each variable.
for i in a b ; do for j in 1 2; do echo "$i $j"; done; done
a 1
a 2
b 1
b 2
My requirement is to have as
a 1
b 2
How do I get it ?
letters=(a b c d) # declare an array with four elements
numbers=(1 2 3 4)
for ((i=0;i<${#letters[#]};i++)); do echo ${letters[$i]} ${numbers[$i]}; done
Output:
a 1
b 2
c 3
d 4
${#letters[#]} is the number of elements in array letters.
You can also do the same using regular variables and string indexes:
#!/bin/bash
letters="abcdefghi"
nums="123456789"
for ((i = 0; i < ${#nums}; i++)); do
printf "%s %s\n" ${letters:i:1} ${nums:i:1}
done
Output
$ bash prnidx.sh
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9

Unix Command (Mac OS): cut and move rows

Could you please give me a hint which unix command I can use to do the following:
I want to convert these lines...
1 a i
2 b ii
3 c iii
4 d iv
5 e v
6 f vi
7 g vii
8 h viii
9 i xi
...into those:
1 a i 4 d iv 7 g vii
2 b ii 5 e v 8 h viii
3 c iii 6 f vi 9 i xi
rsand perl -pne just transpose them but I need a completely new arrangement as you see. Perl-code would be favored, but I am thankful for any help.
cheers
marsch
Using a perl one-liner
perl -lne 'push #{$l[($.-1) % 3]}, $_; }{ print "#$_" for #l' data.txt | column -t
Explanation:
Switches:
-l: Enable line ending processing, specifies line terminator
-n: Creates a while(<>){..} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
Code:
push #{$l[($.-1) % 3]}, $_;: Push each line into an array modulo the line number
}{ print "#$_" for #l: Print the 3 element array at end of processing
| column -t: Even out the columns
I would go with split and paste from coreutils. Try the following commands:
split -l3 infile
paste -d' ' xaa xab xac | column -t
Output:
1 a i 4 d iv 7 g vii
2 b ii 5 e v 8 h viii
3 c iii 6 f vi 9 i xi
Here is a oneliner:
perl -ne 'chomp; push #a,$_ if $_; unless($. % 3) {push #f,[#a]; #a = undef; shift #a} END {for my $i (#f) { for (#$i) {print "$_ "} print "\n"}}' filename.txt
output
1 a i 2 b ii 3 c iii
4 d iv 5 e v 6 f vi
7 g vii 8 h viii 9 i xi
I use ruby
string = "1 a i
2 b ii
3 c iii
4 d iv
5 e v
6 f vi
7 g vii
8 h viii
9 i xi "
ary = string.split("\n")
length = ary.size / 3
new_ary = Array.new(3, "")
ary.each_with_index do |e, i|
position = i % 3
new_ary[position] += e
end
puts new_ary.join("\n")
Hope to help:)

Resources