I want to group the columns then form subsequent group getting the count of last column values.
For example main Group A, Subgroup D, J , P and count of P in the subsequent groups as well as the total count of last column.
I am able to form groups but subgroup seems a little hard. Any help is appreciated like how to get this.
Input:
A,D,J,P
A,D,J,Q
A,D,K,P
A,D,K,P
A,E,J,Q
A,E,K,Q
A,E,J,Q
B,F,L,R
B,F,L,R
B,F,M,S
C,H,N,T
C,H,O,U
C,H,N,T
C,H,O,U
Output:
A D J P 1
         Q 1
      K P 2
A E J Q 2
      K Q 1
B F L R 2
      M S 1
C H N T 2
      O U 2
    Total 14
Here's a different approach, a shell script that uses sqlite to calculate the group counts (Requires 3.25 or newer because it uses window functions):
#!/bin/sh
file="$1"
sqlite3 -batch -noheader <<EOF
CREATE TABLE data(c1 TEXT, c2 TEXT, c3 TEXT, c4 TEXT);
.mode csv
.import "$file" data
.mode list
.separator " "
SELECT (CASE c1 WHEN lag(c1, 1) OVER (PARTITION BY c1 ORDER BY c1) THEN ' ' ELSE c1 END)
, (CASE c2 WHEN lag(c2, 1) OVER (PARTITION BY c1,c2 ORDER BY c1,c2) THEN ' ' ELSE c2 END)
, (CASE c3 WHEN lag(c3, 1) OVER (PARTITION BY c1,c2,c3 ORDER BY c1,c2,c3) THEN ' ' ELSE c3 END)
, c4
, count(*)
FROM data
GROUP BY c1, c2, c3, c4
ORDER BY c1, c2, c3, c4;
SELECT 'Total ' || count(*) FROM data;
EOF
Running this gives:
$ ./group.sh example.csv
A D J P 1
Q 1
K P 2
E J Q 2
K Q 1
B F L R 2
M S 1
C H N T 2
O U 2
Total 14
Also a one-liner using datamash, though it doesn't include the fancy output format:
$ datamash -st, groupby 1,2,3,4 count 4 < example.csv | tr , ' '
A D J P 1
A D J Q 1
A D K P 2
A E J Q 2
A E K Q 1
B F L R 2
B F M S 1
C H N T 2
C H O U 2
Using Perl
Script
perl -0777 -lne '
s/^(.+?)$/$x++;$kv{$1}++/mge;
foreach my $k (sort keys %kv)
{ $q=$c=$k;
while(length($p) > 0)
{
last if $c=~/^$p/g;
$q=substr($c,length($p)-1);
$p=~s/(.$)//;
}
printf( "%9s\n", "$q $kv{$k}") ;
$p=$k;
}
print "Total $x";
' anurag.txt
Output:
A,D,J,P 1
Q 1
K,P 2
E,J,Q 2
K,Q 1
B,F,L,R 2
M,S 1
C,H,N,T 2
O,U 2
Total 14
$ cat tst.awk
BEGIN { FS="," }
!($0 in cnt) { recs[++numRecs] = $0 }
{ cnt[$0]++ }
END {
for (recNr=1; recNr<=numRecs; recNr++) {
rec = recs[recNr]
split(rec,f)
newVal = 0
for (i=1; i<=NF; i++) {
if (f[i] != p[i]) {
newVal = 1
}
printf "%s%s", (newVal ? f[i] : " "), OFS
p[i] = f[i]
}
print cnt[rec]
tot += cnt[rec]
}
print "Total", tot+0
}
$ awk -f tst.awk file
A D J P 1
Q 1
K P 2
E J Q 2
K Q 1
B F L R 2
M S 1
C H N T 2
O U 2
Total 14
I'll propose a multi stage solution in the spirit of unix toolset.
create a sorted, counted, de-delimited data format
$ sort file | uniq -c | awk '{print $2,$1}' | tr ',' ' '
A D J P 1
A D J Q 1
A D K P 2
A E J Q 2
A E K Q 1
B F L R 2
B F M S 1
C H N T 2
C H O U 2
now, the task is removing the longest left common substring from consecutive lines
... | awk 'NR==1 {p=$0}
NR>1 {k=0;
while(p~t=substr($0,1,++k));
gsub(/./," ",t); sub(/^ /,"",t);
p=$0; $0=t substr(p,k)}1'
A D J P 1
Q 1
K P 2
E J Q 2
K Q 1
B F L R 2
M S 1
C H N T 2
O U 2
whether it's easier to understand than one script will be seen.
I have not exactly an answer that produces your example output but I'm close enough to dare posting an answer
Now I have an answer that produces exactly your example output... :-)
$ cat ABCD
A,D,J,P
A,D,J,Q
A,D,K,P
A,D,K,P
A,E,J,Q
A,E,K,Q
A,E,J,Q
B,F,L,R
B,F,L,R
B,F,M,S
C,H,N,T
C,H,O,U
C,H,N,T
C,H,O,U
$ awk '{a[$0]+=1}END{for(i in a) print i","a[i];print "Total",NR}' ABCD |\
sort | \
awk -F, '
/Total/{print;next}
{print a1==$1?" ":$1,a2==$2?" ":$2,a3==$3?" ":$3,a4==$4?" ":$4,$5
a1=$1;a2=$2;a3=$3;a4=$4}'
A D J P 1
Q 1
K P 2
E J Q 2
K 1
B F L R 2
M S 1
C H N T 2
O U 2
Total 14
$
The first awk script iterates on every line and at every line we increment the value of an array, a, element, indexed by the whole line value, next at the end (END target) we loop on the indices of a to print the index and the associated value, that is the count of the times we have that line in the data - eventually we output also the total number of lines processed, that is automatically updated in the variable NR, number of records.
The second awk script either prints the total line and skips any further processing or it compares each field (splitted on commas) with the corresponding field of the previous line and output the new field or a space accordingly.
If every letter in the following represents a name. What is the best way to sort them by how common the ancestors are?
A B C D
E F G H
I J K L
M N C D
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L
The result should be:
I J K L # Three names is more important that two names
W J K L
X J K L
A B C D # C D is repeated more than G H
M N C D
O P C D
Q R C D
E F G H
S T G H
U V G H
EDIT:
Names might have spaces in them (Double names).
Consider the following example where each letter represents a single word:
A B C D M
E F G H M
I J K L M
M N C D M
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L
The output should be:
A B C D M
M N C D M
I J K L M
E F G H M
W J K L
X J K L
O P C D
Q R C D
S T G H
U V G H
First count the number of occurrences for each chain. Then rank each name according to that count. Try this:
from collections import defaultdict
words = """A B C D
E F G H
I J K L
M N C D
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L"""
words = words.split('\n')
# Count ancestors
counters = defaultdict(lambda: defaultdict(lambda: 0))
for word in words:
parts = word.split()
while parts:
counters[len(parts)][tuple(parts)] += 1
parts.pop(0)
# Calculate tuple of ranks, used for sorting
ranks = {}
for word in words:
rank = []
parts = word.split()
while parts:
rank.append(counters[len(parts)][tuple(parts)])
parts.pop(0)
ranks[word] = tuple(rank)
# Sort by ancestor count, longest chain comes first
words.sort(key=lambda word: ranks[word], reverse=True)
print(words)
Here's how you could do it in Java - essentially the same method as #fafl's solution:
static List<Name> sortNames(String[] input)
{
List<Name> names = new ArrayList<>();
for (String name : input)
names.add(new Name(name));
Map<String, Integer> partCount = new HashMap<>();
for (Name name : names)
for (String part : name.parts)
partCount.merge(part, 1, Integer::sum);
for (Name name : names)
for (String part : name.parts)
name.counts.add(partCount.get(part));
Collections.sort(names, new Comparator<Name>()
{
public int compare(Name n1, Name n2)
{
for (int c, i = 0; i < n1.parts.size(); i++)
if ((c = Integer.compare(n2.counts.get(i), n1.counts.get(i))) != 0)
return c;
return 0;
}
});
return names;
}
static class Name
{
List<String> parts = new ArrayList<>();
List<Integer> counts = new ArrayList<>();
Name(String name)
{
List<String> s = Arrays.asList(name.split("\\s+"));
for (int i = 0; i < s.size(); i++)
parts.add(String.join(" ", s.subList(i, s.size())));
}
}
Test:
public static void main(String[] args)
{
String[] input = {
"A B C D",
"W J K L",
"E F G H",
"I J K L",
"M N C D",
"O P C D",
"Q R C D",
"S T G H",
"U V G H",
"X J K L" };
for (Name name : sortNames(input))
System.out.println(name.parts.get(0));
}
Output:
I J K L
W J K L
X J K L
A B C D
M N C D
O P C D
Q R C D
E F G H
S T G H
U V G H
I have been trying to write a program a simple program without importing any library. I simply want to print the following strings in this array in vertical form without using any complex algorithm. I will be glad if anyone can help me please.
['San Francisco', 'Christchurch ', 'Sydney ', 'Bangkok ', 'Copenhagen ']
This can be done using some built-in functions, like max(), len() and zip():
L = ['San Francisco', 'Christchurch ', 'Sydney ', 'Bangkok ', 'Copenhagen ']
max_length = len(max(L, key = lambda x : len(x)))
new_L = []
for e in L:
new_L.append(e + ' ' * (max_length - len(e)))
for e in zip(*new_L):
for el in e:
if el != ' ':
print el,
Output:
S C S B C a h y a o n r d n p i n g e F s e k n r t y o h a c k a n h g c u e i r n s c c h o
The lines:
new_L = []
for e in L:
new_L.append(e + ' ' * (max_length - len(e)))
can be written with list comprehension like:
new_L = [e + ' ' * (max_length - len(e)) for e in L]
Edit:
L = ['San Francisco', 'Christchurch ', 'Sydney ', 'Bangkok ', 'Copenhagen ']
# Get the maximum length of a string in the list
max_length = len(max(L, key = lambda x : len(x)))
#print max(L, key = lambda x : len(x)) # get the maximum of the list based in length
#print max_length
# Iterate through indices of max_length: 0, 1, 2, 3 ...
for i in range(max_length):
# Iterate through each city in the list
for city in L:
# If the index is < than the length of the city
if i < len(city) and city[i] != ' ':
print city[i],
How do I encode the request in UTF-16? Here's what I have:
# Create Savon client
#client = Savon::Client.new do
wsdl.document = File.expand_path("account_list.wsdl", __FILE__)
end
# Set header encoding
#client.http.headers["Content-Type"] = "text/xml;charset=UTF-16"
# Setup ssl configuration
#client.http.auth.ssl.cert_key_file = "cert_key_file.pem"
#client.http.auth.ssl.cert_file = "cert_file.pem"
#client.http.auth.ssl.ca_cert_file = "ca_cert_file.pem"
#client.http.auth.ssl.verify_mode=:none
# Execute request
response = #client.request :account_list do
soap.body = {
:id => "18615618"
}
end
Here's the begging of what's sent, notice the encoding="UTF-8":
Content-Type: text/xml;charset=UTF-16, SOAPAction: "accountList", Content-Length: 888 <?xml version="1.0" encoding="UTF-8"?><env:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema
Here's the error I get:
< s o a p : E n v e l o p e x m l n s : s o a p = " h t t p : / / s c h e m a s . x m l s o a p . o r g / s o a p / e n v e l o p e / " x m l n s : w s>d
< s o a p : B o d y >
< s o a p : F a u l t >
< f a u l t c o d e > s o a p : C l i e n t < / f a u l t c o d e >
< f a u l t s t r i n g > F a i l e d t o p r o c e s s S O A P r e q u e s t . S O A P b o d y n o t i n U T F - 1 6 .
< / f a u l t s t r i n g >
< d e t a i l >
< w s d l _ o p s : e r r o r > F a i l e d t o p r o c e s s S O A P r e q u e s t . S O A P b o d y n o t i n U T F - 1 6 .
< / w s d l _ o p s : e r r o r >
< / d e t a i l >
< / s o a p : F a u l t >
< / s o a p : B o d y >
< / s o a p : E n v e l o p e >
Savon currently only supports changing the XML directive tag via the integrated Builder-method:
response = #client.request(:account_list) do
soap.xml(:xml, :encoding => "UTF-16") { |xml| xml.id("18615618") }
end
You'll miss out a lot of XML-support by using this approach though. No SOAP envelope, no header or body:
<?xml version="1.0" encoding="UTF-16"?><id>18615618</id>
I'll use your ticket to come up with a better solution asap!