How do I remove entries in a tuple with duplicated properties?

How do I remove entries in a tuple with duplicated properties? - linq

I have a tuple dictionary consisting of three attributes: Name, Address, Phone.
Example:
Johnny Tarr, 1234 Gaelic Way, 555-402-9687
Patrick Murphy, 1234 Taylor Road, 555-555-5555
Patrick Murphy, 1234 Morrison Court, 555-555-5555
How do I remove the entries where two of the three properties are duplicated?
A failed attempt to iterate the collection:
for (int i = 0; i < fileList.Count - 1; i++)
{
for (int j = i + 1; j < fileList.Count; j++)
{
// Test Results: There are supposed to be 362 duplicates. Outputting only 225 entries. A mix of duplicates and not duplicates.
if (fileList[i].Item1.Equals(fileList[j].Item1, StringComparison.CurrentCultureIgnoreCase) && fileList[i].Item3.Equals(fileList[j].Item3, StringComparison.CurrentCultureIgnoreCase))
{
file.WriteLine(fileList[i].Item1 + "|" + fileList[i].Item2 + "|" + fileList[i].Item3);
}
}
}

var distincts = fileList.GroupBy(t => t.Item1 + "," + t.Item3)
.Where(g => g.Count() == 1)
.Select(g => g.Single());
foreach (var item in distincts)
{
Console.WriteLine(item);
}
This groups your tuples by Name/Phone, then keeps only groups that contain single tuples, and then selects that single tuple for the output list of distinct tuples.

Related

Generate all possible customer and account combinations

I am developing an application to store the relationship data of customers and accounts (w.r.t banking domain). Typically in banks a customer can have an account which is a sole account or have a joint account with another customer.
Eg 1: Customer C1 has a sole account A1.
Eg 2: Customer C1 and C2 have a joint account JA1 where C1 is primary holder and C2 is non-primary holder.
I am looking for an algorithm that will generate all possible combinations of relationships for a given number of customers and accounts.
For example: if the number of customers = 2 and number of accounts = 2, then the algorithm should generate the below entries.
Combination #1:
C1-A1-Primary
C1-A2-Primary
C2-A1-Non-Primary
C2-A2-Non-Primary
Combination #2:
C1-A1-Primary
C1-A2-Non-Primary
C2-A1-Non-Primary
C2-A2-Primary
Combination #3:
C1-A1-Non-Primary
C1-A2-Primary
C2-A1-Primary
C2-A2-Non-Primary
Combination #4:
C1-A1-Non-Primary
C1-A2-Non-Primary
C2-A1-Primary
C2-A2-Primary
Combination #5:
C1-A1-Sole
C1-A2-Primary
C2-A2-Non-Primary
Combination #6:
C1-A1-Sole
C1-A2-Non-Primary
C2-A2-Primary
Combination #7:
C1-A2-Sole
C1-A1-Primary
C2-A1-Non-Primary
Combination #8:
C1-A2-Sole
C1-A1-Non-Primary
C2-A1-Primary
Edit: This is not the complete list of combinations - but the algorithm is supposed to generate all those.

You have 2 problems to solve here:
Get all possible account types for N customers. You can do it like this:
const allAccounts = [];
for (let i = 1; i <= customersNumber; i++) {
allAccounts.push(C${i}-Sole);
for (let j = 1; j <= customersNumber; j++) {
if (i === j) continue;
allAccounts.push(C${i}-Primary C${j}-NonPrimary);
}
}
For 2 customers the result will be:
[
"C1-Sole",
"C1-Primary C2-NonPrimary",
"C2-Sole",
"C2-Primary C1-NonPrimary"
]
Get all possible combinations of length r (with repetitions) from this array. We want to exclude two types of combinations here:
ones that have 2 or more sole accounts for the same customer.
ones that are not connected (have no common customers, if I got you right)
// checks if two accounts are connected
function connected(customers1, customers2) {
return customers1.filter(cu => customers2.includes(cu)).length > 0;
}
// checks if acc1 and acc2 are the same Sole account
function sameSoleAccount(acc1, acc2) {
return acc1.type === 'Sole' && acc1 === acc2;
}
function printAccount(i, a) {
const c = a.customers;
return a.type === 'Sole' ? `${c[0]}-A${i}-Sole` : `${c[0]}-A${i}-Primary ${c[1]}-A${i}-NonPrimary`;
}
function combination(chosen, arr, index, r) {
if (index === r) {
const combination = chosen.map((c, i) => printAccount(i + 1, arr[c])).join(', ');
console.log(combination);
return;
}
for (let i = 0; i < arr.length; i++) {
if (chosen.length === 0 ||
chosen.some(ch => !sameSoleAccount(arr[ch], arr[i])
&& connected(arr[ch].customers, arr[i].customers))) {
const copy = chosen.slice();
copy[index] = i;
combination(copy, arr, index + 1, r);
}
}
}
function allPossibleCombinations(accountsNumber, customersNumber) {
const allAccounts = [];
for (let i = 1; i <= customersNumber; i++) {
allAccounts.push({customers: [`C${i}`], type: 'Sole'});
for (let j = 1; j <= customersNumber; j++) {
if (i === j) continue;
allAccounts.push({customers: [`C${i}`, `C${j}`], type: 'Joint'});
}
}
console.log(`All possible combinations for ${customersNumber} customers and ${accountsNumber} accounts: `);
combination([], allAccounts, 0, accountsNumber);
}
allPossibleCombinations(2, 2);

If you have limited relationship between accounts and customers:
1) create dict with:
dMapCustomer = {<nCustomerId>: [<nAccountId1>, <nAccountId2>]}
2) for each customer create all possible pairs, it is just
lCustomerPairs = [(nCustomerId, nAccountId1), (nCustomerId, nAccountId2), ...]
3) concat all pairs from step 2.
l = []
for nCustomer in lCustomer:
l += lCustomerPairs
If any account can be linked with any customer, then just:
lAccounts = [1,2,3]
lCustomers = [4,5,6]
list(product(lCustomers, lCustomers)) # all possible pairs of account and customer
Function product generates all possible pairs from two lists:
def product(l1, l2):
pools = [tuple(pool) for pool in [l1, l2]]
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)

Sort the sorted List after a second criterion

I have an List resultList and I want to sort the list after the content of the object[]. So I want to sort it first after the account 'row[11]' and then vehicle 'row[1]'. How can I do the second sort with Groovy?
Object[] row = (Object[]) resultList.get(i);
String account = row[11];
String vehicle = row[1];
Example
Account Vehicle
HKB 300 - PB
HKV 400 - PDAAA
HMN 200 - PBC200
HZA 155 - PCL
HZA 160 - PGA100
HZAB 165 - PGA100
HZAC 170 - PGA100
Code
int execute(List <Object[]> resultList) {
Object[] row = null;
resultList = resultList.sort{ a,b -> a[11] <=> b[11]};
//Then here I want to sort the sorted resultList after the vehicle also second group.
for (int i=resultList.size()-1; i >= 0; i--) {
row = (Object[]) resultList.get(i);
String account = row[11];
String vehicle = row[1];
}
return 1;
}

Think you just need to replace:
resultList = resultList.sort{ a,b -> a[11] <=> b[11]};
with
resultList = resultList.sort { a, b ->
a[11] <=> b[11] ?: a[1] <=> b[1]
}

Google Spreadsheet - Convert multiple columns to one column

I want to loop through a set of rows in a Google Spreadsheet that look like this:
XXX 123 234 234
YYY 789 098 765
ZZZ 76 123 345
End Result Needs to Be:
XXX: 123
XXX: 234
XXX: 234
YYY: 789
YYY: 098
etc.
My current code:
function loopshoplocations(){
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getRange('A4:A8').getValues();
var i=0;
for(i = 0; i < 4; i++){
return ('Shop Location: ' + data[i][0]);
}}

Alternatively with a formula
=ArrayFormula( transpose(split(query(rept(left(A2:A, 3)&" ", 3),,50000), " "))&": "
&transpose(split(query(regexreplace(A2:A, "^(.+?)\s",""),,50000), " ")))
Also see this screenshot:

This code assumes that there is a header row on row one. The data gets appended at the end of the sheet. If you want something different, the code would need to be adjusted.
function loopshoplocations() {
var data,L,outerArray,innerArray,numberOfRows,sheet,startRow;
startRow = 2;
sheet = SpreadsheetApp.getActiveSheet();
numberOfRows = sheet.getLastRow();
data = sheet.getRange(startRow,1,numberOfRows-startRow+1,4).getValues();
L = data.length;
//Logger.log(data);
outerArray = [];
var i,j;
for(i = 0; i < L; i++){
for (j=1; j<4 ; j+=1) {//Begin count at one, not zero
innerArray = [];//Reset
innerArray.push(data[i][0]);
innerArray.push(data[i][j]);
//Logger.log(innerArray)
outerArray.push(innerArray);
};
};
sheet.getRange(numberOfRows+1,1,outerArray.length,outerArray[0].length).setValues(outerArray);
};

Linq - return index of collection using conditional logic

I have a collection
List<int> periods = new List<int>();
periods.Add(0);
periods.Add(30);
periods.Add(60);
periods.Add(90);
periods.Add(120);
periods.Add(180);
var overDueDays = 31;
I have a variable over due days. When the vale is between 0 to 29 then I want to return the index of 0. When between 30 - 59 I want to return index 1. The periods list is from db so its not hard coded and values can be different from what are here. What is the best way to to it using LINQ in one statement.

It's not really what Linq is designed for, but (assuming that the range is not fixed) you could do the following to get the index
List<int> periods = new List<int>();
periods.Add(0);
periods.Add(30);
periods.Add(60);
periods.Add(90);
periods.Add(120);
periods.Add(180);
var overDueDays = 31;
var result = periods.IndexOf(periods.First(n => overDueDays < n)) - 1;

You can use .TakeWhile():
int periodIndex = periods.TakeWhile(p => p <= overDueDays).Count() - 1;

how about this ?
var qPeriods = periods.Where(v => v <= overDueDays)
.Select((result, i) => new { index = i })
.Last();

Assuming that periods is sorted, you can use the following approach:
var result = periods.Skip(1)
.Select((o, i) => new { Index = i, Value = o })
.FirstOrDefault(o => overDueDays < o.Value);
if (result != null)
{
Console.WriteLine(result.Index);
}
else
{
Console.WriteLine("Matching range not found!");
}
The first value is skipped since we're interested in comparing with the upper value of the range. By skipping it, the indices fall into place without the need to subtract 1. FirstOrDefault is used in case overDueDays doesn't fall between any of the available ranges.

How to match a column name and find out the column position in awk?

I am trying to parse some csv files using awk. I am new to shell scripting and awk.
The csv file i am working on looks something like this :
fnName,minAccessTime,maxAccessTime
getInfo,300,600
getStage,600,800
getStage,600,800
getInfo,250,620
getInfo,200,700
getStage,700,1000
getInfo,280,600
I need to find the average AccessTimes of the different functions.
I have been working with awk and have been able to get the average times provided the exact column numbers are specified like $2, $3 etc.
However I need to have a general script in which if i input "minAccessTime" in the command argument, I need the script to print the average AccessTime (instead of explicitly specifying $2 or $3 while using awk).
I have been googling about this and saw in various forums but none of them seems to work.
Can someone tell me how to do this ? It would be of great help !
Thanks in advance!!

This awk script should give you all that you want.
It first evaluates which column you're interested in by using the name passed in as the COLM variable and checking against the first line. It converts this into an index (it's left as the default 0 if it couldn't find the column).
It then basically runs through all other lines in your input file. On all these other lines (assuming you've specified a valid column), it updates the count, sum, minimum and maximum for both the overall data plus each individual function name.
The former is stored in count, sum, min and max. The latter are stored in associative arrays with similar names (with _arr appended).
Then, once all records are read, the END section outputs the information.
NR == 1 {
for (i = 1; i <= NF; i++) {
if ($i == COLM) {
cidx = i;
}
}
}
NR > 1 {
if (cidx > 0) {
count++;
sum += $cidx;
if (count == 1) {
min = $cidx;
max = $cidx;
} else {
if ($cidx < min) { min = $cidx; }
if ($cidx > max) { max = $cidx; }
}
count_arr[$1]++;
sum_arr[$1] += $cidx;
if (count_arr[$1] == 1) {
min_arr[$1] = $cidx;
max_arr[$1] = $cidx;
} else {
if ($cidx < min_arr[$1]) { min_arr[$1] = $cidx; }
if ($cidx > max_arr[$1]) { max_arr[$1] = $cidx; }
}
}
}
END {
if (cidx == 0) {
print "Column '" COLM "' does not exist"
} else {
print "Overall:"
print " Total records = " count
print " Sum of column = " sum
if (count > 0) {
print " Min of column = " min
print " Max of column = " max
print " Avg of column = " sum / count
}
for (task in count_arr) {
print "Function " task ":"
print " Total records = " count_arr[task]
print " Sum of column = " sum_arr[task]
print " Min of column = " min_arr[task]
print " Max of column = " max_arr[task]
print " Avg of column = " sum_arr[task] / count_arr[task]
}
}
}
Storing that script into qq.awk and placing your sample data into qq.in, then running:
awk -F, -vCOLM=minAccessTime -f qq.awk qq.in
generates the following output, which I'm relatively certain will give you every possible piece of information you need:
Overall:
Total records = 7
Sum of column = 2930
Min of column = 200
Max of column = 700
Avg of column = 418.571
Function getStage:
Total records = 3
Sum of column = 1900
Min of column = 600
Max of column = 700
Avg of column = 633.333
Function getInfo:
Total records = 4
Sum of column = 1030
Min of column = 200
Max of column = 300
Avg of column = 257.5
For `maxAccessTime, you get:
Overall:
Total records = 7
Sum of column = 5120
Min of column = 600
Max of column = 1000
Avg of column = 731.429
Function getStage:
Total records = 3
Sum of column = 2600
Min of column = 800
Max of column = 1000
Avg of column = 866.667
Function getInfo:
Total records = 4
Sum of column = 2520
Min of column = 600
Max of column = 700
Avg of column = 630
And, for xyzzy (a non-existent column), you'll see:
Column 'xyzzy' does not exist

If I understand the requirements correctly, you want the average of a column, and you'd like to specify the column by name.
Try the following script (avg.awk):
BEGIN {
FS=",";
}
NR == 1 {
for (i=1; i <= NF; ++i) {
if ($i == SELECTED_FIELD) {
SELECTED_COL=i;
}
}
}
NR > 1 && $1 ~ SELECTED_FNAME {
sum[$1] = sum[$1] + $SELECTED_COL;
count[$1] = count[$1] + 1;
}
END {
for (f in sum) {
printf("Average %s for %s: %d\n", SELECTED_FIELD, f, sum[f] / count[f]);
}
}
and invoke your script like this
awk -v SELECTED_FIELD=minAccessTime -f avg.awk < data.csv
or
awk -v SELECTED_FIELD=maxAccessTime -f avg.awk < data.csv
or
awk -v SELECTED_FIELD=maxAccessTime -v SELECTED_FNAME=getInfo -f avg.awk < data.csv
EDIT:
Rewritten to group by function name (assumed to be first field)
EDIT2:
Rewritten to allow additional parameter to filter by function name (assumed to be first field)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I remove entries in a tuple with duplicated properties? - linq

Related

Generate all possible customer and account combinations

Sort the sorted List after a second criterion

Google Spreadsheet - Convert multiple columns to one column

Linq - return index of collection using conditional logic

How to match a column name and find out the column position in awk?

Categories

Resources