Finding elements that appear in groups most - algorithm

Having trouble figuring out how to go about this algorithm.
Input: any number of lists each holding elements grouped by a common attribute
For example,
matched_by_first_name = {"bob" => [person, person, ...], "nancy" => [person, ...], ...}
matched_by_zip_code = {"12345" => [person, person, ...], "56789" => [person, ...], ...}
Output: List of groups of people that appear most frequently in the same groups, with separate "weightings" per input list. So, I might weight two people grouped by the same first name more than I would weight two people grouped by the same zip code.
In other words:
matches = [[person, person], [person], [person, person, person]]
Basically, if there are two persons and for every single grouping they are in the same group, then they should definitely be in the same final matched group. If there's only one group they're not in, then they should probably still be matched (depending on the weighting of that group type).

Related

How can l get a total sum from one table if it is linked with a lot of tables using linq?

l have multiple tables with one to may relationships eg Country -> Region -> Center -> Greater ->Section. The section has a column for census. l am trying to write a linq query for my view to get the total census grouped by the Country. l also need to know in a country how many regions are there, how many centers, how many greaters and the total census. They can be separate queries there is no problem.
In this case, the SelectMany method in LINQ is your friend. Each country has a collection of regions, right? You can use .SelectMany to combine all of the regions from several countries into a single collection of regions. Then you need to get a collection of centers from all of the regions, and so on and so on.
Consider this code:
context.Countries.SelectMany(country => country.Regions)
.SelectMany(region => region.Centers)
.SelectMany(center => center.Greaters)
.SelectMany(greater => greater.Sections)
.GroupBy(country => country.Id)
.Sum(section => section.Census);

substring matching in lucene with a list of items

lets say I have a field in my database: 'city'.
I have a list of cities that I would like to get results where the 'city' field is one of the cities in my list.
2 problems:
my list is long (around 3000 cities). is there a better way than:
city:"city1" OR city:"city2" OR ... OR city:"city3000"
sometimes the city is listed as part of a bigger string. so I want results where 'city:"city1"'
but also 'city:"my city is city1"' and 'city:"my city is city1 and its nice"' or 'city:"city1 is my city"'
I would imagine that using 'city:"*city1*"' would so but lucene does not support * in the beginning of the search term.

Right algorithm for grouping food items and ingredients?

Problem
Given N food items each containing a set of ingredients. There are a pool of M ingredients.
A group is formed consisting of food items and ingredients such that each of the food item in that group contains all of the ingredients in that group.
The problem is to create the groups using the foods and ingredients such that each ingredient and a food item is covered(There should be a group present corresponding to each mapping of food item and ingredient) with the constraint of minimising the number of groups created.
Example:
Input
N = 3, M = 3
Ingredients('a', 'b', 'c')
Food Item 1 containing ('a', 'b', 'c') ingredients.
Food Item 2 containing ('b', 'a') ingredients.
Food Item 3 containing ('a', 'b', 'c') ingredients.
Output
2 groups
Group 1: (Food Item 1, Food Item 2, Food Item 3)('a', 'b')
Group 2: (Food Item 1, Food Item 2)('c')
The solution that I thought of is to compute all subsequences of the ingredients, assign them to groups and add the appropriate food items in the group. But, this doesn't seem to be the right algorithm.
I think I can help with this, but it's not clear in your question what is "minimizing" for you and also what "..is covered" means for you.
In you example, if you considered only the first 2 groups that is
Food Item 1
Food Item 2
it seems that all food items are covered, and also all the ingredients , 'a'+'b' being in the first group and 'c' being in the second group. What am I missing here that you added the third group?
Thanks,
The easiest approach would be:
create a Group for every single ingredient.
put in these groups all Food items that contain that ingredient.
check if two different Groups contain the same Food items, join them together.
repeat until there are no two Groups that contain exactly same Food items.

Searching through a CSV with variable number of columns

I could use some help thinking about a puzzle I'm solving. I was going to tackle it in Ruby, but could be in another language like Javascript/Node. I'm needing help breaking out the problem and designing.
I'm currently working on a command-line program that reads that reads in a CSV, searches the CSV based on the arguments, and then produces output based on what it finds.
The CSV rows have one of two formats. One is simple is a list of restaurants, food items, and their prices:
restaurant ID, price, item label
But for restaurants offering combo meals, where there can be any number of
items in a value meal:
restaurant ID, price, item 1 label, item 2 label, ...
So the idea is that you could run this program with the arguments of the CSV file to read and the food items you want to eat, and it outputs the restaurant they should go to, and the total price it will cost them. It is okay to purchase extra items, as long as the total cost is minimized.
Sample data.csv
1, 4.00, burger
1, 8.00, tofu_log
2, 5.00, burger
2, 6.50, tofu_log
$ foodfinder.rb data.csv burger tofu_log
=> 2, 11.5
Likewise with the rows that have multiple food items:
5, 4.00, extreme_fajita
5, 8.00, fancy_european_water
6, 5.00, fancy_european_water
6, 6.00, extreme_fajita, jalapeno_poppers, extra_salsa
$ foodfinder.rb data.csv fancy_european_water extreme_fajita
=> 6, 11.0
Since data normalization isn't an option—I can't shove these into a DB—I was wondering how I might go about thinking about how to parse the CSV in an efficient way. Also that some rows have multiple food items has me unsure how to store those. I'm guessing I'd want to import the rows into a hash and then search through the hash in some fashion. Any guidance, wizards?
With Ruby, I'd skip the standard CSV libraries, and just load the rows, split them into, at most, three pieces, and convert the third into an array. From that point you have all you need:
records = file.map { |row|
row.split(/,\s?/, 3)
}.map { |arr|
[arr[0].to_i, arr[1].to_f, arr[2].split(/,\s?/)]
}
Now your records will be:
[
[5, 4.00, ["extreme_fajita"]],
[5, 8.00, ["fancy_european_water"]],
[6, 5.00, ["fancy_european_water"]],
[6, 6.00, ["extreme_fajita", "jalapeno_poppers", "extra_salsa"]]
]
You can use your knowledge on resolving NP-complete problems on this data, that has already the needed form.
That data is easy to parse with the standard CSV library and a tiny bit of array wrangling:
data = CSV.open('data.csv')
.map { |r| [ r[0], r[1], r[2..-1].map(&:strip) ] }
That gives you this in data:
data = [
['5', '4.00', ['extreme_fajita']],
['5', '8.00', ['fancy_european_water']],
#...
]
From there it is easy to build whatever indexed structure you need.
However, if you're just interested in finding rows with 'extra_salsa' then use select instead of map:
want = CSV.open('x.csv')
.select { |r| r[2..-1].map(&:strip).include?('extra_salsa') }
and clean up want for printing however you need to.
You're going to be spinning through the whole CSV every time your script runs so you should search it while you're scanning it, building intermediate indexed data structures is just a waste of time if you're only doing one search per run.

List within lists within lists

I have a list called Countries, and each country has a list of Towns, which in it's turn has a list of Streets. And a street has a number of houses. Lists within lists within lists. Very simple.
I need to generate a list of houses that are located in countries which names start with the letter 'A'. Not a very logical example, but it's easier to explain than the more complex structure I'm dealing with.
This is, of course, not too complex and could be done by creating a List and then ForEaching all countries.Where(Name.StartsWith('A')), then ForEaching all towns and finally adding each street in that town to the list.
I don't like that method so I want something prettier...
Could this be done by using something like Aggregate on the Countries.Where() list? If so, how? (Thus, in a single statement.)
Yes, the selection will be on the top list only, so that should make it easier.
This looks like a job for Enumerable.SelectMany (which allows you to ungroup one level of hierarchy):
List<County> countyList = GetCounties();
IEnumerable<County> aCounties = countyList
.Where(c => c.Name.StartsWith("A"));
List<House> aCountyHouses = aCounties
.SelectMany(c => c.Towns)
.SelectMany(t => t.Streets)
.SelectMany(s => s.Houses)
.ToList();

Resources