Using multiple OR conditions in ArrayFormula - google-sheets-formula

I've a column of names
Smith John Sr
Smith John R
Smith Jr John L
Smith III John
Smith John IV
I know the Last Name is always the first word.
I know the First is sometimes in the second word.
However, sometimes "Jr" or "III" may be there instead.
Otherwise, the title can be in the fourth word if it is not the middle initial.
How do I use an arrayformula with multiple OR conditions so that I can extract into a second column the First Name, The Last Name, and the Applicable Title (e.g. Jr, Sr, III, IV). I want to disregard the initial which I figured I could use a character count "=1" within the OR logic, correct?

If you already have your working formula, you can use + to join conditions that are meant to be at least one of them met (like OR) and * to join conditions that are meant to be together (like AND). For example:
=IF(OR(A1=1,B1=1),1,0)
As an array you could do it:
=ArrayFormula(IF((A1:A=1)+(B1:B=1),TRUE,FALSE))
Since TRUE conditions are considered as 1 and FALSE conditions as 0, when you sum two conditions it will return 1 or more if one of them is TRUE; and with that positive result, it will apply the true statement of the IF. Hope it's useful

Related

Most common "denominators" in a two column list in Google Sheets

How can I find the most commonly found 'Code' (Col B) associated with each unique 'Name' in (Col A) and find the closest value if the 'Code' in Col B is unique?
The image below shows the shared google sheet with Starting data in Columns A & B and the desired output columns in columns C and D. Each Unique Name has associated codes. Column D displays the most commonly occuring Code for each unique name. For example, Buick La Sabre 1 has 3 associated codes in B3,B4,B5 but in D3 only 98761 because it appears more frequently than the other 2 codes do in B2:B. I will explain what I mean by the closest value below.
The Codes that have a count = 1 are unique so the output in column D tries to find the closest match.
However, when the count of the code in B2:B > 1, then the output in column D = to the most frequent code associated with the Name.
Approach when there is 2 or more of the same values in column B
Query
I thought I might use a QUERY with a ORDER BY count(B) DESC LIMIT 2 in a fashion similar to this working equation:
QUERY($A$1:$D$25,"SELECT A, B ORDER BY B DESC Limit 2",1)
but I could not get it to work when I substituted in the Count function.
SORT & INDEX OR VLOOKUP
If the query function can't be fixed to work, then I thought another approach might be to combine a Vlookup/Index after sorting column B in a descending order.
UNIQUE(sort($B$3:$B,if(len($B$3:$B),countif($B$3:$B,$B$3:$B),),0,1,1))
Since a Vlookup or Index using multiple criteria would just pull the first value it finds, you would just end up with the first matching value, we would then get the most frequent value.
Approach when there is < 2 of the same values in column B
This is a little more complicated since the values can be numbers and letters.
A solution like that seen in the image below could be used if everything were a number. In our case there will usually be between 3 - 5 character alphanumeric code starting with 0 - 1 letters numbers and followed by numbers. I'm not sure what the best way to match a code like A1234 would be. I imagine a solution might be to SPLIT off letters and trying to match those first. For example A1234 would be split into A | 1234, then matching the closest letter and then the closest number. But I really am not sure what the best solution to this might be that works within the constraints of Google Sheets.
In the event that a number is equidistant between two numbers, the lower number should be chosen. For example, if 8 is the number and the closest match would be 6 or 10, then 6 should be selected.
In the event that a letter is being used it should work in a similar fashion. For example, thinking of {A, B, C} as {1, 2, 3}, B should preferrentially match to A since it comes before C.
In summary, looking for a way to find the most frequently associated code in col B that is associated with unique names in col A in this sheet and; In the event where there are none of the same codes in B2:B, a formula that will find the closest match for a number or alphanumeric code.
You can use this formula:
=QUERY({range of numerators & denominators}, "select Col2, count(Col2) group by Col2 label Col2 'Denominator', count(Col2) 'Count'")
That outputs something like this:
Denominator
Count
Den 1
Count 1
Den 2
Count 2
use:
=ARRAY_CONSTRAIN(SORTN(QUERY({A3:B},
"select Col1,Col2,count(Col2)
where Col1 is not null
group by Col1,Col2
order by count(Col2) desc,Col2 asc
label count(Col2)''"), 9^9, 2, 1, 1), 9^9, 2)

Prolog, strange answer of setof

I'm using the online compiler https://swish.swi-prolog.org/
Given the next facts:
frontier(spain,france).
frontier(spain,portugal).
frontier(portugal,spain).
frontier(france,spain).
frontier(france,italy).
frontier(france,germany).
frontier(france,belgium).
frontier(france,swiztland).
frontier(belgium,netherlands).
frontier(belgium,france).
frontier(belgium,germany).
frontier(netherlands,germany).
frontier(netherlands,belgium).
frontier(germany,netherlands).
frontier(germany,belgium).
frontier(germany,france).
frontier(germany,austria).
frontier(germany,swiztland).
frontier(austria,germany).
frontier(austria,swiztland).
frontier(austria,italy).
frontier(swiztland,austria).
frontier(swiztland,france).
frontier(swiztland,germany).
frontier(swiztland,italy).
frontier(italy,france).
frontier(italy,swiztland).
frontier(italy,austria).
I would like to obtain all of the countries but without obtain repeated ones.
Thus, I use a setof predicate, which avoids the repeated, like this:
setof(Country, (frontier(Country,_)), Countries).
The problem is that, when I executed the query, I obtained some iterations:
[germany, italy, swiztland]
[france, germany, netherlands]
[belgium, germany, italy, spain, swiztland],
[austria, belgium, france, netherlands, swiztland]
[austria, france, swiztland]
[belgium, germany]
[spain]
[france, portugal]
[austria, france, germany, italy]
I don't understand why, I was expected that the list Countries return me the list of all the countries without repeated ones and sorted, that's why I use the anonymus variable in the second argument of the predicate frontier, because I don't care about the second argument, only I want the first argument without repeated ones.
Any help?

Prolog, print employees with same names

This is my first time using Prolog.
I have employees:
employee(eID,firstname,lastname,month,year).
I have units:
unit(uID,type,eId).
I want to make a predicate
double_name(X).
that prints the last names of the employees with the same first name in the unit X.
I am doing something like this :
double_name(X) :-
unit(X,_,_eID),
employee(_eID,_firstname,_,_,_),
_name = _firstname,
employee(_,_name,_lastname,_,_),
write(_lastname).
But it prints all the employees in the unit.
How can i print only the employees with the same name ?
unit(unit_01,type,1).
unit(unit_01,type,2).
unit(unit_01,type,3).
employee(1,mary,smith,6,1992).
employee(2,fred,jones,1,1990).
employee(3,mary,cobbler,2,1995).
double_name(Unit) :-
unit(Unit,_,Eid_1),
employee(Eid_1,Firstname,Lastname_1,_,_),
unit(Unit,_,Eid_2),
Eid_1 \= Eid_2,
employee(Eid_2,Firstname,Lastname_2,_,_),
write(Firstname),write(","),write(Lastname_1),nl,
write(Firstname),write(","),write(Lastname_2).
Variables in Prolog typically start with an upper case letter, but starting them with and underscore is allowed, but not typical.
In double_name/2 the predicates like
unit(Unit,_,Eid_1)
employee(Eid_1,Firstname,Lastname_1,_,_)
are used to load the values from the facts into variables while pattern matching (via unification) that the bound variables match with the fact.
To ensure that a person is not compared with themselves.
Eid_1 \= Eid_2
and to make sure that two people have the same first name the same variable is used: Firstname.
The write/1 and nl/0 predicates just write the result to the screen.
Example:
?- double_name(unit_01).
mary,smith
mary,cobbler
true ;
mary,cobbler
mary,smith
true ;
false.
Notice that the correct answer is duplicated. This can be resolved.
See: Prolog check if first element in lists are not equal and second item in list is equal
and look at the use of normalize/4 and setof/3 in my answer
which I leave as an exercise for you.

Remove duplicate words from a address using oracle pl/sql

Remove duplicate words from a address using oracle pl/sql:
There are two types of addresses will be there, below is the example,
1. '3 Mayers Court 3 Mayers Court' : where total no of words in address is even and either all words/combination of words are duplicate.
2. 'Manor House Manor'  or '1 Briar Cottages 1 Briar': where total no of words in address is odd and thus there is a middle word across which all words/combination of words on its left and right are duplicate.
I can do this through code, but I've no idea how to remove duplicate words through PL/SQL. I've been instructed to do this through PL/SQL anonymous block or through a function. Any help would be appreciated.
If these are the only cases that may appear in your data you could use query below. You can put this logic into function, but query is faster, simpler.
Nothing fancy here, I'm just dividing string into half and comparing with source. Works for given examples, obviously there may be cases where you need more logic. For instance if you have consecutive spaces in string you have to get rid of them at first.
demo
select address,
case when address like sub||'%'
then substr(address, 1, length(address) - length(sub))
else address
end trimmed
from (select address, trim(substr(address, instr(address, ' ', 1, sn/2 + 1))) sub
from (select address, regexp_count(address, ' ') sn from t))
Result:
ADDRESS TRIMMED
----------------------------- -----------------------------
3 Mayers Court 3 Mayers Court 3 Mayers Court
905 Mayers Street 905 Mayers Street
Manor House Manor Manor House
1 Briar Cottages 1 Briar 1 Briar Cottages

Algorithm to create unique random concatenation of items

I'm thinking about an algorithm that will create X most unique concatenations of Y parts, where each part can be one of several items. For example 3 parts:
part #1: 0,1,2
part #2: a,b,c
part #3: x,y,z
And the (random, one case of some possibilities) result of 5 concatenations:
0ax
1by
2cz
0bz (note that '0by' would be "less unique " than '0bz' because 'by' already was)
2ay (note that 'a' didn't after '2' jet, and 'y' didn't after 'a' jet)
Simple BAD results for next concatenation:
1cy ('c' wasn't after 1, 'y' wasn't after 'c', BUT '1'-'y' already was as first-last
Simple GOOD next result would be:
0cy ('c' wasn't after '0', 'y' wasn't after 'c', and '0'-'y' wasn't as first-last part)
1az
1cx
I know that this solution limit possible results, but when all full unique possibilities will gone, algorithm should continue and try to keep most avaible uniqueness (repeating as few as possible).
Consider real example:
Boy/Girl/Martin
bought/stole/get
bottle/milk/water
And I want results like:
Boy get milk
Martin stole bottle
Girl bought water
Boy bought bottle (not water, because of 'bought+water' and not milk, because of 'Boy+milk')
Maybe start with a tree of all combinations, but how to select most unique trees first?
Edit: According to this sample data, we can see, that creation of fully unique results for 4 words * 3 possibilities, provide us only 3 results:
Martin stole a bootle
Boy bought an milk
He get hard water
But, there can be more results requested. So, 4. result should be most-available-uniqueness like Martin bought hard milk, not Martin stole a water
Edit: Some start for a solution ?
Imagine each part as a barrel, wich can be rotated, and last item goes as first when rotates down, first goes as last when rotating up. Now, set barells like this:
Martin|stole |a |bootle
Boy |bought|an |milk
He |get |hard|water
Now, write sentences as We see, and rotate first barell UP once, second twice, third three and so on. We get sentences (note that third barell did one full rotation):
Boy |get |a |milk
He |stole |an |water
Martin|bought|hard|bootle
And we get next solutions. We can do process one more time to get more solutions:
He |bought|a |water
Martin|get |an |bootle
Boy |stole |hard|milk
The problem is that first barrel will be connected with last, because rotating parallel.
I'm wondering if that will be more uniqe if i rotate last barrel one more time in last solution (but the i provide other connections like an-water - but this will be repeated only 2 times, not 3 times like now). Don't know that "barrels" are good way ofthinking here.
I think that we should first found a definition for uniqueness
For example, what is changing uniqueness to drop ? If we use word that was already used ? Do repeating 2 words close to each other is less uniqe that repeating a word in some gap of other words ? So, this problem can be subjective.
But I think that in lot of sequences, each word should be used similar times (like selecting word randomly and removing from a set, and after getting all words refresh all options that they can be obtained next time) - this is easy to do.
But, even if we get each words similar number od times, we should do something to do-not-repeat-connections between words. I think, that more uniqe is repeating words far from each other, not next to each other.
Anytime you need a new concatenation, just generate a completely random one, calculate it's fitness, and then either accept that concatenation or reject it (probabilistically, that is).
const C = 1.0
function CreateGoodConcatenation()
{
for (rejectionCount = 0; ; rejectionCount++)
{
candidate = CreateRandomConcatination()
fitness = CalculateFitness(candidate) // returns 0 < fitness <= 1
r = GetRand(zero to one)
adjusted_r = Math.pow(r, C * rejectionCount + 1) // bias toward acceptability as rejectionCount increases
if (adjusted_r < fitness)
{
return candidate
}
}
}
CalculateFitness should never return zero. If it does, you might find yourself in an infinite loop.
As you increase C, less ideal concatenations are accepted more readily.
As you decrease C, you face increased iterations for each call to CreateGoodConcatenation (plus less entropy in the result)

Resources