Remove duplicate words from a address using oracle pl/sql

Remove duplicate words from a address using oracle pl/sql - oracle

Remove duplicate words from a address using oracle pl/sql:
There are two types of addresses will be there, below is the example,
1. '3 Mayers Court 3 Mayers Court' : where total no of words in address is even and either all words/combination of words are duplicate.
2. 'Manor House Manor'  or '1 Briar Cottages 1 Briar': where total no of words in address is odd and thus there is a middle word across which all words/combination of words on its left and right are duplicate.
I can do this through code, but I've no idea how to remove duplicate words through PL/SQL. I've been instructed to do this through PL/SQL anonymous block or through a function. Any help would be appreciated.

If these are the only cases that may appear in your data you could use query below. You can put this logic into function, but query is faster, simpler.
Nothing fancy here, I'm just dividing string into half and comparing with source. Works for given examples, obviously there may be cases where you need more logic. For instance if you have consecutive spaces in string you have to get rid of them at first.
demo
select address,
case when address like sub||'%'
then substr(address, 1, length(address) - length(sub))
else address
end trimmed
from (select address, trim(substr(address, instr(address, ' ', 1, sn/2 + 1))) sub
from (select address, regexp_count(address, ' ') sn from t))
Result:
ADDRESS TRIMMED
----------------------------- -----------------------------
3 Mayers Court 3 Mayers Court 3 Mayers Court
905 Mayers Street 905 Mayers Street
Manor House Manor Manor House
1 Briar Cottages 1 Briar 1 Briar Cottages

Related

Using multiple OR conditions in ArrayFormula

I've a column of names
Smith John Sr
Smith John R
Smith Jr John L
Smith III John
Smith John IV
I know the Last Name is always the first word.
I know the First is sometimes in the second word.
However, sometimes "Jr" or "III" may be there instead.
Otherwise, the title can be in the fourth word if it is not the middle initial.
How do I use an arrayformula with multiple OR conditions so that I can extract into a second column the First Name, The Last Name, and the Applicable Title (e.g. Jr, Sr, III, IV). I want to disregard the initial which I figured I could use a character count "=1" within the OR logic, correct?

If you already have your working formula, you can use + to join conditions that are meant to be at least one of them met (like OR) and * to join conditions that are meant to be together (like AND). For example:
=IF(OR(A1=1,B1=1),1,0)
As an array you could do it:
=ArrayFormula(IF((A1:A=1)+(B1:B=1),TRUE,FALSE))
Since TRUE conditions are considered as 1 and FALSE conditions as 0, when you sum two conditions it will return 1 or more if one of them is TRUE; and with that positive result, it will apply the true statement of the IF. Hope it's useful

Most common "denominators" in a two column list in Google Sheets

How can I find the most commonly found 'Code' (Col B) associated with each unique 'Name' in (Col A) and find the closest value if the 'Code' in Col B is unique?
The image below shows the shared google sheet with Starting data in Columns A & B and the desired output columns in columns C and D. Each Unique Name has associated codes. Column D displays the most commonly occuring Code for each unique name. For example, Buick La Sabre 1 has 3 associated codes in B3,B4,B5 but in D3 only 98761 because it appears more frequently than the other 2 codes do in B2:B. I will explain what I mean by the closest value below.
The Codes that have a count = 1 are unique so the output in column D tries to find the closest match.
However, when the count of the code in B2:B > 1, then the output in column D = to the most frequent code associated with the Name.
Approach when there is 2 or more of the same values in column B
Query
I thought I might use a QUERY with a ORDER BY count(B) DESC LIMIT 2 in a fashion similar to this working equation:
QUERY($A$1:$D$25,"SELECT A, B ORDER BY B DESC Limit 2",1)
but I could not get it to work when I substituted in the Count function.
SORT & INDEX OR VLOOKUP
If the query function can't be fixed to work, then I thought another approach might be to combine a Vlookup/Index after sorting column B in a descending order.
UNIQUE(sort($B$3:$B,if(len($B$3:$B),countif($B$3:$B,$B$3:$B),),0,1,1))
Since a Vlookup or Index using multiple criteria would just pull the first value it finds, you would just end up with the first matching value, we would then get the most frequent value.
Approach when there is < 2 of the same values in column B
This is a little more complicated since the values can be numbers and letters.
A solution like that seen in the image below could be used if everything were a number. In our case there will usually be between 3 - 5 character alphanumeric code starting with 0 - 1 letters numbers and followed by numbers. I'm not sure what the best way to match a code like A1234 would be. I imagine a solution might be to SPLIT off letters and trying to match those first. For example A1234 would be split into A | 1234, then matching the closest letter and then the closest number. But I really am not sure what the best solution to this might be that works within the constraints of Google Sheets.
In the event that a number is equidistant between two numbers, the lower number should be chosen. For example, if 8 is the number and the closest match would be 6 or 10, then 6 should be selected.
In the event that a letter is being used it should work in a similar fashion. For example, thinking of {A, B, C} as {1, 2, 3}, B should preferrentially match to A since it comes before C.
In summary, looking for a way to find the most frequently associated code in col B that is associated with unique names in col A in this sheet and; In the event where there are none of the same codes in B2:B, a formula that will find the closest match for a number or alphanumeric code.

You can use this formula:
=QUERY({range of numerators & denominators}, "select Col2, count(Col2) group by Col2 label Col2 'Denominator', count(Col2) 'Count'")
That outputs something like this:
Denominator
Count
Den 1
Count 1
Den 2
Count 2

use:
=ARRAY_CONSTRAIN(SORTN(QUERY({A3:B},
"select Col1,Col2,count(Col2)
where Col1 is not null
group by Col1,Col2
order by count(Col2) desc,Col2 asc
label count(Col2)''"), 9^9, 2, 1, 1), 9^9, 2)

Visual Basic Function Procedure

I need help with the following H.W. problem. I have done everything except the instructions I numbered. Please help!
A furniture manufacturer makes two types of furniture—chairs and sofas.
The cost per chair is $350, the cost per sofa is $925, and the sales tax rate is 5%.
Write a Visual Basic program to create an invoice form for an order.
After the data on the left side of the form are entered, the user can display an invoice in a list box by pressing the Process Order button.
The user can click on the Clear Order Form button to clear all text boxes and the list box, and can click on the Quit button to exit the program.
The invoice number consists of the capitalized first two letters of the customer’s last name, followed by the last four digits of the zip code.
The customer name is input with the last name first, followed by a comma, a space, and the first name. However, the name is displayed in the invoice in the proper order.
The generation of the invoice number and the reordering of the first and last names should be carried out by Function procedures.

Seeing as this is homework and you haven't provided any code to show what effort you have made on your own, I'm not going to provide any specific answers, but hopefully I will try to point you in the right direction.
Your first 2 numbered items look to be variations on the same theme... string manipulation. Assuming you have the customer's address information from the order form, you just need to write 2 separate function to take the parts of the name and address, take the data you need and return the value (which covers your 3rd item).
To get parts of the name and address to generate the invoice number, you need to think about using the Left() and Right() functions.
Something like:
Dim first as String, last as String, word as String
word = "Foo"
first = Left(word, 1)
last = Right(word, 1)
Debug.Print(first) 'prints "F"
Debug.Print(last) 'prints "o"
Once you get the parts you need, then you just need to worry about joining the parts together in the order you want. The concatenation operator for strings is &. So using the above example, it would go something like:
Dim concat as String
concat = first & last
Debug.Print(concat) 'prints "Fo"
Your final item, using a Function procedure to generate the desired values, is very easily google-able (is that even a word). The syntax is very simple, so here's a quick example of a common function that is not built into VB6:
Private Function IsOdd(value as Integer) As Boolean
If (value Mod 2) = 0 Then 'determines of value is an odd or even by checking
' if the value divided by 2 has a remainder or not
' (aka Mod operator)
IsOdd = False ' if remainder is 0, set IsOdd to False
Else
IsOdd = True ' otherwise set IsOdd to True
End If
End Function
Hopefully this gets you going in the right direction.

Algorithm to create unique random concatenation of items

I'm thinking about an algorithm that will create X most unique concatenations of Y parts, where each part can be one of several items. For example 3 parts:
part #1: 0,1,2
part #2: a,b,c
part #3: x,y,z
And the (random, one case of some possibilities) result of 5 concatenations:
0ax
1by
2cz
0bz (note that '0by' would be "less unique " than '0bz' because 'by' already was)
2ay (note that 'a' didn't after '2' jet, and 'y' didn't after 'a' jet)
Simple BAD results for next concatenation:
1cy ('c' wasn't after 1, 'y' wasn't after 'c', BUT '1'-'y' already was as first-last
Simple GOOD next result would be:
0cy ('c' wasn't after '0', 'y' wasn't after 'c', and '0'-'y' wasn't as first-last part)
1az
1cx
I know that this solution limit possible results, but when all full unique possibilities will gone, algorithm should continue and try to keep most avaible uniqueness (repeating as few as possible).
Consider real example:
Boy/Girl/Martin
bought/stole/get
bottle/milk/water
And I want results like:
Boy get milk
Martin stole bottle
Girl bought water
Boy bought bottle (not water, because of 'bought+water' and not milk, because of 'Boy+milk')
Maybe start with a tree of all combinations, but how to select most unique trees first?
Edit: According to this sample data, we can see, that creation of fully unique results for 4 words * 3 possibilities, provide us only 3 results:
Martin stole a bootle
Boy bought an milk
He get hard water
But, there can be more results requested. So, 4. result should be most-available-uniqueness like Martin bought hard milk, not Martin stole a water
Edit: Some start for a solution ?
Imagine each part as a barrel, wich can be rotated, and last item goes as first when rotates down, first goes as last when rotating up. Now, set barells like this:
Martin|stole |a |bootle
Boy |bought|an |milk
He |get |hard|water
Now, write sentences as We see, and rotate first barell UP once, second twice, third three and so on. We get sentences (note that third barell did one full rotation):
Boy |get |a |milk
He |stole |an |water
Martin|bought|hard|bootle
And we get next solutions. We can do process one more time to get more solutions:
He |bought|a |water
Martin|get |an |bootle
Boy |stole |hard|milk
The problem is that first barrel will be connected with last, because rotating parallel.
I'm wondering if that will be more uniqe if i rotate last barrel one more time in last solution (but the i provide other connections like an-water - but this will be repeated only 2 times, not 3 times like now). Don't know that "barrels" are good way ofthinking here.
I think that we should first found a definition for uniqueness
For example, what is changing uniqueness to drop ? If we use word that was already used ? Do repeating 2 words close to each other is less uniqe that repeating a word in some gap of other words ? So, this problem can be subjective.
But I think that in lot of sequences, each word should be used similar times (like selecting word randomly and removing from a set, and after getting all words refresh all options that they can be obtained next time) - this is easy to do.
But, even if we get each words similar number od times, we should do something to do-not-repeat-connections between words. I think, that more uniqe is repeating words far from each other, not next to each other.

Anytime you need a new concatenation, just generate a completely random one, calculate it's fitness, and then either accept that concatenation or reject it (probabilistically, that is).
const C = 1.0
function CreateGoodConcatenation()
{
for (rejectionCount = 0; ; rejectionCount++)
{
candidate = CreateRandomConcatination()
fitness = CalculateFitness(candidate) // returns 0 < fitness <= 1
r = GetRand(zero to one)
adjusted_r = Math.pow(r, C * rejectionCount + 1) // bias toward acceptability as rejectionCount increases
if (adjusted_r < fitness)
{
return candidate
}
}
}
CalculateFitness should never return zero. If it does, you might find yourself in an infinite loop.
As you increase C, less ideal concatenations are accepted more readily.
As you decrease C, you face increased iterations for each call to CreateGoodConcatenation (plus less entropy in the result)

Free Pascal - Problem solving query (not syntax) - how to approach the next phase of this loop

I have more of a 'problem solving' question than a syntax related problem.
Briefly, I'm creating a program that will read a text file full of words (that may feasibly be a list of passwords), one word per line - I'll be using ReadLn for that bit in a loop. For every word it finds, I want it to add "an amount" of obfuscation in line with how users these days will use '3' instead of 'E' in their passwords, or '1' instead of 'I'. I work in the IT security field and password breaking is often part of it and that's what the program is for.
I have managed to create the program so far that it generates a LEET table full of many different values for each letter of the alphabet and stacks them in a StringGrid that I can access as part of the process (and it is also outputted visually to a table).
type
TLetters = 'A'..'Z';
TLeet = array[TLetters] of TStringList;
var
SourceFileName, str : string;
StartIndexFile : TextFile;
i : TLetters;
leet : TLeet;
s : string;
n, o, ColumnSize : integer;
begin
for i in TLetters do
leet[ i ] := TStringList.Create;
// The next sequence of loops populates the string grid by counting the number of characters for each letter of the alphabet and then inserting them down, column by column and row by row...
//Letter A:
s := '4 # /-\ /\ ^ aye ∂ ci λ Z';
ColumnSize := wordcount(s,[' ']);
o := 0;
for n := 0 to ColumnSize do
leet['A'].Add(ExtractWord(n,s,[' ']));
for o := 0 to ColumnSize do
StringGrid1.Cells[1,o] := Leet['A'][o];
// And so on for B - Z
// ... then an OpenDialog that opens the source text file to read. Got that sorted
// A load of file opening stuff and then the obsfucation
repeat
Readln(StartIndexFile, Str);
LblProgress.Caption := ('Parsing Index File...please wait');
OBSFUCATE THE WORDS HERE TO SOME EXTENT
// but now I have hit a barrier....
until(EOF(StartIndexFile));
My problem is this : given the word 'Edward', for example, how do I decide to what level I should obfuscate it? Just the first letter 'E' to be replaced with a '3', and nothing more perhaps? Or the first two letters 'E' and 'd' to be replaced with ALL the values in the LEET table for both of the letters E and d (meaning dozens of new words would be generated from 'Edward', and so on), or all the values for 'E' but nothing else...the list goes on. Potentially, for every word, I could create thousands of additional one's! A 100Gb source file would soon become terabytes!
In other words, I need to set "a level" for which the program will function, that the user can decide. But I'm not sure how to structure that level?
So I'm not sure how to make it work? I didn't really think it through enough before I started. My initial thoughts were "It would be cool to have a program that would take an index of words from a computer, and then generate variations of every word to account for people who obfuscate characters." but having come to code it, I've realised it's a bigger job than I thought and I am now stuck at the section for actually 'LEETing my input file'!

You could use a level (0-10) as input.
0: replace nothing
10: replace all letters with LEET letters.
Depending on the length of the word, you calculate how many letters to replace and just replace random letters in the word, so that you not always replace the first letter for level 1 etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio