Logic for parsing names

Logic for parsing names - algorithm

I am wanting to solve this problem, but am kind of unsure how to correctly structure the logic for doing this. I am given a list of user names and I am told to find an extracted name for that. So, for example, I'll see a list of user names such as this:
jason
dooley
smith
rob.smith
kristi.bailey
kristi.betty.bailey
kristi.b.bailey
robertvolk
robvolk
k.b.dula
kristidula
kristibettydula
kristibdula
kdula
kbdula
alexanderson
caesardv
joseluis.lopez
jbpritzker
jean-luc.vey
dvandewal
malami
jgarciathome
christophertroethlisberger
How can I then turn each user name into an extracted name? The only parameter I am given is that every user name is guaranteed to have at least a partial person's name.
So for example, kristi.bailey would be turned into "Kristi Bailey"
alexanderson would be turned into "Alex Anderson"
So, the pattern I see is that, if I see a period I will turn that into two strings (possibly a first and last name). If I see three periods then it will be first, middle. The problem I am having trouble finding the logic for is when the name is just clumped up together like alexanderson or jgarciathome. How can I turn that into an extracted name? I was thinking of doing something like if I see 2 consonants and a vowel in a row I would separate the names, but I don't think that'll work.
Any ideas?

I'd use a string.StartsWith method and a string.EndsWith method and determine the maximum overlap on each. As long as it's more than 2 characters, call that the common name. Sort them into buckets based on the common name. It's a naive implementation, but it that's where I'd start.
Example:
string name1 = "kristi.bailey";
string name2 = "kristi.betty.bailey";
// We've got a 6 character overlap for first name:
name2.StartsWith(name1.Substring(0,6)) // this is true
// We've got a 6 character overlap for last name:
name2.EndsWith(name1.Substring(7)) // this is true
HTH!

Related

My flow fails for no reason: Invalid Template Language? This is what I do

Team,
Occasionally my flow fails and its enough test it manually to running again. However, I want to avoid that this error ocurrs again to stay in calm.
The error that appears is this:
Unable to process template language expressions in action 'Periodo' inputs at line '0' and column '0': 'The template language function 'split' expects its first parameter to be of type string. The provided value is of type 'Null'. Please see https://aka.ms/logicexpressions#split for usage details.'.
And it appears in 2 of the 4 variables that I create:
Client and Periodo
The variable Clientlooks this:
The same scenario to "Periodo".
The variables are build in the same way:
His formula:
trim(first(split(first(skip(split(outputs('Compos'),'client = '),1)),'indicator')))
His formula:
trim(first(split(first(skip(split(outputs('Compos'),'period = '),1)),'DATA_REPORT_DELIVERY')))
The same scenario to the 4 variables. 4 of them strings (numbers).
Also I attached email example where I extract the info:
CO NIV ICE REFRESCOS DE SOYA has finished successfully.CO NIV ICE REFRESCOS DE SOYA
User
binary.struggle#mail.com
Parameters
output = 7
country = 170
period = 202204012
DATA_REPORT_DELIVERY = NO
read_persistance = YES
write_persistance = YES
client = 18277
indicator_group = SALES
Could you give some help? I reach some attepmpts succeded but it fails for no apparent reason:
Thank you.

I'm not sure if you're interested but I'd do it a slightly different way. It's a little more verbose but it will work and it makes your expressions a lot simpler.
I've just taken two of your desired outputs and provided a solution for those, one being client and the other being country. You can apply the other two as need be given it's the same pattern.
If I take client for example, this is the concept.
Initialize Data
This is your string that you provided in your question.
Initialize Split Lines
This will split up your string for each new line. The expression for this step is ...
split(variables('Data'), '\n')
However, you can't just enter that expression into the editor, you need to do it and then edit in in code view and change it from \\n to \n.
Filter For 'client'
This will filter the array created from the split line step and find the item that contains the word client.
`contains(item(), 'client')`
On the other parallel branches, you'd change out the word to whatever you're searching for, e.g. country.
This should give us a single item array with a string.
Initialize 'client'
Finally, we want to extract the value on the right hand side of the equals sign. The expression for this is ...
trim(split(body('Filter_For_''client''')[0], '=')[1])
Again, just change out the body name for the other action in each case.
I need to put body('Filter_For_''client''')[0] and specify the first item in an array because the filter step returns an array. We're going to assume the length is always 1.
Result
You can see from all of that, you have the value as need be. Like I said, it's a little more verbose but (I think) easier to follow and troubleshoot if something goes wrong.

Algorithm for translating MLB play-by-play records into descriptive text

I'm trying to collect a dataset that could be used for automatically generating baseball articles.
I have play-by-play records of MLB games from retrosheet.org that I would like to be written out to plain text, as those that could possibly appear as part of a recap news article.
Here are some examples of the play-by-play records:
play,2,0,semim001,32,.CBFFFBBX,9/F
play,2,0,phegj001,01,FX,S7/G
play,2,0,martn003,01,CX,3/G
play,2,1,youne003,00,,NP
The following is what I would like to achieve:
For the first example
play,2,0,semim001,32,.CBFFFBBX,9/F,
I want it to be written out as something like:
"semim001 (Marcus Semien) was on three balls and two strikes in the second inning as the away player. He hit the ball into play after one called strike, one ball, three fouls, and another two balls. The fly ball was caught by the right outfielder."
The plays are formatted in the following way:
The first field is the inning, an integer starting at 1.
The second field is either 0 (for visiting team) or 1 (for home team).
The third field is the Retrosheet player id of the player at the plate.
The fourth field is the count on the batter when this particular event (play) occurred. Most Retrosheet games do not have this information, and in such cases, "??" appears in this field.
The fifth field is of variable length and contains all pitches to this batter in this plate appearance and is described below. If pitches are unknown, this field is left empty, nothing is between the commas.
The sixth field describes the play or event that occurred.
Explanations for all the symbols in the fifth and sixth field can be found on this Retrosheet page.
With Python 3, I've been able to format all the info of invariable length into a formatted sentence, which is all but the last two fields. I'm having difficulty in thinking of an efficient way to unparse (correct me if this is the wrong term to use here) the fifth and sixth fields, the pitches and the events that occurred, due to their variable length and wide variety of things that can occur.
I think I could write out all the rules based on the info on the Retrosheet website, but I'm looking for suggestions for a smarter way to do this. I wrote natural language processing as tags, hoping this could be a trivial problem in that field. Any pointers will be greatly appreciated!

Comparing two files in Ruby with different data types

I had an interview today and wanted input on how you would solve this issue that came up. I answered the question, but in my mind I was thinking there is a better way.
Here is the scenario. You have two files that you need to compare. In the first file you have a list in string format of NFL team abbreviations for example:
ARI
CHIC
GB
NYG
DET
WASH
PHL
PITT
STL
SF
CLEV
IND
DAL
KC
In the second file you would have the following information in a hash or json for example:
"data":
{"description": name: "CLEV","totfd":26,"totyds":396,"pyds":282,"ryds":114,"pen":4,"penyds":24,
"trnovr":0,"pt":4,"ptyds":163,"ptavg":36,"top":"37:05"}},"players":null}
How would you take the strings in the first file (the abbreviations) and see if that abbreviation was included somewhere in the data of the second file? So, for example I want to see if CLEV, ARI, WASH, so on would be anywhere in the second file. If that abbreviation is included I would want to extract information based on that abbreviation.
Here was my answer:
I would iterate over each abbreviation looking for that specific abbreviation inside the second file.
I felt my answer was poor, but I wanted to see if others had a good idea on what they would do.
thanks
Mike Riley

You should ask questions in your interview. Some questions I'd ask:
Will the hash/json include duplicate data for teams? Meaning, will CLEV have multiple records in there? If not, now you know you have unique data so there's no need to group anything ahead of time.
If it's not unique, I'd get a list of all the names that exist in the hash, so you can do a comparison between the array given and the other file.
This is in O(n) for the traversal + O(logN) for the value lookup:
hash = [{'description': 'some team', 'name': 'CLEV','totfd':26,'totyds':396,'pyds':282 },
{'description': 'some team', 'name': 'PHL','totfd':26,'totyds':396,'pyds':282 }]
hash_names = hash.map { |team| team[:name] }
Now that we have a list of names in the hash, we can find out where there is an overlap. We can add the two arrays together and figure out who shows up in there more than once. There are many ways to do that, but we should keep with our run time of O(n):
list = ["ARI","CHIC","GB","NYG","DET","WASH","PHL","PITT","STL","SF","CLEV","IND","DAL"]
teams_in_both = (list + hash_names).group_by { |team| team }.keep_if { |_, occ| occ.size > 1 }.map(&:first)
Now we have a list of:
["PHL", "CLEV"]
We know enough to say who's important to us and can fetch the remaining data accordingly.

Parsing one large array into several sub-arrays

I have a list of adjectives (found here), that I would like to be the basis for a "random_adjective(category)" method.
I'm really just taking a stab at this, as my first real attempt at a useful program.
Step 1: Open file, remove formatting. No problem.
list=File.read('adjectivelist')
list.gsub(/\n/, " ")
The next step is to break the string up by category..
list.split(" ")
Now I have an array of every word in the file. Neat. The ones with a tilde before them represent the category names.
Now I would like to break up this LARGE array into several smaller ones, based on category.
I need help with the syntax here, although the pseudocode for this would be something like
Scan the array for an element which begins with a tilde.
Now create a new array based on the name of that element sans the tilde, and ALSO place this "category name" into the "categories" array. Now pull all the elements from the main array, and pop them into the sub-array, until you meet another tilde. Then repeat the process until there are no more elements in the array.
Finally I would pull a random word from the category named in the parameter. If there was no category name matching the parameter, it would return false and exit (this is simply in case I want to add more categories later.)
Tips would be appreciated

You may want to go back and split first time around like this:
categories = list.split(" ~")
Then each list item will start with the category name. This will save you having to go back through your data structure as you suggest. Consider that a tip: sometimes it's better to re-think the start of a coding problem than to head inexorably forwards
The structure you are reaching towards is probably a Hash, where the keys are category names, and the values are arrays of all the matching adjectives. It might look like this:
{
'category' => [ 'word1', 'word2', 'word3' ]
}
So you might do this:
words_in_category = Hash.new
categories.each do |category_string|
cat_name, *words = category_string.split(" ")
words_in_category[cat_name] = words
end
Finally, to pick a random element from an array, Ruby provides a very useful method sample, so you can just do this
words_in_category[ chosen_category ].sample
. . . assuming chosen_category contains the string name of an actual category. I'll leave it to you to figure out how to put this all together and handle errors, bad input etc

Use slice_before:
categories = list.split(" ").slice_before(/~\w+/)
This will create an sub array for each word starting with ~, containing all words before the next matching word.

If this file format is your original and you have freedom to change it, then I recommend you save the data as yaml or json format and read it when needed. There are libraries to do this. That is all. No worry about the mess. Don't spend time reinventing the wheel.

Parsing text files in Ruby when the content isn't well formed

I'm trying to read files and create a hashmap of the contents, but I'm having trouble at the parsing step. An example of the text file is
put 3
returns 3
between
3
pargraphs 1
4
3
#foo 18
****** 2
The word becomes the key and the number is the value. Notice that the spacing is fairly erratic. The word isn't always a word (which doesn't get picked up by /\w+/) and the number associated with that word isn't always on the same line. This is why I'm calling it not well-formed. If there were one word and one number on one line, I could just split it, but unfortunately, this isn't the case. I'm trying to create a hashmap like this.
{"put"=>3, "#foo"=>18, "returns"=>3, "paragraphs"=>1, "******"=>2, "4"=>3, "between"=>3}
Coming from Java, it's fairly easy. Using Scanner I could just use scanner.next() for the next key and scanner.nextInt() for the number associated with it. I'm not quite sure how to do this in Ruby when it seems I have to use regular expressions for everything.

I'd recommend just using split, as in:
h = Hash[*s.split]
where s is your text (eg s = open('filename').read. Believe it or not, this will give you precisely what you're after.
EDIT: I realized you wanted the values as integers. You can add that as follows:
h.each{|k,v| h[k] = v.to_i}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio