Linq OrderBy but ignore first word if "the" - linq

I have the following linq expression to do my ordering but was wondering how do I change this so that it will orderby name but ignore the first word if it is "the"
CaseStudies.OrderBy(a => a.Name)

Simplest way (if there is always lower-case the and no more than one space between words):
CaseStudies.OrderBy(a => a.Name.StartsWith("the ") ? a.Name.Substring(4) : a.Name)
You can create method with nice descriptive name and move this logic as well as null check and ignore case comparison there:
private string RemoveDefiniteArticle(string s)
{
if (String.IsNullOrEmpty(s))
return s;
if (s.StartsWith("the ", StringComparison.CurrentCultureIgnoreCase))
return s.Substring(4).TrimStart();
return s;
}
And use it
CaseStudies.OrderBy(a => RemoveDefiniteArticle(a.Name))

There are a surprising number of edge cases here. Suppose your list is
List<string> strings = new List<string> { "The aardvark", "the bear", "The cat", " dog", " elephant"};
Then the starting point is handling "the" at the start
strings.OrderBy(w => w.StartsWith("the ") ? w.Substring(4) : w);
Which gives:
elephant
dog
the bear
The cat
The aardvark
Ignoring case is better
strings.OrderBy(w => w.StartsWith("the ", StringComparison.CurrentCultureIgnoreCase) ? w.Substring(4) : w);
Giving:
elephant
The cat
dog
The aardvark
the bear
Handling multiple spaces after the leading "the" is even better, but not perfect:
strings.OrderBy(w => w.StartsWith("the ", StringComparison.CurrentCultureIgnoreCase) ? w.Substring(4).TrimStart() : w);
elephant
dog
The aardvark
the bear
The cat
Handling leading spaces before the leading "the" looks correct
strings.OrderBy(w => w.TrimStart().StartsWith("the ", StringComparison.CurrentCultureIgnoreCase) ? w.TrimStart().Substring(4).TrimStart() : w.TrimStart());
Gives:
The aardvark
the bear
The cat
dog
elephant
But there may be other edge cases around null/empty/whitespace checking at multiple points...

CaseStudies.OrderBy(a => a.Name.TrimStart().StartsWith("the ", StringComparison.CurrentCultureIgnoreCase) ? a.Name.TrimStart().Substring(4).TrimStart() : a.Name)

Related

Strange behaviour corrupting some part of a string but only particular chars, Atom, RSpec

I am struggling with an odd situation. I have a function which iterates over an array of strings and splits each string on "is" which am testing in RSpec like so:
test body
info_combo = ["pish pish Iron is 3910 Credits","glob prok Gold is 57800 Credits"]
expect(interpreter.solveForUnknownInfo(info_combo)).to eq some_final_expectable_object
function
def getSubjectsAndObjects(info_combo)
subjects = []
objects = []
info_combo.each do |info_str|
print info_str
subjectsAndObjects = info_str.split("is")
print subjectsAndObjects
subjects << subjectsAndObjects[0]
objects << subjectsAndObjects[1]
end
return subjects, objects
end
printed output while debugging
"pish pish Iron is 3910 Credits" => first iteration input
["p", "h p", "h Iron ", " 3910 Credits"] => crazy unexpected
"glob prok Gold is 57800 Credits" => second iteration input
["glob prok Gold ", " 57800 Credits"] => expectable output
## after replacing the first substr of 2nd input string, 'pish' with 'another_random_word' ...
"another_random_word pish Iron is 3910 Credits" => first iteration input
["another_random_word p", "h Iron ", " 3910 Credits"] =>some hopeful change
"glob prok Gold is 57800 Credits" => second iteration input
["glob prok Gold ", " 57800 Credits"] => expectable output
## after replacing the final 'pish' with 'another_random_word'
"another_random_word another_random_word Iron is 3910 Credits" => first iteration input
"another_random_word another_random_word Iron ", " 3910 Credits"] => now totally expectable/desired output from function
"glob prok Gold is 57800 Credits" => second iteration input
["glob prok Gold ", " 57800 Credits"] => expectable output
This is really confusing for me. I have no idea how to debug this or ideas of what might be going wrong. I thought it was a text editor glitch (Atom), have restarted the program and no changes.
Something I've missed? Any ideas? Also ideas on improving the question/title are very welcome.
You've missed something fairly straightforward: the middle two characters of "pish" are "is". So of course, if you split on "is", that gets split into "p" and "h".
There are a couple of ways around this. The simplest, in your case, is probably to split on " is " (that is, "is" with a space on each side). Depending on exact needs, you might instead split on regular expressions such as /\sis\s/ ("is" with some sort of whitespace on either side, could be space, tab, etc) or /\bis\b/ ("is" with a word boundary on either side - in this case, the "is" can't be in the middle of the word, but the surrounding whitespace isn't actually part of the match, so it's not removed from the string).
"his is hers".split(/\sis\s/) # => ["his", "hers"]
"his is hers".split(/\bis\b/) # => ["his ", " hers"]
Note that in the first case, the spaces are part of the delimiter and are removed along with it, but in the second case, they are not part of the delimiter, and are not removed.

how to detect all caps word in a string

I am new using java. I wanted to ask, if I have a text file containing different words per line and I want to read that file as a string in order to detect if there are certain words that are written in all caps (abbreviations). The exception being that if the word starts with "#" or and "#" it will ignore counting it. For example I have:
OMG terry is cute #HAWT SMH
The result will be:
Abbreviations = 2.
or
terry likes TGIF parties #ANDERSON
The result will be:
Abbreviations = 1.
Please help
Try to use the .split(String T) method, and the .contains(char C) methods .....
I think they will help you a lot ....
Function split:
http://www.tutorialspoint.com/java/java_string_split.htm
Function contains:
http://www.tutorialspoint.com/java/lang/string_contains.htm
String str1 = "OMG terry is cute #HAWT SMH";
String str2 = "terry likes TGIF parties #ANDERSON";
Pattern p = Pattern.compile("(?>\\s)([A-Z]+)(?=\\s)");
Matcher matcher = p.matcher(" "+str1+" ");//pay attention! adding spaces
// before and after to catch potentials in
// beginning/end of the sentence
int i=0;
while (matcher.find()) {
i++; //count how many matches were found
}
System.out.println("matches: "+i); // prints 2
matcher = p.matcher(" "+str2+" ");
i=0;
while (matcher.find()) {
i++;
}
System.out.println("matches: "+i); // prints 1
OUTPUT:
matches: 2
matches: 1
Here is a bazooka for your spider problem.
(mystring+" ").split("(?<!#|#)[A-Z]{2,}").length-1;
Pad the string with a space (because .split removes trailing empty strings).
Split on the pattern "behind this is neither # nor #, and this is two or more capital letters". This returns an array of substrings of the that are not part of abbreviations.
Take the length of the array and subtract 1.
Example:
mystring = "OMG terry is cute #HAWT SMH";
String[] arr = (mystring+" ").split("(?<!#|#)[A-Z]{2,}").length-1;
//arr is now {"", " terry is cute #HAWT ", " "}, three strings
return arr.length-1; //returns 2

Ruby diff two strings and make an array of the parts that are the same

With Ruby, how can I get the diff between two strings, then use the identical parts as a base to split the rest?
For example, I have two strings (Not all strings will have this formatting):
String1 = "Computer: Person1, Title: King, Phone: 555-1212"
String2 = "Computer: PersonB, Title: Queen, Phone: 123-4567"
I would like to be able to compare (diff) the two strings so that I get the result:
["Computer: ",", Title:",", Phone:"]
then use this to reparse the original strings to get:
["Person1","King","555-1212"] and ["PersonB","Queen","123-4567"]
which I could label in db/storage with the former array.
Are there features to do this and how would I achieve these results?
The object of this is not need prior knowledge of formatting. This way just the data are analyzed for patterning and then divided as such. It may be comma delimited, new lines, spaced out, etc.
I am looking at gem "diffy" and "diff-lcs" to see if they might help split this up.
I think all you need is a hash, with hash you can do anything fancy.
>> String1 = "Computer: Person1, Title: King, Phone: 555-1212"
>> a = String1.gsub(/[^\s\:]/) { |w| "\"#{w}\"" }
>> a.insert(0, "{")
>> a.insert(-1, "}")
>> a1 = JSON.parse(a)
>> #=> {
"Computer" => "Person1",
"Title" => "King",
"Phone" => "555-1212"
}
Then you can request what you want in question, like
>> a1["Computer"]
>> #=> "Person1"
Add
And you can abstract it to a method further
def str_to_hash(str)
ouput = str.gsub(/[^\s\:]/) { |w| "\"#{w}\"" }
output.insert(0, "{").insert(-1, "}")
JSON.parse(out)
end
>> h2 = str_to_hash(String2)
>> h2["Computer"]
>> #=>"PersonB"
String1 = "Computer: Person1, Title: King, Phone: 555-1212"
String2 = "Computer: PersonB, Title: Queen, Phone: 123-4567"
keys = String1.split - (String1.split - String2.split)
values = String1.split - keys
You need to find a suitable way to split for your specific data. For instance, if values are allowed to contain spaces inside double quotes, you can to something like .split(/"?[^ ]*\ ?[^ ]*"?/), but there is no general solution for this, that will handle any type of data.
And then you need to clean up the resulting values.
Given those strings, I would rather split columns by ,, then use the part before : as name of column.
There is an longest common subsequence problem, which has something to do, but is not smart enough to handle semantics of data.
s1 = String1.split(' ')
s2 = String2.split(' ')
s1 - s2
=> ["Person1,", "King,", "555-1212"]
s2 - s1
=> ["PersonB,", "Queen,", "123-4567"]

Linq search string word separate string

I got a list of string. I want to do a search inside and return a new list. The search value is a string (sentence). Each word are split by a space.
So, I look at a way to search each string containing each word of the sentence.
sample :
list = {"abcdef", "abc", "ab", "cd ab"}
search "ab" => return list with "abcdef", "abc", "ab", "cd ab"
search "abc" => return list with "abcdef", "abc"
search "ab cd" => return list with "abcdef","cd ab"
its simple but I don't know how to do it with Linq in a single command. Something like
if l.contains(list)
where contains check every element of the list.
That may be simple, I just ask how. Or maybe a link to another post that I have not seen.
Thank you
if (list.Any(w => w.Contains(something))
The tricky one is your last case of "ab cd".
var list = new List<string> {"abcdef", "abc", "ab", "cd ab"};
var result = list.Where (w => "ab cd".Split().All (s => w.Contains(s)));
Click here to see the proof that it catches all cases you've outlined.

Ruby - Abbreviating a string containing a name to first name last initial

Fairly simple question I need to take a string containing, for example, "Bob Smith" and return "Bob S." - or "Javier de Luca" and return "Javier de L.". In other words, abbreviate the last word in a string to just the first initial and add a period.
Here's what I have - it works, but it seems clumsy.
str = str.split(' ')
str[str.length - 1] = "#{str.last[0]}."
str = str.join(' ')
Surely, there's a more elegant way.
>> "Bob Smith".sub(/(.+\b.).+\z/, '\1.')
=> "Bob S."
>> "Javier de Luca".sub(/(.+\b.).+\z/, '\1.')
=> "Javier de L."
This regular expression captures the entire string until the second character of the last word. It then replaces this string with the capture plus a period ala \1.
What about this:
name = 'Javier de Luca'
name.sub!(/(\w)\w+$/, '\1.')
You could use tap in 1.9:
str = str.split(/\s+/).tap { |a| a[-1].sub!(/(.).+/) { "#{$1}." } }.join(' ')
Using a[-1].sub! will modify the last element in-place so the tap block modifies a as well as passing it through to the join call. And, the .+ takes care of leaving strange names like Joe B alone; if you want that to become Joe B. them use .* instead of .+.

Resources