I'm trying to compare between:
string1 = 'client 90 llc'
string2 = 'client 501 llc'
using:
expect(client[i]).toBeGreaterThan(client[i-1]);
FAILS because the compare function thinks 90 is greater than 501, i'm assuming because its doing character by character comparision. Is there a way to make it where it compares the whole number? Because the web application list string2 after string1 because 501 is bigger than 90.
UPDATE: there is no particular format for these strings. It can be
'client abc'
'client 90 llc'
'client 501 llc'
'abcclient'
'client111'
'client 22'
'33client'
If you know the format of the string, you can extract values with the help of Regular Expressions. In you case you want to extract a varying number in the middle of the string, which has common parts. The following regular expression might work:
/^client (\d+) llc$/
^ - beginning of the string
() - capture specific group of characters
\d - represents a digit (0-9), backslash is required because it is a character sequence, not to match a letter d
+ - character may appear 1 or more times
$ - end of the string
As a result, we'll be able to find a group of digits in the middle of the string. You can create an utility function to extract the value:
function extractNumber(string) {
var pattern = /^client (\d+) llc$/;
var match = string.match(pattern);
if (match !== null) {
// return the value of a group (\d+) and convert it to number
return Number(match[1]);
// match[0] - holds a match of entire pattern
}
return null; // unable to extract a number
}
And use it in tests:
var number1 = extractNumber(string1); // 90
var number2 = extractNumber(string2); // 501
expect(number1).toBeGreaterThan(number2);
Yes, Jasmine does a character based comparison. One way to do this is to split the string into parts and then compare the numbers only as shown below -
string1 = 'client 90 llc';
string2 = 'client 501 llc';
var newString1 = parseInt(string1.substring(string1.indexOf(' '), string1.lastIndexOf(' ')));
var newString2 = parseInt(string2.substring(string2.indexOf(' '), string2.lastIndexOf(' ')));
expect(newString2).toBeGreaterThan(newString1); //501 > 90 - should return true
I am assuming that your string pattern will be the same as you have mentioned above in the snippet. Or you can use regular expressions in place of substring() function and get the result. Hope this helps.
Related
I have large integers (typically 15-30 digits) stored as a string that represent a certain amount of a given currenty (such as ETH). Also stored with that is the number of digits to move the decimal.
{
"base_price"=>"5000000000000000000",
"decimals"=>18
}
The output that I'm ultimately looking for is 5.00 (which is what you'd get if took the decimal from 5000000000000000000 and moved it to the left 18 positions).
How would I do that in Ruby?
Given:
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
You could use:
my_number = my_map["base_price"].to_i / (10**my_map["decimals"]).to_f
puts(my_number)
h = { "base_price"=>"5000000000000000000", "decimals"=>18 }
bef, aft = h["base_price"].split(/(?=\d{#{h["decimals"]}}\z)/)
#=> ["5", "000000000000000000"]
bef + '.' + aft[0,2]
#=> "5.00"
The regular expression uses the positive lookahead (?=\d{18}\z) to split the string at a ("zero-width") location between digits such that 18 digits follow to the end of the string.
Alternatively, one could write:
str = h["base_price"][0, h["base_price"].size-h["decimals"]+2]
#=> h["base_price"][0, 3]
#=> "500"
str.insert(str.size-2, '.')
#=> "5.00"
Neither of these address potential boundary cases such as
{ "base_price"=>"500", "decimals"=>1 }
or
{ "base_price"=>"500", "decimals"=>4 }
Nor do they consider rounding issues.
Regular expressions and interpolation?
my_map = {
"base_price"=>"5000000000000000000",
"decimals"=>18
}
my_map["base_price"].sub(
/(0{#{my_map["decimals"]}})\s*$/,
".#{$1}"
)
The number of decimal places is interpolated into the regular expression as the count of zeroes to look for from the end of the string (plus zero or more whitespace characters). This is matched, and the match is subbed with a . in front of it.
Producing:
=> "5.000000000000000000"
I want extract from a table all rows where in a column (string) there is at least one word that starts with a specified character.
Example:
Row 1: 'this is the first row'
Row 2: 'this is th second row'
Row 3: 'this is the third row'
If the specified character is T -> I would extract all 3 rows
If the specified character is S -> I would extract only the second column
...
Please help me
Assuming you mean "space delimited sequence of characters, or begin to space or space to end" by "word", then you can split on the delimiter and test them for matches:
var src = new[] {
"this is the first row",
"this is th second row",
"this is the third row"
};
var findChar = 'S';
var lowerFindChar = findChar.ToLower();
var matches = src.Where(s => s.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Any(w => w.ToLower()[0] == lowerFindChar));
The LINQ Enumerable.Any method tests a sequence to see if any element matches, so you can split each string into a sequence of words and see if any word begins with the desired letter, compensating for case.
Try this:
rows.Where(r => Regex.IsMatch(r, " [Tt]"))
You can replace the Tt with Ss (both assuming you want either upper case or lower case).
The problem of course is, what is a "word"?
Is the character sequence 'word' in the sentence above a word according to your definition? It doesn't start with a space, not even a white-space.
A definition of a word could be:
Define wordCharacter: something like A-Z, a-z.
Define word:
- the non-empty sequence of wordCharacters at the beginning of a string followed by a non-wordcharacter
- or the non-empty sequence of wordCharacters at the end of a string preceded by a non-wordcharacter
- any non-empty sequence of wordCharacters in the string preceded and followed by a non-wordcharacter
Define start of word: the first character of a word.
String: "Some strange characters: 'A', 9, äll, B9 C$ X?
- Words: Some, strange characters, A
- Not Words: 9, äll, B9, C$ X?
So you first have to specify precisely what you mean by word, then you can define functions.
I'll write it as an extension method of IEnumerable<string>. Usage will look similar to LINQ. See Extension Methods Demystified
bool IsWordCharacter(char c) {... TODO: implement your definition of word character}
static IEnumerable<string> SplitIntoWords(this string text)
{
// TODO: exception if text null
if (text.Length == 0) return
int startIndex = 0;
while (startIndex != text.Length)
{ // not at end of string. Find the beginning of the next word:
while (startIndex < text.Length && !IsWordCharacter(text[startIndex]))
{
++startIndex;
}
// now startIndex points to the first character of the next word
// or to the end of the text
if (startIndex != text.Length)
{ // found the beginning of a word.
// the first character after the word is either the first non-word character,
// or the end of the string
int indexAfterWord = startWordIndex + 1;
while (indexAfterWord < text.Length && IsWordCharacter(text[indexAfterWord]))
{
++indexAfterWord;
}
// all characters from startIndex to indexAfterWord-1 are word characters
// so all characters between startIndexWord and indexAfterWord-1 are a word
int wordLength = indexAfterWord - startIndexWord;
yield return text.SubString(startIndexWord, wordLength);
}
}
}
Now that you've got a procedure to split any string into your definition of words, your query will be simple:
IEnumerabl<string> texts = ...
char specifiedChar = 'T';
// keep only those texts that have at least one word that starts with specifiedChar:
var textsWithWordThatStartsWithSpecifiedChar = texts
// split the text into words
// keep only the words that start with specifiedChar
// if there is such a word: keep the text
.Where(text => text.SplitIntoWords()
.Where(word => word.Length > 0 && word[0] == specifiedChar)
.Any());
var yourChar = "s";
var texts = new List<string> {
"this is the first row",
"this is th second row",
"this is the third row"
};
var result = texts.Where(p => p.StartsWith(yourChar) || p.Contains(" " + yourChar));
EDITED:
Alternative way (I'm not sure it works in linq query)
var result = texts.Where(p => (" " + p).Contains(" " + yourChar));
you can use .ToLower() if you want Case-insensitive check.
I want to check and capture 2 or x words after and before a target string in a multiline text. The problem is that if the words matched are less than x number of words, then regex cuts off the last word and splits it till x.
For example
text = "This is an example /year"
if example is the target:
Matching Data: "is" , "an", "/yea", "r"
If i add random words after /year it matches it correctly.
How could I fix this so that if less than x words exist just stop there or return empty for the rest of the matches?
So it should be
Matching Data: "is" , "an", "/year", ""
def checkWords(target, text, numLeft = 2, numRight = 2)
target = target.compact.map{|x| x.inspect}.join('').gsub(/"/, '')
regex = ""
regex += "\\s+{,2}(\\S+)\\s+{,2}" * numLeft
regex += target
regex += "\\s+{,2}(\\S+)" * numRight
pattern = Regexp.new(regex)
matches = pattern.match(text)
puts matches.inspect
end
Since you want to capture the words before and after target, you need to set a capturing group around the whole regex parts that match the 0 to 2 occurrences of spaces-non-spaces. Also, you need to allow a minimum bound of 0 - use {0,2} (or a more succint {,2}) limiting quantifier to make sure you get the context on the left even if it is missing on the right:
/((?:\S+\s+){,2})target((?:\s+\S+){,2})/
^ ^ ^ ^
See this Rubular demo
If you use /(?:(\S+)\s+){0,2}target(?:\s+(\S+)){0,2}/, all captured values but the last one will be lost, i.e. once quantified, repeated capturing groups only store the value captured during the last iteration in the group buffer.
Also note that setting a {,2} quantifier on the + quantifier makes no sense, \\s+{,2} = \\s+.
The quiz problem:
You are given the following short list of movies exported from an Excel comma-separated values (CSV) file. Each entry is a single string that contains the movie name in double quotes, zero or more spaces, and the movie rating in double quotes. For example, here is a list with three entries:
movies = [
%q{"Aladdin", "G"},
%q{"I, Robot", "PG-13"},
%q{"Star Wars","PG"}
]
Your job is to create a regular expression to help parse this list:
movies.each do |movie|
movie.match(regexp)
title,rating = $1,$2
end
# => for first entry, title should be Aladdin, rating should be G,
# => WITHOUT the double quotes
You may assume movie titles and ratings never contain double-quote marks. Within a single entry, a variable number of spaces (including 0) may appear between the comma after the title and the opening quote of the rating.
Which of the following regular expressions will accomplish this? Check all that apply.
regexp = /"([^"]+)",\s*"([^"]+)"/
regexp = /"(.*)",\s*"(.*)"/
regexp = /"(.*)", "(.*)"/
regexp = /(.*),\s*(.*)/
Would someone explain why the answer was (1) and (2)?
Would someone explain why the answer was (1) and (2)?
The resulting strings will be similar to "Aladdin", "G" let's take a look at the correct answer #1:
/"([^"]+)",\s*"([^"]+)"/
"([^"]+)" = at least one character that is not a " surrounded by "
, = a comma
\s* = a number of spaces (including 0)
"([^"]+)" = like first
Which is exactly the type of strings you will get. Let's take a look at the above string:
"Aladdin", "G"
#^1 ^2^3^4
Now let's take at the second correct answer:
/"(.*)",\s*"(.*)"/
"(.*)" = any number (including 0) of almost any character surrounded by ".
, = a comma
\s* = any number of spaces (including 0)
"(.*)" = see first point
Which is correct as well as the following irb session (using Ruby 1.9.3) shows:
'"Aladdin", "G"'.match(/"([^"]+)",\s*"([^"]+)"/) # number 1
# => #<MatchData "\"Aladdin\", \"G\"" 1:"Aladdin" 2:"G">
'"Aladdin", "G"'.match(/"(.*)",\s*"(.*)"/) # number 2
# => #<MatchData "\"Aladdin\", \"G\"" 1:"Aladdin" 2:"G">
Just for completeness I'll tell why the third and fourth are wrong as well:
/"(.*)", "(.*)"/
The above regex is:
"(.*)" = any number (including 0) of almost any character surrounded by "
, = a comma
= a single space
"(.*)" = see first point
Which is wrong because, for example, Aladdin takes more than one character (the first point) as the following irb session shows:
'"Aladdin", "G"'.match(/"(.*)", "(.*)"/) # number 3
# => nil
The fourth regex is:
/(.*),\s*(.*)/
which is:
(.*) = any number (including 0) of almost any character
, = a comma
\s* = any number (including 0) of spaces
(.*) = see first point
Which is wrong because the text explicitly says that the movie titles do not contain any number of " character and that are surrounded by double quotes. The above regex does not checks for the presence of " in movie titles as well as the needed surrounding double quotes, accepting strings like "," (which are not valid) as the following irb session shows:
'","'.match(/(.*),\s*(.*)/) # number 4
# => #<MatchData "\",\"" 1:"\"" 2:"\"">
I'm attempting to parse blocks of text and need a way to detect the difference between apostrophes in different contexts. Possession and abbreviation in one group, quotations in the other.
e.g.
"I'm the cars' owner" -> ["I'm", "the", "cars'", "owner"]
but
"He said 'hello there' " -> ["He","said"," 'hello there' "]
Detecting whitespace on either side won't help as things like " 'ello " and " cars' " would parse as one end of a quotation, same with matching pairs of apostrophes. I'm getting the feeling that there's no way of doing it other than an outrageously complicated NLP solution and I'm just going to have to ignore any apostrophes not occurring mid-word, which would be unfortunate.
EDIT:
Since writing I have realised this is impossible. Any regex-ish based parser would have to parse:
'ello there my mates' dogs
in 2 different ways, and could only do that with understanding of the rest of the sentence. Guess I'm for the inelegant solution of ignoring the least likely case and hoping it's rare enough to only cause infrequent anomalies.
Hm, I'm afraid this won't be easy. Here's a regex that kinda works, alas only for stuff like "I'm" and "I've":
>> s1 =~ /[\w\s]*((?<!I)'(?:[^']+)')[\w\s]*/
=> nil
>> s2 =~ /[\w\s]*((?<!I)'(?:[^']+)')[\w\s]*/
=> 0
>> $1
=> "'hello there'"
If you play around with it a bit more, you may be able to eliminate some other common contractions, which might still be better than nothing.
Some rules to think about:
Quotes will start with an apostrophe with a whitespace character or nothing before it.
Quotes will end with an apostrophe with punctuation or a whitespace character after it.
Some words may look like the end of quotes, e.g., peoples'.
Quote delimiting apostrophes will never have letters directly before and after them.
Use a very simple two-phase process.
In pass 1 of 2, start with this regular expression to break the text down into alternating segments of word and non-word characters.
/(\w+)|(\W+)/gi
Store the matches in a list like this (I'm using AS3-style pseudo-code, since I don't work with ruby):
class MatchedWord
{
var text:String;
var charIndex:int;
var isWord:Boolean;
var isContraction:Boolean = false;
function MatchedWord( text:String, charIndex:int, isWord:Boolean )
{
this.text = text; this.charIndex = charIndex; this.isWord = isWord;
}
}
var match:Object;
var matched_word:MatchedWord;
var matched_words:Vector.<MatchedWord> = new Vector.<MatchedWord>();
var words_regex:RegExp = /(\w+)|(\W+)/gi
words_regex.lastIndex = 0; //this is where to start looking for matches, and is updated to the end of the last match each time exec is called
while ((match = words_regex.exec( original_text )) != null)
matched_words.push( new MatchedWord( match[0], match.index, match[1] != null ) ); //match[0] is the entire match and match[1] is the first parenthetical group (if it's null, then it's not a word and match[2] would be non-null)
In pass 2 of 2, iterate over the list of matches to find contractions by checking to see if each (trimmed, non-word) match ENDS with an apostrophe. If it does, then check the next adjacent (word) match to see if it matches one of only 8 common contraction endings. Despite all the two-part contractions I could think of, there are only 8 common endings.
d
l
ll
m
re
s
t
ve
Once you've identified such a pair of matches (non-word)="'" and (word)="d", then you just include the preceding adjacent (word) match and concatenate the three matches to get your contraction.
Understanding the process just described, one modification you must make is expand that list of contraction endings to include contractions that start with apostrophe, such as "'twas" and "'tis". For those, you simply don't concatenate the preceding adjacent (word) match, and you look at the apostrophe match a little more closely to see if it included other non-word character before it (that's why it's important it ends with an apostrophe). If the trimmed string EQUALS an apostrophe, then merge it with the next match, and if it only ENDS with an apostrophe, then strip off the apostrophe and merge it with the following match. Likewise, conditions that will include the prior match should first check to ensure the (trimmed non-word) match ending with an apostrophe EQUALS an apostrophe, so there are no extra non-word characters included accidentally.
Another modification you may need to make is expand that list of 8 endings to include endings that are whole words such as "g'day" and "g'night". Again, it's a simple modification involving a conditional check of the preceding (word) match. If it's "g", then you include it.
That process should capture the majority of contractions, and is flexible enough to include new ones you can think of.
The data structure would look like this.
Condition(Ending, PreCondition)
where PreCondition is
"*", "!", or "<exact string>"
The final list of conditions would look like this:
new Condition("d","*") //if apostrophe d is found, include the preceding word string and count as successful contraction match
new Condition("l","*");
new Condition("ll","*");
new Condition("m","*");
new Condition("re","*");
new Condition("s","*");
new Condition("t","*");
new Condition("ve","*");
new Condition("twas","!"); //if apostrophe twas is found, exclude the preceding word string and count as successful contraction match
new Condition("tis","!");
new Condition("day","g"); //if apostrophe day is found and preceding word string is g, then include preceding word string and count as successful contraction match
new Condition("night","g");
If you just process those conditions as I explained, that should cover all of these 86 contractions (and more):
'tis 'twas ain't aren't can't could've couldn't didn't doesn't don't
everybody's g'day g'night hadn't hasn't haven't he'd he'll he's how'd
how'll how's I'd I'll I'm I've isn't it'd it'll it's let's li'l
might've mightn't mustn't needn't nobody's nothing's shan't she'd
she'll she's should've shouldn't that'd that'll that's there's they'd
they'll they're they've wasn't we'd we'll we're we've weren't what'll
what're what'd what's what've when'd when'll when's where'd where'll
where's who's who'll who're who'd who'll who's who've why'd why'll
why's won't would've wouldn't you'd you'll you're you've
On a side note, don't forget about slang contractions that don't use apostrophes such as "gotta" > "got to" and "gonna" > "going to".
Here is the final AS3 code. Overall, you're looking at less than 50 lines of code to parse the text into alternating word and non-word groups, and identify and merge contractions. Simple. You could even add a Boolean "isContraction" variable to the MatchedWord class and set the flag in the code below when a contraction is identified.
//Automatically merge known contractions
var conditions:Array = [
["d","*"], //if apostrophe d is found, include the preceding word string and count as successful contraction match
["l","*"],
["ll","*"],
["m","*"],
["re","*"],
["s","*"],
["t","*"],
["ve","*"],
["twas","!"], //if apostrophe twas is found, exclude the preceding word string and count as successful contraction match
["tis","!"],
["day","g"], //if apostrophe day is found and preceding word string is g, then include preceding word string and count as successful contraction match
["night","g"]
];
for (i = 0; i < matched_words.length - 1; i++) //not a type-o, intentionally stopping at next to last index to avoid a condition check in the loop
{
var m:MatchedWord = matched_words[i];
var apostrophe_text:String = StringUtils.trim( m.text ); //check if this ends with an apostrophe first, then deal more closely with it
if (!m.isWord && StringUtils.endsWith( apostrophe_text, "'" ))
{
var m_next:MatchedWord = matched_words[i + 1]; //no bounds check necessary, since loop intentionally stopped at next to last index
var m_prev:MatchedWord = ((i - 1) >= 0) ? matched_words[i - 1] : null; //bounds check necessary for previous match, since we're starting at beginning, since we may or may not need to look at the prior match depending on the precondition
for each (var condition:Array in conditions)
{
if (StringUtils.trim( m_next.text ) == condition[0])
{
var pre_condition:String = condition[1];
switch (pre_condition)
{
case "*": //success after one final check, include prior match, merge current and next match into prior match and delete current and next match
if (m_prev != null && apostrophe_text == "'") //EQUAL apostrophe, not just ENDS with apostrophe
{
m_prev.text += m.text + m_next.text;
m_prev.isContraction = true;
matched_words.splice( i, 2 );
}
break;
case "!": //success after one final check, do not include prior match, merge current and next match, and delete next match
if (apostrophe_text == "'")
{
m.text += m_next.text;
m.isWord = true; //match now includes word text so flip it to a "word" block for logical consistency
m.isContraction = true;
matched_words.splice( i + 1, 1 );
}
else
{ //strip apostrophe off end and merge with next item, nothing needs deleted
//preserve spaces and match start indexes by manipulating untrimmed strings
var apostrophe_end:int = m.text.lastIndexOf( "'" );
var apostrophe_ending:String = m.text.substring( apostrophe_end, m.text.length );
m.text = m.text.substring( 0, m.text.length - apostrophe_ending.length); //strip apostrophe and any trailing spaces
m_next.text = apostrophe_ending + m_next.text;
m_next.charIndex = m.charIndex + apostrophe_end;
m_next.isContraction = true;
}
break;
default: //conditional success, check prior match meets condition
if (m_prev != null && m_prev.text == pre_condition)
{
m_prev.text += m.text + m_next.text;
m_prev.isContraction = true;
matched_words.splice( i, 2 );
}
break;
}
}
}
}
}