Parse many numbers containing commas from string - ruby

I have a series of strings that all include 1 or many numbers (a number in this case would be 123,123,123) in the following format
"This is a number 123,124,123"
"These are some more numbers 123,345,123; 231,123,123; 124,152,123"
"This one is an odd situation 123,124,125; 123,123,123; more text"
What is the cleanest way to parse these numbers into either an array or a string that I can split that looks like this?
"123,124,123"
"123,345,123;231,123,123;124,152,123"
"123,124,125;123,123,123;"
Ultimately I want to be able to separate out the numbers like this.
"123,124,123"
"123,345,123" "231,123,123" "124,152,123"
"123,124,125" "123,123,123"
Currently attempting to use
"string".scan( /\d/ )
but obviously this is only giving me the numbers without the commas and also not separated properly.

Do it like this
string.scan(/[\d,]+/)

Another way would be to remove the unwanted characters.
arr = ["This is a number 123,124,123",
"These are some more numbers 123,345,123; 231,123,123; 124,152,123",
"This one is an odd situation 123,124,125; 123,123,123; more text"]
arr.map { |str| str.gsub(/[^\s\d,]+/,'').split }
#=> [["123,124,123"],
# ["123,345,123", "231,123,123", "124,152,123"],
# ["123,124,125", "123,123,123"]]

Regex that matches your numbers is \d{1,3}(,\d{3})*

Related

How to use gsubstitution with more letters

I've printed the code, wit ruby
string = "hahahah"
pring string.gsub("a","b")
How do I add more letter replacements into gsub?
string.gsub("a","b")("h","l") and string.gsub("a","b";"h","l")
didnt work...
*update I have tried this too but without any success .
letters = {
"a" => "l"
"b" => "n"
...
"z" => "f"
}
string = "hahahah"
print string.gsub(\/w\,letters)
You're overcomplicating. As with most method calls in Ruby, you can simply chain #gsub calls together, one after the other:
str = 'adfh'
print str.gsub("a","b").gsub("h","l") #=> 'bdfl'
What you're doing here is applying the second #gsub to the result of the first one.
Of course, that gets a bit long-winded if you do too many of them. So, when you find yourself stringing too many together, you'll want to look for a regex solution. Rubular is a great place to tinker with them.
The way to use your hash trick with #gsub and a regex expression is to provide a hash for all possible matches. This has the same result as the two #gsub calls:
print str.gsub(/[ah]/, {'a'=>'b', 'h'=>'l'}) #=> 'bdfl'
The regex matches either a or h (/[ah]/), and the hash is saying what to substitute for each of them.
All that said, str.tr('ah', 'bl') is the simplest way to solve your problem as specified, as some commenters have mentioned, so long as you are working with single letters. If you need to work with two or more characters per substitution, you'll need to use #gsub.

How to sort an array using only the first 5 characters of a string in Ruby?

So, say I have an array of strings such as:
["74712 Don", "48342 Cindy", "50912 Nick"]
and I want to sort them by the number in front of the name. How would I sort by only the first 5 characters of each element (while also evaluating them as numbers)?
Thanks
Assuming you wish to sort by the leading digits of the strings, you can do the following:
["74712 Don", "48342 Cindy", "50912 Nick"].sort_by(&:to_i)
#=> ["48342 Cindy", "50912 Nick", "74712 Don"]
This works because String#to_i ignores "extraneous characters past the end of a valid number".
If some elements of the array may have more than five leading digits, but only the first five are to be considered, one would use
["74712 Don", "48342 Cindy", "209124 Nick"].sort_by { |s|
s[0,5].to_i }
#=> ["209124 Nick", "48342 Cindy", "74712 Don"]

Ruby Delete From Array On Criteria

I'm just learning Ruby and have been tackling small code projects to accelerate the process.
What I'm trying to do here is read only the alphabetic words from a text file into an array, then delete the words from the array that are less than 5 characters long. Then where the stdout is at the bottom, I'm intending to use the array. My code currently works, but is very very slow since it has to read the entire file, then individually check each element and delete the appropriate ones. This seems like it's doing too much work.
goal = File.read('big.txt').split(/\s/).map do |word|
word.scan(/[[:alpha:]]+/).uniq
end
goal.each { |word|
if word.length < 5
goal.delete(word)
end
}
puts goal.sample
Is there a way to apply the criteria to my File.read block to keep it from mapping the short words to begin with? I'm open to anything that would help me speed this up.
You might want to change your regex instead to catch only words longer than 5 characters to begin with:
goal = File.read('C:\Users\bkuhar\Documents\php\big.txt').split(/\s/).flat_map do |word|
word.scan(/[[:alpha:]]{6,}/).uniq
end
Further optimization might be to maintain a Set instead of an Array, to avoid re-scanning for uniqueness:
goal = Set.new
File.read('C:\Users\bkuhar\Documents\php\big.txt').scan(/\b[[:alpha:]]{6,}\b/).each do |w|
goal << w
end
In this case, use the delete_if method
goal => your array
goal.delete_if{|w|w.length < 5}
This will return a new array with the words of length lower than 5 deleted.
Hope this helps.
I really don't understand what a lot of the stuff you are doing in the first loop is for.
You take every chunk of text separated by white space, and map it to a unique value in an array generated by chunking together groups of letter characters, and plug that into an array.
This is way too complicated for what you want. Try this:
goal = File.readlines('big.txt').select do |word|
word =~ /^[a-zA-Z]+$/ &&
word.length >= 5
end
This makes it easy to add new conditions, too. If the word can't contain 'q' or 'Q', for example:
goal = File.readlines('big.txt').select do |word|
word =~ /^[a-zA-Z]+$/ &&
word.length >= 5 &&
! word.upcase.include? 'Q'
end
This assumes that each word in your dictionary is on its own line. You could go back to splitting it on white space, but it makes me wonder if the file you are reading in is written, human-readable text; a.k.a, it has 'words' ending in periods or commas, like this sentence. In that case, splitting on whitespace will not work.
Another note - map is the wrong array function to use. It modifies the values in one array and creates another out of those values. You want to select certain values from an array, but not modify them. The Array#select method is what you want.
Also, feel free to modify the Regex back to using the :alpha: tag if you are expecting non-standard letter characters.
Edit: Second version
goal = /([a-z][a-z']{4,})/gi.match(File.readlines('big.txt').join(" "))[1..-1]
Explanation: Load a file, and join all the lines in the file together with a space. Capture all occurences of a group of letters, at least 5 long and possibly containing but not starting with a '. Put all those occurences into an array. the [1..-1] discards "full match" returned by the MatchData object, which would be all the words appended together.
This works well, and it's only one line for your whole task, but it'll match
sugar'
in
I'd like some 'sugar', if you know what I mean
Like above, if your word can't contain q or Q, you could change the regex to
/[a-pr-z][a-pr-z']{4,})[ .'",]/i
And an idea - do another select on goal, removing all those entries that end with a '. This overcomes the limitations of my Regex

Ruby regex to capture any string with a number

I am looking for a regular expression in Ruby to capture a sentence that has any sort of number in it.
For instance, I need to capture all of the following:
"5 different ways to do it"
"2 x 2 is certainly 4"
"there are 15 different things"
"Try to get to 10"
I only want to capture sentences with a number within, but that has nothing else before or after the number. I don't want to include things like:
"$2 billion dollars"
"The 5x effect"
It has to be just a sequence for 1 or more numbers at the beginning, middle, or end of a sentence.
Thanks.
You probably want something like:
/^.*(?<!\S)\d+(?!\S).*$/
Which will match a number and "look-around" for a non-space.
This
(s =~ /(^|\s)\d+(\s|$)/) ? s : nil
will return the string s if it contains at least one non-negative integer, that is:
the entire string,
at the beginning of the string followed by a whitespace character,
at the end the string preceded by a whitespace character, or
is both preceded and followed by a whitespace character.

How can I determine that one alphanumeric ID is greater than another in Ruby?

Right now I am working on a project that issues IDs consisting of both letters and numbers, for example 345A22. I need this program to be able to tell that for example, 345B22 is greater than 345A22. I can't assume that the letters will be in the same position all the time (ie we do have some id's with 22335Q) but when I compare two numbers the letters will be in the same position.
How do I accomplish this in Ruby?
You can use the String#<=> method to compare strings. See documentation here.
>> "345B22" <=> "345A22"
=> 1
Where the 1 return value means that 345B22 is greater.
If a simple string comparison won't do the trick (e.g. different lengths, etc.), try converting the IDs (assuming they all match ^[0-9A-Z]*$) into integers by treating them as base36-encoded data.
In Ruby strings have the same comparison methods as numbers have.
2 > 1 #=> true
"2" > "1" #=> true
"B" > "A" #=> true
Not sure I understand your question, but I'm guessing that you mentally parse the ids into components (so 345B22 is 345, B, 22) and then are wishing for a numeric sort for things that are numbers (i.e., 12 > 2) and a string sort for things that are strings (AB < B).
If this is what you intend, something like the following would do the trick:
ids.sort_by do |id|
id.scan(/\d+|[a-zA-Z]+/).map {|c| c =~ /\d/ ? c.rjust(20) : c.ljust(20) }.join
end
What this does is extract out all consecutive numbers or letters and then justify them right or left based on their type, concatenates the result and then sorts based on this (expanded and canonicalized) id.

Resources