I am missing something obvious. I have a string that is tab separated in a text file. I read it into an argument
Here is what it looks like in the text file:
hello world foo bar
So each of those words has a tab between them in the text file.
I read it into a variable
line = ""
File.open("some_file", "r+") do |file|
line = file.gets
end
Now I simply want to split up the words by the tab separation:
word1, word2, word3, word4 = line.split("\t")
However what is happening is that it is putting ALL the words in the first variable, leaving the other variables with nil
p word1
=> "hello world foo bar"
p word2
=> nil
p word3
=> nil
p word4
=> nil
What am I missing? A word should be within each of those variables.
This is because your string does not contain "\t" in it (but rather spaces):
words = 'hello world foo bar'
words.split(' ')
#=> ["hello", "world", "foo", "bar"]
If it really would contain tabs:
"hello\tworld"
you then would indeed be able to split it as intended:
"hello\tworld".split("\t")
#=> ["hello", "world"]
Related
I would like a regexp that match all groups of words (single words and sub-sentences) in a sentence separated by white space.
Example :
"foo bar bar2".scan(regexp)
I want a regexp that will returns :
['foo', 'bar', 'bar2', 'foo bar', 'bar bar2', 'foo bar bar2']
So far, I tried :
"foo bar bar2".scan(/\S*[\S]/) (ie regexp=/\S*/)
which returns ['foo', 'bar', 'bar2']
"foo bar bar2".scan(/\S* [\S]+/) (ie regexp=/\S* [\S]+/)
which returns ["foo bar", " bar2"]
words = "foo bar bar2".scan(/\S+/)
result = 1.upto(words.length).map do |n|
words.each_cons(n).to_a
end.flatten(1)
#⇒ [["foo"], ["bar"], ["bar2"],
# ["foo", "bar"], ["bar", "bar2"],
# ["foo", "bar", "bar2"]]
result.map { |e| e.join(' ') }
#⇒ ["foo", "bar", "bar2", "foo bar", "bar bar2", "foo bar bar2"]
Here we used Enumerable#each_cons to get to the result.
Mudasobwa did a nice variation of this answer check here.
I've used combine , builtin method for arrays. The procedure is almost the same:
string = "foo bar bar2"
groups = string.split
objects = []
for i in 1..groups.size
groups = string.split.combination(i).to_a
objects << groups
end
results = objects.flatten(1).map { |e| e.join('-') }
puts results
Anyway , you can't do it with one regex.(suppose you have 50 words and need to find all the combinations; regex can't do it). You will need to iterate with the objects like Mudasobwa showed.
I would start doing this: the regex, if you want to use one, can be /([^\s]\w+)/m ; for example.
This regex will match words. And by words I mean groups of characters surrounded by white-spaces.
With this you can scan your text or split your string. You can do it many ways and in the end you will have an array with the words you wanna combine.
string = "foo bar bar2"
Then you split it, creating an array and applying to it the combination method.
groups = string.split
=> ["foo", "bar", "bar2"]
combination method takes a number as argument, and that number will be the 'size' of the combination. combination(2) combines the elements in groups of two. 1 - groups of 1 .. 0 groups of zero! (this is why we start combinations with 1).
You need to loop and cover all possible group sizes, saving the results
in a results array. :
objects = []
use the number of elements as parameter to the loop
for i in 1..groups.size
groups = string.split.combination(i).to_a
objects << groups
end
Now you just have to finish with a loop to flatten the arrays that are inside arrays and to take out the comas and double quotes
results = objects.flatten(1).map { |e| e.join('-') }
Thats it! You can run the code above (example with more words)here https://repl.it/JLK9/1
Ps: both question and the mentioned answer are lacking a combination (foo-bar2)
I have a string that I want to extract all but a certain pattern into another variable.
first_string = "Q13 Hello, World!"
I'd like to get the Hello, World! out of the string and into another variable so that: second_string = "Hello, World!".
I attempted to create a regex that extracts all but the "Q13" and it works on Rubular but not in the console.
> first_string = "Q13 Hello, World!"
> second_string = first_string.scan(/[^(Q[0-9]{1,})]/)
=> [" ", "H", "e", "l", "l", "o", ",", " ", "W", "o", "r", "l", "d", "!"]
> second_string.join()
=> " Hello World!"
This is fine but I can't lose the leading space using the regex. That wouldn't be a problem except I have some application specific caveats...
Not all strings will have "Q13"... the "Q" will be there but the number will change. I don't know if "Q13" will come at the beginning or end of the text. I can't be certain what text will be in the string.
I can't rely on the leading space being there. It might also be a trailing space.
Any ideas?
Assuming you want to omit the Q[number] and any surrounding whitespace:
second_string = first_string.gsub(/\s?Q\d+\s?/, "")
If you want to omit the Q[number] but not the surrounding whitespace:
second_string = first_string.gsub(/Q\d+/, "")
Try this:
second_string = first_string.scan(/\A(?:Q[0-9]+)?(?: )?(.*?)(?: )?(?:Q[0-9]+)?\z/).flatten.first
Live test in Ruby console
2.0.0p247 :001 > first_string = "Q12 Hello World! Q87"
=> "Q12 Hello World! Q87"
2.0.0p247 :002 > second_string = first_string.scan(/\A(?:Q[0-9]+)?(?: )?(.*?)(?: )?(?:Q[0-9]+)?\z/).flatten.first
=> "Hello World!"
I am using Ruby and looking for a way to read in a sample string with the following text:
"This is a test
file, dog cat bark
meow woof woof"
and split elements into an array of characters based on whitespace, but to keep the \n value in the array as a separate element.
I know I can use the string.split(/\n/) to get
["this is a test", "file, dog cat bark", "meow woof woof"]
Also string.split(/ /) yields
["this", "is", "a", "test\nfile,", "dog", "cat", "bark\nmeow", "woof", "woof"]
But I am looking for a way to get:
["this", "is", "a", "test", "\n", "file,", "dog", "cat", "bark", "\n", "meow", "woof", "woof"]
Is there any way to accomplish this using Ruby?
It's a strange thing to do but:
string.split /(?=\n)|(?<=\n)| /
#=> ["This", "is", "a", "test", "\n", "file,", "dog", "cat", "bark", "\n", "meow", "woof", "woof"]
You could turn your logic around a bit and look for what you want instead of looking for the delimiters between what you want. A simple scan like this should do the trick:
>> s.scan(/\S+|\n+/)
=> ["This", "is", "a", "test", "\n", "file,", "dog", "cat", "bark", "\n", "meow", "woof", "woof"]
That assumes that repeated \n should be a single token of course.
This isn't particularly elegant, but you could try replacing "\n" with " \n " (note the spaces surrounding \n), and then split the resulting string on / /.
This is an odd request, and perhaps, if you told us WHY you want to do that, we could help you do it in a more straightforward and conventional fashion.
It looks like you're trying to split the words and still know where your original line-ends were. Having the lines split into individual words is useful for many things, but keeping the line-ends... not so much in my experience.
When I'm dealing with text and need to break the lines up for processing, I do it this way:
text = "This is a test
file, dog cat bark
meow woof woof"
data = text.lines.map(&:split)
At this point, data looks like:
[["This", "is", "a", "test"],
["file,", "dog", "cat", "bark"],
["meow", "woof", "woof"]]
I know that each sub-array was a separate line, so if I need to process by lines I can do it using an iterator like each or map, or to reconstruct the original text I can join(" ") the sub-array elements, then join("\n") the resulting lines:
data.map{ |a| a.join(' ') }.join("\n")
=> "This is a test\nfile, dog cat bark\nmeow woof woof"
I'm relatively new to ruby and I'm trying to figure out the "ruby" way of extracting multiple values from a string, based on grouping in regexes. I'm using ruby 1.8 (so I don't think I have named captures).
I could just match and then assign $1,$2 - but I feel like there's got to be a more elegant way (this is ruby, after all).
I've also got something working with grep, but it seems hackish since I'm using an array and just grabbing the first element:
input="FOO: 1 BAR: 2"
foo, bar = input.grep(/FOO: (\d+) BAR: (\d+)/){[$1,$2]}[0]
p foo
p bar
I've tried searching online and browsing the ruby docs, but haven't been able to figure anything better out.
Rubys String#match method returns a MatchData object with the method captures to return an Array of captures.
>> string = "FOO: 1 BAR: 2"
=> "FOO: 1 BAR: 2"
>> string.match /FOO: (\d+) BAR: (\d+)/
=> #<MatchData "FOO: 1 BAR: 2" 1:"1" 2:"2">
>> _.captures
=> ["1", "2"]
>> foo, bar = _
=> ["1", "2"]
>> foo
=> "1"
>> bar
=> "2"
To Summarize:
foo, bar = input.match(/FOO: (\d+) BAR: (\d+)/).captures
Either:
foo, bar = string.scan(/[A-Z]+: (\d+)/).flatten
or:
foo, bar = string.match(/FOO: (\d+) BAR: (\d+)/).captures
Use scan instead:
input="FOO: 1 BAR: 2"
input.scan(/FOO: (\d+) BAR: (\d+)/) #=> [["1", "2"]]
I have an array of strings, of different lengths and contents.
Now i'm looking for an easy way to extract the last word from each string, without knowing how long that word is or how long the string is.
something like;
array.each{|string| puts string.fetch(" ", last)
This should work just fine
"my random sentence".split.last # => "sentence"
to exclude punctuation, delete it
"my random sentence..,.!?".split.last.delete('.!?,') #=> "sentence"
To get the "last words" as an array from an array you collect
["random sentence...", "lorem ipsum!!!"].collect { |s| s.split.last.delete('.!?,') } # => ["sentence", "ipsum"]
array_of_strings = ["test 1", "test 2", "test 3"]
array_of_strings.map{|str| str.split.last} #=> ["1","2","3"]
["one two", "three four five"].collect { |s| s.split.last }
=> ["two", "five"]
"a string of words!".match(/(.*\s)*(.+)\Z/)[2] #=> 'words!' catches from the last whitespace on. That would include the punctuation.
To extract that from an array of strings, use it with collect:
["a string of words", "Something to say?", "Try me!"].collect {|s| s.match(/(.*\s)*(.+)\Z/)[2] } #=> ["words", "say?", "me!"]
The problem with all of these solutions is that you only considering spaces for word separation. Using regex you can capture any non-word character as a word separator. Here is what I use:
str = 'Non-space characters, like foo=bar.'
str.split(/\W/).last
# "bar"
This is the simplest way I can think of.
hostname> irb
irb(main):001:0> str = 'This is a string.'
=> "This is a string."
irb(main):002:0> words = str.split(/\s+/).last
=> "string."
irb(main):003:0>