How to match exactly one string when there are multiple matches [closed] - ruby

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am trying to extract from a file with 3-4 entries only the first journal reference. Any ideas on how to get only the first occurrence of a match?
Here is what I have done so far. I can extract the references, but I am getting all of them:
if file_line =~ /^ JOURNAL \*?(.*)/
captured_journal = $1
To be more clear, this is some of the file I am trying to extract only the first JOURNAL entry from:
JOURNAL Genomics 33 (2), 229-246 (1996)
PUBMED 8660972
REFERENCE 2 (bases 1 to 17009)
AUTHORS Lopez,J.V.
TITLE Direct Submission
JOURNAL Submitted (07-FEB-1995) Jose V. Lopez, Laboratory of Viral
Carcinogenesis, PRI/DynCorp, Biological Carcinogenesis and
Development Prog, Bldg 560, Room 11-21, NCI-Frederick Cancer
Research and Development Center, Frederick, MD 21702-1201, USA`enter code here`
I only want "Genomics 33 (2), 229-246 (1996)" but I am also getting the next JOURNAL entries.

It is hard to answer your question, your example does not show the complete coding.
One possibility: Your if file_line is inside a loop. Then you could leave the loop:
filecontent.each_line{|file_line|
if file_line =~ /^ JOURNAL \*?(.*)/
captured_journal = $1
break
end
}
As an alternative you could check, if you already found an entry:
captured_journal = nil
filecontent.each_line{|file_line|
if file_line =~ /^ JOURNAL \*?(.*)/
captured_journal = $1 unless captured_journal
end
}
But maybe you are not in a loop and the file content is stored in a String (e.g. with File.read). Then you could use a simple regex:
filecontent =~ /^ JOURNAL \*?(.*)/
captured_journal = $1
or
/^ JOURNAL \*?(.*)/.match(filecontent)[1]
Correction after you posted more details:
You could use the regex /^\s*JOURNAL\s+(.*)/. Your Regexp uses a fix number of spaces. With \s+ the number of spaces is flexible.

Related

Merge numbered files with variable names [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 months ago.
The community is reviewing whether to reopen this question as of 10 months ago.
Improve this question
I have a number of numbered files, e.g.:
alpha_01.txt alpha_02.txt beta_01.txt beta_02.txt
I want to execute a single line bash that will output correctly merged files based on their variable name (e.g. alpha, beta, ...), that is, alpha.txt beta.txt.
I can do so for a single file:
cat alpha_*.txt(n) >>alpha.txt 2>/dev/null
But I don‘t know the name before _*.txt.
Can I use a wildcard here? Or what would be the best solution?
If you want to concatenate all the alpha_xxx.txt files then you cannot have beta_xxx.txt in the arguments of cat.
As #tripleee said, the easiest way would be to use a for loop where you list all the prefixes:
for name in alpha beta
do
cat "$name"_*.txt > "$name".txt
done
Now, if you don't know the prefixes in advance then you can always workout something with awk:
awk '
BEGIN {
for (i = 1; i <= ARGC; i++) {
filename = ARGV[i]
if (filename !~ /^(.*\/)?[^\/]+_[0-9]+\.[^\/.]+$/)
continue
match(filename, /^(.*\/)?[^\/]+_/)
prefix = substr(filename, RSTART, RLENGTH-1)
match(filename, /\.[^.\/]+$/)
suffix = substr(filename, RSTART, RLENGTH)
outfile[filename] = prefix suffix
}
}
FILENAME in outfile { print $0 > outfile[FILENAME] }
' ./*.txt

Ruby - Find first and last names in text from email address [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
This is an interesting issue I have been playing around with but unable to find an answer for.
I have a text file of unstructured data that includes emails as well as full names. I already have the emails extracted but I want to map first and last names to each email as well.
So suppose the email is ksmith#gmail.com, and somewhere on the page is 'Kevin Smith'.
I'd want to use whatever is in front of '#' to map the full name from somewhere in the text. But obviously searching for 'ksmith' will return no match. So then, starting from the left, I'd search for one less character, ie 'smith', which would match.
But then when I find 'Smith', I also want to find the first name. so maybe assume this will always be the last name (since most emails have last but not first names) and search to the left from 'Smith' until reaching the next space (in front of 'Kevin') and figuring that what is between the space before 'Smith' and before 'Kevin' is the first name.
But then, what if the full name is "Kevin Michael Smith" or "Kevin P. Smith"? In which case I don't want "Michael" or "P.", but Kevin as the first name.
Or what if the email structure is smithk#gmail.com, in which case shrinking the substring from the left will never be a match and I'd need to try from the other side as well.
Basically I need a method smart enough to recognize these full names in a number of cases.
Any help would be appreciated!
I am trying to do this in Ruby, if that helps
When you find the last name, you move back to the first name so instead of moving left of 'Smith' until reaching the next space, you should see if there is space behind the first alphabet of next name for example your algorithm for "Kevin P. Smith" will find "P." but if you check if there is space behind "P" then find next part of the name. So for "Kevin Micheal John Smith" you will get Kevin because first you reach "John" then you see there is space behind "J" so you move back to "Micheal" again there is space bind "M" so you move to "Kevin". As there is no space behind Kevin so you have the first name.
Easiest solution is to use the Split function for example
string_=string_.split(" ");
firstName=string_[0];
my suggestion is to write an algorithm, which make an array of full name. for exmple :
a = ["kevin smit", "andrew john", "thom devid", "M. K. Add","k smith"]
b= "ksmith#gmail.com"
c = b.split('#')[0]
=> "ksmith"
first = c[0]
=> "k"
last = c[1..c.length]
=> "smith"
a.each do |i|
if i.gsub(" ").count == 1
if (i.split(" ")[0][0] == first && i.split(" ")[1] == last) || (i.split(" ")[0][0] == last && i.split(" ")[1] == first)
p i
end
elsif i.gsub(" ").count == 2
if (i.split(" ")[0][0] == first && i.split(" ")[2] == last) || (i.split(" ")[0][0] == last && i.split(" ")[2] == first)
p i.split(" ")[0] + i.split(" ")[2]
end
end
end
This will works for you. And you can use switch-case insted of if-else if there are multiple scenarios

Retrieving multiple matched tokens from a regexp [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Here's what I want to happen
> /x(y\d)*/.somefunction('xy1y2y3').each { |x| puts x }
y1
y2
y3
This seems like a pretty natural use of the asterisk in a regexp. I've matched a bunch of tokens and I want them printed out.
The closest I've been able to find is:
/x((y\d)*)/.match('xy1y2y3')[1].scan(/y\d/).each { |x| puts x }
Which is just abysmal.
The issue you are running into has to do with the regex rather than Ruby. You are repeating a capture group rather than capturing a repeated group. You could use
str.scan(/x((?:y\d)*)/)
However, this will capture all of the groups combined as one string. In order to do what you actually want to do (check that the string follows the pattern x followed by these groups) you unfortunately need to do two steps as you are doing in your question. Either that, or you can remove the additional requirement and search only for the pattern as other answers are suggesting.
I assume this is what you want:
'xy1y2y3'.gsub(/y\d/) { |s| puts s }
The gsub method accepts a block.
Based on your input and output, this looks about right:
'xy1y2y3'.scan(/y\d/)
# => ["y1", "y2", "y3"]
Use this if you want to print them:
puts 'xy1y2y3'.scan(/y\d/)
# >> y1
# >> y2
# >> y3
String's scan is your friend if you want to look through a string and capture repeating patterns.

ruby regular expression match string between last two delimiters [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need to match everything between last two '/' in a regex
for example: for string tom/jack/sam/jill/ ---> I need to match jill
and in that case also need to match tom/jack/sam (without the last '/')
Thoughts appreciated!
1)
str = "tom/jack/sam/jill/"
*the_rest, last = str.split("/")
the_rest = the_rest.join("/")
puts last, the_rest
--output:--
jill
tom/jack/sam
2)
str = "tom/jack/sam/jill/"
md = str.match %r{
(.*) #Any character 0 or more times(greedy), captured in group 1
/ #followed by a forward slash
([^/]+) #followed by not a forward slash, one or more times, captured in group 2
}x #Ignore whitespace and comments in regex
puts md[2], md[1] if md
--output:--
jill
tom/jack/sam
If what you want is given a string tom/jack/sam/jill/ extract two groups: jill and tom/jack/sam/.
The regexp you need is: ^((?:[^\/]+\/)+)([^\/]+)\/$.
Note that regexp does not accept / in the begin of string and request a / in the end of string.
Take a look: http://rubular.com/r/mxBYtC31N2

In ruby, how do I split a string into an array, using only the last occurrence of a word [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am creating an event, and I would like to be able to parse a single string and populate the model's attributes. As an example, I would like to do the following:
string = "Workout at the gym at 7pm on July 4th for 1 hour"
from this string, I would like to set the following variables:
title = Workout at the gym
date_time = 7pm on July 4th
duration = 1 hour
If you are always going to use that format you can do:
re = string.match(/(?<title>.*) at (?<date_time>.*) for (?<duration>.*)/)
title, date_time, duration = re[:title], re[:date_time], re[:duration]
# ["Workout at the gym", "7pm on July 4th", "1 hour"]
The following should work for you:
/^(.*) at (\d{1,2}[a|p]m.*) for (.*)$/gm
In the replace, you would use:
title = $1\n\ndate_time = $2\n\nduration = $3
Working example: http://regexr.com?35i10
Explanation:
^ means that we want to start at the beginning of the line
(.*) at means store everything until at in first variable.
(\d{1,2}[a|p]m.*) means a 1 or 2 digit number (\d{1,2}) followed by a OR p + m then another everything until...
for simple enough.
(.*)$ means store everything until the end of the line.
/gm tells the regular expression to be Global and Multi-line
str = "Workout at the gym at 7pm on July 4th for 1 hour"
a = str.split(/at|for/).map(&:strip)
# => ["Workout", "the gym", "7pm on July 4th", "1 hour"]
duration,date_time,title = a.pop,a.pop,a.join(" ")
duration # => "1 hour"
date_time # => "7pm on July 4th"
title # => "Workout the gym"

Resources