Extracting numbers and slashes from a string

Extracting numbers and slashes from a string - ruby

I have the following string
Policy 023203232/02/05 saved successfully
And I need to extract 023203232/02/05 from the above string and I have written the following code
puts a[/\d+\/\d+\/\d+/]
And it works fine. But If the number increases with the slashes like, 023203232/02/05/06 I have to include one more \d+ but I don't know how many slashes and number would repeat this way, So any one can suggest me to write some generic solution
If string is
Policy 023203232/02/05 saved successfully
Then
023203232/02/05
If string is
Policy 023203232/02/05/06 saved successfully
Then
023203232/02/05/06
If string is
Policy 023203232/02/05/06/08 saved successfully
Then
023203232/02/05/06/08
How to write such a generic regular expression?

Assuming contiguous digits and slashes (i.e. no spaces in between), this should work:
a.scan(/\d+\/?/).join

This regex should be what you're looking for :
/(\d+\/?)+/
It means :
At least one pattern of :
at least one digit
possibly followed by a /
It should be a bit more robust than the other answers :
"Policy 023203232/02/05/06/08 saved successfully"[/(\d+\/?)+/]
# => "023203232/02/05/06/08"
"Policy 023203232/02/05/07/3434343/56 saved successfully 09/56/32"[/(\d+\/?)+/]
# => "023203232/02/05/07/3434343/56"
"Policy // // 023203232/02/05/07/3434343/56 saved successfully 09/56/32"[/(\d+\/?)+/]
# => "023203232/02/05/07/3434343/56"
If you want to make sure that the number is right after 'Policy ' but don't want to have 'Policy ' inside your match, you can use a positive look-behind :
/(?<=Policy )(\d+\/?)+/
Here in action :
"2017/03/31 Policy 023203232/02/05/07/3434343/56 saved successfully"[/(?<=Policy )(\d+\/?)+/]
# => "023203232/02/05/07/3434343/56"

This one looks for a serie of digits or slashes:
str = "Policy 023203232/02/05/07/3434343/56 saved successfully 09/56/3"
p str.match(/[0-9\/]+/)[0] # => "023203232/02/05/07/3434343/56"

Related

Replace special character with its index

I need to replace all special characters within a string with their index.
For example,
"I-need_to#change$all%special^characters^"
should become:
"I1need6to9change16all20special28characters39"
The index of all special character differs.
I have checked many links replacing all with single character, occurances of a character.
I found very similar link but it I do not want to adopt these replace its index number as I need to replace all of the special characters.
I have also tried to do something like this:
str.gsub!(/[^0-9A-Za-z]/, '')
Here str is my example string.
As this replaces all the characters but with space, and I want the index instead of space. Either all of the special character or these seven
\/*[]:?
I need to replace this seven mainly but it would be OK if we replace all of them.
I need a simpler way.
Thanks in advance.

You can use the global variable $` and the block form of gsub:
irb> str = "I-need_to#change$all%special^characters^"
=> "I-need_to#change$all%special^characters^"
irb> str.gsub(/[^0-9A-Za-z]/) { $`.length }
=> "I1need6to9change16all20special28characters39"

Extracting a string using regular expression

I need to extract a string 'MT/23232' I have written the below code, but
it's not working, Can any one help me here?
'Policy created with MT/1212'
'Policy created with MT/121212'
'Policy created with MT/21212121212'
I have written this code
msg="MT/33235"
id = msg.scan(/MT/\d+/\d+/)[0]
But it's not working for me, Can any one help me to extract this string?

You need to escape the forward slash which exists next to MT in your regex and you don't need to have a forward slash after \d+ . And also i suggest you to add a lookbehind, so that you get a clean result. (?<=\s) Positive lookbehind which asserts that the match must be preceded by a space character.
msg.scan(/(?<=\s)MT\/\d+/)[0]
If you don't care about the preceding character then the below regex would be fine.
msg.scan(/MT\/\d+/)[0]
Example:
> msg = 'Policy created with MT/21212121212'
=> "Policy created with MT/21212121212"
> msg.scan(/(?<=\s)MT\/\d+/)[0]
=> "MT/21212121212"
> msg.match(/(?<=\s)MT\/\d+/)[0]
=> "MT/21212121212"

your_string.scan(/\sMT.*$/).last.strip
If your required substring can be anywhere in the string, then:
your_string.scan(/\bMT\/\d+\b/).last.strip # "\b" is for word boundaries
Or you can specify the acceptable digits this way:
your_string.scan(/\bMT\/[0-9]+\b/).last.strip
Lastly, if the string format is going to remain as you specified, then:
your_string.split.last

ruby extract string between two string

I am having a string as below:
str1='"{\"#Network\":{\"command\":\"Connect\",\"data\":
{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"'
I wanted to extract the somename string from the above string. Values of xx:xx:xx:xx:xx:xx, somename and 123456789 can change but the syntax will remain same as above.
I saw similar posts on this site but don't know how to use regex in the above case.
Any ideas how to extract the above string.

Parse the string to JSON and get the values that way.
require 'json'
str = "{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
json = JSON.parse(str.strip)
name = json["#Network"]["data"]["Name"]
pwd = json["#Network"]["data"]["Pwd"]

Since you don't know regex, let's leave them out for now and try manual parsing which is a bit easier to understand.
Your original input, without the outer apostrophes and name of variable is:
"{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
You say that you need to get the 'somename' value and that the 'grammar will not change'. Cool!.
First, look at what delimits that value: it has quotes, then there's a colon to the left and comma to the right. However, looking at other parts, such layout is also used near the command and near the pwd. So, colon-quote-data-quote-comma is not enough. Looking further to the sides, there's a \"Name\". It never occurs anywhere in the input data except this place. This is just great! That means, that we can quickly find the whereabouts of the data just by searching for the \"Name\" text:
inputdata = .....
estposition = inputdata.index('\"Name\"')
raise "well-known marker wa not found in the input" unless estposition
now, we know:
where the part starts
and that after the "Name" text there's always a colon, a quote, and then the-interesting-data
and that there's always a quote after the interesting-data
let's find all of them:
colonquote = inputdata.index(':\"', estposition)
datastart = colonquote+3
lastquote = inputdata.index('\"', datastart)
dataend = lastquote-1
The index returns the start position of the match, so it would return the position of : and position of \. Since we want to get the text between them, we must add/subtract a few positions to move past the :\" at begining or move back from \" at end.
Then, fetch the data from between them:
value = inputdata[datastart..dataend]
And that's it.
Now, step back and look at the input data once again. You say that grammar is always the same. The various bits are obviously separated by colons and commas. Let's try using it directly:
parts = inputdata.split(/[:,]/)
=> ["\"{\\\"#Network\\\"",
"{\\\"command\\\"",
"\\\"Connect\\\"",
"\\\"data\\\"",
"\n{\\\"Id\\\"",
"\\\"xx",
"xx",
"xx",
"xx",
"xx",
"xx\\\"",
"\\\"Name\\\"",
"\\\"somename\\\"",
"\\\"Pwd\\\"",
"\\\"123456789\\\"}}}\\0\""]
Please ignore the regex for now. Just assume it says a colon or comma. Now, in parts you will get all the, well, parts, that were detected by cutting the inputdata to pieces at every colon or comma.
If the layout never changes and is always the same, then your interesting-data will be always at place 13th:
almostvalue = parts[12]
=> "\\\"somename\\\""
Now, just strip the spurious characters. Since the grammar is constant, there's 2 chars to be cut from both sides:
value = almostvalue[2..-3]
Ok, another way. Since regex already showed up, let's try with them. We know:
data is prefixed with \"Name\" then colon and slash-quote
data consists of some text without quotes inside (well, at least I guess so)
data ends with a slash-quote
the parts in regex syntax would be, respectively:
\"Name\":\"
[^\"]*
\"
together:
inputdata =~ /\\"Name\\":\\"([^\"]*)\\"/
value = $1
Note that I surrounded the interesting part with (), hence after sucessful match that part is available in the $1 special variable.
Yet another way:
If you look at the grammar carefully, it really resembles a set of embedded hashes:
\"
{ \"#Network\" :
{ \"command\" : \"Connect\",
\"data\" :
{ \"Id\" : \"xx:xx:xx:xx:xx:xx\",
\"Name\" : \"somename\",
\"Pwd\" : \"123456789\"
}
}
}
\0\"
If we'd write something similar as Ruby hashes:
{ "#Network" =>
{ "command" => "Connect",
"data" =>
{ "Id" => "xx:xx:xx:xx:xx:xx",
"Name" => "somename",
"Pwd" => "123456789"
}
}
}
What's the difference? the colon was replaced with =>, and the slashes-before-quotes are gone. Oh, and also opening/closing \" is gone and that \0 at the end is gone too. Let's play:
tmp = inputdata[2..-4] # remove opening \" and closing \0\"
tmp.gsub!('\"', '"') # replace every \" with just "
Now, what about colons.. We cannot just replace : with =>, because it would damage the internal colons of the xx:xx:xx:xx:xx:xx part.. But, look: all the other colons have always a quote before them!
tmp.gsub!('":', '"=>') # replace every quote-colon with quote-arrow
Now our tmp is:
{"#Network"=>{"command"=>"Connect","data"=>{"Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789"}}}
formatted a little:
{ "#Network"=>
{ "command"=>"Connect",
"data"=>
{ "Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789" }
}
}
So, it looks just like a Ruby hash. Let's try 'destringizing' it:
packeddata = eval(tmp)
value = packeddata['#Network']['data']['Name']
Done.
Well, this has grown a bit and Jonas was obviously faster, so I'll leave the JSON part to him since he wrote it already ;) The data was so similar to Ruby hash because it was obviously formatted as JSON which is a hash-like structure too. Using the proper format-reading tools is usually the best idea, but mind that the JSON library when asked to read the data - will read all of the data and then you can ask them "what was inside at the key xx/yy/zz", just like I showed you with the read-it-as-a-Hash attempt. Sometimes when your program is very short on the deadline, you cannot afford to read-it-all. Then, scanning with regex or scanning manually for "known markers" may (not must) be much faster and thus prefereable. But, still, much less convenient. Have fun.

Rails 3 + regex - Replace part of a string, 1 occurrence

I'm new to Rails, and furthermore to regex. Been looking around, but I'm blocked...
I have a string like this :
Current: http://zs.domain.com/user_images/123456789/imageName_size.ext
Wanted: http://zs.domain.com/user_images/123456789/imageName.ext
I've managed to get to this :
http://a0.twimg.com/profile/1240267050/logo1.png
=> losing all occurrences with
picture.gsub!(/_([a-z0-9-]+)/, '')
or this :
http://a0.twimg.com/profile_images/1240267050/logo1
=> changing only the last occurrence, but losing the extension with
picture.gsub!(/_([a-z0-9-]+)**.(png|gif|jpg|jpeg)**/, '')

You're almost there. The second parameter is the string with which the match will be replaced, and you can re-use matched groups from the match. This will do the trick:
picture.gsub!(/_([a-z0-9-]+).(png|gif|jpg|jpeg)/, '.\2')
To accomodate for the additional conditions, as posed in the comment:
picture.gsub!(/_([^\/]+).(png|gif|jpg|jpeg)/, '.\2')

markijbema's answer will change the string
.../xxx_yyygifzzz/...,
into
.../xxxgifzzz/....
In order to avoid that, you can do this:
picture.gsub!(/_[^\/]+(?=\.[^\.]+\z)/, '')
(?=...) is understood as a context that follows the string, and will not be included in the match.
\z describes the end of the string, so this regexp is safe to use when some intermediate directory includes a string like above.

Why doesn't this Ruby replace regex work as expected?

Consider the following string which is a C fragment in a file:
strcat(errbuf,errbuftemp);
I want to replace errbuf (but not errbuftemp) with the prefix G-> plus errbuf. To do that successfully, I check the character after and the character before errbuf to see if it's in a list of approved characters and then I perform the replace.
I created the following Ruby file:
line = " strcat(errbuf,errbuftemp);"
item = "errbuf"
puts line.gsub(/([ \t\n\r(),\[\]]{1})#{item}([ \t\n\r(),\[\]]{1})/, "#{$1}G\->#{item}#{$2}")
Expected result:
strcat(G->errbuf,errbuftemp);
Actual result
strcatG->errbuferrbuftemp);
Basically, the matched characters before and after errbuf are not reinserted back with the replace expression.
Anyone can point out what I'm doing wrong?

Because you must use syntax gsub(/.../){"...#{$1}...#{$2}..."} or gsub(/.../,'...\1...\2...').
Here was the same problem: werid, same expression yield different value when excuting two times in irb
The problem is that the variable $1 is interpolated into the argument string before gsub is run, meaning that the previous value of $1 is what the symbol gets replaced with. You can replace the second argument with '\1 ?' to get the intended effect. (Chuck)

I think part of the problem is the use of gsub() instead of sub().
Here's two alternates:
str = 'strcat(errbuf,errbuftemp);'
str.sub(/\w+,/) { |s| 'G->' + s } # => "strcat(G->errbuf,errbuftemp);"
str.sub(/\((\w+)\b/, '(G->\1') # => "strcat(G->errbuf,errbuftemp);"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extracting numbers and slashes from a string - ruby

Assuming contiguous digits and slashes (i.e. no spaces in between), this should work: a.scan(/\d+\/?/).join

This one looks for a serie of digits or slashes: str = "Policy 023203232/02/05/07/3434343/56 saved successfully 09/56/3" p str.match(/[0-9\/]+/)[0] # => "023203232/02/05/07/3434343/56"

Related

Replace special character with its index

Extracting a string using regular expression

ruby extract string between two string

Rails 3 + regex - Replace part of a string, 1 occurrence

Why doesn't this Ruby replace regex work as expected?

Categories

Resources