extracting strings out of one long string in Ruby - ruby

I have this really long string and I would like to extract specific strings out of it in a list form.
the string:
[#<User id: 1, login: "test", hash ... ]
I would like to extract everything that appears in between login: " and ", so in this case it would be the word test. This string can be indefinitely long but the pattern will be the same. How can I go about extracting the words out in a list form?
thanks!

string.scan(/login: "(.*?)",/)

Related

How to remove the last character from a string in Jmeter

I have a string like below and am trying to remove the last character from that string. can someone please help on this
what if I have a lengthy string and I want to only remove the last character of my string.
Example: "city": "Winston Salem","state": "NC","zip": "27127","country": " "}}
and I want to only remove the last '}'.
Use a String method like replace:
String newString = oldString.replace("}}", "}")
if that´s case or another one; you can use anywhere of methods of String API only if casts to String

Parse multiline fixed-width text files in ruby

I am trying to parse a multi-line fixed-width file in ruby and it seems I can't parse the information I need. I can parse fine when the information is in 1 line. for example:
Name LastName DOB
John Doe 01/01/2001
Jane Doe 01/02/2002
but where I am facing a challenge is when the file does have a structure like below
This message needs to be AccountId: 7854639
parsed in a single key Phone: 823972839563
of the json that I want to produce Email: test#test.com
The multiline text is always let's say on the same coordinates, and it is dynamic. Not sure how to parse this and map into a json value for example.
Here's a simplistic, un-golfed approach:
freeform_text = str.split('\n').map do |s|
m = s.match(/^(.*)\s+(.*):(.*)$/)
m[1] ? m[1].strip : ''
end.join(' ')
# Produces:
# "This message needs to be parsed in a single key of the json that I want to produce"
There are other, more-idiomatic approaches, but this gives you a hint of the direction to take.
str = "This message needs to be AccountId: 7854639
parsed in a single key Phone: 823972839563
of the json that I want to produce Email: test#test.com"
p str.scan(/([^\s]+:[^\n]+)/).flatten
See Ruby demo.

ruby extract string between two string

I am having a string as below:
str1='"{\"#Network\":{\"command\":\"Connect\",\"data\":
{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"'
I wanted to extract the somename string from the above string. Values of xx:xx:xx:xx:xx:xx, somename and 123456789 can change but the syntax will remain same as above.
I saw similar posts on this site but don't know how to use regex in the above case.
Any ideas how to extract the above string.
Parse the string to JSON and get the values that way.
require 'json'
str = "{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
json = JSON.parse(str.strip)
name = json["#Network"]["data"]["Name"]
pwd = json["#Network"]["data"]["Pwd"]
Since you don't know regex, let's leave them out for now and try manual parsing which is a bit easier to understand.
Your original input, without the outer apostrophes and name of variable is:
"{\"#Network\":{\"command\":\"Connect\",\"data\":{\"Id\":\"xx:xx:xx:xx:xx:xx\",\"Name\":\"somename\",\"Pwd\":\"123456789\"}}}\0"
You say that you need to get the 'somename' value and that the 'grammar will not change'. Cool!.
First, look at what delimits that value: it has quotes, then there's a colon to the left and comma to the right. However, looking at other parts, such layout is also used near the command and near the pwd. So, colon-quote-data-quote-comma is not enough. Looking further to the sides, there's a \"Name\". It never occurs anywhere in the input data except this place. This is just great! That means, that we can quickly find the whereabouts of the data just by searching for the \"Name\" text:
inputdata = .....
estposition = inputdata.index('\"Name\"')
raise "well-known marker wa not found in the input" unless estposition
now, we know:
where the part starts
and that after the "Name" text there's always a colon, a quote, and then the-interesting-data
and that there's always a quote after the interesting-data
let's find all of them:
colonquote = inputdata.index(':\"', estposition)
datastart = colonquote+3
lastquote = inputdata.index('\"', datastart)
dataend = lastquote-1
The index returns the start position of the match, so it would return the position of : and position of \. Since we want to get the text between them, we must add/subtract a few positions to move past the :\" at begining or move back from \" at end.
Then, fetch the data from between them:
value = inputdata[datastart..dataend]
And that's it.
Now, step back and look at the input data once again. You say that grammar is always the same. The various bits are obviously separated by colons and commas. Let's try using it directly:
parts = inputdata.split(/[:,]/)
=> ["\"{\\\"#Network\\\"",
"{\\\"command\\\"",
"\\\"Connect\\\"",
"\\\"data\\\"",
"\n{\\\"Id\\\"",
"\\\"xx",
"xx",
"xx",
"xx",
"xx",
"xx\\\"",
"\\\"Name\\\"",
"\\\"somename\\\"",
"\\\"Pwd\\\"",
"\\\"123456789\\\"}}}\\0\""]
Please ignore the regex for now. Just assume it says a colon or comma. Now, in parts you will get all the, well, parts, that were detected by cutting the inputdata to pieces at every colon or comma.
If the layout never changes and is always the same, then your interesting-data will be always at place 13th:
almostvalue = parts[12]
=> "\\\"somename\\\""
Now, just strip the spurious characters. Since the grammar is constant, there's 2 chars to be cut from both sides:
value = almostvalue[2..-3]
Ok, another way. Since regex already showed up, let's try with them. We know:
data is prefixed with \"Name\" then colon and slash-quote
data consists of some text without quotes inside (well, at least I guess so)
data ends with a slash-quote
the parts in regex syntax would be, respectively:
\"Name\":\"
[^\"]*
\"
together:
inputdata =~ /\\"Name\\":\\"([^\"]*)\\"/
value = $1
Note that I surrounded the interesting part with (), hence after sucessful match that part is available in the $1 special variable.
Yet another way:
If you look at the grammar carefully, it really resembles a set of embedded hashes:
\"
{ \"#Network\" :
{ \"command\" : \"Connect\",
\"data\" :
{ \"Id\" : \"xx:xx:xx:xx:xx:xx\",
\"Name\" : \"somename\",
\"Pwd\" : \"123456789\"
}
}
}
\0\"
If we'd write something similar as Ruby hashes:
{ "#Network" =>
{ "command" => "Connect",
"data" =>
{ "Id" => "xx:xx:xx:xx:xx:xx",
"Name" => "somename",
"Pwd" => "123456789"
}
}
}
What's the difference? the colon was replaced with =>, and the slashes-before-quotes are gone. Oh, and also opening/closing \" is gone and that \0 at the end is gone too. Let's play:
tmp = inputdata[2..-4] # remove opening \" and closing \0\"
tmp.gsub!('\"', '"') # replace every \" with just "
Now, what about colons.. We cannot just replace : with =>, because it would damage the internal colons of the xx:xx:xx:xx:xx:xx part.. But, look: all the other colons have always a quote before them!
tmp.gsub!('":', '"=>') # replace every quote-colon with quote-arrow
Now our tmp is:
{"#Network"=>{"command"=>"Connect","data"=>{"Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789"}}}
formatted a little:
{ "#Network"=>
{ "command"=>"Connect",
"data"=>
{ "Id"=>"xx:xx:xx:xx:xx:xx","Name"=>"somename","Pwd"=>"123456789" }
}
}
So, it looks just like a Ruby hash. Let's try 'destringizing' it:
packeddata = eval(tmp)
value = packeddata['#Network']['data']['Name']
Done.
Well, this has grown a bit and Jonas was obviously faster, so I'll leave the JSON part to him since he wrote it already ;) The data was so similar to Ruby hash because it was obviously formatted as JSON which is a hash-like structure too. Using the proper format-reading tools is usually the best idea, but mind that the JSON library when asked to read the data - will read all of the data and then you can ask them "what was inside at the key xx/yy/zz", just like I showed you with the read-it-as-a-Hash attempt. Sometimes when your program is very short on the deadline, you cannot afford to read-it-all. Then, scanning with regex or scanning manually for "known markers" may (not must) be much faster and thus prefereable. But, still, much less convenient. Have fun.

How do I extract the right most number in a string?

I have strings like this:
https://www.facebook.com/username_with_number_14/posts/101505775425654414
https://www.facebook.com/username/posts/101505775425654466
I need to extract the number on the end of the string in Ruby. In the first string, it is the second and last number, whereas in the second string it is the first, only and last number.
At the moment I am extracting the number like this:
int1 = Regexp.new('.*?(\\d+)',Regexp::IGNORECASE).match()[1]
But when this is applied to the first string, it extracts the number part of the username, not the desired number.
How can I do it so that it will work on both strings?
text = <<ENDTEXT
https://www.facebook.com/username_with_number_14/posts/101505775425654414
https://www.facebook.com/username/posts/101505775425654466
ENDTEXT
p text.lines.map{|line| line.scan(/\d+/).last}
#=> ["101505775425654414", "101505775425654466"]
for me works regexp like this:
^.*?(\d+)$
look here: http://rubular.com/r/CJzsgjedqJ
Try this
int1 = Regexp.new('.*\\/(\\d+)$',Regexp::IGNORECASE).match()[1]
The $ matches the end of the string. So I put all numbers from the last / to the end of the string into the capturing group 1.

Ruby: How can I process a CSV file with "bad commas"?

I need to process a CSV file from FedEx.com containing shipping history. Unfortunately FedEx doesn't seem to actually test its CSV files as it doesn't quote strings that have commas in them.
For instance, a company name might be "Dog Widgets, Inc." but the CSV doesn't quote that string, so any CSV parser thinks that comma before "Inc." is the start of a new field.
Is there any way I can reliably parse those rows using Ruby?
The only differentiating characteristic that I can find is that the commas that are part of a string have a space after then. Commas that separate fields have no spaces. No clue how that helps me parse this, but it is something I noticed.
you can use a negative lookahead
>> "foo,bar,baz,pop, blah,foobar".split(/,(?![ \t])/)
=> ["foo", "bar", "baz", "pop, blah", "foobar"]
Well, here's an idea: You could replace each instance of comma-followed-by-a-space with a unique character, then parse the CSV as usual, then go through the resulting rows and reverse the replace.
Perhaps something along these lines..
using gsub to change the ', ' to something else
ruby-1.9.2-p0 > "foo,bar,baz,pop, blah,foobar".gsub(/,\ /,'| ').split(',')
[
[0] "foo",
[1] "bar",
[2] "baz",
[3] "pop| blah",
[4] "foobar"
]
and then remove the | after words.
If you are so lucky as to only have one field like that, you can parse the leading fields off the start, the trailing fields off than end and assume whatever is left is the offending field. In python (no habla ruby) this would look something like:
fields = line.split(',') # doesn't work if some fields are quoted
fields = fields[:5] + [','.join(fields[5:-3])] + fields[-3:]
Whatever you do, you should be able at a minimum determine the number of offending commas and that should give you something (a sanity check if nothing else).

Resources