URI with variable - ruby

I've got a file with uids on separate lines, and I'm trying to include them in a URI.
File.open("File Path").readlines.each do |line|
puts line
uid = line
uri = URI("http://example:port/path/variable=#{uid}&fragment")
res = Net::HTTP.get_response(uri)
puts res.body
But I get an error saying "bad URI(is not URI?)".
Could anyone help?
Thanks

Can you try a
uid = line.strip
The strip removes leading and trailing spaces and newlines.
With
p uid
or
puts uid.inspect
you may see the real content of the string.

It depends a lot on what are you actually feeding it, but I recommend trying these 2 things so you troubleshoot your code well.
Use puts "[#{uid}]" to see what does the line variable contain exactly. This will surely help you notice if it has a newline in it, for example. The right square bracket will be on the next line and you will know your input is malformed.
Try constructing the URL like this: uri = URI("http://example:port/path/variable=#{URI.encode(uid)}&fragment"). This will help you escape characters which are normally not allowed in an URI / URL.
Hope this helps.

do you means
uri = URI("http://example:port/path/?variable=#{uid}&fragment")

Related

Issue copying file into new file gsub with regex, variable and string?

I'm struggling with a script to target specific XML files in a directory and rename them as copies with a different name.
I put in the puts statements for debugging, and, from what I can tell, everything looks OK until the FileUtils.cp line. I tried this with simpler text and it worked, but my overly complicated cp(file, file.gsub()) seems to be causing problems that I can't figure out.
def piano_treatment(cats)
FileUtils.chdir('12Piano')
src = Dir.glob('*.xml')
src.each do |file|
puts file
cats.each do |text|
puts text
if file =~ /#{text}--\d\d/
puts "Match Found!!"
puts FileUtils.pwd
FileUtils.cp(file, file.gsub!(/#{text}--\d\d/, "#{text}--\d\dBass "))
end
end
end
end
piano_treatment(cats)
I get the following output in Terminal:
12Piano--05Free Stuff--11Test.xml
05Free Stuff
Match Found!!
/Users/mbp/Desktop/Sibelius_Export/12Piano
cp 12Piano--05Free Stuff--ddBass Test.xml 12Piano--05Free Stuff--ddBass Test.xml
/Users/mbp/.rvm/rubies/ruby-2.0.0-p247/lib/ruby/2.0.0/fileutils.rb:1551:in `stat': No such file or directory - 12Piano--05Free Stuff--ddBass Test.xml (Errno::ENOENT)
Why is \d\d showing up as "dd" when it should actually be numbers? Is this a single vs. double quote issue? Both yield errors.
Any suggestions are appreciated. Thanks.
EDIT One additional change was needed to this code. The FileUtils.chdir('12Piano') would change the directory for the first iteration of the loop, but it would revert to the source directory after that. Instead I did this:
def piano_treatment(cats)
src = Dir.glob('12Piano/*.xml')
which sets the match path for the whole method.
Your replacement string is not a regex, so \d has no special meaning, but is just a literal string. You need to specify a group in your regex, and then you can use the captured group in your replacement string:
FileUtils.cp(file, file.gsub(/#{text}--(\d\d)/, "#{text}--\\1Bass "))
The parenthesis in the regex form the group, which can be used (by number) in the replacement string: \1 for the first group, \2 for the second, etc. \0 refers to the entire regex match.
Update
Replaced gsub!() with gsub() and escaped the backslash in the replacement string (to treat \1 as the capture group, not a literal character... Doh!).

How to work with multibyte strings read from a file in Ruby? See inside

I'm a newbie to Ruby with Perl background. And I got some problems with .reverse of a multibyte string read from an utf-8 encoded file.
Code:
#!C:\Ruby200-x64\bin\ruby
puts "Content-Type:text/plain;charset=utf8\n\n" #I execute it via CGI
$: << "."
puts "А это строка".reverse #mb-string output is pretty fine
#but when I do the following, it fails;
file = File.open('test_rb_file.txt','r')
file.each_line {|line| puts line.reverse} #puts line works good, but not puts line.reverse
The script itself is in utf-8. The test_rb_file.txt is in utf-8. So, when I try to output a multibyte string - all ok, but when I try to read it from a file and reverse - it fails.
I think, specifying the encoding of the file I read from (test_rb_file.txt) would do the trick, but I don't know how to do that so far. And I maybe wrong about that.
Any ideas to fix the problem? Thanks in advance
UPD All fixed, thanks everyone. Following thing sets the encoding of input file and fixes the problem:
file = File.open('test_rb_file.txt','r:UTF-8')
File.open('test_rb_file.txt','r:UTF-8')
To check encoding of a String "YourString".encoding

easy issue about Ruby

I would like to know what it does:
File.open(filename,"r").each_file do |line|
if (!line.strip.empty? and !line.starts_with?(" "))
....
.....
end
end
Especially what isstrip? Thanks for your time!
Strip removes all the leading and trailing whitespace chars from a string. In essense the code you pasted checks if the sting contains anything apart from whitespaces AND the first symbol is not a space.

CSV.read Illegal quoting in line x

I am using ruby CSV.read with massive data. From time to time the library encounters poorly formatted lines, for instance:
"Illegal quoting in line 53657."
It would be easier to ignore the line and skip it, then to go through each csv and fix the formatting. How can I do this?
I had this problem in a line like 123,456,a"b"c
The problem is the CSV parser is expecting ", if they appear, to entirely surround the comma-delimited text.
Solution use a quote character besides " that I was sure would not appear in my data:
CSV.read(filename, :quote_char => "|")
The liberal_parsing option is available starting in Ruby 2.4 for cases like this. From the documentation:
When set to a true value, CSV will attempt to parse input not conformant with RFC 4180, such as double quotes in unquoted fields.
To enable it, pass it as an option to the CSV read/parse/new methods:
CSV.read(filename, liberal_parsing: true)
Don't let CSV both read and parse the file.
Just read the file yourself and hand each line to CSV.parse_line, and then rescue any exceptions it throws.
Try forcing double quote character " as quote char:
require 'csv'
CSV.foreach(file,{headers: :first_row, quote_char: "\x00"}) do |line|
p line
end
Apparently this error can also be caused by unprintable BOM characters. This thread suggests using a file mode to force a conversion, which is what finally worked for me.
require 'csv'
CSV.open(#filename, 'r:bom|utf-8') do |csv|
# do something
end

How can I match a URL but exclude terminators from the match?

I want to match urls in text and replace them with anchor tags, but I want to exclude some terminators just like how Twitter matches urls in tweets.
So far I've got this, but it's obviously not working too well.
(http[s]?\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?)
EDIT: Some example urls. In all cases below I only want to match "http://www.example.com"
http://www.example.com.
http://www.example.com:
"http://www.example.com"
http://www.example.com;
http://www.example.com!
[http://www.example.com]
{http://www.example.com}
http://www.example.com*
I looked into this very issue last year and developed a solution that you may want to look at - See: URL Linkification (HTTP/FTP) This link is a test page for the Javascript solution with many examples of difficult-to-linkify URLs.
My regex solution, written for both PHP and Javascript - (but could easily be translated to Ruby) is not simple (but neither is the problem as it turns out.) For more information I would recommend also reading:
The Problem With URLs by Jeff Atwood, and
An Improved Liberal, Accurate Regex Pattern for Matching URLs by John Gruber
The comments following Jeff's blog post are a must read if you want to do this right...
Ruby's URI module has a extract method that is used to parse out URLs from text. Parsing the returned values lets you piggyback on the heuristics in the module to extract the scheme and host information from a URL, avoiding reinventing the wheel.
text = '
http://www.example.com.
http://www.example.com:
"http://www.example.com"
http://www.example.com;
http://www.example.com!
[http://www.example.com]
{http://www.example.com}
http://www.example.com*
http://www.example.com/foo/bar?q=foobar
http://www.example.com:81
'
require 'uri'
puts URI::extract(text).map{ |u| uri = URI.parse(u); "#{ uri.scheme }://#{ uri.host[/(^.+?)\.?$/, 1] }" }
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
# >> http://www.example.com
The only gotcha, is that a period '.' is a legitimate character in a host name, so URI#host won't strip it. Those get caught in the map statement where the URL is rebuilt. Note that URI is stripping off the path and query information.
A pragmatic and easy understandable solution is:
regex = %r!"(https?://[-.\w]+\.\w{2,6})"!
Some notes:
With %r we can choose the start and end delimiter. In this case I used exclamation mark, since I want to use slash unescaped in the regex.
The optional quantifier (i.e. '?') binds only to the preceding expression, in this case 's'. There's no need to put the 's' in a character class [s]?. It's the same as s?.
Inside the character class [-.\w] we don't need to escape dash and dot in order to make them match dot and dash literally. Dash should be first, however, to not mean range.
\w matches [A-Za-z0-9_] in Ruby. It's not exactly the full definition of URL characters, but combined with dash and dot it may be enough for our needs.
Top domains are between 2 and 6 characters long, e.g. '.se' and '.travel'
I'm not sure what you mean by I want to exclude some terminators but this regex matches only the wanted one in your example.
We want to use the first capture group, e.g. like this:
if input =~ %r!"(https?://[-.\w]+.\w{2,6})"!
match = $~[1]
else
match = ""
end
What about this?
%r|https?://[-\w.]*\w|

Resources