Can't 'file.open.read' a url within a ruby if-block - ruby

I want to create a ruby script which will take barcodes from a text file, search a webservice for that barcode and download the result.
First I tried to test the webservice download. In a file when I hardcode the query things work fine:
result_download = open('http://webservice.org/api/?query=barcode:78686112327', 'User-Agent' => 'UserAgent email#gmail.com').read
It all works fine.
When I try to take the barcode from a textfile and run the query I run into problems.
IO.foreach(filename) {|barcode| barcode
website = "'http://webservice.org/api/?query=barcode:"+barcode.to_str.chomp + "', 'User-Agent' => 'UserAgent email#gmail.com'"
website = website.to_s
mb_metadata = open(website).read
}
The result of this is:
/home/user/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/open-uri.rb:37:in `initialize': No such file or directory # rb_sysopen - http://webservice.org/api/?query=barcode:78686112327', 'User-Agent' => 'UserAgent email#gmail.com' (Errno::ENOENT)
I can't figure out if this problem occurs because the string I generate somehow isn't a valid url and ruby is trying to open a non-existent file, or is the issue that I am doing all this in a for loop and the file/url doesn't exist there. I have tried using open(website).write instead of open(website).read but that produces the same error.
Any help would be much appreciated.

The error message you get explicitly states, that there is no such file:
http://webservice.org/api/?query=barcode:78686112327', 'User-Agent' => 'UserAgent email#gmail.com'.
You try to pass all the parameters to open method using 1 big string (website), which is wrong. You should do it like that.
IO.foreach(filename) do |barcode|
website = "http://webservice.org/api/?query=barcode:#{barcode.to_str.chomp}"
mb_metadata = open(website, 'User-Agent' => 'UserAgent email#gmail.com').read
end

Related

Ruby Post returns 404 URL Not found while curl works fine

I'm trying to write some Ruby code to update GitLab CI/CD variables using the REST endpoint update variable. When I perform a curl with the same path, the same private token, and the same --form data it updates the variable as expected. When I use the Ruby code that I put together based on reading stackoverflow and the net::http docs, it fails with a 404 URL not found.
I can use a similar piece of code to create a new CI/CD variable successfully. I can also delete an existing variable, and re-create it, but it I would like to know the mistake I am making in the update call.
Can someone point out what I did wrong?
#!/usr/bin/env ruby
require 'net/http'
require 'uri'
token = File.read(__dir__ + '/.gitlab-token').chomp
host = 'https://gitlab.com/'
variables_path = 'api/v4/projects/123456/variables'
env_var = 'MY_VAR'
update_uri = URI(host + variables_path + '/' + env_var)
# I've written the above this way because my actual code
# has a delete and create in order to "update" the variable
response = Net::HTTP.start(update_uri.host, update_uri.port, use_ssl: true) do |http|
update_request = Net::HTTP::Post.new(update_uri)
update_request['PRIVATE-TOKEN'] = token
form_data = [
['value', 'a new value']
]
update_request.set_form(form_data, 'multipart/form-data')
response = http.request(update_request)
response.body
end

Proper way to upload a doc to FSCrawler for indexing in Elasticsearch

I'm prototyping a Rails application to upload documents to FSCrawler (running the REST interface), to incorporate into an Elasticsearch index. Using their example, this works:
response = `curl -F "file=##{params[:document][:upload].tempfile.path}" "http://127.0.0.1:8080/fscrawler/_upload?debug=true"`
The file gets uploaded, and the content gets indexed. This is an example of what I get:
"{\n \"ok\" : true,\n \"filename\" : \"RackMultipart20200130-91061-16swulg.pdf\",\n \"url\" : \"http://127.0.0.1:9200/local/_doc/d661edecf3e28572676e97a6f0d1d\",\n \"doc\" : {\n \"content\" : \"\\n \\n \\n\\nBasically, what you need to know is that Dante is all IP-based, and makes use of common IT standards. Each Dante device behaves \\n\\nmuch like any other network device you would already find on your network. \\n\\nIn order to make integration into an existing network easy, here are some of the things that Dante does: \\n\\n▪ Dante...
When I run curl at the command line, I get EVERYTHING, like the "filename" being properly set. If I use it as above, in the Rails controller, as you can see, the filename is set to the Tempfile's filename. That's not a workable solution. Trying to use params[:document][:upload].tempfile (without .path) or just params[:document][:upload] both fail entirely.
I'm trying to do this "the right way," but every incarnation of using a proper HTTP client to do this fails. I can't figure out how to invoke an HTTP POST that will submit a file to FSCrawler the way curl (on the command line) does it.
In this example, I'm just trying to send the file by using the Tempfile file object. For some reason, FSCrawler gives me the error in the comment, and get a little metadata, but no content is indexed:
## Failed to extract [100000] characters of text for ...
## org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
uri = URI("http://127.0.0.1:8080/fscrawler/_upload?debug=true")
request = Net::HTTP::Post.new(uri)
form_data = [['file', params[:document][:upload].tempfile,
{ filename: params[:document][:upload].original_filename,
content_type: params[:document][:upload].content_type }]]
request.set_form form_data, 'multipart/form-data'
response = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(request)
end
If I change the above to use params[:document][:upload].tempfile.path, then I don't get the error about the InputStream, but I also (still) do not get any content indexed. This is an example of what I get:
{"_index":"local","_type":"_doc","_id":"72c9ecf2a83440994eb87d28786e6","_version":3,"_seq_no":26,"_primary_term":1,"found":true,"_source":{"content":"/var/folders/bn/pcc1h8p16tl534pw__fdz2sw0000gn/T/RackMultipart20200130-91061-134tcxn.pdf\n","meta":{},"file":{"extension":"pdf","content_type":"text/plain; charset=ISO-8859-1","indexing_date":"2020-01-30T15:33:45.481+0000","filename":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf"},"path":{"virtual":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf","real":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf"}}}
If I try to use RestClient, and I try send the file by referencing the actual path to the Tempfile, then I get this error message, and I get nothing:
## Unsupported media type
response = RestClient.post 'http://127.0.0.1:8080/fscrawler/_upload?debug=true',
file: params[:document][:upload].tempfile.path,
content_type: params[:document][:upload].content_type
If I try to .read() the file, and submit that, then I break the FSCrawler form:
## Internal server error
request = RestClient::Request.new(
:method => :post,
:url => 'http://127.0.0.1:8080/fscrawler/_upload?debug=true',
:payload => {
:multipart => true,
:file => File.read(params[:document][:upload].tempfile),
:content_type => params[:document][:upload].content_type
})
response = request.execute
Obviously, I've been trying this every way I can, but I can't replicate whatever curl is doing with any known Ruby-based HTTP clients. I'm utterly lost as to how to get Ruby to submit data to FSCrawler in a way that will get the document contents indexed properly. I've been at this far longer than I care to admit. What am I missing here?
I finally tried Faraday, and, based on this answer, came up with the following:
connection = Faraday.new('http://127.0.0.1:8080') do |f|
f.request :multipart
f.request :url_encoded
f.adapter :net_http
end
file = Faraday::UploadIO.new(
params[:document][:upload].tempfile.path,
params[:document][:upload].content_type,
params[:document][:upload].original_filename
)
payload = { :file => file }
response = connection.post('/fscrawler/_upload', payload)
Using Fiddler helped me to see the results of my attempts, as I got closer and closer to the curl request. This snippet posts the request almost exactly as curl does. To route this call through the proxy, I just needed to add , proxy: 'http://localhost:8866' to the end of the connection setup.

Ruby hipchat gem invalid send file

So this is related to an earlier post I made on this method. This is essentially what I am using to send files via hipchat:
#!/usr/bin/env ruby
require 'hipchat'
client = HipChat::Client.new('HIPCHAT_TOKEN', :api_version => 'v2', :server_url => 'HIPCHAT_URL')
client.user('some_username').send_file('message', File.open('./output/some-file.csv') )
client['some_hipchat_room'].send_file('some_user', 'message', File.open('./output/some-file.csv') )
Now for some reason the send_file method is invalid:
/path/to/gems/hipchat-1.5.4/lib/hipchat/errors.rb:40:in `response_code_to_exception_for': You requested an invalid method. path:https://hipchat.illum.io/v2/user/myuser#myemail/share/file?auth_token=asdfgibberishasdf method:Net::HTTP::Get (HipChat::MethodNotAllowed)
from /path/to/gems/gems/hipchat-1.5.4/lib/hipchat/user.rb:50:in `send_file'
I think this indicating that you should be using POST instead of GET, but I'm not sure because I haven't used this library nor Hipchat.
Looking at the question you referenced and the source posted by another user they're sending the request using self.class.post, and your debug output shows Net::HTTP::Get
To debug, could you try,
file = Tempfile.new('foo').tap do |f|
f.write("the content")
f.rewind
end
user = client.user(some_username)
user.send_file('some bytes', file)
The issue is that I was attempting to connect to the server via http instead of https. If the following client is causing issues:
client = HipChat::Client.new('HIPCHAT_TOKEN', :api_version => 'v2', :server_url => 'my.company.com')
Then try adding https:// to the beginning of your company's name.
client = HipChat::Client.new('HIPCHAT_TOKEN', :api_version => 'v2', :server_url => 'https://my.company.com')

Ruby upload file error

I am trying to create a form that will upload files to AWS S3. I have searched all around for an answer but I am getting the error "TypeError at /upload can't convert Symbol into Integer"
Here is the block of code
post '/upload' do
s3 = AWS::S3.new(
:access_key_id => 'X',
:secret_access_key => 'X')
bucket = s3.buckets['X']
title = params['title']
desc = params['desc']
file = params['file'][:tempfile]
s3.buckets['indio'].objects[title].write(:file => file)
end
I get the error on the line
file = params['file'][:tempfile]
Can someone point out what I am doing wrong?
Typically the error can't convert Symbol into Integer hints to the fact that you are trying to access an array with a non-integer.
From this I suspect the params['file'] is an array or a string, and not whatever you think it is.
Find out exactly what you got in params['file'] and continue from there.

Multipart File Upload in Ruby

I simply want to upload an image to a server with POST. As simple as this task sounds, there seems to be no simple solution in Ruby.
In my application I am using WWW::Mechanize for most things so I wanted to use it for this too, and had a source like this:
f = File.new(filename, File::RDWR)
reply = agent.post(
'http://rest-test.heroku.com',
{
:pict => f,
:function => 'picture2',
:username => #username,
:password => #password,
:pict_to => 0,
:pict_type => 0
}
)
f.close
This results in a totally garbage-ready file on the server that looks scrambled all over:
alt text http://imagehub.org/f/1tk8/garbage.png
My next step was to downgrade WWW::Mechanize to version 0.8.5. This worked until I tried to run it, which failed with an error like "Module not found in hpricot_scan.so". Using the Dependency Walker tool I could find out that hpricot_scan.so needed msvcrt-ruby18.dll. Yet after I put that .dll into my Ruby/bin-folder it gave me an empty error box from where on I couldn't debug very much further. So the problem here is that Mechanize 0.8.5 has a dependency on Hpricot instead of Nokogiri (which works flawlessly).
The next idea was to use a different gem, so I tried using Net::HTTP. After short research I could find out that there is no native support for multipart forms in Net::HTTP and instead you have to build a class that encodes etc. for you. The most helpful I could find was the Multipart-class by Stanislav Vitvitskiy. This class looked good so far, but it does not do what I need, because I don't want to post only files, I also want to post normal data, and that is not possible with his class.
My last attempt was to use RestClient. This looked promising, as there have been examples on how to upload files. Yet I can't get it to post the form as multipart.
f = File.new(filename, File::RDWR)
reply = RestClient.post(
'http://rest-test.heroku.com',
:pict => f,
:function => 'picture2',
:username => #username,
:password => #password,
:pict_to => 0,
:pict_type => 0
)
f.close
I am using http://rest-test.heroku.com which sends back the request to debug if it is sent correctly, and I always get this back:
POST http://rest-test.heroku.com/ with a 101 byte payload,
content type application/x-www-form-urlencoded
{
"pict" => "#<File:0x30d30c4>",
"username" => "s1kx",
"pict_to" => "0",
"function" => "picture2",
"pict_type" => "0",
"password" => "password"
}
This clearly shows that it does not use multipart/form-data as content-type but the standard application/x-www-form-urlencoded, although it definitely sees that pict is a file.
How can I upload a file in Ruby to a multipart form without implementing the whole encoding and data aligning myself?
Long problem, short answer: I was missing the binary mode for reading the image under Windows.
f = File.new(filename, File::RDWR)
had to be
f = File.new(filename, "rb")
Another method is to use Bash and Curl. I used this method when I wanted to test multiple file uploads.
bash_command = 'curl -v -F "file=#texas.png,texas_reversed.png"
http://localhost:9292/fog_upload/upload'
command_result = `#{bash_command}` # the backticks are important <br/>
puts command_result

Resources