get data from http persistent connection - ruby

I tried for few days, I am a little confused here.
I am using clojure http-kit to make keepalive get request.
(ns weibo-collector.weibo
(:require [org.httpkit.client :as http]
[clojure.java.io :as io]))
(def sub-url "http://c.api.weibo.com/datapush/status?subid=10542")
(defn spit-to-file [content]
(spit "sample.data" content :append true))
#(http/get sub-url {:as :stream :keepalive 3000000}
(fn [{:keys [status headers body error opts]}]
(spit-to-file body)
))
I am pretty sure that I made a persistent connection to target server, but nothing written to sample.data file.
I tried as stream and as text.
I also tried ruby version the program create a persistent connection either, still nothing written.
So typically, the target will use webhook to notify my server new data is coming, but how to I get data from the persistent connection?
---EDIT---
require 'awesome_print'
url = "http://c.api.weibo.com/datapush/status?subid=10542"
require "httpclient"
c = HTTPClient.new
conn = c.get_async(url)
Thread.new do
res = conn.pop
while true
text = ""
while ch = res.content.read(1)
text = text+ch
break if text.end_with? "\r\n"
end
ap text
end
end
while true
end
Above is a working example using ruby, it uses a thread to read data from the connection. So I must miss something to get the data from clojure

Related

Proper way to upload a doc to FSCrawler for indexing in Elasticsearch

I'm prototyping a Rails application to upload documents to FSCrawler (running the REST interface), to incorporate into an Elasticsearch index. Using their example, this works:
response = `curl -F "file=##{params[:document][:upload].tempfile.path}" "http://127.0.0.1:8080/fscrawler/_upload?debug=true"`
The file gets uploaded, and the content gets indexed. This is an example of what I get:
"{\n \"ok\" : true,\n \"filename\" : \"RackMultipart20200130-91061-16swulg.pdf\",\n \"url\" : \"http://127.0.0.1:9200/local/_doc/d661edecf3e28572676e97a6f0d1d\",\n \"doc\" : {\n \"content\" : \"\\n \\n \\n\\nBasically, what you need to know is that Dante is all IP-based, and makes use of common IT standards. Each Dante device behaves \\n\\nmuch like any other network device you would already find on your network. \\n\\nIn order to make integration into an existing network easy, here are some of the things that Dante does: \\n\\n▪ Dante...
When I run curl at the command line, I get EVERYTHING, like the "filename" being properly set. If I use it as above, in the Rails controller, as you can see, the filename is set to the Tempfile's filename. That's not a workable solution. Trying to use params[:document][:upload].tempfile (without .path) or just params[:document][:upload] both fail entirely.
I'm trying to do this "the right way," but every incarnation of using a proper HTTP client to do this fails. I can't figure out how to invoke an HTTP POST that will submit a file to FSCrawler the way curl (on the command line) does it.
In this example, I'm just trying to send the file by using the Tempfile file object. For some reason, FSCrawler gives me the error in the comment, and get a little metadata, but no content is indexed:
## Failed to extract [100000] characters of text for ...
## org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
uri = URI("http://127.0.0.1:8080/fscrawler/_upload?debug=true")
request = Net::HTTP::Post.new(uri)
form_data = [['file', params[:document][:upload].tempfile,
{ filename: params[:document][:upload].original_filename,
content_type: params[:document][:upload].content_type }]]
request.set_form form_data, 'multipart/form-data'
response = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(request)
end
If I change the above to use params[:document][:upload].tempfile.path, then I don't get the error about the InputStream, but I also (still) do not get any content indexed. This is an example of what I get:
{"_index":"local","_type":"_doc","_id":"72c9ecf2a83440994eb87d28786e6","_version":3,"_seq_no":26,"_primary_term":1,"found":true,"_source":{"content":"/var/folders/bn/pcc1h8p16tl534pw__fdz2sw0000gn/T/RackMultipart20200130-91061-134tcxn.pdf\n","meta":{},"file":{"extension":"pdf","content_type":"text/plain; charset=ISO-8859-1","indexing_date":"2020-01-30T15:33:45.481+0000","filename":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf"},"path":{"virtual":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf","real":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf"}}}
If I try to use RestClient, and I try send the file by referencing the actual path to the Tempfile, then I get this error message, and I get nothing:
## Unsupported media type
response = RestClient.post 'http://127.0.0.1:8080/fscrawler/_upload?debug=true',
file: params[:document][:upload].tempfile.path,
content_type: params[:document][:upload].content_type
If I try to .read() the file, and submit that, then I break the FSCrawler form:
## Internal server error
request = RestClient::Request.new(
:method => :post,
:url => 'http://127.0.0.1:8080/fscrawler/_upload?debug=true',
:payload => {
:multipart => true,
:file => File.read(params[:document][:upload].tempfile),
:content_type => params[:document][:upload].content_type
})
response = request.execute
Obviously, I've been trying this every way I can, but I can't replicate whatever curl is doing with any known Ruby-based HTTP clients. I'm utterly lost as to how to get Ruby to submit data to FSCrawler in a way that will get the document contents indexed properly. I've been at this far longer than I care to admit. What am I missing here?
I finally tried Faraday, and, based on this answer, came up with the following:
connection = Faraday.new('http://127.0.0.1:8080') do |f|
f.request :multipart
f.request :url_encoded
f.adapter :net_http
end
file = Faraday::UploadIO.new(
params[:document][:upload].tempfile.path,
params[:document][:upload].content_type,
params[:document][:upload].original_filename
)
payload = { :file => file }
response = connection.post('/fscrawler/_upload', payload)
Using Fiddler helped me to see the results of my attempts, as I got closer and closer to the curl request. This snippet posts the request almost exactly as curl does. To route this call through the proxy, I just needed to add , proxy: 'http://localhost:8866' to the end of the connection setup.

How can I send messages to specific client using Faye Websockets?

I've been working on a web application which is essentially a web messenger using sinatra. My goal is to have all messages encrypted using pgp and to have full duplex communication between clients using faye websocket.
My main problem is being able to send messages to a specific client using faye. To add to this all my messages in a single chatroom are saved twice for each person since it is pgp encrypted.
So far I've thought of starting up a new socket object for every client and storing them in a hash. I do not know if this approach is the most efficient one. I have seen that socket.io for example allows you to emit to a specific client but not with faye websockets it seems ? I am also considering maybe using a pub sub model but once again I am not sure.
Any advice is appreciated thanks !
I am iodine's author, so I might be biased in my approach.
I would consider naming a channel by the used ID (i.e. user1...user201983 and sending the message to the user's channel.
I think Faye will support this. I know that when using the iodine native websockets and builtin pub/sub, this is quite effective.
So far I've thought of starting up a new socket object for every client and storing them in a hash...
This is a very common mistake, often seen in simple examples.
It works only in single process environments and than you will have to recode the whole logic in order to scale your application.
The channel approach allows you to scale using Redis or any other Pub/Sub service without recoding your application's logic.
Here's a quick example you can run from the Ruby terminal (irb). I'm using plezi.io just to make it a bit shorter to code:
require 'plezi'
class Example
def index
"Use Websockets to connect."
end
def pre_connect
if(!params[:id])
puts "an attempt to connect without credentials was made."
return false
end
return true
end
def on_open
subscribe channel: params[:id]
end
def on_message data
begin
msg = JSON.parse(data)
if(!msg["to"] || !msg["data"])
puts "JSON message error", data
return
end
msg["from"] = params[:id]
publish channel: msg["to"].to_s, message: msg.to_json
rescue => e
puts "JSON parsing failed!", e.message
end
end
end
Plezi.route "/" ,Example
Iodine.threads = 1
exit
To test this example, use a Javascript client, maybe something like this:
// in browser tab 1
var id = 1
ws = new WebSocket("ws://localhost:3000/" + id)
ws.onopen = function(e) {console.log("opened connection");}
ws.onclose = function(e) {console.log("closed connection");}
ws.onmessage = function(e) {console.log(e.data);}
ws.send_to = function(to, data) {
this.send(JSON.stringify({to: to, data: data}));
}.bind(ws);
// in browser tab 2
var id = 2
ws = new WebSocket("ws://localhost:3000/" + id)
ws.onopen = function(e) {console.log("opened connection");}
ws.onclose = function(e) {console.log("closed connection");}
ws.onmessage = function(e) {console.log(e.data);}
ws.send_to = function(to, data) {
this.send(JSON.stringify({to: to, data: data}));
}.bind(ws);
ws.send_to(1, "hello!")

Is it possible to get the headers of a request using Ruby's HTTPClient gem before the request completes?

I have a requirement to proxy a request in a Rails app. I was hoping I could proxy it with chunking (so, 1 chunk received, one chunk is sent). The app is working fine without chunking (load the request into memory, and transmit).
Here is my code to proxy the chunks through to the end-client:
self.response.headers['Last-Modified'] = Time.now.ctime.to_s
self.response_body = Enumerator.new do |y|
client = HTTPClient.new
http_response = client.get(proxy_url, nil, headers) do |chunk|
y << chunk
end
end
The problem is, I can't inspect "http_response" until all the chunks have been received, thus I can't set the headers based on the headers of the client.
What I'm trying to do is transmit the headers returned from the client before the first chunk is sent. Is this possible?
If not, is this pattern possible in any other Ruby HTTP client gem?
Update
I have a solution for you.
If you call get_async instead, it will retun immediately with an HTTPClient::Connection object that is updated with the header information as soon as it is received. This code sample demonstrates.
The patch to HTTPClient::Connection is almost certainly not necessary for you, but it lets you write things like conn.queue.size? and conn.queue.empty?.
conn.pop blocks until the response (or exception) has been pushed to the queue by the async thread and then returns the normal HTTP::Message object. (Note that, if you are using the monkey patch, you can use conn.queue.empty? to see if pop is going to block.)
resp.content returns an IO object which is a pipe read endpoint, and can be called as soon as pop hs returned. The other end is written by the async thread as the data arrives, and you can read the entire content in one go or in whatever size chunks you like using read.
require 'httpclient'
class HTTPClient::Connection
attr_reader :queue
end
client = HTTPClient.new
conn = client.get_async 'http://en.wikipedia.org/wiki/Ruby_(programming_language)'
resp = conn.pop
resp.header.all.each { |name, val| puts "#{name}=#{val}" }
puts
pipe = resp.content
while chunk = pipe.read(8192)
print chunk
end
You could parse the first chunk you receive to extract the headers, but I suggest you call head first to get the header information. Then do the get as well.
(Updated - the first chunk holds the beginning of the content so this won't work.)

How to calculate the amount of data downloaded and the total data to be downloaded in Ruby

I'm trying to build a desktop client that manages some downloads with Ruby. I would like to know how to go about trying to identify how much of the data is downloaded and the size of the entire data that is to be downloaded.
Im trying to do this with Ruby so any help would be useful.
Thanks in advance.
Like Wayne said in his comment, it depends on the protocol that is used to transfer the files. With HTTP for example, the HTTP response will include a Content-Length header which will tell you the length of the file that you are downloading. After you know that you will have to keep track of the number of bytes that you've read from the HTTP connection.
Something like this seems to work (for HTTP), but I wouldn't be surprised if it could be done more elegantly:
require 'net/http'
url = URI.parse('http://www.google.com/index.html')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) do |http|
http.request(req) do |res|
remaining = res.content_length
puts "total length: #{remaining}"
res.read_body do |segment|
puts "read #{segment.length} bytes"
remaining = remaining - segment.length
puts "#{remaining} bytes remaining"
end
end
end
www.google.com/index.html is a bad example since the content gets returned in one segment, but try it on a larger object and you should see multiple "read..." lines.
If you're using Net::HTTP then the length of whatever you're requesting should be in the response header. Net::HTTP mixin NET::HTTPHeader, in it you'll find content_length(). Although it only works if the size is determined before the transfer happens.
Net::HTTPResponse has a method that reads the body in chunks, so you can use that to determine the progress. Start at 0 and add the length of each chunk, compare it to the total size and you're done.
http.request_get('/index.html') {|res|
res.read_body do |segment|
print segment
end
} #Example taken from Ruby-Documentation
If you're using FTP then it should be easier through NET::FTP. Connect to the server, get the size of a given file with size(filename), and then download the file with get, getbinaryfile or gettextfile.
This is the signature of the get method: get(remotefile, localfile = File.basename(remotefile), blocksize = DEFAULT_BLOCKSIZE) {|data| ...}
ftp.get('file.something', 'file.something.local', 1024){ |data|
puts "Downloaded 1024 more bytes"
}

Multipart File Upload in Ruby

I simply want to upload an image to a server with POST. As simple as this task sounds, there seems to be no simple solution in Ruby.
In my application I am using WWW::Mechanize for most things so I wanted to use it for this too, and had a source like this:
f = File.new(filename, File::RDWR)
reply = agent.post(
'http://rest-test.heroku.com',
{
:pict => f,
:function => 'picture2',
:username => #username,
:password => #password,
:pict_to => 0,
:pict_type => 0
}
)
f.close
This results in a totally garbage-ready file on the server that looks scrambled all over:
alt text http://imagehub.org/f/1tk8/garbage.png
My next step was to downgrade WWW::Mechanize to version 0.8.5. This worked until I tried to run it, which failed with an error like "Module not found in hpricot_scan.so". Using the Dependency Walker tool I could find out that hpricot_scan.so needed msvcrt-ruby18.dll. Yet after I put that .dll into my Ruby/bin-folder it gave me an empty error box from where on I couldn't debug very much further. So the problem here is that Mechanize 0.8.5 has a dependency on Hpricot instead of Nokogiri (which works flawlessly).
The next idea was to use a different gem, so I tried using Net::HTTP. After short research I could find out that there is no native support for multipart forms in Net::HTTP and instead you have to build a class that encodes etc. for you. The most helpful I could find was the Multipart-class by Stanislav Vitvitskiy. This class looked good so far, but it does not do what I need, because I don't want to post only files, I also want to post normal data, and that is not possible with his class.
My last attempt was to use RestClient. This looked promising, as there have been examples on how to upload files. Yet I can't get it to post the form as multipart.
f = File.new(filename, File::RDWR)
reply = RestClient.post(
'http://rest-test.heroku.com',
:pict => f,
:function => 'picture2',
:username => #username,
:password => #password,
:pict_to => 0,
:pict_type => 0
)
f.close
I am using http://rest-test.heroku.com which sends back the request to debug if it is sent correctly, and I always get this back:
POST http://rest-test.heroku.com/ with a 101 byte payload,
content type application/x-www-form-urlencoded
{
"pict" => "#<File:0x30d30c4>",
"username" => "s1kx",
"pict_to" => "0",
"function" => "picture2",
"pict_type" => "0",
"password" => "password"
}
This clearly shows that it does not use multipart/form-data as content-type but the standard application/x-www-form-urlencoded, although it definitely sees that pict is a file.
How can I upload a file in Ruby to a multipart form without implementing the whole encoding and data aligning myself?
Long problem, short answer: I was missing the binary mode for reading the image under Windows.
f = File.new(filename, File::RDWR)
had to be
f = File.new(filename, "rb")
Another method is to use Bash and Curl. I used this method when I wanted to test multiple file uploads.
bash_command = 'curl -v -F "file=#texas.png,texas_reversed.png"
http://localhost:9292/fog_upload/upload'
command_result = `#{bash_command}` # the backticks are important <br/>
puts command_result

Resources