Writing a simple webserver in Ruby - ruby

I want to create an extremely simple web server for development purposes in Ruby (no, don’t want to use ready solutions).
Here is the code:
#!/usr/bin/ruby
require 'socket'
server = TCPServer.new('127.0.0.1', 8080)
while connection = server.accept
headers = []
length = 0
while line = connection.gets
headers << line
if line =~ /^Content-Length:\s+(\d+)/i
length = $1.to_i
end
break if line == "\r\n"
end
body = connection.readpartial(length)
IO.popen(ARGV[0], 'r+') do |script|
script.print(headers.join + body)
script.close_write
connection.print script.read
end
connection.close
end
The idea is to run this script from the command line, providing another script, which will get the request on its standard input, and gives back the complete response on its standard output.
So far so good, but this turns out to be really fragile, as it breaks on the second request with the error:
/usr/bin/serve:24:in `write': Broken pipe (Errno::EPIPE)
from /usr/bin/serve:24:in `print'
from /usr/bin/serve:24
from /usr/bin/serve:23:in `popen'
from /usr/bin/serve:23
Any idea how to improve the above code to be sufficient for easy use?
Versions: Ubuntu 9.10 (2.6.31-20-generic), Ruby 1.8.7 (2009-06-12 patchlevel 174) [i486-linux]

The problem appears to be in the child script, since the parent script in your question runs on my box (Debian Squeeze, Ruby 1.8.7 patchlevel 249):
I created the dummy child script bar.rb:
#!/usr/bin/ruby1.8
s = $stdin.read
$stderr.puts s
print s
I then ran your script, passing it the path to the dummy script:
$ /tmp/foo.rb /tmp/bar.rb
The I hit it with wget:
$ wget localhost:8080/index
And saw the dummy script's output:
GET /index HTTP/1.0^M
User-Agent: Wget/1.12 (linux-gnu)^M
Accept: */*^M
Host: localhost:8080^M
Connection: Keep-Alive^M
^M
I also saw that wget received what it sent:
$ cat index
GET /index HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: localhost:8080
Connection: Keep-Alive
It worked the same no matter how many times I hit it with wget.

The Ruby Web Servers Booklet describes most of web server implementation strategies.

With the Ruby Webrick Lib you have an easy Library to build a webserver.
http://www.ruby-doc.org/stdlib/libdoc/webrick/rdoc/

Related

Ruby server logging a socket's request thrice

I am writing a simple server in Ruby in order to understand the Socket module. Here is my code:
require 'socket'
s = TCPServer.new(3939)
loop do
c = s.accept
STDERR.puts c.gets
c.close
end
I simply want to print the request to the server console before closing the socket. Why does it print the request thrice, instead of just once?
If I curl that code
$ curl localhost:3939
I get an empty reply
curl: (52) Empty reply from server
and a single GET request
GET / HTTP/1.1

Scrapy failed to find Xpath that Nokogiri found

I am newly working for a website that needs to crawl products from several stores/sites...
I am a bit new to python and scrapy, in which the original code was written, so when testing crawlers and Xpaths, i use Scrapy and also open another console to test using nokogiri (Ruby gem)
in a particular site, i failed to extract some content using scrapy, but I've found that I can get this content, from the same url using same xpath
Here is the code snippet used in both cases:
Scrapy
yield Request(product_url,headers={'User-Agent':'curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3'}, callback=self.parse_item)
def parse_item(self, response):
script = response.xpath('//script[contains(text(),"var ProductViewJSON")]')
yield {
'url': response.url,
'script length': len(script),
'script': script,
}
it produces following result:
{"url": "http://www.pullandbear.com/eg/en/man/accessories/pack-of-3-assorted-bracelets-c29537p100036212.html", "script length": 0, "script": []},
Nokogiri
require 'nokogiri'
require 'open-uri'
html_data = open('http://www.pullandbear.com/eg/en/man/accessories/pack-of-3-assorted-bracelets-c29537p100036212.html', 'User-Agent' => 'curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3').read
nokogiri_object = Nokogiri::HTML(html_data)
script = nokogiri_object.xpath('//script[contains(text(),"var ProductViewJSON")]')
script.length # produces 1
Can anybody help me to explain that, please note that this scrapy code was running, I've just been reported that it has stopped, and the main problem was the need to add the headers
I hope I was clear enough, thanks for your interest :)
Edit
I've tried to parse the url from scrapy shell, using the same User Agent as the spider's request and nokogiri's one, it worked for me, it found the element matching the xpath, but still not running with in the spider...
The cause for this is the User-Agent you use.
I tried the site with a simple scrapy shell (with the default User-Agent) and I get the following response:
>>> response.body
'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http://www.pullandbear.com/eg/en/man/accessories/pack-of-3-assorted-bracelets-c29537p100036212.html" on this server.<P>\nReference #18.3f496768.1453197808.1ef09a53\n</BODY>\n</HTML>\n'
So change your User-Agent in your Request (or set it through the settings of scrapy once) and you should be ready to gather your information.
As you can see the server returns an access denied site for User-Agents which are not a browser -- just as your cURL agent.
If I start the shell with the following User-Agent:
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36'
and execute your XPath I get following results:
>>> response.xpath('//script[contains(text(),"var ProductViewJSON")]')
[<Selector xpath='//script[contains(text(),"var ProductViewJSON")]' data=u'<script type="text/javascript">\r\n\tvar Pr'>]

Ruby fcgi with spawn-fcgi closes when a page is requested

I have a Ruby script called server.rb that I open using spawn-fcgi -a 127.0.0.1 -p 9001 /bin/ruby server.rb.
Running sudo netstat -lnptu | grep :9001 tells me that ruby is listening.
I have also set nginx up to pass .rb files to 127.0.0.1:9001
But once I request a .rb file:
ruby dissapears from netstat
nginx returns a 502 error (bad gateway)
This gets printed to the console: 2015/06/02 14:47:37 [error] 1852#0: *30 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "GET /ruby/ HTTP/1.1", upstream: "fastcgi://127.0.0.1:9001", host: "localhost", referrer: "http://localhost/"
server.rb
require "rubygems"
require "fcgi"
loop FCGI.each do |request|
File.write("test.txt", "Loading file #{__FILE__}!")
request.out.print "Content-Type: text/plain\\n\\nHello from #{__FILE__}"
request.finish
end
the problem is in:
\\n\\n
you are escaping the backslash and effectively don't have a newline in the output. what happens is that the client receives the whole line and treats it as header and after that closes the connection
Replace with:
request.out.print "Content-Type: text/plain\n\nHello from #{__FILE__}"

Ruby TCPServer fails to work sometimes

I've implemented a very simple kind of server in Ruby, using TCPServer. I have a Server class with serve method:
def serve
# Do the actual serving in a child process
#pid = fork do
# Trap signal sent by #stop or by pressing ^C
Signal.trap('INT') { exit }
# Create a new server on port 2835 (1 ounce = 28.35 grams)
server = TCPServer.new('localhost', 2835)
#logger.info 'Listening on http://localhost:2835...'
loop do
socket = server.accept
request_line = socket.gets
#logger.info "* #{request_line}"
socket.print "message"
socket.close
end
end
end
and a stop method:
def stop
#logger.info 'Shutting down'
Process.kill('INT', #pid)
Process.wait
#pid = nil
end
I run my server from the command line, using:
if __FILE__ == $0
server = Server.new
server.logger = Logger.new(STDOUT)
server.logger.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" }
begin
server.serve
Process.wait
rescue Interrupt
server.stop
end
end
The problem is that, sometimes, when I do ruby server.rb from my terminal, the server starts, but when I try to make a request on localhost:2835, it fails. Only after several requests it starts serving some pages. In other cases, I need to stop/start the server again for it to properly serve pages. Why is this happening? Am I doing something wrong? I find this very weird...
The same things applies to my specs: I have some specs defined, and some Capybara specs. Before each test I create a server and start it and after each test I stop the server. And the problem persists: tests sometimes pass, sometimes fail because the requested page could not be found.
Is there something fishy going on with my forking?
Would appreciate any answer because I have no more place to look...
Your code is not an HTTP server. It is a TCP server that sends the string "message" over the socket after receiving a newline.
The reason that your code isn't a valid HTTP server is that it doesn't conform to the HTTP protocol. One of the many requirements of the HTTP protocol is that the server respond with a message of the form
HTTP/1.1 <code> <reason>
Where <code> is a number and <reason> is a human-readable "status", like "OK" or "Server Error" or something along those lines. The string message obviously does not conform to this requirement.
Here is a simple introduction to how you might build a HTTP server in ruby: https://practicingruby.com/articles/implementing-an-http-file-server

How to silently start Sinatra + Thin?

I have a Sinatra::Base webservice which I want to start from a command line Ruby program, so I have this:
# command line program file
require 'mymodule/server'
puts "Running on 0.0.0.0:4567, debugging to STDOUT..."
MyModule::Server.run! bind: '0.0.0.0', port: 4567, environment: :production
This works as expected but it throws out:
$ myscript
Running on 0.0.0.0:4567, debugging to STDOUT...
== Sinatra/1.3.1 has taken the stage on 4567 for production with backup from Thin
>> Thin web server (v1.3.1 codename Triple Espresso)
>> Maximum connections set to 1024
>> Listening on 0.0.0.0:4567, CTRL+C to stop
127.0.0.1 - - [23/Dec/2011 18:44:55] "POST /images HTTP/1.1" 200 360 0.0133
...
And I want it to be silent, and let me output what I want. For example, if I start it not daemonized I want to just see some message from the command line program and the log output, something like:
$ myscript
Running on 0.0.0.0:4567, debugging to STDOUT...
127.0.0.1 - - [23/Dec/2011 18:44:55] "POST /images HTTP/1.1" 200 360 0.0133
...
Also would like to silently shutdown it, hiding:
== Sinatra has ended his set (crowd applauds)
One last question, is this the best option to start a sinatra app with thin from inside an application code(ruby script in this case)?
You can turn off Sinatra logging with
set :logging, false
http://www.sinatrarb.com/configuration.html
As far as whether or not this is the best way to start a sinatra app... You might want to look at the "foreman" gem, and the "Procfile" (which Heroku.com uses) as an example:
http://ddollar.github.com/foreman/

Resources