searching for a simple ruby/bash solution to investigate a logfile, for example an apache access log.
my log contains lines with beginning string "authorization:"
goal of the script is to return the whole next but one line after this match, which contains the string "x-forwarded-for".
host: 10.127.5.12:8088^M
accept: */*^M
date: Wed, 19 Apr 2019 22:12:36 GMT^M
authorization: FOP ASC-amsterdam-b2c-v7:fkj9234f$t34g34rf=^M
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0)
x-forwarded-for: 195.99.33.222, 10.127.72.254^M
x-forwarded-host: my.luckyhost.com^M
x-forwarded-server: server.luckyhost.de^M
connection: Keep-Alive^M
^M
My question relates to the if condition.
How can I get the line number/caller from readline and in second step return the whole next line with x-forwarded-for.
file = File.open(args[:apache_access_log], "r")
log_snapshot = file.readlines
file.close
log_snapshot.reverse_each do |line|
if line.include? "authorization:"
puts line
end
end
Maybe something along these lines:
log_snapshot.each_with_index.reverse_each do |line, n|
case (line)
when /authorization:/
puts '%d: %s' % [ n + 1, line ]
end
end
Where each_with_index is used to generate 0-indexed line numbers. I've switched to a case style so you can have more flexibility in matching different conditions. For example, you can add the /i flag to do a case-insensitive match really easily or add \A at the beginning to anchor it at the beginning of the string.
Another thing to consider using the block method for File.open, like this:
File.open(args[:apache_access_log], "r") do |f|
f.readlines.each_with_index.reverse_each do |line, n|
# ...
end
end
Where that eliminates the need for an explicit close call. The end of the block closes it for you automatically.
Related
I'm writing a code to search a string in all txt files of a directory. The code works ok in 2 of of 3 files.
search = ['first', 'second', ...]
Dir["directory/*.txt"].each do |txt|
file = File.read(txt, encoding: "ISO8859-1:utf-8")
search.each do |se|
puts se if file.include? se #added to see if it finds a record - not working
file.each_line do |li|
if li.include? se
puts li # I removed everything else to see if this works - not working
end
end
end
end
Like I said before, It works fine with 2/3 files (80 MB, 88 MB, 224 MB). I left just the 224 MB file in the directory (the one that is not working), but still nothing.
I have been searching all day, but didn't find something that would help me. Why would not work in the 224 MB file, if has the same txt format and its from the same source.
EDIT:
Not working because doesn't find the string that I know is there and only happens for the third file mentioned.
Edit2:
I did li.split("\t") and know that li[2] its the column that I know the search string is.
Then changed the code to:
file.each_line.with_index do |li, line|
data = li.split("\t")
if line == 3
puts data[2] #I got in console the string that i'm looking for
end
# but then when i try to use it I cant
if data[2] == search #this is false i tried change both .to_s or .to_i
puts li
end
I did another test like:
puts data[2].to_i + 1 #result is 1 when data[2] is just numbers
I downloaded again the file and try it again, but nothing seems to work. its like it can return the string data[2] but dont recognize it or cant do anything with it. And like I said, is just in 1 file out of 3.
[EDIT]
Problem was that txt files were damage from source, months later I try again this code with new generated txt files, and this worked with no issues.
Thanks all for comments and answers
I've seen similar issues when working with strings that exceed the threshold of some memory limitation somewhere.
I would try breaking the large files up into smaller chunks like this:
FILE_SIZE_LIMIT_IN_MB = 80
search = ['first', 'second', ...]
def read_file(path)
File.open(path, 'r') do |f|
until f.eof? do
yield f.read(FILE_SIZE_LIMIT_IN_MB * 1024 * 1024)
end
end
end
Dir["directory/*.txt"].each do |txt|
read_file(txt) do |file|
search.each do |se|
puts se if file.include? se #added to see if it finds a record - not working
file.each_line do |li|
if li.include? se
puts li # I removed everything else to see if this works - not working
end
end
end
end
end
It looks like you are searching line-by-line. If so, you can save a ton of memory overhead and searching through arrays by reading line by line. In order to to that, you'll wan to move the search.each loop inside the loop that reads the files. Here's my attempt:
search = ['first', 'second', ...]
Dir["directory/*.txt"].each do |txt|
File.foreach(txt, {encoding: "ISO8859-1:utf-8"}) do |li|
search.each do |se|
puts se if li.include? se
end
end
end
The foreach method doesn't slurp in the entire file.
This doesn't work if the search string stretches across a newline barrier. If you have some other separator that would work better, you can optionally override the default:
File.foreach(txt, "\t", {encoding: "ISO8859-1:utf-8"}) do |r| # Tab-separated records
I would like to read CSV file with (headers: true option), but the first 5 lines of my file contain unwanted data. So I want line 6 to be a header and start reading file with line 6.
But when I read a file CSV.readlines("my_file.csv", headers: true).drop(5),
it still uses line 1 as a header. How can I set line 6 as a header?
Pre-read the garbage lines before you start CSV.
require 'csv'
File.open("my_file.csv") do |f|
5.times { f.gets }
csv = CSV.new(f, headers: true)
puts csv.shift.inspect
end
Here is my solution
require 'csv'
my_header = CSV.readlines("my_file.csv").drop(5).first
CSV.readlines("my_file.csv", headers: my_header).drop(6) do |row|
do something .....
end
I've worked a bit with Ruby's CSV module, but am having some problems getting it to ignore multiple header lines.
Specifically, here are the first twenty lines of a file I want to parse:
USGS Digital Spectral Library splib06a
Clark and others 2007, USGS, Data Series 231.
For further information on spectrsocopy, see: http://speclab.cr.usgs.gov
ASCII Spectral Data file contents:
line 15 title
line 16 history
line 17 to end: 3-columns of data:
wavelength reflectance standard deviation
(standard deviation of 0.000000 means not measured)
( -1.23e34 indicates a deleted number)
----------------------------------------------------
Olivine GDS70.a Fo89 165um W1R1Bb AREF
copy of splib05a r 5038
0.205100 -1.23e34 0.090781
0.213100 -1.23e34 0.018820
0.221100 -1.23e34 0.005416
0.229100 -1.23e34 0.002928
The actual headers are given on the tenth line, and the seventeenth line is where the actual data start.
Here's my code:
require "nyaplot"
# Note that DataFrame basically just inherits from Ruby's CSV module.
class SpectraHelper < Nyaplot::DataFrame
class << self
def from_csv filename
df = super(filename, col_sep: ' ') do |csv|
csv.convert do |field, info|
STDERR.puts "Field is #{field}"
end
end
end
end
def csv_headers
[:wavelength, :reflectance, :standard_deviation]
end
end
def read_asc filename
f = File.open(filename, "r")
16.times do
line = f.gets
puts "Ignoring #{line}"
end
d = SpectraHelper.from_csv(f)
end
The output suggests that my calls to f.gets are not actually ignoring those lines, and I can't understand why. Here are the first few lines of output:
Field is Clark
Field is and
Field is others
Field is 2007,
Field is USGS,
I tried looking for a tutorial or example which shows processing of more complicated CSV files, but haven't had much luck. If someone could point me towards a resource which answers this question, I would be grateful (and would prefer to mark that as accepted over a solution to my specific problem — but both would be appreciated).
Using Ruby 2.1.
It believe that you are using ::open which uses IO.open. This method will open the file again.
I modified the script a bit
require 'csv'
class SpectraHelper < CSV
def self.from_csv(filename)
df = open(filename, 'r' , col_sep: ' ') do |csv|
csv.drop(16).each {|c| p c}
end
end
end
def read_asc(filename)
SpectraHelper.from_csv(filename)
end
read_asc "data/csv1.csv"
It turns out the problem here was not with my understanding of CSV, but rather with now Nyaplot::DataFrame handles CSV files.
Basically, Nyaplot doesn't actually store things as CSVs. CSV is just an intermediate format. So a simple way to handle the files makes use of #khelli's suggestion:
def read_asc filename
Nyaplot::DataFrame.new(CSV.open(filename, 'r',
col_sep: ' ',
headers: [:wavelength, :reflectance, :standard_deviation],
converters: :numeric).
drop(16).
map do |csv_row|
csv_row.to_h.delete_if { |k,v| k.nil? }
end)
end
Thanks, everyone, for the suggestions.
I wouldn't use the CSV module since your file is not well formatted. the following code will read the file and give you an array of your records:
lines = File.open(filename,'r').readlines
lines.slice!(0,16)
records = lines.map {|line| line.chomp.split}
the recordsoutput:
[["0.205100", "-1.23e34", "0.090781"], ["0.213100", "-1.23e34", "0.018820"], ["0.221100", "-1.23e34", "0.005416"], ["0.229100", "-1.23e34", "0.002928"]]
I'm writing a small webserver. I want to read the HTTP Request. It works when there is no body involved. But when a body is sent then I can't read the content of the body in a satisfying manner.
I read the data coming from the client via TCPSocket. The TCPSocket::gets method reads until the data for the body is received. There is no delimiter or EOF send to signal for the end of the HTTP Request body. The HTTP/1.1 Specification - Section 4.4 lists five cases to get the message length. Point 1) works. Points 2) and 4) are not relevant for my application. Point 5) is not an option because I need to send an response.
I can read the value of the Content-Length field. But when I try to "persuade" the TCPSocket to read the last part of the HTTP Request via read(contentlength) or rcv(contentlength), I have no success. Reading line-by-line until the \r\n which separates Header and Body works, but after that I'm stuck - at least in the way I want to do it.
So my questions are:
Is there a possibility to do is like I intended in the code?
Are there better ways to achieve my goal of reading the HTTP Request correctly (which I really hope for)?
Here is runnable code. The parts that I want to work is in comments.
#!/usr/bin/ruby
require 'socket'
server = TCPServer.new 2000
loop do
Thread.start(server.accept) do |client|
hascontent = false
contentlength = 0
content = ""
request = ""
#This seems to work, but I'm not really happy with it, too much is happening in
#the loop
while(buf = client.readpartial(4096))
request = request + buf
split = request.split("\r\n")
puts request
puts request.dump
puts split.length
puts split.inspect
if(request.index("\r\n\r\n")>0)
break
end
end
#This part is commented out because it doesn't work
=begin
while(line = client.gets)
puts ":" + line
request = request + line
if(line.start_with?("Content-Length"))
hascontent = true
split = line.split(' ')
contentlength = split[1]
end
if(line == "\r\n" and !hascontent)
break
end
if(line == "\r\n" and hascontent)
puts "Trying to get content :P"
puts contentlength
puts content.length
puts client.inspect
#tried read, with and without parameter, rcv, also with and
#without param and their nonblocking couterparts
#where does my thought process go in the wrong direction
while(readin = client.readpartial(contentlength))
puts readin
content = content + readin
end
break
end
end
=end
puts request
client.close
end
So... I have just had this issue for the past 2 hours also, and so I did some digging into the Socket API. Turns out Socket extends BasicSocket which has a method recvmsg. When I tried calling it I got the following:
["GET / HTTP/1.1\r\nHost: localhost:12357\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en-US,en;q=0.9\r\n\r\n", #<Addrinfo: empty-sockaddr SOCK_STREAM>, 0]
I.E. My the complete HTTP request, the sender's address information and any other ruby flags raised.
You can use recvmsg to read the entire HTTP request:
raw_request = client.recvmsg()
request = /(?<METHOD>\w+) \/(?<RESOURCE>[^ ]*) HTTP\/1.\d\r\n(?<HEADERS>(.+\r\n)*)(?:\r\n)?(?<BODY>(.|\s)*)/i.match(raw_request)
p request["BODY"]
I have no idea how to do it without recvmsg but I am glad the functionality exists.
Hi I have a simple form that allows a user to input a name, their gender and a password. I use Digest::MD5.hexdigest to encrypt the input. Once I have the encrypted input eg, d1c261ede46c1c66b7e873564291ebdc, I want to be able to append this to a file I have already created. However every thing I have tried just isn't working. Can anyone please help and thank you in advance. Here is what I have:
input = STDIN.read( ENV["CONTENT_LENGHT"] )
puts "Content-type: text/html \n\n"
require 'digest/md5'
digest = Digest::MD5.hexdigest(input)
f = File.open("register.txt", "a")
f.write(digest)
f.close
I have also tried this with no luck:
File.open("register.txt", "a") do |f|
f.puts(digest)
end
If the code is verbatim then I think you have a typo in the first line: did you mean CONTENT_LENGHT or is it a typo? ENV[] will return a string if the variable is set, which will upset STDIN#read. I get TypeError: can't convert String into Integer. Assuming the typo, then ENV[] returns nil, which tells STDIN#read to read until EOF, which from the console means, I think, Control-Z. That might be causing a problem.
I suggest you investigate by modifying your script thus:
read_length = ENV["CONTENT_LENGTH"].to_i # assumed typo fixed, convert to integer
puts "read length = #{read_length}"
input = STDIN.read( read_length )
puts "input = #{input}"
puts "Content-type: text/html \n\n" # this seems to serve no purpose
require 'digest/md5'
digest = Digest::MD5.hexdigest(input)
puts "digest = #{digest}"
# prefer this version: it's more idiomatically "Rubyish"
File.open("register.txt", "a") do |f|
puts "file opened"
f.puts(digest)
end
file_content = File.read("register.txt")
puts "done, file content = #{file_content}"
This works on my machine, with the following output (when CONTENT_LENGTH set to 12):
read length = 12
abcdefghijkl
input = abcdefghijkl
Content-type: text/html
digest = 9fc9d606912030dca86582ed62595cf7
file opened
done, file content = 6cfbc6ae37c91b4faf7310fbc2b7d5e8
e271dc47fa80ddc9e6590042ad9ed2b7
b0fb8772912c4ac0f13525409c2b224e
9fc9d606912030dca86582ed62595cf7