I have a log file and need to create a hash key for each URL in the record. Each line from the record has been placed into an array and I am looping through the array assigning hash keys.
I need to get from this:
"2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"
to this:
"/logschecks/scripts/setup1.php"
I have tried using match, scan and split but they have both failed to get me where I need to go.
My method currently looks like:
def pathHistogram (rowsInFile)
i = 0
urlHash = Hash.new
while i <= rowsInFile.length - 1
urlKey = rowsInFile[i].scan(/<"GET ">/).last.first
if urlHash.has_key?(urlKey) == true
#get the number of stars already in there and add one.
urlHash[urlKey] = urlHash[urlKey] + '*'
i = i + 1
else
urlHash[urlKey] = '*'
i = i + 1
end
end
end
I know that just scanning the "GET " won't complete the job but I was trying to baby-step through it. The match and split versions that I tried were fairly epic-fails, but I was likely using them incorrectly and they are long gone.
Running this script gives me an undefined method error on "first", though I have gotten other errors when I vary the way this is handled.
I should also say I am not married to using scan. If another method would work better, I would be more than happy to switch.
Any help would be greatly appreciated.
You state in a comment to the other answer the pattern is basically "GET ... HTTP, where you are interested in the ... part. That can be extracted very easily:
line = '2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"'
line[/"GET (.*?) HTTP/, 1]
# => "/logschecks/scripts/setup1.php"
Assuming each of your input lines contains /logschecks/...:
x = "2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: \"GET /logschecks/scripts/setup1.php HTTP/1.1\", host: \"www.example.com\""
x[%r(/logscheck[/\w\.]+)] # => "/logschecks/scripts/setup1.php"
Scanning HTTP logs isn't hard, but how you go about it will vary depending on the format. In the sample you're giving it's easier than a standard log because you have some landmarks you can look for:
Search for request: " using something like:
/request: "\S+ (\S+)/i
That pattern will skip over GET, POST, HEAD or whatever method was used for the request.
log_line[/request: "\S+ (\S+)/i, 1] # => "/logschecks/scripts/setup1.php"
You might want to know that if you're mining your logs. In that case...
Search for request: "[GET|POST|HEAD|...] using something like:
/request: "(\S+) (\S+)/i
You'd use it like:
method, url = log_line.match(/request: "(\S+) (\S+)/i).captures # => ["GET", "/logschecks/scripts/setup1.php"]
method # => "GET"
url # => "/logschecks/scripts/setup1.php"
You can also grab whatever is inside the double-quotes, then split it to get at the parts:
/request: "([^"]+)"/i
For instance:
log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"]
method, url, http_ver = log_line[/request: "([^"]+)"/i, 1].split # => ["GET", "/logschecks/scripts/setup1.php", "HTTP/1.1"]
method # => "GET"
url # => "/logschecks/scripts/setup1.php"
http_ver # => "HTTP/1.1"
Or use a bit more complex pattern, using some of the modern extensions and reduce the code:
log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"]
/request: "(?<method>\S+) (?<url>\S+) (?<http_ver>\S+)"/i =~ log_line
method # => "GET"
url # => "/logschecks/scripts/setup1.php"
http_ver # => "HTTP/1.1"
Related
I am trying to create a custom fact I can use as the value for a class parameter in a hiera yaml file.
I am using the openstack/puppet-keystone module and I want to use fernet-keys.
According to the comments in the module I can use this parameter.
# [*fernet_keys*]
# (Optional) Hash of Keystone fernet keys
# If you enable this parameter, make sure enable_fernet_setup is set to True.
# Example of valid value:
# fernet_keys:
# /etc/keystone/fernet-keys/0:
# content: c_aJfy6At9y-toNS9SF1NQMTSkSzQ-OBYeYulTqKsWU=
# /etc/keystone/fernet-keys/1:
# content: zx0hNG7CStxFz5KXZRsf7sE4lju0dLYvXdGDIKGcd7k=
# Puppet will create a file per key in $fernet_key_repository.
# Note: defaults to false so keystone-manage fernet_setup will be executed.
# Otherwise Puppet will manage keys with File resource.
# Defaults to false
So wrote this custom fact ...
[root#puppetmaster modules]# cat keystone_fernet/lib/facter/fernet_keys.rb
Facter.add(:fernet_keys) do
setcode do
fernet_keys = {}
puts ( 'Debug keyrepo is /etc/keystone/fernet-keys' )
Dir.glob('/etc/keystone/fernet-keys/*').each do |fernet_file|
data = File.read(fernet_file)
if data
content = {}
puts ( "Debug Key file #{fernet_file} contains #{data}" )
fernet_keys[fernet_file] = { 'content' => data }
end
end
fernet_keys
end
end
Then in my keystone.yaml file I have this line:
keystone::fernet_keys: '%{::fernet_keys}'
But when I run puppet agent -t on my node I get this error:
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, "{\"/etc/keystone/fernet-keys/1\"=>{\"content\"=>\"xxxxxxxxxxxxxxxxxxxx=\"}, \"/etc/keystone/fernet-keys/0\"=>{\"content\"=>\"xxxxxxxxxxxxxxxxxxxx=\"}}" is not a Hash. It looks to be a String at /etc/puppetlabs/code/environments/production/modules/keystone/manifests/init.pp:1144:7 on node mgmt-01
I had assumed that I had formatted the hash correctly because facter -p fernet_keys output this on the agent:
{
/etc/keystone/fernet-keys/1 => {
content => "xxxxxxxxxxxxxxxxxxxx="
},
/etc/keystone/fernet-keys/0 => {
content => "xxxxxxxxxxxxxxxxxxxx="
}
}
The code in the keystone module looks like this (with line numbers)
1142
1143 if $fernet_keys {
1144 validate_hash($fernet_keys)
1145 create_resources('file', $fernet_keys, {
1146 'owner' => $keystone_user,
1147 'group' => $keystone_group,
1148 'subscribe' => 'Anchor[keystone::install::end]',
1149 }
1150 )
1151 } else {
Puppet does not necessarily think your fact value is a string -- it might do, if the client is set to stringify facts, but that's actually beside the point. The bottom line is that Hiera interpolation tokens don't work the way you think. Specifically:
Hiera can interpolate values of any of Puppet’s data types, but the
value will be converted to a string.
(Emphasis added.)
My idea is using Rack to load a web application separated in two parts. This is the structure:
my_app
|______api
| |______poros
|
|______fe
| |______build
| |______node_modules
| |______package.json
| |______public
| | |______ index.html
| | ...
| |______README.md
| |______src
| |______index.js
| ...
|______config.ru
The fe part (which stands for front-end) is a React application created with create-react-app. I've built it with
npm run build
and then its static optimized build is under my_app/fe/build directory.
This is my my_app/config.ru:
#\ -w -p 3000
require 'emeraldfw'
use Rack::Reloader, 0
use Rack::ContentLength
run EmeraldFW::Loader.new
And this is my EmeraldFW::Loader class, which is part of a gem installed and running fine.
module EmeraldFW
class Loader
def call(env)
path_info = (env['PATH_INFO'] == '/') ? '/index.html' : env['PATH_INFO']
extension = path_info.split('/').last.split('.').last
file = File.read("fe/build#{path_info}")
[200,{'Content-Type' => content_type[extension]},[file]]
end
def content_type
{
'html' => 'text/html',
'js' => 'text/javascript',
'css' => 'text/css',
'svg' => 'image/svg+xm',
'ico' => 'image/x-icon',
'map' => 'application/octet-stream'
}
end
end
end
As you may see, this is all quite simple. I capture the request with in my EmeraldFW::Loader class and transform its path_info it a bit. If the request is '/' I rewrite it to '/index.html' before doing anything else. In all cases I prepend fe/build to make it load from the static build of the React application.
When I run
rackup config.ru
and load the application at http://localhost:3000 the result is completely fine:
[2017-03-15 21:28:23] INFO WEBrick 1.3.1
[2017-03-15 21:28:23] INFO ruby 2.3.3 (2016-11-21) [x86_64-linux]
[2017-03-15 21:28:23] INFO WEBrick::HTTPServer#start: pid=11728 port=3000
::1 - - [15/Mar/2017:21:28:27 -0300] "GET / HTTP/1.1" 200 386 0.0088
::1 - - [15/Mar/2017:21:28:27 -0300] "GET /static/css/main.9a0fe4f1.css HTTP/1.1" 200 623 0.0105
::1 - - [15/Mar/2017:21:28:27 -0300] "GET /static/js/main.91328589.js HTTP/1.1" 200 153643 0.0086
::1 - - [15/Mar/2017:21:28:28 -0300] "GET /static/media/logo.5d5d9eef.svg HTTP/1.1" 200 2671 0.0036
::1 - - [15/Mar/2017:21:28:28 -0300] "GET /static/js/main.91328589.js.map HTTP/1.1" 200 1705922 0.1079
::1 - - [15/Mar/2017:21:28:28 -0300] "GET /static/css/main.9a0fe4f1.css.map HTTP/1.1" 200 105 0.0021
As you may see, all resources are loading correctly, with the correct mime types. But it happens that my default React logo, which should be spinning in my app frontpage, is not there! As if it wasn't loaded.
The big picture of all this is having this rack middleware loading the React front-end of the app and, at the same time, redirecting correctly the api requests made with fetch from the front-end to the poros (Plain Old Ruby Objects) which are the API part.
The concept is quite simple. But I just can't understand why this specific resource, the svg logo, is not loading.
It was all a matter of wrong mime type, caused by a typo.
It may be clearly seen in my question that I wrote:
def content_type
{
'html' => 'text/html',
'js' => 'text/javascript',
'css' => 'text/css',
'svg' => 'image/svg+xm', # <== Here is the error!!!
'ico' => 'image/x-icon',
'map' => 'application/octet-stream'
}
end
when the correct mime type for SVG files is 'image/svg-xml', with an 'l'...
This was making the browser ignore the file received and so it won't display it correctly.
I'm having issues getting data from GitHub Archive.
The main issue is my problem with encoding {} and .. in my URL. Maybe I am misreading the Github API or not understanding encoding correctly.
require 'open-uri'
require 'faraday'
conn = Faraday.new(:url => 'http://data.githubarchive.org/') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter Faraday.default_adapter # make requests with Net::HTTP
end
#query = '2015-01-01-15.json.gz' #this one works!!
query = '2015-01-01-{0..23}.json.gz' #this one doesn't work
encoded_query = URI.encode(query)
response = conn.get(encoded_query)
p response.body
The GitHub Archive example for retrieving a range of files is:
wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz
The {0..23} part is being interpreted by wget itself as a range of 0 .. 23. You can test this by executing that command with the -v flag which returns:
wget -v http://data.githubarchive.org/2015-01-01-{0..1}.json.gz
--2015-06-11 13:31:07-- http://data.githubarchive.org/2015-01-01-0.json.gz
Resolving data.githubarchive.org... 74.125.25.128, 2607:f8b0:400e:c03::80
Connecting to data.githubarchive.org|74.125.25.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2615399 (2.5M) [application/x-gzip]
Saving to: '2015-01-01-0.json.gz'
2015-01-01-0.json.gz 100%[===========================================================================================================================================>] 2.49M 3.03MB/s in 0.8s
2015-06-11 13:31:09 (3.03 MB/s) - '2015-01-01-0.json.gz' saved [2615399/2615399]
--2015-06-11 13:31:09-- http://data.githubarchive.org/2015-01-01-1.json.gz
Reusing existing connection to data.githubarchive.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 2535599 (2.4M) [application/x-gzip]
Saving to: '2015-01-01-1.json.gz'
2015-01-01-1.json.gz 100%[===========================================================================================================================================>] 2.42M 867KB/s in 2.9s
2015-06-11 13:31:11 (867 KB/s) - '2015-01-01-1.json.gz' saved [2535599/2535599]
FINISHED --2015-06-11 13:31:11--
Total wall clock time: 4.3s
Downloaded: 2 files, 4.9M in 3.7s (1.33 MB/s)
In other words, wget is substituting values into the URL and then getting that new URL. This isn't obvious behavior, nor is it well documented, but you can find mention of it "out there". For instance in "All the Wget Commands You Should Know":
7. Download a list of sequentially numbered files from a server
wget http://example.com/images/{1..20}.jpg
To do what you want, you need to iterate over the range in Ruby using something like this untested code:
0.upto(23) do |i|
response = conn.get("/2015-01-01-#{ i }.json.gz")
p response.body
end
To get a better idea of what's going wrong, let's start with the example given in the GitHub documentation:
wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz
The thing to note here is that {0..23} is automagically getting expanded by bash. You can see this by running the following command:
echo {0..23}
> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
This means wget doesn't get called just once, but instead gets called a total of 24 times. The problem you're having is that Ruby doesn't automagically expand {0..23} like bash does, and instead you're making a literal call to http://data.githubarchive.org/2015-01-01-{0..23}.json.gz, which doesn't exist.
Instead you will need to loop through 0..23 yourself and make a single call every time:
(0..23).each do |n|
query = "2015-01-01-#{n}.json.gz"
encoded_query = URI.encode(query)
response = conn.get(encoded_query)
p response.body
end
I am using the plivo search number api to list telephone numbers by selected parameters.
The ruby code is:
get '/search_numbers' do
p = RestAPI.new(AUTH_ID, AUTH_TOKEN)
params = {'country_iso' => 'GB'}
response = p.search_phone_number(params)
end
and the error I receive is:
88.106.106.78 - - [22/Jan/2015:15:02:59 +0000] "GET /search_numbers HTTP/1.0" 200 - 1.7485
2015-01-22 15:02:59 +0000: Read error: #<NoMethodError: undefined method `bytesize' for ["api_id", "c025f0cc-a247-11e4-b153-22000abcaa64"]:Array>
I simply cannot figure out what is wrong. I can see no typos, the parameter that is passed is correct and in the correct format and the function is the right function to use.
Help, please!
How can I disable messages from webrick echoed on to the terminal? For the INFO messages that appear at the beginning, I was able to disable it by setting the Logger parameter so as:
s = WEBrick::HTTPServer.new(
Port: 3000,
BindAddress: "localhost",
Logger: WEBrick::Log.new("/dev/null"),
)
But I further want to disable the messages that look like:
localhost - - [17/Jun/2011:10:01:38
EDT] "GET .... HTTP/1.1" 200 0
http://localhost:3000/ -> .....
when a request is made from the web browser.
Following the link to the source and suggestion provied by Yet Another Geek, I was able to figure out a way. Set the AccessLog parameter to [nil, nil] [] (Changed following suggestion by Robert Watkins).
s = WEBrick::HTTPServer.new(
Port: 3000,
BindAddress: "localhost",
Logger: WEBrick::Log.new("/dev/null"),
AccessLog: [],
)