logstash filter encode character failed - ruby

we have a log in GB18030 charset, and we want to encode the log to UTF8 when it's shipped from logstash to elasticsearch. After read the document from logstash website, we found a filter could do this for us. I had never used RUBY before, I copied a logstash filter from logstash website and modified a little.
# Call this file 'foo.rb' (in logstash/filters, as above)
require "logstash/filters/base"
require "logstash/namespace"
require "logstash/environment"
class LogStash::Filters::Foo < LogStash::Filters::Base
# Setting the config_name here is required. This is how you
# configure this filter from your logstash config.
#
# filter {
# foo { ... }
# }
config_name "foo"
attr_accessor :logger
# New plugins should start life at milestone 1.
milestone 1
public
def register
# nothing to do
end # def register
public
def filter(event)
# return nothing unless there's an actual filter event
return unless filter?(event)
# Replace the event message with our message as configured in the
# config file.
a = event["message"].force_encoding("GB18030")
$stdout.write("\n origin:"+a+"\n")
$stdout.write("\n encoded:+ a.encode("UTF-8","GB18030")+"\n\n")
event["message"] = event["message"].encode("UTF-8","GB18030")
# filter_matched should go in the last line of our successful code
filter_matched(event)
end # def filter
end # class LogStash::Filters::Foo
Here's the screen output
origin:\xA1\xA2\xD3\xFE]\xCA\xFD\xBE\xDD\xD5\xFD\xD4\xD3\xCA\xFD\xBE\xE2\xBC\xD3
encoded:\xA1\xA2\xD3\xFE]\xCA\xFD\xBE\xDD\xD5\xFD\xD4\xD3\xCA\xFD\xBE\xE2\xBC\xD3
It seems the "encode" function didn't work. I tried encode the string above in RUBY2.1.5 interpreter, it works fine.

Related

Minitest reports the wrong line number when an assertion fails inside a block

I have written an assertion that collects new records created while it yields to a block. Here's an example, with a failing assertion inside that block:
product =
assert_latest_record Product do # line 337
post :create,
:product => { ... }
assert false # line 340
end
The source of my assertion is below, but I don't think it's relevant. It does not intercept Minitest exceptions, or even call rescue or ensure.
The problem is when an assertion inside that block fails. The fault diagnostic message reports the line number as 337 the line of the outer assertion, not 340, the line of the inner assertion that failed. This is important if, for example, my colleagues have written a run-on test with way too many lines in it; isolating a failing line becomes more difficult.
Why doesn't Minitest report the correct line number?
The source:
##
# When a test case calls methods that write new ActiveModel records to a database,
# sometimes the test needs to assert those records were created, by fetching them back
# for inspection. +assert_latest_record+ collects every record in the given model or
# models that appear while its block runs, and returns either a single record or a ragged
# array.
#
# ==== Parameters
#
# * +models+ - At least 1 ActiveRecord model or association.
# * +message+ - Optional string or ->{block} to provide more diagnostics at failure time.
# * <code>&block</code> - Required block to call and monitor for new records.
#
# ==== Example
#
# user, email_addresses =
# assert_latest_record User, EmailAddress, ->{ 'Need moar records!' } do
# post :create, ...
# end
# assert_equal 'franklyn', user.login # 1 user, so not an array
# assert_equal 2, email_addresses.size
# assert_equal 'franklyn#gmail.com', email_addresses.first.mail
# assert_equal 'franklyn#hotmail.com', email_addresses.second.mail
#
# ==== Returns
#
# The returned value is a set of one or more created records. The set is normalized,
# so all arrays of one item are replaced with the item itself.
#
# ==== Operations
#
# The last argument to +assert_latest_record+ can be a string or a callable block.
# At failure time the assertion adds this string or this block's return value to
# the diagnostic message.
#
# You may call +assert_latest_record+ with anything that responds to <code>.pluck(:id)</code>
# and <code>.where()</code>, including ActiveRecord associations:
#
# user = User.last
# email_address =
# assert_latest_record user.email_addresses do
# post :add_email_address, user_id: user.id, ...
# end
# assert_equal 'franklyn#philly.com', email_address.mail
# assert_equal email_address.user_id, user.id, 'This assertion is redundant.'
#
def assert_latest_record(*models, &block)
models, message = _get_latest_record_args(models, 'assert')
latests = _get_latest_record(models, block)
latests.include?(nil) and _flunk_latest_record(models, latests, message, true)
pass # Increment the test runner's assertion count
return latests.size > 1 ? latests : latests.first
end
##
# When a test case calls methods that might write new ActiveModel records to a
# database, sometimes the test must check that no records were written.
# +refute_latest_record+ watches for new records in the given class or classes
# that appear while its block runs, and fails if any appear.
#
# ==== Parameters
#
# See +assert_latest_record+.
#
# ==== Operations
#
# refute_latest_record User, EmailAddress, ->{ 'GET should not create records' } do
# get :index
# end
#
# The last argument to +refute_latest_record+ can be a string or a callable block.
# At failure time the assertion adds this string or this block's return value to
# the diagnostic message.
#
# Like +assert_latest_record+, you may call +refute_latest_record+ with anything
# that responds to <code>pluck(:id)</code> and <code>where()</code>, including
# ActiveRecord associations.
#
def refute_latest_record(*models, &block)
models, message = _get_latest_record_args(models, 'refute')
latests = _get_latest_record(models, block)
latests.all?(&:nil?) or _flunk_latest_record(models, latests, message, false)
pass
return
end
##
# Sometimes a test must detect new records without using an assertion that passes
# judgment on whether they should have been written. Call +get_latest_record+ to
# return a ragged array of records created during its block, or +nil+:
#
# user, email_addresses, posts =
# get_latest_record User, EmailAddress, Post do
# post :create, ...
# end
#
# assert_nil posts, "Don't create Post records while creating a User"
#
# Unlike +assert_latest_record+, +get_latest_record+ does not take a +message+ string
# or block, because it has no diagnostic message.
#
# Like +assert_latest_record+, you may call +get_latest_record+ with anything
# that responds to <code>.pluck(:id)</code> and <code>.where()</code>, including
# ActiveRecord associations.
#
def get_latest_record(*models, &block)
assert models.any?, 'Call get_latest_record with one or more ActiveRecord models or associations.'
refute_nil block, 'Call get_latest_record with a block.'
records = _get_latest_record(models, block)
return records.size > 1 ? records : records.first
end # Methods should be easy to use correctly and hard to use incorrectly...
def _get_latest_record_args(models, what) #:nodoc:
message = nil
message = models.pop unless models.last.respond_to?(:pluck)
valid_message = message.nil? || message.kind_of?(String) || message.respond_to?(:call)
models.length > 0 && valid_message and return models, message
raise "call #{what}_latest_record(models..., message) with any number\n" +
'of Model classes or associations, followed by an optional diagnostic message'
end
private :_get_latest_record_args
def _get_latest_record(models, block) #:nodoc:
id_sets = models.map{ |model| model.pluck(*model.primary_key) } # Sorry about your memory!
block.call
record_sets = []
models.each_with_index do |model, index|
pk = model.primary_key
set = id_sets[index]
records =
if set.length == 0
model
elsif pk.is_a?(Array)
pks = pk.map{ |k| "`#{k}` = ?" }.join(' AND ')
pks = [ "(#{pks})" ] * set.length
pks = pks.join(' OR ')
model.where.not(pks, *set.flatten)
else
model.where.not(pk => set)
end
records = records.order(pk).to_a
record_sets.push records.size > 1 ? records : records.first
end
return record_sets
end
private :_get_latest_record
def _flunk_latest_record(models, latests, message, polarity) #:nodoc:
itch_list = []
models.each_with_index do |model, index|
records_found = latests[index] != nil
records_found == polarity or itch_list << model.name
end
itch_list = itch_list.join(', ')
diagnostic = "should#{' not' unless polarity} create new #{itch_list} record(s) in block"
message = nil if message == ''
message = message.call.to_s if message.respond_to?(:call)
message = [ message, diagnostic ].compact.join("\n")
raise Minitest::Assertion, message
end
private :_flunk_latest_record
You could try to configure it to log exceptions in test_helper.rb:
def MiniTest.filter_backtrace(backtrace)
backtrace
end
I'm not sure if this is the default, but depending on your configuration, the backtrace might not be shown.

How do I call a function in Ruby?

I'm trying to call but I keep getting an error. This is my code:
require 'rubygems'
require 'net/http'
require 'uri'
require 'json'
class AlchemyAPI
#Setup the endpoints
##ENDPOINTS = {}
##ENDPOINTS['taxonomy'] = {}
##ENDPOINTS['taxonomy']['url'] = '/url/URLGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['text'] = '/text/TextGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['html'] = '/html/HTMLGetRankedTaxonomy'
##BASE_URL = 'http://access.alchemyapi.com/calls'
def initialize()
begin
key = File.read('C:\Users\KVadher\Desktop\api_key.txt')
key.strip!
if key.empty?
#The key file should't be blank
puts 'The api_key.txt file appears to be blank, please copy/paste your API key in the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
Process.exit(1)
end
if key.length != 40
#Keys should be exactly 40 characters long
puts 'It appears that the key in api_key.txt is invalid. Please make sure the file only includes the API key, and it is the correct one.'
Process.exit(1)
end
#apiKey = key
rescue => err
#The file doesn't exist, so show the message and create the file.
puts 'API Key not found! Please copy/paste your API key into the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
#create a blank file to hold the key
File.open("api_key.txt", "w") {}
Process.exit(1)
end
end
# Categorizes the text for a URL, text or HTML.
# For an overview, please refer to: http://www.alchemyapi.com/products/features/text-categorization/
# For the docs, please refer to: http://www.alchemyapi.com/api/taxonomy/
#
# INPUT:
# flavor -> which version of the call, i.e. url, text or html.
# data -> the data to analyze, either the the url, text or html code.
# options -> various parameters that can be used to adjust how the API works, see below for more info on the available options.
#
# Available Options:
# showSourceText -> 0: disabled (default), 1: enabled.
#
# OUTPUT:
# The response, already converted from JSON to a Ruby object.
#
def taxonomy(flavor, data, options = {})
unless ##ENDPOINTS['taxonomy'].key?(flavor)
return { 'status'=>'ERROR', 'statusInfo'=>'Taxonomy info for ' + flavor + ' not available' }
end
#Add the URL encoded data to the options and analyze
options[flavor] = data
return analyze(##ENDPOINTS['taxonomy'][flavor], options)
print
end
**taxonomy(text,"trees",1)**
end
In ** ** I have entered my call. Am I doing something incorrect. The error I receive is:
C:/Users/KVadher/Desktop/testrub:139:in `<class:AlchemyAPI>': undefined local variable or method `text' for AlchemyAPI:Class (NameError)
from C:/Users/KVadher/Desktop/testrub:6:in `<main>'
I feel as though I'm calling as normal and that there is something wrong with the api code itself? Although I may be wrong.
Yes, as jon snow says, the function (method) call must be outside of the class. The methods are defined along with the class.
Also, Options should be a Hash, not a number, as you call options[flavor] = data, which is going to cause you another problem.
I believe maybe you meant to put text in quotes, as that is one of your flavors.
Furthermore, because you declared a class, this is called an instance method, and you must make an instance of the class to use this:
my_instance = AlchemyAPI.new
my_taxonomy = my_instance.taxonomy("text", "trees")
That's enough to get it to work, it seems like you have a ways to go to get this all working though. Good luck!

What is the proper way to input options from a file? | Ruby Scripts

I'm trying to write a script that will take ip addresses from a host file, and username info from a config file. I'm obviously not holding a the file-name as a proper hash/value.
What should my File.new(options[:config_file], 'r').each { |params| puts params } be calling? I've tried what it is currently set too, and
File.new(config_file, 'r').each { |params| puts params }, as well as File.new(:config_file, 'r').each { |params| puts params } with no luck.
Should I be doing something different all together? Like load(filename = nil)?
options = {}
opt_parser = OptionParser.new do |opt|
opt.banner = 'Usage: opt_parser COMMAND [OPTIONS]'
opt.on('--host_file','I need hosts, put them here') do |host_file|
options[:host_file] = host_file
end
opt.on('--config_file', 'I need config info, put it here') do |config_file|
options[:config_file] = config_file
end
opt.on('-h', '--help', 'What your looking at') do |help|
options[:help] = help
puts opt
end
end
opt_parser.parse!
if options[:config_file]
File.new(options[:config_file], 'r').each { |params| puts params }
end
if options[:host_file]
File.new(options[:host_file], 'r').each { |host| puts host }
end
Parsing the hosts file
You can write your own parser or use a gem already implementing one.
Example using the "hosts" gem: (you need to install it)
require 'hosts'
hosts = Hosts::File.read('/etc/hosts')
entries = hosts.elements.select{ |element| element.is_a? Hosts::Entry }
addresses = Hash[entries.map{ |entry| [entry.name, entry.address] }]
# You should get a hash of entry names and addresses
# {"localhost"=>"127.0.0.1", "ip6-localhost"=>"::1"}
Parsing the config file
A common way to store configuration is to use YAML files.
Considering the following YAML file (in '/tmp/config.yml'):
username: foo
password: bar
You can parse this config file using the YAML module:
require 'yaml'
config = YAML.load_file('config.yml')
# You should get a hash of config values
# {"username"=>"foo", "password"=>"bar"}
If you don't want your password stored in plain text in a config file, you can:
ask the password at runtime if your context allow that
use environment variable to store the password and retrieve it at runtime
Edit:
If you only need to extract your hostnames from a text file, considering one hostname per line, you can use something like hostnames = IO.readlines("config.yml").map{ |line| line.chomp } to get an array of hostnames. You can after iterate through this array to do your operations.
www.ruby-doc.org/core-2.1.0/IO.html#method-i-readline

Jekyll - generating JSON files alongside the HTML files

I'd like to make Jekyll create an HTML file and a JSON file for each page and post. This is to offer a JSON API of my Jekyll blog - e.g. a post can be accessed either at /posts/2012/01/01/my-post.html or /posts/2012/01/01/my-post.json
Does anyone know if there's a Jekyll plugin, or how I would begin to write such a plugin, to generate two sets of files side-by-side?
I was looking for something like this too, so I learned a bit of ruby and made a script that generates JSON representations of Jekyll blog posts. I’m still working on it, but most of it is there.
I put this together with Gruntjs, Sass, Backbonejs, Requirejs and Coffeescript. If you like, you can take a look at my jekyll-backbone project on Github.
# encoding: utf-8
#
# Title:
# ======
# Jekyll to JSON Generator
#
# Description:
# ============
# A plugin for generating JSON representations of your
# site content for easy use with JS MVC frameworks like Backbone.
#
# Author:
# ======
# Jezen Thomas
# jezenthomas#gmail.com
# http://jezenthomas.com
module Jekyll
require 'json'
class JSONGenerator < Generator
safe true
priority :low
def generate(site)
# Converter for .md > .html
converter = site.getConverterImpl(Jekyll::Converters::Markdown)
# Iterate over all posts
site.posts.each do |post|
# Encode the HTML to JSON
hash = { "content" => converter.convert(post.content)}
title = post.title.downcase.tr(' ', '-').delete("’!")
# Start building the path
path = "_site/dist/"
# Add categories to path if they exist
if (post.data['categories'].class == String)
path << post.data['categories'].tr(' ', '/')
elsif (post.data['categories'].class == Array)
path << post.data['categories'].join('/')
end
# Add the sanitized post title to complete the path
path << "/#{title}"
# Create the directories from the path
FileUtils.mkpath(path) unless File.exists?(path)
# Create the JSON file and inject the data
f = File.new("#{path}/raw.json", "w+")
f.puts JSON.generate(hash)
end
end
end
end
There are two ways you can accomplish this, depending on your needs. If you want to use a layout to accomplish the task, then you want to use a Generator. You would loop through each page of your site and generate a new .json version of the page. You could optionally make which pages get generated conditional upon the site.config or the presence of a variable in the YAML front matter of the pages. Jekyll uses a generator to handle slicing blog posts up into indices with a given number of posts per page.
The second way is to use a Converter (same link, scroll down). The converter will allow you to execute arbitrary code on your content to translate it to a different format. For an example of how this works, check out the markdown converter that comes with Jekyll.
I think this is a cool idea!
Take a look at JekyllBot and the following code.
require 'json'
module Jekyll
class JSONPostGenerator < Generator
safe true
def generate(site)
site.posts.each do |post|
render_json(post,site)
end
site.pages.each do |page|
render_json(page,site)
end
end
def render_json(post, site)
#add `json: false` to YAML to prevent JSONification
if post.data.has_key? "json" and !post.data["json"]
return
end
path = post.destination( site.source )
#only act on post/pages index in /index.html
return if /\/index\.html$/.match(path).nil?
#change file path
path['/index.html'] = '.json'
#render post using no template(s)
post.render( {}, site.site_payload)
#prepare output for JSON
post.data["related_posts"] = related_posts(post,site)
output = post.to_liquid
output["next"] = output["next"].id unless output["next"].nil?
output["previous"] = output["previous"].id unless output["previous"].nil?
#write
#todo, figure out how to overwrite post.destination
#so we can just use post.write
FileUtils.mkdir_p(File.dirname(path))
File.open(path, 'w') do |f|
f.write(output.to_json)
end
end
def related_posts(post, site)
related = []
return related unless post.instance_of?(Post)
post.related_posts(site.posts).each do |post|
related.push :url => post.url, :id => post.id, :title => post.to_liquid["title"]
end
related
end
end
end
Both should do exactly what you want.

understanding Ruby code?

I was wondering if anyone can help me understanding the Ruby code below? I'm pretty new to Ruby programming and having trouble understanding the meaning of each functions.
When I run this with my twitter username and password as parameter, I get a stream of twitter feed samples. What do I need to do with this code to only display the hashtags?
I'm trying to gather the hashtags every 30 seconds, then sort from least to most occurrences of the hashtags.
Not looking for solutions, but for ideas. Thanks!
require 'eventmachine'
require 'em-http'
require 'json'
usage = "#{$0} <user> <password>"
abort usage unless user = ARGV.shift
abort usage unless password = ARGV.shift
url = 'https://stream.twitter.com/1/statuses/sample.json'
def handle_tweet(tweet)
return unless tweet['text']
puts "#{tweet['user']['screen_name']}: #{tweet['text']}"
end
EventMachine.run do
http = EventMachine::HttpRequest.new(url).get :head => { 'Authorization' => [ user, password ] }
buffer = ""
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/)
handle_tweet JSON.parse(line)
end
end
end
puts "#{tweet['user']['screen_name']}: #{tweet['text']}"
That line shows you a user name followed by the content of the tweet.
Let's take a step back for a sec.
Hash tags appear inside the tweet's content--this means they're inside tweet['text']. A hash tag always takes the form of a # followed by a bunch of non-space characters. That's really easy to grab with a regex. Ruby's core API facilitates that via String#scan. Example:
"twitter is short #foo yawn #bar".scan(/\#\w+/) # => ["#foo", "#bar"]
What you want is something like this:
def handle_tweet(tweet)
return unless tweet['text']
# puts "#{tweet['user']['screen_name']}: #{tweet['text']}" # OLD
puts tweet['text'].scan(/\#\w+/).to_s
end
tweet['text'].scan(/#\w+/) is an array of strings. You can do whatever you want with that array. Supposing you're new to Ruby and want to print the hash tags to the console, here's a brief note about printing arrays with puts:
puts array # => "#foo\n#bar"
puts array.to_s # => '["#foo", "#bar"]'
#Load Libraries
require 'eventmachine'
require 'em-http'
require 'json'
# Looks like this section assumes you're calling this from commandline.
usage = "#{$0} <user> <password>" # $0 returns the name of the program
abort usage unless user = ARGV.shift # Return first argument passed when program called
abort usage unless password = ARGV.shift
# The URL
url = 'https://stream.twitter.com/1/statuses/sample.json'
# method which, when called later, prints out the tweets
def handle_tweet(tweet)
return unless tweet['text'] # Ensures tweet object has 'text' property
puts "#{tweet['user']['screen_name']}: #{tweet['text']}" # write the result
end
# Create an HTTP request obj to URL above with user authorization
EventMachine.run do
http = EventMachine::HttpRequest.new(url).get :head => { 'Authorization' => [ user, password ] }
# Initiate an empty string for the buffer
buffer = ""
# Read the stream by line
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/) # cut each line at newline
handle_tweet JSON.parse(line) # send each tweet object to handle_tweet method
end
end
end
Here's a commented version of what the source is doing. If you just want the hashtag, you'll want to rewrite handle_tweet to something like this:
handle_tweet(tweet)
tweet.scan(/#\w/) do |tag|
puts tag
end
end

Resources