I have code that works but am soliciting suggestions for improvement.
I have a file containing ruby hashes:
{"dat"=>"2013-09-01T20:40:00-07:00", "sca"=>"5", "del"=>"755", "dir"=>"S"}
{"dat"=>"2013-09-01T21:00:00-07:00", "sca"=>"5", "del"=>"459", "dir"=>"S"}
that I want to convert to JSON that is both valid and human-readable. This code is compact and produces valid JSON...
#!/usr/bin/env ruby
# expected input: file of hashes, one/line
# output: properly formatted json array
require 'json'
json_array = []
while input = ARGF.gets
input.each_line do |line|
json_array.push( eval(line) )
end
end
print json_array
puts
..but without any newlines is not easily human-readable:
[{"dat"=>"2013-09-01T20:40:00-07:00", "sca"=>"5", "del"=>"755", "dir"=>"S"}, {"dat"=>"2013-09-01T21:00:00-07:00", "sca"=>"5", "del"=>"459", "dir"=>"S"}]
Substituting
puts JSON.pretty_generate(json_array)
for the two output lines above produces valid JSON that is human-readable, but verbose:
[
{
"dat": "2013-09-01T20:40:00-07:00",
"sca": "5",
"del": "755",
"dir": "S"
},
(more lines...)
Better from a human-readbiility standpoint would be to have a "record" on each line:
[
{"dat":"2013-09-01T20:40:00-07:00","sca":"5","del":"755","dir":"S"},
{"dat":"2013-09-01T21:00:00-07:00","sca":"5","del":"459","dir":"S"}
]
But in order to avoid the trailing comma issue [apparently a common problem - see http://trailingcomma.com/ ] I have resorted to an ugly loop with special casing. While it accomplishes the goal, I'm not happy about it and I feel like there must be a simpler way:
#!/usr/bin/env ruby
# expected input: file of hashes, one/line
# output: properly formatted json array
require 'json'
prevHash = ""
currHash = ""
puts "["
while input = ARGF.gets
# in order to to prevent a dangling comma on last element in output json array
# this counter-intuitive loop always outputs the prev, not the current, array elem
# with a trailing comma
input.each_line do |currLine|
currHash = eval(currLine) # convert string to hash
if (prevHash != "") # if not first time thru
puts " " + prevHash.to_json + ","
end
prevHash = currHash
end
end
# then, finally add the last array element *without* the troublesome trailing comma
puts " " + currHash.to_json
puts "]"
Suggestions welcome, particularly those that show me the artful one-liner that I missed.
JSON.pretty_generate accepts an optional hash parameter where you can configure the generator.
A state hash can have the following keys:
indent: a string used to indent levels (default: ”),
space: a string that is put after, a : or , delimiter (default: ”),
space_before: a string that is put before a : pair delimiter (default: ”),
object_nl: a string that is put at the end of a JSON object (default: ”),
array_nl: a string that is put at the end of a JSON array (default: ”),
allow_nan: true if NaN, Infinity, and -Infinity should be generated, otherwise an exception is thrown if these values are encountered. This options defaults to false.
max_nesting: The maximum depth of nesting allowed in the data structures from which JSON is to be generated. Disable depth checking with :max_nesting => false, it defaults to 19.
Playing around with that the closest I could get to your requirement is
JSON.pretty_generate(hash, {object_nl: '', indent: ' '})
which renders to
[
{ "dat": "2013-09-01T20:40:00-07:00", "sca": "5", "del": "755", "dir": "S"},
{ "dat": "2013-09-01T21:00:00-07:00", "sca": "5", "del": "459", "dir": "S"}
]
Related
How would you convert a string to an array in Ruby?
What I want to do is convert a string like "[value1, value2, value3]" to an array [value1, value2, value3]. Keep in mind some of these values may be strings themselves.
I am trying to write it in a method called str_to_ary.
def str_to_ary
#to_convert = self
#however everything I try beyond this point fails
end
Well, that looks like a JSON.
require 'json'
def str_to_ary
JSON.parse(#to_convert)
end
Note that this is true and works only if those string values in there are between double quotes, not single quotes.
well if you know that [ is always on the first place and ] is always on the last place then you can start with
string = "[X, 1, Test, 22, 3]"
trimmed = string[1,string.length-2]
array = trimmed.split(", ")
array => ["X", " 1", " Test", " 22", " 3"]
if you want to then cast 1, 22 or 3 into Integers then that's a different problem that requires more thought. What values are you expecting to have in the array?
I have a list of Unicode character codes that I would like to output with rumoji. Here's the code I'm using to iterate over my data.
require "rumoji"
# this works
puts Rumoji.decode("\u{1F600}")
# feed some data
data = [
"1F600",
"1F476",
"1F474"
]
data.each do |line|
# this doesn't work
puts Rumoji.decode("\u{#{line}}")
puts Rumoji.decode("\u{" + line + "}")
end
I'm not sure how I can use variable names inside the escaped string.
One can not use \u along with string interpolation, since \u takes precedence. What one might do, is to Array#pack an array of integers:
▶ data.map { |e| e.to_i(16) }.pack 'U*'
#⇒ "😀👶👴"
I have a string called "example", like this:
192.168.1.40,8.8.8.8,12.34.45.56,408,-,1812
192.168.1.128,192.168.101.222,12.34.45.56,384,-,1807
and I would like to obtain this output:
{"string1":"192.168.1.40","string2":"8.8.8.8",“string3":“12.34.45.56”,“string4”:408,“string5”:“-”,"string6":1812}
{"string1":"192.168.1.128","string2":"192.168.101.222",“string3":“12.34.45.56”,“string4”:384,“string5”:“-”,"string6":1807}
I did this:
example = example.gsub("\n","}\n{\"string1\": \"")
example = example.insert(0, "{\"string1\": \"")
example = example.concat("}")
and I obtained:
{"string1":"192.168.1.40,8.8.8.8,12.34.45.56,408,-,1812}
{"string1":"192.168.1.128,192.168.101.222,12.34.45.56,384,-,1807}
but I don't know how can I do the others changes. Thanks!!
Well, to get it as a ruby hash, which you can output as json or whatever you need:
out = {}
your_input_data.split(",").each_with_index { |val, i| out["string#{i}"] = val }
(but you would need to do this for each line: input.lines.each { |line| ... do the above here } - but I am not clear - do you want a list of maps?)
I made the assumption that you didn't want values that were just numbers to be double-quoted.
DATA.each_line do |line|
l = line.chomp.split(',').map.with_index do |v, i|
v = v =~ /^\d+$/ ? v : "\"#{v}\""
"\"string#{i+1}\":#{v}"
end
print "{", l.join(','), "}\n"
end
__END__
192.168.1.40,8.8.8.8,12.34.45.56,408,-,1812
192.168.1.128,192.168.101.222,12.34.45.56,384,-,1807
Result:
{"string1":"192.168.1.40","string2":"8.8.8.8","string3":"12.34.45.56","string4":408,"string5":"-","string6":1812}
{"string1":"192.168.1.128","string2":"192.168.101.222","string3":"12.34.45.56","string4":384,"string5":"-","string6":1807}
It seems from the code you wrote that you are looking for a single string as output rather than a more elaborate Ruby data structure or output to a printed stream.
This is working for me:
example = '192.168.1.40,8.8.8.8,12.34.45.56,408,-,1812
192.168.1.128,192.168.101.222,12.34.45.56,384,-,1807'
result = example.split("\n").map do |line|
n = 0
line.split(',').map{|s| %Q|"string#{n+=1}":"#{s}"|}.join(',')
end.map{|c| "{#{c}}"}.join("\n")
puts result
{"string1":"192.168.1.40","string2":"8.8.8.8","string3":"12.34.45.56","string4":"408","string5":"-","string6":"1812"}
{"string1":" 192.168.1.128","string2":"192.168.101.222","string3":"12.34.45.56","string4":"384","string5":"-","string6":"1807"}
This splits into lines then splits each line into separate strings, then concatenates each string with its JSON key and finally reassembles with join first with commas and then with newline. If you'd rathet have lists than reassembled strings, just omit the respective join.
I want to parse the following text and find lines that start with '+' or '-':
--- a/product.json
+++ b/product.json
## -1,4 +1,4 ##
{
- "name": "Coca Cola",
- "barcode": "41324134132"
+ "name": "Sprite",
+ "barcode": "41324134131"
}
\ No newline at end of file
When I find a such line, I want to store the attribute name. I.e., for:
- "name": "Coca Cola",
I want to store name in minus_array.
You want to iterate over the lines, and find lines that begin with either - or + followed by whitespace:
text = %[
--- a/product.json
+++ b/product.json
## -1,4 +1,4 ##
{
- "name": "Coca Cola",
- "barcode": "41324134132"
+ "name": "Sprite",
+ "barcode": "41324134131"
}
\ No newline at end of file
]
text.lines.select{ |l| l.lstrip[/^[+-]\s/] }.map{ |s| s.split[1] }
# => ["\"name\":", "\"barcode\":", "\"name\":", "\"barcode\":"]
lines splits a string on line-ends, returning the entire line, including the trailing line-end character.
lstrip removes whitespace at the start of the line. This is to normalize lines allowing the regex pattern to be a bit more simple.
l.lstrip[/^[+-]\s/] is a bit of Ruby String slight-of-hand, that basically says to apply the pattern to the string and return the matching text. If nothing matches in the string nil will be returned, which acts as false as far as select is concerned. If the string has something that matches the pattern, [] will return the text, which acts as a true value for select, which then passes on that string.
map iterates over all elements that select passes to it, and transforms the element by splitting it on spaces, which is the default behavior of split. [1] returns the second element in the string.
Here's an alternate path to the same place:
ary = []
text.lines.each do |l|
i = l.strip
ary << i if i[/^\{$/] .. i[/^}$/]
end
ary[1..-2].map{ |s| s.split[1] } # => ["\"name\":", "\"barcode\":", "\"name\":", "\"barcode\":"]
That'll get you started. How to remove duplicates, strip the leading/trailing double-quotes and colon is your task.
text.split(/\n/).select { |l| l =~ /^\+./ }
If you're using file:
File.open('your_file.txt', "r").select { |l| l =~ /^\+./ }
Use group_by to group according to the first character:
groups = text.lines.group_by { |l| l[0] }
groups['-']
# => ["--- a/product.json\n", "- \"name\": \"Coca Cola\",\n", "- \"barcode\": \"41324134132\"\n"]
groups['+']
# => ["+++ b/product.json\n", "+ \"name\": \"Sprite\",\n", "+ \"barcode\": \"41324134131\"\n"]
File.readlines("file.txt").each do |line|
if line.starts_with? '+ ' || line.starts_with? '- '
words = line.split(":")
key = words[0].match(/".*"/)
val = words[1].match(/".*"/)
# You can then do what you will with the name and value here
# For example, minus_array << if line.starts_with? '-'
end
end
I'm not entirely sure of the constraints you have with this, so I can't give a more specific answer. Basically, you can iterate the lines of a file with File.readlines('file') { }. Then we check for a the string to start with + or -, and grab the name and value accordingly. I put a space in the starts_with? because if I didn't it would also match the top two lines of your example.
Hopefully that's what you were looking for!
i'm writing a client to a third-party API, and they provide data in a weird format. At first, it might look like JSON but it's not, and i'm a bit confused about how i should handle that.
It's a key-value based format (much like JSON).
Keys are separated by '=' from their values.
Keys and values are wrapped within double-quotes.
Dictionaries start with '{' and end with '}'.
Arrays start with '('
and end with ')'
Lines end with ';' (Excepted for arrays content) and end-of-line character (\r i think).
Sometimes, there seem to be unicode (Stuff like \U2623 for the BioHazard sign) in strings.
What could possibly be this format? Shall i use a premade gem to parse it, or should i build my own parser?
{ "anArray" = (
"100",
"200",
"300"
);
"aDictionary" = {
"aString" = "Something";
};
}
EDIT This format seems to be Apple's property list, but it's not XML neither Binary... This make sense as the API is from a WebObjects webservice. i will try to use CFPropertyList gem to parse it, if there is a better solution, please let me know.
EDIT 2 This is a NextSTEP Property List.
Here's a robust answer using a custom StringScanner-based parser. It allows whitespace to be optional, allows trailing commas after the last item in a list and allows omitting the semicolon after the last dictionary key/value pair. It allows the outermost item to be an dictionary, array, or string. And it allows really any sort of legal string content, including parens and curly braces and escaped text like \n.
Seen in action:
p parse('{ "array" = ( "1", "2", ( "3", "4" ) ); "hash"={ "key"={ "more"="oh}]yes;!"; }; }; }')
#=> {"array"=>["1", "2", ["3", "4"]], "hash"=>{"key"=>{"more"=>"oh}]yes;!"}}}
puts parse('("Escaped \"Quotes\" Allowed", "And Unicode \u2623 OK")')
#=> Escaped "Quotes" Allowed
#=> And Unicode ☣ OK
The code:
require 'strscan'
def parse(str)
ss, getstr, getary, getdct = StringScanner.new(str)
getvalue = ->{
if ss.scan /\s*\{\s*/ then getdct[]
elsif ss.scan /\s*\(\s*/ then getary[]
elsif str = getstr[] then str
elsif ss.scan /\s*[)}]\s*/ then nil end
}
getstr = ->{
if str=ss.scan(/\s*"(?:[^"\\]|\\u\d+|\\.)*"\s*/i)
eval str.gsub(/([^\\](?:\\\\)*)#(?=[{#$])/,'\1\#')
end
}
getary = ->{
[].tap do |a|
while v=getvalue[]
a << v
ss.scan /\s*,\s*/
end
end
}
getdct = ->{
{}.tap do |h|
while key = getstr[]
ss.scan /\s*=\s*/
if value=getvalue[] then h[key]=value; ss.scan(/\s*;\s*/) end
end
end
end
}
getvalue[]
end
As an alternative to rolling your own parser from scratch in the future, you might also want to look into the Treetop Ruby library.
Edit: I've replaced the implementation of getstr above with one that should prevent running arbitrary Ruby code inside the eval. For more details, see "Eval a string without interpolation". Seen in action:
#secret = "OH NO!"
$secret = "OH NO!"
##secret = "OH NO!"
puts parse('"\"#{:NOT&&:very}\" bad. \u262E\n##secret \\#$secret \\\\###secret"')
Here's a very quick-and-dirty hack that transforms the syntax into valid Ruby and then evals it. Note that this could be dangerous. More importantly, this will convert all parentheses inside keys and values into square brackets.
def parse(str)
eval(
str
.gsub( /" = (?=[({"])/, '" => ' ) # Dictionary separators become =>
.gsub( /(?<=[)}"]); (?=[)}"])/, ', ' ) # Dictionary semicolons become ,
.tr( '()', '[]' ) # ALL parens become square brackets
)
end
p parse('{ "anArray" = ( "100", "200", "300" ); "aDictionary" = { "aString" = "Something"; }; }')
#=> {"anArray"=>["100", "200", "300"], "aDictionary"=>{"aString"=>"Something"}}