Hi I convert a PDF to a txt file in Ruby 1.9.3
Here is part of the txt file:
[["Rate", "Card", "February", "29,", "2012"]]
[["Termination", "Color", "Test", "No", "Rate", "Currency", "Notes"]]
[["x", "A", "CAMEL", "56731973573", "$", "0.1400", "USD", "30/45/100%"]]
["y", "A", "CARDINAL", "56731972501", "$", "0.1400", "USD", "30/45/100%"]]
[["z", "A", "CARNELIAN", "56731971654", "$", "0.1400", "USD", "30/45/100%"]]
.....
....
[["Rate", "Card", "February", "29,", "2012"]]
[["Termination", "Color", "Test", "No", "Rate", "Currency", "Notes"]]
I store every line in a different array, but the problem is that I don't want to read the two first lines which appears lots of times in my txt file, because those lines are the header in every page on the pdf. Any idea about how to do that? Thanks!
You can read file into array and reject lines you do not need:
rejected = [
'[["Rate", "Card", "February", "29,", "2012"]]',
'[["Termination", "Color", "Test", "No", "Rate", "Currency", "Notes"]]',
]
lines = File.readlines('/path/to/file').reject { |line| rejected.include? line }
Related
How can I get iso code and name from Money::Currency.all in Ruby
How can I get iso code and name from Money::Currency.table in Ruby
Money::Currency.all is Array ,I used map ,but don't work
Money::Currency.table is Hash ,I have not find one method to handle the problem
this is Money::Currency.all
{
"id": "usd",
"alternate_symbols": [
"US$"
],
"decimal_mark": ".",
"disambiguate_symbol": "US$",
"html_entity": "$",
"iso_code": "USD",
"iso_numeric": "840",
"name": "United States Dollar",
"priority": 1,
"smallest_denomination": 1,
"subunit": "Cent",
"subunit_to_unit": 100,
"symbol": "$",
"symbol_first": true,
"thousands_separator": ","
},
{
"id": "eur",
"alternate_symbols": [],
"decimal_mark": ",",
"disambiguate_symbol": null,
"html_entity": "€",
"iso_code": "EUR",
"iso_numeric": "978",
"name": "Euro",
"priority": 2,
"smallest_denomination": 1,
"subunit": "Cent",
"subunit_to_unit": 100,
"symbol": "€",
"symbol_first": true,
"thousands_separator": "."
},
this is Money::Currency.table
"aed": {
"priority": 100,
"iso_code": "AED",
"name": "United Arab Emirates Dirham",
"symbol": "د.إ",
"alternate_symbols": [
"DH",
"Dhs"
],
"subunit": "Fils",
"subunit_to_unit": 100,
"symbol_first": false,
"html_entity": "",
"decimal_mark": ".",
"thousands_separator": ",",
"iso_numeric": "784",
"smallest_denomination": 25
},
This works with money-6.6.1 (the one I currently use). Probably works for the newer versions too.
Money::Currency.all.map { |m| [m.iso_code, m.name] } # => [["USD", "United States Dollar"], ...]
Money::Currency.table.values.map { |m| [m[:iso_code], m[:name]] }
# or
Money::Currency.table.values.map { |m| [m["iso_code"], m["name"]] }
You can use pluck to directly get values from an ActiveRecord Collection
In your case you can do
Money::Currency.all.pluck(:iso_code, :name)
You can read more about pluck from here
I am using following EL
jsonPath($, "$array.map({id: value.get('id'), type: value.get('type') })")
which produces the next variable ...
But the key(id) is not kept unique ?!
[{
"id": "1",
"type": "1"
},
{
"id": "1",
"type": "2"
},
{
"id": "2",
"type": "1"
}]
What can i use in snaplogic expression language or a snap to get the following unique key array :
[{
"id": "1",
"types": ["1", "2"],
{
"id": "2",
"type": ["1"]
}]
Any ideas?
Use Group By Fields snap to group based on id and then use a simple mapper to create the desired JSON. Please note that you have to sort the incoming documents by id before doing the group by.
Sample Pipeline
Final Mapper expressions
$groupBy.id mapped to id
jsonPath($, "$groups[*].type") mapped to types
Resulting output
I have a quite large array of hashes (stored in "#hash["response"]["results"])" returned by my program in JSON format.
I have seen several examples on Stack Overflow on how to convert a simple hash to CSV format, however I haven't been able to find any complex examples of doing it with a larger dataset.
I would like to use the hash keys ("pluginID", "ip", "pluginName", etc.) as the CSV headers and the hash values ("11112", "100.100.100.100", "Name for plugin here", etc.) for the CSV row content.
Note that the "repository" key is a hash itself and for that I'd like to just use the name, as opposed to the ID or description.
Any help is greatly appreciated. I have played with some code samples following the Ruby CSV standard library instructions but I am not even getting close.
#hash = '{
"type": "regular",
"response": {
"Records": "137",
"rRecords": 137,
"startOffset": "0",
"endOffset": "500",
"matchingDataElementCount": "-1",
"results": [
{ "pluginID": "11112",
"ip": "100.100.100.100",
"pluginName": "Name for plugin here",
"firstSeen": "1444208776",
"lastSeen": "1451974232",
"synopsis": "synopsis contents",
"description": "Full description would go here... Full description would go here... Full description would go here... Full description would go here... Full description would go here...",
"solution": "",
"version": "Revision: 1.51",
"pluginText": "output text here",
"dnsName": "name",
"repository": {
"id": "1",
"name": "Name Here As Well",
"description": "Description here also"
},
"pluginInfo": "11112 (0/6) Name for plugin here"
},
{ "pluginID": "11113",
"ip": "100.100.100.100",
"pluginName": "Name for plugin here",
"firstSeen": "1444455329",
"lastSeen": "1451974232",
"synopsis": "Tsynopsis contents",
"description": "Full description would go here... Full description would go here... Full description would go here... Full description would go here... Full description would go here...",
"solution": "",
"version": "Revision: 1.51",
"pluginText": "output text here",
"dnsName": "name here",
"repository": {
"id": "1",
"name": "Name Here As Well",
"description": "Description here also"
},
"pluginInfo": "11112 (0/6) Name for plugin here"
},
{ "pluginID": "11113",
"ip": "100.100.100.100",
"pluginName": "Name for plugin here : Passed",
"firstSeen": "1444455329",
"lastSeen": "1444455329",
"synopsis": "nope, more synopsis data here",
"description": "Uanother different description",
"solution": "",
"version": "Revision: 1.14",
"pluginText": "",
"dnsName": "name here",
"repository": {
"id": "1",
"name": "Name Here As Well",
"description": "Description here also"
},
"pluginInfo": "11114 (0/6) Name for plugin here : Passed"
},
{ "pluginID": "11115",
"ip": "100.100.100.100",
"pluginName": "Name for plugin here",
"firstSeen": "1444455329",
"lastSeen": "1444455329",
"synopsis": "Tsynopsis contents",
"description": "Full description would go here... Full description would go here... Full description would go here... Full description would go here... Full description would go here...",
"solution": "",
"version": "Revision: 1.51",
"pluginText": "output text here",
"dnsName": "",
"repository": {
"id": "1",
"name": "Name Here As Well",
"description": "Description here also"
},
"pluginInfo": "11116 (0/6) Name for plugin here"
}
]
},
"code": 0,
"msg": "",
"msg_det": [],
"time": 1454733549
}'
This is pretty easy. There are essentially five steps:
Parse the JSON into a Ruby Hash.
Get the key names from the first hash in the "results" array† and write them to the CSV file as headers.
Iterate over the "results" array and for each hash:
Replace the "repository" hash with its "name" value.
Extract the values in the same order as the headers and write them to the CSV file.
The code looks something like this:
require 'json'
require 'csv'
json = '{
"type": "regular",
"response": {
...
},
...
}'
# Parse the JSON
hash = JSON.parse(json)
# Get the Hash we're interested in
results = hash['response']['results']
# Get the key names to use as headers
headers = results[0].keys
filename = "/path/to/output.csv"
CSV.open(filename, 'w', headers: :first_row) do |csv|
# Write the headers to the CSV
csv << headers
# Iterate over the "results" hashes
results.each do |result|
# Replace the "repository" hash with its "name" value
result['repository'] = result['repository']['name']
# Get the values in the same order as the headers and write them to the CSV
csv << result.values_at(*headers)
end
end
†This code (headers = results[0].keys) assumes that the first "results" hash will have all of the keys you want in the CSV. If that's not the case you need to either:
Specify the headers explicitly, e.g.:
headers = %w[ pluginId ip pluginName ... ]
Loop over all of the hashes and build a list of all of their keys:
headers = results.reduce([]) {|all_keys, result| all_keys | result.keys }
I used solution like it:
stats_rows = #hash["responce"]["results"].each_with_object([]) do |e, memo|
memo << [e["pluginID"], e["ip"], e["pluginName"]]
end
CSV.generate do |csv|
csv << ["pluginID", "ip", "pluginName"] #puts your hash keys into SCV
stats_rows.each do |row| #values
csv << row
end
end
I am working with an API which accepts some JSON objects (sent as post request) and fails others based on certain criteria.
I am trying to compile a "log" of the objects which have failed and ones which have been validated successfully so I don't have to manually copy and paste them each time. (There are hundreds of objects).
Basically if the API returns "false", I want to push that object into a file, and if it returns true, all those objects go into another file.
I have tried to read a bunch of documentation / blogs on "select, detect, reject" etc enumerators but my problem is very different from the examples given.
I have written some pseudo code in my ruby file below and I think I'm going along the right lines, but need a bit of guidance to complete the task:
restaurants = JSON.parse File.read('pretty-minified.json')
restaurants.each do |restaurant|
create_response = HTTParty.post("https://api.hailoapp.com/business/create",
{
:body => restaurant.to_json,
:headers => { "Content-Type" => "text", "Accept" => "application/x-www-form-urlencoded", "Authorization" => "token #{api_token}" }
})
data = create_response.to_hash
alert = data["valid"]
if alert == false
# select restaurant json objects which return false and push into new file
# false_rest = restaurants.detect { |r| r == false }
File.open('false_objects.json', 'w') do |file|
file << JSON.pretty_generate(false_rest)
else
# select restaurant json objects which return true and push into another file
File.open('true_objects.json', 'w') do |file|
file << JSON.pretty_generate()
end
end
An example of the output (JSON) from the API is as follows:
{"id":"102427","valid":true}
{"valid":false}
The JSON file is basically an huge array of hashes (or objects), here is a short excerpt:
[
{
"id": "223078",
"name": "3 South Place",
"phone": "+442032151270",
"email": "3sp#southplacehotel.com",
"website": "",
"location": {
"latitude": 51.5190536,
"longitude": -0.0871038,
"address": {
"line1": "3 South Place",
"line2": "",
"line3": "",
"postcode": "EC2M 2AF",
"city": "London",
"country": "UK"
}
}
},
{
"id": "210071",
"name": "5th View Bar & Food",
"phone": "+442077347869",
"email": "waterstones.piccadilly#elior.com",
"website": "http://www.5thview.com",
"location": {
"latitude": 51.5089594,
"longitude": -0.1359897,
"address": {
"line1": "Waterstone's Piccadilly",
"line2": "203-205 Piccadilly",
"line3": "",
"postcode": "W1J 9HA",
"city": "London",
"country": "UK"
}
}
},
{
"id": "239971",
"name": "65 & King",
"phone": "+442072292233",
"email": "hello#65king.com",
"website": "http://www.65king.com/",
"location": {
"latitude": 51.5152533,
"longitude": -0.1916538,
"address": {
"line1": "65 Westbourne Grove",
"line2": "",
"line3": "",
"postcode": "W2 4UJ",
"city": "London",
"country": "UK"
}
}
}
]
Assuming you want to filter by emails, ending with elior.com (this condition might be easily changed):
NB! The data above looks like a javascript var, it’s not a valid ruby object. I assume you just got it from somewhere as a string. That’s why json:
require 'json'
array = JSON.parse(restaurants) # data is a string: '[{....... as you received it
result = array.group_by do |e|
# more sophisticated condition goes here
e['email'] =~ /elior\.com$/ ? true : false
end
File.open('false_objects.json', 'w') do |file|
file << JSON.pretty_generate(result[false])
end
File.open('true_objects.json', 'w') do |file|
file << JSON.pretty_generate(result[true])
end
There is a hash in result, containing two elements:
#⇒ {
# true: [..valids here ..],
# false: [..invalids here..]
# }
I'm currently downloading a ton of jira issues to generate a report. Currently the 'full data' file has a ton of individual records like this:
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
However, because I'm downloading multiple versions and appending to the same file I end up with this kind of structure.
[
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
]
[
{
"key": "552",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Graphical Issue",
"closedDate": "2014-10-13T09:01:23.000+0200",
"flag": null,
"fixVersionID": "456",
"fixVersionName": "2.8"
}
]
What I want to do is to count the number of records with a specific date and then doing the same looping through a starting date to an end date using jq
But, I can't figure out how to:
Flatten the records so that they are one array not two
Strip the T09:01:23.000+0200 from the closedDate value
Count the number of objects with a specific date value such as 2014-10-13
You have multiple independent inputs. To be able to combine them in any meaningful way, you'll have to slurp up the input. The inputs will be treated as an array of the inputs. Then you could combine them into a single array by adding them.
Since the dates are all in a certain fixed format, you can take substrings of the dates.
"2014-10-13T09:01:23.000+0200"[:10] -> "2014-10-13"
Given that, you can then filter by the date you want and count using the length filter.
add | map(select(.closedDate[:10]=="2014-10-13")) | length
e.g.,
$ cat input.json
[
{
"key": "645",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Crash when saving document",
"closedDate": "2014-10-03T09:01:23.000+0200",
"flag": null,
"fixVersionID": "123",
"fixVersionName": "2.7"
}
]
[
{
"key": "552",
"type": "Bug",
"typeid": "1",
"status": "Closed",
"summary": "Graphical Issue",
"closedDate": "2014-10-13T09:01:23.000+0200",
"flag": null,
"fixVersionID": "456",
"fixVersionName": "2.8"
}
]
$ jq -s 'add | map(select(.closedDate[:10]=="2014-10-13")) | length' input.json
1
For question 1 and 2:
$ echo -e "[\n$(sed '/^[][]$/d;/closedDate/s/\(T[^"]*\)//g' json)\n]" > flat-json
To count the number for special day:
$ grep "closedDate" flat-json | grep "2014-10-13" | wc -l