json files saved with trailing "?" - ruby

I'm saving a bunch of data as a hash and saving it as a json file. Code for the saving:
def write_to_file(id, data)
Dir.chdir(File.dirname(__FILE__)+"/specs")
filename = "./"+id+".json"
File.open("#{filename}", 'w') do |f2|
f2.write(data.to_json)
end
end
I want to save it as id.json, but the file is getting saved with a "?" at the end. For example, 199015806906670?.json where the original value of "id" is 199015806806670.
If I search for 19901580606670 I am also unable to use TAB to autocomplete.
Does anyone know why this is happening?
EDIT:
Sample from file containing the ids:
104184946332304
131321736945390
693134284084652
146974018804301
288608807960773
Code to get these:
url = File.open("newapps/curlist.txt","r")
url.each_line do |line|
func1(line) #func1 calls write_to_file, no changes to line in func1
end

As has been mentioned in the comments you likely have a newline as the last character of your id field. The character (being invalid) is being replaced with a question mark.
Use this to remove the newline...
filename = "./"+id.chomp+".json"

Related

What's the correct way to read an array from a csv field in Ruby?

I am trying to save some class objects to a csv file, everything works fine. I can save and read back from the csv file, there is only a 'minor' problem with an attribute that is an Array of Strings.
When I save it to the file it appears like this: "[""Dan Brown""]"
CSV.open('documents.csv', "w") do |csv|
csv << %w[ISBN Titre Auteurs Type Disponibilité]
#docs.each { |doc|
csv << [doc.isbn, doc.titre, doc.auteurs, doc.type, doc.empruntable ? "Disponible" : "Emprunté"]
}
end
And when I try to extract the data from the file I end up with something like this: ["[\"Dan Brown\"]"].
table = CSV.parse(File.read("documents.csv"), headers: true)
table.each do |row|
doc = Document.new(row['Titre'], row['ISBN'], row['Type'])
doc.auteurs << row['Auteurs'] #This the array where there is a 'problem'
if row['Disponibilité'] == "Disponible"
doc.empruntable = true
else
doc.empruntable = false
end
#docs.push(doc) #this an array where I save my objects
end
I tried many things to solve this but without any luck. I would be thankful if you can help me find a solution.
Since a CSV file, by it's nature, contains in its fields only strings, not arrays or other data types, the CSV class is applying the to_s method of the objects to turn them into a string before putting them into the CSV.
When you later read them back, you just get this - the string representation of what once had been your array. The only one who knows that 'Auteurs' should end up as an array of strings, is the application, i.e. you.
Hence on reading the CSV, after having extracted the autheurs string, you need to convert it manually back to an Array, because there is no automatic "inverse method" to reverse the to_s.
A cheap, but dangerous way to do it, is to use eval, which indeed would reconstruct your array. However, you need to be sure that nobody had a chance to fiddle manually with the CSV data, because an eval allows sneaking in arbitrary code.
A safer way would be to either write your own conversion function to and from String representation, or use a format such as YAML or JSON for representing the Array as String, instead of using to_s.

Is there a way to read fixed length files using csv.reader() module in Python 2.x

I have a fixed length file like:
0001ABC,DEF1234
The file definition is:
id[1:4]
name[5:11]
phone[12:15]
I need to load this data into a table. I tried to use CSV module and defined the fixed lengths of each field. It is working fine except for the name field.
For the NAME field, only the value till ABC is getting loaded. The reason is:
As I am using CSV module, it is treating 0001ABC, as a value and only parsing till that.
I tried to use escapechar = ',' while reading the file, but it removes the ',' from the data. I also tried quoting=csv.QUOTE_ALL but that didnt work either.
with open("xyz.csv") as csvfile:
readCSV = csv.reader(csvfile)
writeCSV = open("sample_csv", 'w');
output = csv.writer(writeCSV, dialect='excel', lineterminator="\n")
for row in readCSV:
print(row) # to debug #
data= str(row[0])
print(data) # to debug #
id = data[0:4]
name = data[5:11]
phone = data[12:15]
output.writerow([id,name,phone])
writeCSV.close()
Output of the print commands:
row: ['0001ABC','DEF1234']
data: 0001ABC
Ideally, I expect to see the entire set 0001ABC,DEF1234 in the variable: data.
I can then use the parsing as mentioned in the code to break it into different fields.
Can you please let me know where I am going wrong?

How to use get_object in ruby for AWS?

I am very new to ruby. I am able to connect to AWS S3 using ruby. I am using following code
filePath = '/TMEventLogs/stable/DeviceWiFi/20160803/1.0/20160803063600-2f9aa901-2ce7-4932-aafd-f7286cdb9871.csv'
s3.get_object({bucket: "analyticspoc", key:"TMEventLogs/stable/DeviceWiFi/20160803/1.0/"}, target:filePath ) do |chunk|
puts "1"
end
In above code s3 is client. "analyticspoc" is root bucket. My path to csv file is as follows All Buckets /analyticspoc/TMEventLogs/stable/DeviceWiFi/20160803/1.0/20160803063600-2f9aa901-2ce7-4932-aafd-f7286cdb9871.csv.
I have tried above code. I above code I was getting error Error getting objects: [Aws::S3::Errors::NoSuchKey] - The specified key does not exist. Using above code I want to read the contents of a file. How to do that ? Please tell me what is the mistake in above code
Got the answer. You can use list_objects for accessing array of file names in chunk(1000 at a time) where as get_object is used for accessing the content of a single file as follows
BUCKET = "analyticspoc"
path = "TMEventLogs/stable/DeviceWiFi/20160803/1.0/"
s3.list_objects(bucket:BUCKET, prefix: path).each do |response|
contents = response.contents
end
file_name = "TMEventLogs/stable/DeviceWiFi/20160803/1.0/012121212121"
response = s3.get_object(bucket: BUCKET, key: file_name)
As far as I can tell you're passing in the arguments incorrectly. It should be a single options hash according to the documentation for get_object:
s3.get_object(
bucket: "analyticspoc",
key: "TMEventLogs/stable/DeviceWiFi/20160803/1.0/",
target: filePath
) do |chunk|
puts "1"
end
I believe it was trying to use your hash as a string key which is obviously not going to work.
With Ruby the curly braces { } are only necessary in method calls if additional arguments follow that need to be in another hash or are non-hash in nature. This makes the syntax a lot less ugly in most cases where options are deliberately last, and sometimes first and last by virtue of being the only argument.

How to add a namespace to existing xml file

I want to open this file and get all elements that start with us-gaap.
ftp://ftp.sec.gov/edgar/data/916789/0001558370-15-001143.txt
To get elements I tried like this:
str = '<html><body><us-gaap:foo>foo</us-gaap:foo></body></html>'
doc = Nokogiri::XML(File.read(str))
doc.xpath('//us-gaap:*')
Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //us-gaap:*
from /Users/ironsand/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/nokogiri-1.6.7.2/lib/nokogiri/xml/searchable.rb:165:in `evaluate'
doc.namespaces returns {}, so I think I have to add namespace us-gaap.
There are some questions about "adding namespace with Nokogiri", but it looks like about how to create a new XML document, not how to add a namespace to existing documents.
How can I add a namespace to existing document?
I know I can remove the namespace by Nokogiri::XML::Document#remove_namespaces!, but I don't want to use it because it removes also necesarry information.
You have asked an XY Problem. You think that the problem is that you need to add a missing namespace; the real problem is that the file you're trying to parse is not valid XML.
require 'nokogiri'
doc = Nokogiri.XML( IO.read('0001558370-15-001143.txt') )
doc.errors.length
#=> 5716
For example, the <ACCEPTANCE-DATETIME> 'element' opened on line 3 is never closed, and on line 16 there is a raw ampersand in the text:
STANDARD INDUSTRIAL CLASSIFICATION: ELECTRIC HOUSEWARES & FANS [3634]
which ought to be escaped as an entity.
However, the document has valid XML fragments within it! In particular, there is one XML document that defines xmlns:us-gaap namespace, from lines 27243-49312. Let's extract just that, using only the knowledge that the root element defines the namespace we want, and the assumptions that no element with the same name is nested within the document, and that the root element does not have an unescaped > character in any attribute. (These assumptions are valid for this file, but may not be valid for every XML file.)
txt = IO.read('0001558370-15-001143.txt')
gaap_finder = %r{(<(\w+) [^>]+xmlns:us-gaap=.+?</\2>)}m
txt.scan(gaap_finder) do |xml,_|
doc = Nokogiri.XML( xml )
gaaps = doc.xpath('//us-gaap:*')
p gaaps.length
#=> 569
end
The code above handles the case where there may be more than one XML document in the txt file, though in this case there is only one.
Decoded, the gaap_finder regex says this:
%r{...}m — this is a regular expression (that allows slashes in it, unescaped) with "multiline mode", where a period will match newline characters
(...) — capture everything we find
< — start with a literal "less-than" symbol
(\w+) — find one or more word characters (the tag name), and save them
— the word characters must be followed by a space (important to avoid capturing the <xsd:xbrl ...> element in this file)
[^>]+ — followed by one or more characters that is NOT a "greater-than" symbol (to ensure that we stay in the same element that we started in)
xmlns:us-gaap\s*= — followed by this literal namespace declaration (which may have whitespace separating it from the equals sign)
.+? — followed by anything (as little as possible)...
</\2> — ...up until you see a closing tag with the same name as what we captured for the name of the starting tag
Because of the way scan works when the regex has capturing groups, each result is a two-element array, where the first element is the entire captured XML and the second element is the name of the tag that we captured (which we "discard" by assigning it to the _ variable).
If you want to be less magic about your capturing, the text file format appears to always wrap each XML document in <XBRL>...</XBRL>. So, you could do this to process every XML file (there are seven, five of which do not happen to have any us-gaap namespaces):
txt = IO.read('0001558370-15-001143.txt')
xbrls = %r{(?<=<XBRL>).+?(?=</XBRL>)}m # find text inside <XBRL>…</XBRL>
txt.scan(xbrls) do |xml|
doc = Nokogiri.XML( xml )
if doc.namespaces["xmlns:us-gaap"]
gaaps = doc.xpath('//us-gaap:*')
p gaaps.length
end
end
#=> 569
#=> 0 (for the XML Schema document that defines the namespace)
I couldn't figure out how to update an existing doc with a new namespace, but since Nokogiri will recognize namespaces on the root element, and those namespaces are, syntactically, just attributes, you can update the document with a new namespace declaration, serialize the doc to a string, and re-parse it:
str = '<html><body><us-gaap:foo>foo</us-gaap:foo></body></html>'
doc_without_ns = Nokogiri::XML(str)
doc_without_ns.root['xmlns:us-gaap'] = 'http://your/actual/ns/here'
doc = Nokogiri::XML(doc_without_ns.to_xml)
doc.xpath("//us-gaap:*")
# Returns [#<Nokogiri::XML::Element:0x3ff375583f9c name="foo" namespace=#<Nokogiri::XML::Namespace:0x3ff375583f24 prefix="us-gaap" href="http://your/actual/ns/here"> children=[#<Nokogiri::XML::Text:0x3ff375583768 "foo">]>]

Ruby String to access an object attribute

I have a text file (objects.txt) which contains Objects and its attributes.
The content of the file is something like:
Object.attribute = "data"
On a different file, I am Loading the objects.txt file and if I type:
puts object.attribute it prints out data
The issue comes when I am trying to access the object and/or the attribute with a string. What I am doing is:
var = "object" + "." + "access"
puts var
It prints out object.access and not the content of it "data".
I have already tried with instance_variable_get and it works, but I have to modify the object.txt and append an # at the beginning to make it an instance variable, but I cannot do this, because I am not the owner of the object.txt file.
As a workaround I can parse the object.txt file and get the data that I need but I don't want to do this, as I want take advantage of what is already there.
Any suggestions?
Yes, puts is correctly spitting out "object.access" because you are creating that string exactly.
In order to evaluate a string as if it were ruby code, you need to use eval()
eg:
var = "object" + "." + "access"
puts eval(var)
=> "data"
Be aware that doing this is quite dangerous if you are evaluating anything that potentially comes from another user.

Resources