In my previous question, I had a main CSV file and I had two extra CSVs. I wanted to go through both of these CSVs and remove the entries from the main CSV if something matched within these two files. My solution ended up looking like this:
require 'csv'
main_list = CSV.table('./script/main.csv', headers: true)
already_called = CSV.table('./script/already_called.csv', headers: true)
already_mailed = CSV.table('./script/already_mailed.csv', headers: true)
updated_list = main_list.delete_if do |main_row|
already_called.any? { |called_row| main_row[:id] == called_row[:id] }
already_mailed.any? { |mailed| main_row[:curr_addr_line_1] == mailed[:street_address_1] }
end
CSV.open('./script/final_list.csv', 'w', headers: main_list.headers, write_headers: true) do |writer|
writer << updated_list
end
I believe this works, and now I want to write to final_list.csv with the data matching the headers, but this is not the case.
My ideal format is something along the lines of
id,nm,last_nm,first_nm,birth_dt, ......
"10001","PERSON, PersonA","PERSON","PersonA","5/5/1999"......
"10031","PERSON, PersonB","PERSON","PersonB","3/1/1901"......
However with my code above, I get something like this:
# redacted some headers
# headers seem to be fine?
id,nm,last_nm,first_nm,birth_dt, ......
"1003984,""PERSON, PersonA"",PersonA,PERSON,5/5/1999,58,MALE,ENGLISH,,
XXX-XXX-XXXX,XXX#XXX.COM,PO BOX ???,,CITY,STATE,ANOTHER PLACE,UNITED STATES,
???,""???"",4082000,INSURANCE,2,INSURANCE,""408,200,001.00"",MEDICARE A AND B,
""11,043,785.00"",2500084,ADDRESS,1,COMMERCIAL,
""27,083.00"",ADDRESS,I50.32,SYMPTOMS,???,DATE,OFFICE VISIT,""PERSON"",6/18/2021,""PERSON"",
0,,0,0,,,,""PERSON"",1,55.0,10/7/2020,,,
""[""""???"""",""""???"""",""""???""""]"",""[""""diag""""]"",
""[""""MED"""",""""MED"""",""""MED"""",""""MEDS"""",""""XYZ MED"""",
""""XYZ MEDS"""",""""MEDS"""",""""MEDIS"""",""""XYZ MEDS"""""",,,,,,
............ and more
I've redacted a lot of information above but I wanted to show the format I was getting. When I upload this to Google Sheets, it all outputs into one row with the headers being in the correct place, at least. However, it seems my entire output is just in one very long row. What's especially confusing are the long quotes at the end.
How do I match up the headers and columns correctly?
I've looked around for a while and I'm unsure how to proceed. Ruby is not my language of choice so I really appreciate some guidance!
Thanks to #Stephan I ended up with this and it worked!
CSV.open('./script/final_list.csv', 'w', headers: main_list.headers, write_headers: true) do |writer|
updated_list.each { |row| writer << row }
end
I have a requirement to proxy a request in a Rails app. I was hoping I could proxy it with chunking (so, 1 chunk received, one chunk is sent). The app is working fine without chunking (load the request into memory, and transmit).
Here is my code to proxy the chunks through to the end-client:
self.response.headers['Last-Modified'] = Time.now.ctime.to_s
self.response_body = Enumerator.new do |y|
client = HTTPClient.new
http_response = client.get(proxy_url, nil, headers) do |chunk|
y << chunk
end
end
The problem is, I can't inspect "http_response" until all the chunks have been received, thus I can't set the headers based on the headers of the client.
What I'm trying to do is transmit the headers returned from the client before the first chunk is sent. Is this possible?
If not, is this pattern possible in any other Ruby HTTP client gem?
Update
I have a solution for you.
If you call get_async instead, it will retun immediately with an HTTPClient::Connection object that is updated with the header information as soon as it is received. This code sample demonstrates.
The patch to HTTPClient::Connection is almost certainly not necessary for you, but it lets you write things like conn.queue.size? and conn.queue.empty?.
conn.pop blocks until the response (or exception) has been pushed to the queue by the async thread and then returns the normal HTTP::Message object. (Note that, if you are using the monkey patch, you can use conn.queue.empty? to see if pop is going to block.)
resp.content returns an IO object which is a pipe read endpoint, and can be called as soon as pop hs returned. The other end is written by the async thread as the data arrives, and you can read the entire content in one go or in whatever size chunks you like using read.
require 'httpclient'
class HTTPClient::Connection
attr_reader :queue
end
client = HTTPClient.new
conn = client.get_async 'http://en.wikipedia.org/wiki/Ruby_(programming_language)'
resp = conn.pop
resp.header.all.each { |name, val| puts "#{name}=#{val}" }
puts
pipe = resp.content
while chunk = pipe.read(8192)
print chunk
end
You could parse the first chunk you receive to extract the headers, but I suggest you call head first to get the header information. Then do the get as well.
(Updated - the first chunk holds the beginning of the content so this won't work.)
I want to retrieve bibtex data (for building a bibliography) by sending a DOI (Digital Object Identifier) to http://www.crossref.org from within matlab.
The crossref API suggests something like this:
curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1038/nrd842
based on this source.
Another example from here suggests the following in ruby:
open("http://dx.doi.org/10.1038/nrd842","Accept" => "text/bibliography; style=bibtex"){|f| f.each {|line| print line}}
Although I've heard ruby rocks I want to do this in matlab and have no clue how to translate the ruby message or interpret the crossref command.
The following is what I have so far to send a doi to crossref and retrieve data in xml (in variable retdat), but not bibtex, format:
clear
clc
doi = '10.1038/nrd842';
URL_PATTERN = 'http://dx.doi.org/%s';
fetchurl = sprintf(URL_PATTERN,doi);
numinputs = 1;
www = java.net.URL(fetchurl);
is = www.openStream;
%Read stream of data
isr = java.io.InputStreamReader(is);
br = java.io.BufferedReader(isr);
%Parse return data
retdat = [];
next_line = toCharArray(br.readLine)'; %First line contains headings, determine length
%Loop through data
while ischar(next_line)
retdat = [retdat, 13, next_line];
tmp = br.readLine;
try
next_line = toCharArray(tmp)';
if strcmp(next_line,'M END')
next_line = [];
break
end
catch
break;
end
end
%Cleanup java objects
br.close;
isr.close;
is.close;
Help translating the ruby statement to something matlab can send using a script such as that posted to establish the communication with crossref would be greatly appreciated.
Edit:
Additional constraints include backward compatibility of the code (back at least to R14) :>(. Also, no use of ruby, since that solves the problem but is not a "matlab" solution, see here for how to invoke ruby from matlab via system('ruby script.rb').
You can easily edit urlread for what you need. I won't post my modified urlread function code due to copyright.
In urlread, (mine is at C:\Program Files\MATLAB\R2012a\toolbox\matlab\iofun\urlread.m), as the least elegant solution:
Right before "% Read the data from the connection." I added:
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');
The answer from user2034006 lays the path to a solution.
The following script works when urlread is modified:
URL_PATTERN = 'http://dx.doi.org/%s';
doi = '10.1038/nrd842';
fetchurl = sprintf(URL_PATTERN,doi);
method = 'post';
params= {};
[string,status] = urlread(fetchurl,method,params);
The modification in urlread is not identical to the suggestion of user2034006. Things worked when the line
urlConnection.setRequestProperty('Content-Type','application/x-www-form-urlencoded');
in urlread was replaced with
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');
I am trying to save compressed strings to a file and load them later for use in the game. I kept getting "in 'finish': buffer error" errors when loading the data back up for use. I came up with this:
require "zlib"
def deflate(string)
zipper = Zlib::Deflate.new
data = zipper.deflate(string, Zlib::FINISH)
end
def inflate(string)
zstream = Zlib::Inflate.new
buf = zstream.inflate(string)
zstream.finish
zstream.close
buf
end
setting = ["nothing","nada","nope"]
taggedskills = ["nothing","nada","nope","nuhuh"]
File.open('testzip.txt','wb') do |w|
w.write(deflate("hello world")+"\n")
w.write(deflate("goodbye world")+"\n")
w.write(deflate("etc")+"\n")
w.write(deflate("etc")+"\n")
w.write(deflate("Setting: name "+setting[0]+" set"+(setting[1].class == String ? "str" : "num")+" "+setting[1].to_s)+"\n")
w.write(deflate("Taggedskill: "+taggedskills[0]+" "+taggedskills[1]+" "+taggedskills[2]+" "+taggedskills[3])+"\n")
w.write(deflate("etc")+"\n")
end
File.open('testzip.txt','rb') do |file|
file.each do |line|
p inflate(line)
end
end
It was throwing errors at the "Taggedskill:" point. I don't know what it is, but trying to change it to "Skilltag:", "Skillt:", etc. continues to throw a buffer error, while things like "Setting:" or "Thing:" work fine, while changing the setting line to "Taggedskill:" continues to work fine. What is going on here?
In testzip.txt, you are storing newline separated binary blobs. However, binary blobs may contain newlines by themselves, so when you open testzip.txt and split it by line, you may end up splitting one binary blob that inflate would understand, into two binary blobs that it does not understand.
Try to run wc -l testzip.txt after you get the error. You'll see the file contains one more line, than the number of lines you are putting in.
What you need to do, is compress the whole file at once, not line by line.
I'm trying to build a desktop client that manages some downloads with Ruby. I would like to know how to go about trying to identify how much of the data is downloaded and the size of the entire data that is to be downloaded.
Im trying to do this with Ruby so any help would be useful.
Thanks in advance.
Like Wayne said in his comment, it depends on the protocol that is used to transfer the files. With HTTP for example, the HTTP response will include a Content-Length header which will tell you the length of the file that you are downloading. After you know that you will have to keep track of the number of bytes that you've read from the HTTP connection.
Something like this seems to work (for HTTP), but I wouldn't be surprised if it could be done more elegantly:
require 'net/http'
url = URI.parse('http://www.google.com/index.html')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) do |http|
http.request(req) do |res|
remaining = res.content_length
puts "total length: #{remaining}"
res.read_body do |segment|
puts "read #{segment.length} bytes"
remaining = remaining - segment.length
puts "#{remaining} bytes remaining"
end
end
end
www.google.com/index.html is a bad example since the content gets returned in one segment, but try it on a larger object and you should see multiple "read..." lines.
If you're using Net::HTTP then the length of whatever you're requesting should be in the response header. Net::HTTP mixin NET::HTTPHeader, in it you'll find content_length(). Although it only works if the size is determined before the transfer happens.
Net::HTTPResponse has a method that reads the body in chunks, so you can use that to determine the progress. Start at 0 and add the length of each chunk, compare it to the total size and you're done.
http.request_get('/index.html') {|res|
res.read_body do |segment|
print segment
end
} #Example taken from Ruby-Documentation
If you're using FTP then it should be easier through NET::FTP. Connect to the server, get the size of a given file with size(filename), and then download the file with get, getbinaryfile or gettextfile.
This is the signature of the get method: get(remotefile, localfile = File.basename(remotefile), blocksize = DEFAULT_BLOCKSIZE) {|data| ...}
ftp.get('file.something', 'file.something.local', 1024){ |data|
puts "Downloaded 1024 more bytes"
}