Data in Mbox to JSON or CSV? - ruby

I just downloaded all my Gmail with the new download functionality from Google, and it gives me a large .mbox file. What would be a basic shell of a script to start extracting and processing individual emails from the file?

The book "Mining the Social Web" (O'Reilly, 2nd ed.) by Matthew Russell gives some code for doing this in Python. His code is all on github. You will want the files prefixed with 'mailbox'. https://github.com/ptwobrussell/Mining-the-Social-Web/tree/master/python_code

Check out this GitHub repo - https://github.com/PS1607/mbox-to-json
Also extracts the Attachments for you.
If you want to convert it into CSV instead, change line 55 in src/main.py from df.to_json to df.to_csv

Related

The document \"<?xml version='1.0'?>\\n\" does not have a valid root

I am new to parsing the xml in the ruby and I am stuck with an issue. I'll try my best to explain.
I get the below response from an api
"PK\x03\x04\x14\x00\b\b\b\x00,\x18ET\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1D\x00\x00\x00"
I believe it is .tds format.
I am trying to parse it into a valid xml so this is what I tried.
xml = Nokogiri::XML(response)
which gives me #<Nokogiri::XML::Document:0xf744 name="document">
Then I tried to do Hash.from_xml(xml.to_xml)
But this throws error The document \"<?xml version='1.0'?>\\n\" does not have a valid root
Any idea what am I missing here?
This string starts with "PK", which are initials of Phil Katz, author of the ZIP format. Which means it's a ZIP file. There are certain formats that are ZIP files, but actually follow some further structure conventions, like Java JAR files, but also all OpenDocument formats like .ods, .odt and all MS Office Open XML formats.
Since you are probably expecting an .ods file... While you could unzip it and then use Nokogiri to parse the XML, there's a better way to proceed.
There's an interesting Gem called Roo, that supports all most common spreadsheet formats and produces a nice Ruby API to deal with them: https://github.com/roo-rb/roo
I would recommend you to save the string to a temporary file and then open it with Roo.

File Uploading to Google Drive

I am just starting with the Google Drive API. The quick start is quite handy, but it only explains file listing feature.
Cannot find the file upload feature in the Upload file data page.
The RI documentation for Google::Apis only contains hundreds of new line characters (1128 to be specific), and nothing interesting.
How do I upload files to google drive using the Ruby API?
Here are two upload examples using Drive V2 (one resumable, one not), and here you can see the method you should be using in V3 (create_file instead of insert_file), as well as an explanation of all the possible parameters you can provide.
The first example could be written in V3 like this:
drive_service.create_file({title: 'My Favorite Movie'}, upload_source: 'mymovie.m4v',
content_type: 'video/mp4')
Reference:
Class: Google::Apis::DriveV3
create_file

How to read excel file tibco activities?

I have a requirement to read excel file using tibco palettes.Can any body please throw some lights regarding this. I am basically new to this tibco BW. Please tell me what steps should I follow?
I am assuming you are not referring to CSV files, for which you could use the File Read and Parse activities of BW.
If you want to parse or render a multi-worksheet workbook, you can try publicly available API's such as Apache's POI or commercial API's such as from Aspose to cut your own Java based solution. Then you can use the Java Code or general Java activities to embed and use that code.
And then there's another ready-to-use option available from us: an Excel Plugin for TIBCO BusinessWorks, if you wish to leverage all built-in features of BW (XPath mapping, etc) when parsing or rendering your Excel.
Edit 1:
As per your comment, you can also try the following steps, if you are looking for a more homegrown solution.
Based on one of the (public/commercial) libraries above you can write generic Java Code to parse each cell of each row of each sheet of the workbook. Output should be an XML string. Then create an XSD to match your output. It is at your discretion, which information of the cell you want to read from the workbook - you already are aware of the complexity of the API, I am sure.
Create a BW (sub)process that calls your code from a Java activity, use Parse XML to parse your XML string result into you XSD structure. Configure the End activity to use your XSD and map (copy) your Parse XML result into the End activity.
Then wrap this subprocess into a Custom Activity (General Activities Palette). Create a Custom Palette and now you can re-use what you did in many other BW projects. The path to the custom palettes can be found in TIBCO Designer - Edit- Preferences - General - User Directories
If you add Error Output schemas, you will also get typed error outputs from that custom activity.
HTH,
Hendrik

Simple Excel Scripts For Ruby-Watir?

Can Anyone Explain How to pass different Values(Valid,Invalid,Empty) For Username And Password fields Using Microsoft Excel ?
Like I need to get Values from Excelsheet
I need Script in Ruby
I have Tried to refer But There is no proper Documentation for Excel Scripts..
I would suggest a gem like roo. I am not sure where it is hosted nowadays, but either of these links should help:
http://roo.rubyforge.org/
http://rubygems.org/gems/roo
"roo" gem is the answer for your query.
It is simple to use and very easy to understand
This gem allows you to access the content of
* Open-office spreadsheets (.ods)
* Excel spreadsheets (.xls)
* Google (online) spreadsheets
* Excel’s new file format .xlsx
This link is very helpful
http://roo.rubyforge.org/
http://roo.rubyforge.org/rdoc/index.html

Ruby - Working with Mechanize::File response without saving to disk

I'm working on my first ORM project and am using Mechanize. Here's the situation:
I'm downloading a zip file from my website into a Mechanize::File object. Inside the zip is a file buried three folders deep (folder_1/folder_2/file.txt). I'd like to pull file.txt out of the zip file and return that instead of the zip file itself.
My first thought was to use zip/zipfilesystem. I can do this fine if I save the file to the disk first and use Zip::ZipFile.open(src) but can anyone tell me how/if it is possible to send it over straight from the Mechanize::File.body.
My gut says this has to be possible and I'm just missing something basic. I tried...
zipfile = Mechanize::File.body
Zip::ZipFile.open(zipfile)
...but from what I can tell Zip::ZipFile is only set up to locate a source from a filesystem.
Any direction would be very appreciated and let me know if there are any questions
Thanks in advance
Rob
It seems what you want to do is not possible with rubyzip. From rubyzip library's TODO file:
SUggestion: ZipInputStream/ZipOutputStream should accept an IO object in addition to a filename.

Resources