how to split, remove part of data in column with ; - ruby

I am using the Spreadsheet gem.
My code is:
book = Spreadsheet.open 'excel-file.xls'
sheet = book.worksheet 0
book.write 'output-file.xls'
I want to remove data that comes after ";" in a column:
FULTON BANK NA;FULTON BANK
I just want it to be FULTON BANK NA for example.
Also, I want to leave price data like this: $78,000.00 and want to strip
all other data from a specific column:
MORTGAGE - CORPORATE;($78,000.00)
I just want it to be $78,000.00 for example.

You could do it this way:
s = 'FULTON BANK NA;FULTON BANK'
s = s[/[^;]+/]
that will leave every before the first semicolon in s. Or you could it like this:
s = s.split(';')[0]
Or
s.gsub!(/;.*/, '') # This modifies s in place
For the second one, it depends on the format of your data but you could start with this:
s = 'MORTGAGE - CORPORATE;($78,000.00)'
s = s[/\(([^)]+)\)/, 1]
Or, if the last component may or may not have parentheses, you could do something like this:
s = s.split(';')[-1].tr('()', '')
That will split s into pieces at the semicolons (split(';')), take the last component ([-1]), and then remove any parentheses that there (.tr('()', '')).

Related

Excel Power Query: How to Combine All List Items into Single Row

I have a query to the Cognitive text keyphase API from Microsoft from '16 Excel Power Query - getting keywords from tweets. Works fine.
However, the JSON doc that's returned per query is converted by Power Query into a list of ~1-5 rows.
In the case of the pic, I want all responses returned to be in one cell/row, regardless of the number of items returned.
Here is my full M query (you need to put your own key in) if you're interested.
let
TweetCognitive = (TweetID as text, TweetText as text) =>
let
JsonRecords = Text.FromBinary(Json.FromValue([id=TweetID, text=TweetText])),
JsonRequest = "{""documents"": [" & JsonRecords & "]}",
JsonContent = Text.ToBinary(JsonRequest, TextEncoding.Ascii),
Response =
Web.Contents("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases?",
[
Headers = [#"Ocp-Apim-Subscription-Key"="yourkeyhere",
#"Content-Type"="application/json", Accept="application/json"],
Content=JsonContent
]),
JsonResponse = Json.Document(Response,1252)
in
JsonResponse
in
TweetCognitive
You can use List.Accumulate to turn a list of values into a single value. For example, this would combine the values in the list into a single text value with ". " separating each row's value:
List.Accumulate(JsonResponse, "", (state, current) => state & current & ". ")
This would generate "monday frank love happiness today. nice good kind. tomorrow. " in your example. If you want to get rid of the trailing space, you can surround the List.Accumulate expression with Text.Trim.
The basic function to concatenate elements in a list is Text.Combine. For instance:
Text.Combine(JsonResponse, " ")
This avoids the extra delimeter at the end you get with List.Accumulate. Note also List.Combine is for creating a longer combined list from shorter lists, and the similar naming there may cause confusion.

Ruby - Extra punctuation in file when using regex and csv class to write to a file

I'm using regex to grab parameters from an html file.
I've tested the regexp and it seems to be fine- it appears that the csv conversion is what's causing the issue, but I'm not sure.
Here is what I have:
mechanics_file= File.read(filename)
mechanics= mechanics_file.scan(/(?<=70%">)(.*)(?=<\/td)/)
id_file= File.read(filename)
id=id_file.scan(/(?<="propertyids\[]" value=")(.*)(?=")/)
puts id.zip(mechanics)
CSV.open('csvfile.csv', 'w') do |csv|
id.zip(mechanics) { |row| csv << row }
end
The puts output looks like this:
2073
Acting
2689
Action / Movement Programming
But the contents of the csv look like this:
"[""2073""]","[""Acting""]"
"[""2689""]","[""Action / Movement Programming""]"
How do I get rid of all of the extra quotes and brackets? Am I doing something wrong in the process of writing to a csv?
This is my first project in ruby so I would appreciate a child-friendly explanation :) Thanks in advance!
String#scan returns an Array of Arrays (bold emphasis mine):
scan(pattern) → array
Both forms iterate through str, matching the pattern (which may be a Regexp or a String). For each match, a result is generated and either added to the result array or passed to the block. If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
a = "cruel world"
# […]
a.scan(/(...)/) #=> [["cru"], ["el "], ["wor"]]
So, id looks like this:
id == [['2073'], ['2689']]
and mechanics looks like this:
mechanics == [['Acting'], ['Action / Movement Programming']]
id.zip(movements) then looks like this:
id.zip(movements) == [[['2073'], ['Acting']], [['2689'], ['Action / Movement Programming']]]
Which means that in your loop, each row looks like this:
row == [['2073'], ['Acting']]
row == [['2689'], ['Action / Movement Programming']]
CSV#<< expects an Array of Strings, or things that can be converted to Strings as an argument. You are passing it an Array of Arrays, which it will happily convert to an Array of Strings for you by calling Array#to_s on each element, and that looks like this:
[['2073'], ['Acting']].map(&:to_s) == [ '["2073"]', '["Acting"]' ]
[['2689'], ['Action / Movement Programming']].map(&:to_s) == [ '["2689"]', '["Action / Movement Programming"]' ]
Lastly, " is the string delimiter in CSV, and needs to be escaped by doubling it, so what actually gets written to the CSV file is this:
"[""2073""]", "[""Acting""]"
"[""2689""]", "[""Action / Movement Programming""]"
The simplest way to correct this, would be to flatten the return values of the scans (and maybe also convert the IDs to Integers, assuming that they are, in fact, Integers):
mechanics_file = File.read(filename)
mechanics = mechanics_file.scan(/(?<=70%">)(.*)(?=<\/td)/).flatten
id_file = File.read(filename)
id = id_file.scan(/(?<="propertyids\[]" value=")(.*)(?=")/).flatten.map(&:to_i)
CSV.open('csvfile.csv', 'w') do |csv|
id.zip(mechanics) { |row| csv << row }
end
Another suggestion would be to forgo the Regexps completely and use an HTML parser to parse the HTML.

Prevent string data from being padded with spaces from the left

I have an input box that I use to enter a alphanumeric account numbers in a database. The box accepts up to 25 characters. However, for data entry, each account number may not be as long as 25 characters. In such a case, the account numbers are saved with blank spaces before it instead of being saved to the left of the column. How can I solve this?
I would like each number to be saved like the two hyphenated numbers and not with a space like the first record.
Code summary:
Set objDB = New db.Detail_Data
objDB.ConnectionString = CONNECTSTRING
With objDB
.summary_code = CDbl(mvarSumcode)
.charge_code = UCase$(Me.txtChargeCode)
.clientID = UCase$(Me.txtClientID)
.JobID = UCase$(Me.txtJobID)
.Invno = UCase$(Me.txtInvno.Text)
.TransAmt = CCur(Me.txtTransAmt)
.Gl_accno = Format(Me.txtGL, "#########################")
.Description = Me.txtDescription
blnStatus = .AddDetail
End With
Looks like it works as coded. Your line:
.Gl_accno = Format(Me.txtGL, "#########################")
Format with the # symbol right justifies the string, filling in spaces on the left. Unless you add a ! like so (source).
.Gl_accno = Format(Me.txtGL, "!#########################")

Extra column when scanning JSON into CSV using .map, sorted order is lost

I am writing a script to convert JSON data to an ordered CSV spreadsheet.
The JSON data itself does not necessarily contain all keys (some fields in the spreadsheet should say "NA").
Typical JSON data looks like this:
json = {"ReferringUrl":"N","PubEndDate":"2010/05/30","ItmId":"347628959","ParentItemId":"46999"}
I have a list of the keys found in each column of the spreadsheet:
keys = ["ReferringUrl", "PubEndDate", "ItmId", "ParentItemId", "OtherKey", "Etc"]
My thought was that I could iterate through each line of JSON like this:
parsed = JSON.parse(json)
result = (0..keys.length).map{ |i| parsed[keys[i]] || 'NA'} #add values associated with keys to an array, using NA if no value is present
CSV.open('file.csv', 'wb') do |csv|
csv << keys #create headings on spreadsheet
csv << result #load data associated with headings into the next line
end
Ideally, this would create a CSV file with the proper information in the proper order in a spreadsheet. However, what happens is the result data comes in completely out of order, and contains an extra column that I don't know what to do with.
Looking at the actual data, since there are actually about 100 keys and most of the fields contain NA, it is very difficult to determine what is happening.
Any advice?
The extra column comes from 0..keys.length which includes the end of the range. The last value of result is going to be parsed[keys[keys.length]] i.e. parsed[nil] i.e. nil. You can avoid that entirely by mapping keys directly
result = keys.map { |key| parsed.fetch(key, 'NA') }
As for the random order of the values, I suspect you aren't giving us all of the relevant information, because I tested your code and the result came out in the same order as keys.
Range has two possible notations
..
and
...
... is exclusive, meaning the range (A...B) would be not include B.
Change to
result = (0...keys.length).map{ |i| parsed[keys[i]] || 'NA'} #add values associated with keys to an array, using NA if no value is present
And see if that prevents the last value in that range from evaluating to nil.

RegEx to remove new line characters and replace with comma

I scraped a website using Nokogiri and after using xpath I was left with the following string (which is a few td's pushed into one string).
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
My goal is to make this into an array that looks like the following(it will be a nested array):
["Total First Downs", "359", "274"]
The issue is creating a regex equation that removes the escaped characters, subs in one "," but does not sub in a "," after the last set of integers. If the comma after the last set of integers is necessary, I could use #compact to get rid of the nil that occurs in the array. If you need the code on how I scraped the website here it is: (please note i saved the webpage for testing in order for my ip address to not get burned during the trial phase)
f = File.open('page')
doc = Nokogiri::HTML:(f)
f.close
number = doc.xpath('//tr[#class="tbdy1"]').count
stats = Array.new(number) {Array.new}
i = 0
doc.xpath('//tr[#class="tbdy1"]').each do |tr|
stats[i] << tr.text
i += 1
end
Thanks for your help
I don't fully understand your problem, but the result can be easily achieved with this:
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
.split(/[\n\t]+/)
# => ["Total First Downs", "359", "274"]
Try with gsub
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t".gsub("/[\n\t]+/",",")

Resources