separate a row of strings into separate columns using ruby - ruby

I am trying to manipulate a csv file using Ruby which will separate a row of strings into separate columns. Starting with 'Part#' to create a column then move past the comma to 'Quantity' and create a second column next to it and so on... I anticipate that I will need to utilize the split method to create an array. Is this the best method and how would I paste the array into excel so that it creates rows?
I would like the same thing to happen for the rows below the header containing the actual data where it separates into S-001, 1, [Mela] etc.
Here is a sample of the csv:
Sheet Goods
Part#,Quantity,Description,Length(L),Width(W),Thickness(T),Square Foot (per),Square Foot (total),Total Length (Feet),Material,
S-001,1, [Mela] Fridge Sides, 30",12",0 5/8",2.5,2.5,2.5,Not assigned,
S-002,1, [Mela] Fridge Sides#1,30",12",0 5/8",2.5,2.5,2.5,Not assigned,
S-003,1, [Mela] Fridge TB,32 1/4", 30",0 5/8",6.72,6.72,2.69,Not assigned,
S-004,1, [Mela] Fridge TB#1,32 1/4", 30",0 5/8",6.72,6.72,2.69,Not assigned,
S-005,1, [Mela] Fridge back,32 3/4",11 1/4",0 5/8",2.56,2.56,2.73,Not assigned,
Any help would be appreciated!
EDIT:
This is what the data should look like by the time it's done:
Sheet Goods
Pat# Quantity Description Length (L) Thickness (T) Square Foor (per) Square Foot (total) Total Length (Feet) Material
S-001 1 [Mela] Fridge Sides 30 5/8 2.5 2.5 2.5 Not assigned
Where the commas are removed and the data between the commas are put into separate columns.
Mark

First, use the libraries for the task: CSV. Secondly, it's pretty handy to have the rows indexed by column name (and not by a meaningless number). An example (where you'd get all widths):
require 'csv'
rows = CSV.open("data.csv")
name, headers = rows.take(2)
quantities = rows.map do |row_values|
row = Hash[headers.zip(row_values)]
# here you specific processing
row["Width(W)"]
end
As noted by Jim, your text is not valid CSV, double quotes are reserved.

Related

CSV - Processing each group of contiguous rows having the same values for certain fields

I have a large CSV file with the following headers: "sku", "year", "color", "price", "discount", "inventory", "published_on", "rate", "demographic" and "tags".
I would like to perform various calculations for each contiguous group of rows having the same values for "sku", "year" and "color". I will refer to this partition of the file as each group of rows. For example, if the file looked like this:
sku,year,color,price,discount,...
100,2019,white,24.61,2.3,...
100,2019,white,29.11,2.1,...
100,2019,white,33.48,2.9,...
100,2019,black,58.12,1.3,...
200,2018,brown,44.15,3.1,...
200,2018,brown,53.07,3.2,...
100,2019,white,16.91,2.9,...
there would be four groups of rows: rows 1, 2 and 3 (after the header row), row 4 alone, rows 5 and 6 and row 7 alone. Notice that the last row is not included in the first group even though it has the same values for the first three fields. That it is because it is not contiguous with the first group.
An example of a calculation that might be performed for each group of rows would be to determine the total inventory for the group. In general, the measure to be computed is some function of the values contained in all the rows of the group of rows. The specific calculations for each group of rows is not central to my question. Let us simply assume that each group of rows is passed to some method which returns the measure of interest.
I wish to return an array containing one element per group of rows, each element (perhaps an array or hash) containing the common values of "sku", "year" and "color" and the calculated measure of interest.
Because the file is large it must be read line-by-line, rather than gulping it into an array.
What's the best way to do this?
Enumerator#chunk is perfect for this.
CSV.foreach('path/to/csv', headers: true).
chunk { |row| row.values_at('sku', 'year', 'color') }.
each do |(sku, year, color), rows|
# process `rows` with the current `[sku, year, color]` combination
end
Obviously, that last each can be replaced by map or flat_map, as needed.
Here is an example of how that might be done. I will read the CSV file line-by-line to minimize memory requirements.
Code
require 'csv'
def doit(fname, common_headers)
CSV.foreach(fname, headers: true).
slice_when { |csv1,csv2| csv1.values_at(*common_headers) !=
csv2.values_at(*common_headers) }.
each_with_object({}) { |arr,h|
h[arr.first.to_h.slice(*common_headers)] = calc(arr) }
end
def calc(arr)
arr.sum { |csv| csv['price'].to_f }.fdiv(arr.size).round(2)
end
The method calc needs to be customized for the application. Here I am computing the average price for each contiguous group of records having the same values for "sku", "year" and "color".
See CSV::foreach, Enumerable#slice_when, CSV::Row#values_at, CSV::Row#to_h and Hash#slice.
Example
Now let's construct a CSV file.
str =<<~END
sku,year,color,price
1,2015,red,22.41
1,2015,red,33.61
1,2015,red,12.15
1,2015,blue,36.18
2,2015,yellow,9.08
2,2015,yellow,13.71
END
fname = 't.csv'
File.write(fname, str)
#=> 129
The common headers must be given:
common_headers = ['sku', 'year', 'color']
The average prices are obtained by executing doit:
doit(fname, common_headers)
#=> {{"sku"=>"1", "year"=>"2015", "color"=>"red"}=>22.72,
# {"sku"=>"1", "year"=>"2015", "color"=>"blue"}=>36.18,
# {"sku"=>"2", "year"=>"2015", "color"=>"yellow"}=>11.4}
Note:
((22.41 + 33.61 + 12.15)/3).round(2)
#=> 22.72
((36.18)/1).round(2)
#=> 36.18
((9.08 + 13.71)/2).round(2)
#=> 11.4
The methods foreach and slice_when both return enumerators. Therefore, for each contiguous block of lines from the file having the same values for the keys in common_headers, memory is acquired, calculations are performed for those lines and then that memory is released (by Ruby). In addition, memory is needed to hold the hash that is returned at the end.

Removing entire entry if line is less than an amount desired

I have a long list made up of text like this
Email: example#example.com
Language Spoken: Sample
Points: 52600
Lifetime points: 100000
Country: US
Number: 1234
Gender: Male
Status: Activated
=============================================
I need a way of filtering this list so that only students with higher than 52600 points gets shown. I am currently looking at solutions for this, I thought maybe excel would be a start but am not too sure and wanted input.
Here's a solution in Excel:
1) Copy Text into Column A
2) In B1 enter "1", then in B2 enter the formula: =IF(LEFT(A1,1)="=",B1+1,B1), then copy that down to the end.
(This splits the text into groups divided by the equal signs)
3) In C1 enter the formula: =IF(LEFT(A1,8)="Points: ",VALUE(RIGHT(A1,LEN(A1)-8)),0), then copy that down to the end.
(Basically this is populating the points in column B)
4) In D1 enter the formula: =SUMIF(B:B,B1,C:C), then copy that down to the end.
(This just sums the amounts in column B by grouping)
5) Finally put a filter on Column D, and filter by greater than or equal to the amount desired.

How to match between two arrays and update one based on criteria

I'm trying to match two supplier csv's and update one based on the results of the other; things like if price is different, update one file with the matching item of the other. If the product is in the first csv but not in the other, update it. Once the data set is adjusted, I'll write it back to the csv which I'm ok with. Each supplier file is about 9000 lines long. Sample data from the two Puts lines in the code are:
#<struct RecordBUY item_type=nil, buy_product_id="1000", product_name="Plastic Jeweled Crown", product_type=nil, product_code_SKU="105238", option_set=nil, duplicate={"1000"=>["105238"]}, brand_name="Rubies Costumes", prod_desc="This plastic crown has six large jewel stones accross the top. Adjustable headband. (Colors of the jewel stones may vary, our choice please.)", cost_price="$3.76", prod_weight="00.14", prod_width="5.75", prod_height="0.5", prod_depth="23.5", prod_category="Hats, Wigs & Masks", prod_upn="082686025935", prod_size="One Size", prod_color="Gold">
#<struct BCRecord item_type="Product", bc_product_id="620", product_name="Dollar Ring", product_type=nil, product_code_SKU="109624", option_set=nil, duplicate=nil, brand_name="Rubies Costumes", prod_desc="Ring has three large glittery Dollar Signs '$' that extend over your fingers.", cost_price="3.20", prod_weight="0.7200", prod_width="4.0000", prod_height="1.0000", prod_depth="7.0000", prod_category="Accessories & Makeup", prod_upn="82686006996", prod_size=nil, prod_color=nil, option_set=nil, price="5.60", allow_purchases=[21]>
I read the csv data into arrays against respective objects, but don't know how to do searching and updating efficiently. I did not come across concepts to avoid the bad ones (or whether doing a bad one on 9k lines is actually bad or just frowned upon). What I have is:
puts records[0]
puts recordsBC[1]
#start script
records.each do | buyline |
recordsBC.each do | bcline |
if bcline.product_code_SKU == buyline.product_code_SKU
##update pricing (brute force);
#bcline.price = buyline.cost_price * 1.75 #this fails with undefined method `price=' for #<Record:0x007fbb9088b960>
bcline.cost_price = buyline.cost_price
end
##if product is in BC currently, but not in buy - needs to be marked as inactive in BC
if bcline.product_code_SKU.include? buyline.product_code_SKU
#bcline.allow_purchases = "N" # this fails with undefined method `allow_purchases=' for #<Record:0x007fb2878822c8>
end
#if product is in Buy but not in BC then add it into BC
if buyline.product_code_SKU.include? bcline.product_code_SKU
recordsBC.push buyline
end
end
end
I can't figure out a better way, nor understand why I'm getting the undefined method errors on some but not all lines. I'm not after complete answers, just enough to figure out the rest of the solution.
I'd start by reducing the number of iterations. At the moment you are iterating through all of recordsBC for each buyline. So I'd start with:
records.each do | buyline |
record_subset = recordsBC.select{|r|!(r.product_code_SKU.split & buyling.product_code_SKU.split).empty?}
record_subset.each do |bcline|
.....
end
end
That should mean you only iterate through bcline items that have a matching product_code_SKU. You may have to modify the split as your example doesn't show how multiple SKUs are separated (e.g. '123 456', '123,456', or '123/456')

How to I create a loop that uses the last column in an array as the starting point for the next loop? In matlab

I have output = A(:,Nout) Nout = points along the array..... : = all points in column
So, it is saying the values in the last column.
How do I use output as A at the first column for the next iteration?
Your question is not clear. You may mean a variety of things.
If you want to loop through values in the first column in some order specified by your last column you can:
Asort = A ( A (:, end), :);
and then loop through Asort.
You may also mean to loop N times for each row where N is defined by last column of A.
You can do it using a nested loop:
for Arow = A(:, end)
for ii = 1:Arow
% your code here
end
end
You may also mean several other things, but instead me guessing you could try clarifing a bit. :)
(it should be a comment but I can't add comments yet, sorry)

Visual Basic Function Procedure

I need help with the following H.W. problem. I have done everything except the instructions I numbered. Please help!
A furniture manufacturer makes two types of furniture—chairs and sofas.
The cost per chair is $350, the cost per sofa is $925, and the sales tax rate is 5%.
Write a Visual Basic program to create an invoice form for an order.
After the data on the left side of the form are entered, the user can display an invoice in a list box by pressing the Process Order button.
The user can click on the Clear Order Form button to clear all text boxes and the list box, and can click on the Quit button to exit the program.
The invoice number consists of the capitalized first two letters of the customer’s last name, followed by the last four digits of the zip code.
The customer name is input with the last name first, followed by a comma, a space, and the first name. However, the name is displayed in the invoice in the proper order.
The generation of the invoice number and the reordering of the first and last names should be carried out by Function procedures.
Seeing as this is homework and you haven't provided any code to show what effort you have made on your own, I'm not going to provide any specific answers, but hopefully I will try to point you in the right direction.
Your first 2 numbered items look to be variations on the same theme... string manipulation. Assuming you have the customer's address information from the order form, you just need to write 2 separate function to take the parts of the name and address, take the data you need and return the value (which covers your 3rd item).
To get parts of the name and address to generate the invoice number, you need to think about using the Left() and Right() functions.
Something like:
Dim first as String, last as String, word as String
word = "Foo"
first = Left(word, 1)
last = Right(word, 1)
Debug.Print(first) 'prints "F"
Debug.Print(last) 'prints "o"
Once you get the parts you need, then you just need to worry about joining the parts together in the order you want. The concatenation operator for strings is &. So using the above example, it would go something like:
Dim concat as String
concat = first & last
Debug.Print(concat) 'prints "Fo"
Your final item, using a Function procedure to generate the desired values, is very easily google-able (is that even a word). The syntax is very simple, so here's a quick example of a common function that is not built into VB6:
Private Function IsOdd(value as Integer) As Boolean
If (value Mod 2) = 0 Then 'determines of value is an odd or even by checking
' if the value divided by 2 has a remainder or not
' (aka Mod operator)
IsOdd = False ' if remainder is 0, set IsOdd to False
Else
IsOdd = True ' otherwise set IsOdd to True
End If
End Function
Hopefully this gets you going in the right direction.

Resources