Ruby CSV re-arranging Array - ruby

I'm not sure what the appropriate title for this question so if someone could help me with that also, it would be nice.
-
I have a CSV file that looks something like
ID | Num
a | 1
a | 2
a | 3
b | 4
b | 5
c | 6
c | 7
I need the result to be:
ID | Num
a | 1,2,3,4
b | 4,5
c | 6,7
Currently, my solution is:
ary = CSV.open('some_file')
final = Array.new
id = ary[1][0] # ary[0] is "id"
numJoin = ary[1][1]
(1..ary.length).each do |i|
if id == ary[i+1][0]
numJoin = numJoin + "," + ary[i+1][1]
else
final << [id,numJoin]
id = ary[i+1][0]
numJoin = ary[i+1]]1]
end
end
It works, but I would like to have the opportunity to learn other ways to solve this, as I think there should be simpler ways to do this..
Thanks in advance.

You can use group_by, which groups by the return value of the block passed to it, in this case, it's the ID.
ary = ary.group_by { |v| v[0] }
P.S That file ain't looking like a CSV.

Related

how to update terminal values in realtime

I have the following asc table:
+---------------------------------------------+
| Report |
+----------+----------+-------------+---------+
| Store | Total |
+----------+----------+-------------+---------+
| A | 2723 |
| B | 7277 |
+----------+----------+-------------+---------+
I need to update the total while threre are updates running on my database.
How can I do that?
I already have the method that gets updated total.
But how can I persist the total on the terminal screen?
You can achieve this using the following gems
https://github.com/ruby/curses
https://github.com/tj/terminal-table
Example :
require 'terminal-table'
require "curses"
Curses.init_screen
Curses.crmode
Curses.noecho
Curses.stdscr.keypad = true
begin
x = 0
y = 0
loop do
table = Terminal::Table.new do |t|
t << ['Random 1', Random.rand(1...10)]
t.add_row ['Random 1', Random.rand(10...100)]
end
Curses.setpos(x, y)
output = table.render.to_s
Curses.addstr(output)
Curses.refresh
sleep 1
end
ensure
close_screen
end

Extract value label in command [stata]

I am trying to create a new string variable that combines the string of a real number (an ID) with a name. The name is a numeric variable with a value label.
Example data can be found below:
* Input Data
clear
input long num id
1 689347
2 972623
end
label values num num
label def num 1 "Label A" 2 "Label B"
+------------------+
| num id |
|------------------|
| Label A 689347 |
| Label B 972623 |
+------------------+
What I would like to do is create a string of the type 689347 - Label A. This is very easy to do by simply using decode on num, then writing a new string as follows:
tempvar numstr
decode num, gen(`numstr')
gen label = string(id) + " - " + `numstr'
+-------------------------------------+
| num id label |
|-------------------------------------|
| Label A 689347 689347 - Label A |
| Label B 972623 972623 - Label B |
+-------------------------------------+
This is already pretty easy, but is there a way to do this in one line, without the decode command?
For example something like:
gen label = string(if) + " " + string(num)
The problem with this is, of course, that this will just give a string of the real number value (1 and 2) that num takes on.
In this post you can see how to reference the value label in an if command.
My question is:
Is there a way to tell Stata to create a string and pull the value label instead of the value?
The best I can do is two lines.
decode num, generate(label)
replace label = string(id) + " - " + label
If you do not want to use decode, then this does the trick:
generate label = ""
forvalues i = 1 / 2 {
replace label = string(id) + " - " + "`: label num `i''" in `i'
}

Reshape data in pig - change row values to column names

Is there a way to reshape the data in pig?
The data looks like this -
id | p1 | count
1 | "Accessory" | 3
1 | "clothing" | 2
2 | "Books" | 1
I want to reshape the data so that the output would look like this--
id | Accessory | clothing | Books
1 | 3 | 2 | 0
2 | 0 | 0 | 1
Can anyone please suggest some way around?
If its a fixed set of product line the below code might help, otherwise you can go for a custom UDF which helps in achieving the objective.
Input : a.csv
1|Accessory|3
1|Clothing|2
2|Books|1
Pig Snippet :
test = LOAD 'a.csv' USING PigStorage('|') AS (product_id:long,product_name:chararray,rec_cnt:long);
req_stats = FOREACH (GROUP test BY product_id) {
accessory = FILTER test BY product_name=='Accessory';
clothing = FILTER test BY product_name=='Clothing';
books = FILTER test BY product_name=='Books';
GENERATE group AS product_id, (IsEmpty(accessory) ? '0' : BagToString(accessory.rec_cnt)) AS a_cnt, (IsEmpty(clothing) ? '0' : BagToString(clothing.rec_cnt)) AS c_cnt, (IsEmpty(books) ? '0' : BagToString(books.rec_cnt)) AS b_cnt;
};
DUMP req_stats;
Output :DUMP req_stats;
(1,3,2,0)
(2,0,0,1)

Number of string value occurrences for distinct another column value

I have a model Counter which returns the following records:
name.....flowers.....counter
vino.....rose.........1
vino.....lily.........1
gaya.....rose.........1
rosi.....lily.........1
vino.....lily.........1
rosi.....rose.........1
rosi.....rose.........1
I want to display in the table like:
name | Rose | Lily |
---------------------
Vino | 1 | 2 |
---------------------
Gaya | 1 | 0 |
---------------------
Rosi | 2 | 1 |
I want to display the count of flowers for each distinct name. I have tried the following and wondering how can I do it elegantly?
def counter_results
#counter_results= {}
Counter.each do |name|
rose = Counter.where(flower: 'rose').count
lily= Counter.where(flower: 'lily').count
#counter_results['name'] = name
#counter_results['rose_count'] = rose
#counter_results['lily_count'] = lily
end
return #counter_results
end
which I don't get the hash values.
This will give you slightly different output, but I think it is probably closer to what you want than what you showed.
You can use the query:
Counter.group([:name, :flowers]).sum(:counter)
To get a result set that looks like:
{ ["vino", "rose"] => 1, ["vino", "lily"] => 2, ["gaya", "rose"] => 1, ["gaya", "lily"] => 0, ... }
And you can do something like this to generate your hash:
def counter_results
#counter_results = {}
Counter.group([:name, :flowers]).sum(:counter).each do |k, v|
#counter_results[k.join("_")] = v
end
#counter_results
end
The resulting hash would look like this:
{
"vino_rose" => 1,
"vino_lily" => 2,
"gaya_rose" => 1,
"gaya_lily" => 0,
...
}
Somebody else may have a better way to do it, but seems like that should get you pretty close.

merge rows csv by id ruby

I have a .csv file that, for simplicity, is two fields: ID and comments. The rows of id's are duplicated where each comment field had met max char from whatever table it was generated from and another row was necessary. I now need to merge associative comments together thus creating one row for each unique ID, using Ruby.
To illustrate, I'm trying in Ruby, to make this:
ID | COMMENT
1 | fragment 1
1 | fragment 2
2 | fragment 1
3 | fragment 1
3 | fragment 2
3 | fragment 3
into this:
ID | COMMENT
1 | fragment 1 fragment 2
2 | fragment 1
3 | fragment 1 fragment 2 fragment 3
I've come close to finding a way to do this using inject({}) and hashmap, but still working on getting all data merged correctly. Meanwhile seems my code is getting too complicated with multiple hashes and arrays just to do a merge on selective rows.
What's the best/simplest way to achieve this type of row merge? Could it be done with just arrays?
Would appreciate advice on how one would normally do this in Ruby.
Keep the headers and use group by ID:
rows = CSV.read 'comment.csv', :headers => true
rows.group_by{|row| row['ID']}.values.each do |group|
puts [group.first['ID'], group.map{|r| r['COMMENT']} * ' '] * ' | '
end
You can use 0 and 1 but I think it's clearer to use the header field names.
With the following csv file, tmp.csv
1,fragment 11
1,fragment 21
2,fragment 21
2,fragment 22
3,fragment 31
3,fragment 32
3,fragment 33
Try this (demonstrated using irb)
irb> require 'csv'
=> true
irb> h = Hash.new
=> {}
irb> CSV.foreach("tmp.csv") {|r| h[r[0]] = h.key?(r[0]) ? h[r[0]] + r[1] : r[1]}
=> nil
irb> h
=> {"1"=>"fragment 11fragment 21", "2"=>"fragment 21fragment 22", "3"=>"fragment 31fragment 32fragment 33"}

Resources