Related
I'm using ruby2.5.0 and I have the below function as part of my script. When I run it I get the below error:
ensure in get_database_connection': undefined methodcritical=' for
Thread:Class (NoMethodError)
I understand the for ruby1.9.0 and above, Thread.critical is no longer supported so how can I edit my function to make it run under ruby2.5.0 ?
Thanks.
# Geodict
# Copyright (C) 2010 Pete Warden <pete#petewarden.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
require 'rubygems'
require 'postgres'
require 'set'
# Some hackiness to include the library script, even if invoked from another directory
require File.join(File.expand_path(File.dirname(__FILE__)), 'dstk_config')
# Global holder for the database connections
$connections = {}
# The main entry point. This function takes an unstructured text string and returns a list of all the
# fragments it could identify as locations, together with lat/lon positions
def find_locations_in_text(text)
current_index = text.length-1
result = []
$tokenized_words = {}
setup_countries_cache()
setup_regions_cache()
# This loop goes through the text string in *reverse* order. Since locations in English are typically
# described with the broadest category last, preceded by more and more specific designations towards
# the beginning, it simplifies things to walk the string in that direction too
while current_index>=0 do
current_word, pulled_index, ignored_skipped = pull_word_from_end(text, current_index)
lower_word = current_word.downcase
could_be_country = $countries_cache.has_key?(lower_word)
could_be_region = $regions_cache.has_key?(lower_word)
if not could_be_country and not could_be_region
current_index = pulled_index
next
end
# This holds the results of the match function for the final element of the sequence. This lets us
# optimize out repeated calls to see if the end of the current string is a country for example
match_cache = {}
token_result = nil
# These 'token sequences' describe patterns of discrete location elements that we'll look for.
$token_sequences.each() do |token_sequence|
# The sequences are specified in the order they'll occur in the text, but since we're walking
# backwards we need to reverse them and go through the sequence in that order too
token_sequence = token_sequence.reverse
# Now go through the sequence and see if we can match up all the tokens in it with parts of
# the string
token_result = nil
token_index = current_index
token_sequence.each_with_index do |token_name, token_position|
# The token definition describes how to recognize part of a string as a match. Typical
# tokens include country, city and region names
token_definition = $token_definitions[token_name]
match_function = token_definition[:match_function]
# This logic optimizes out repeated calls to the same match function
if token_position == 0 and match_cache.has_key?(token_name)
token_result = match_cache[token_name]
else
# The meat of the algorithm, checks the ending of the current string against the
# token testing function, eg seeing if it matches a country name
token_result = send(match_function, text, token_index, token_result)
if token_position == 0
match_cache[token_name] = token_result
end
end
if !token_result
# The string doesn't match this token, so the sequence as a whole isn't a match
break
else
# The current token did match, so move backwards through the string to the start of
# the matched portion, and see if the preceding words match the next required token
token_index = token_result[:found_tokens][0][:start_index]-1
end
end
# We got through the whole sequence and all the tokens match, so we have a winner!
if token_result
current_word, current_index, end_skipped = pull_word_from_end(text, current_index)
break
end
end
if !token_result
# None of the sequences matched, so back up a word and start over again
ignored_word, current_index, end_skipped = pull_word_from_end(text, current_index)
else
# We found a matching sequence, so add the information to the result
result.push(token_result)
found_tokens = token_result[:found_tokens]
current_index = found_tokens[0][:start_index]-1
end
end
# Reverse the result so it's in the order that the locations occured in the text
result.reverse!
return result
end
# Functions that look at a small portion of the text, and try to identify any location identifiers
# Caches the countries and regions tables in memory
$countries_cache = {}
$is_countries_cache_setup = false
def setup_countries_cache()
if $is_countries_cache_setup then return end
select = 'SELECT * FROM countries'
hashes = select_as_hashes(select, DSTKConfig::DATABASE)
hashes.each do |hash|
last_word = hash['last_word'].downcase
if !$countries_cache.has_key?(last_word)
$countries_cache[last_word] = []
end
$countries_cache[last_word].push(hash)
end
$is_countries_cache_setup = true
end
$regions_cache = {}
$is_regions_cache_setup = false
def setup_regions_cache()
if $is_regions_cache_setup then return end
select = 'SELECT * FROM regions'
hashes = select_as_hashes(select, DSTKConfig::DATABASE)
hashes.each do |hash|
last_word = hash['last_word'].downcase
if !$regions_cache.has_key?(last_word)
$regions_cache[last_word] = []
end
$regions_cache[last_word].push(hash)
end
$is_regions_cache_setup = true
end
# Translates a two-letter country code into a readable name
def get_country_name_from_code(country_code)
if !country_code then return nil end
setup_countries_cache()
result = country_code
$countries_cache.each do |last_word, countries|
countries.each do |row|
if row['country_code'] and row['country_code'].downcase == country_code.downcase
result = row['country']
end
end
end
result
end
# Matches the current fragment against our database of countries
def is_country(text, text_starting_index, previous_result)
current_word = ''
current_index = text_starting_index
pulled_word_count = 0
found_row = nil
# Walk backwards through the current fragment, pulling out words and seeing if they match
# the country names we know about
while pulled_word_count < DSTKConfig::WORD_MAX do
pulled_word, current_index, end_skipped = pull_word_from_end(text, current_index)
pulled_word_count += 1
if current_word == ''
# This is the first time through, so the full word is just the one we pulled
current_word = pulled_word
# Make a note of the real end of the word, ignoring any trailing whitespace
word_end_index = (text_starting_index-end_skipped)
# We've indexed the locations by the word they end with, so find all of them
# that have the current word as a suffix
last_word = pulled_word.downcase
if !$countries_cache.has_key?(last_word)
break
end
candidate_dicts = $countries_cache[last_word]
name_map = {}
candidate_dicts.each do |candidate_dict|
name = candidate_dict['country'].downcase
name_map[name] = candidate_dict
end
else
current_word = pulled_word+' '+current_word
end
# This happens if we've walked backwards all the way to the start of the string
if current_word == ''
return nil
end
# If the first letter of the name is lower case, then it can't be the start of a country
# Somewhat arbitrary, but for my purposes it's better to miss some ambiguous ones like this
# than to pull in erroneous words as countries (eg thinking the 'uk' in .co.uk is a country)
if current_word[0].chr =~ /[a-z]/
next
end
name_key = current_word.downcase
if name_map.has_key?(name_key)
found_row = name_map[name_key]
end
if found_row
# We've found a valid country name
break
end
if current_index < 0
# We've walked back to the start of the string
break
end
end
if !found_row
# We've walked backwards through the current words, and haven't found a good country match
return nil
end
# Were there any tokens found already in the sequence? Unlikely with countries, but for
# consistency's sake I'm leaving the logic in
if !previous_result
current_result = {
:found_tokens => [],
}
else
current_result = previous_result
end
country_code = found_row['country_code']
lat = found_row['lat']
lon = found_row['lon']
# Prepend all the information we've found out about this location to the start of the :found_tokens
# array in the result
current_result[:found_tokens].unshift({
:type => :COUNTRY,
:code => country_code,
:lat => lat,
:lon => lon,
:matched_string => current_word,
:start_index => (current_index+1),
:end_index => word_end_index
})
return current_result
end
# Looks through our database of 2 million towns and cities around the world to locate any that match the
# words at the end of the current text fragment
def is_city(text, text_starting_index, previous_result)
# If we're part of a sequence, then use any country or region information to narrow down our search
country_code = nil
region_code = nil
if previous_result
found_tokens = previous_result[:found_tokens]
found_tokens.each do |found_token|
type = found_token[:type]
if type == :COUNTRY:
country_code = found_token[:code]
elsif type == :REGION:
region_code = found_token[:code]
end
end
end
current_word = ''
current_index = text_starting_index
pulled_word_count = 0
found_row = nil
while pulled_word_count < DSTKConfig::WORD_MAX do
pulled_word, current_index, end_skipped = pull_word_from_end(text, current_index)
pulled_word_count += 1
if current_word == ''
current_word = pulled_word
word_end_index = (text_starting_index-end_skipped)
select = "SELECT * FROM cities WHERE last_word='"+pulled_word.downcase+"'"
if country_code
select += " AND country='"+country_code.downcase+"'"
end
if region_code
select += " AND region_code='"+region_code.upcase.strip+"'"
end
# There may be multiple cities with the same name, so pick the one with the largest population
select += ' ORDER BY population;'
hashes = select_as_hashes(select, DSTKConfig::DATABASE)
name_map = {}
hashes.each do |hash|
name = hash['city'].downcase
name_map[name] = hash
end
else
current_word = pulled_word+' '+current_word
end
if current_word == ''
return nil
end
if current_word[0].chr =~ /[a-z]/
next
end
name_key = current_word.downcase
if name_map.has_key?(name_key)
found_row = name_map[name_key]
end
if found_row
break
end
if current_index < 0
break
end
end
if !found_row
return nil
end
if !previous_result
current_result = {
:found_tokens => [],
}
else
current_result = previous_result
end
lat = found_row['lat']
lon = found_row['lon']
country_code = found_row['country'].downcase
current_result[:found_tokens].unshift( {
:type => :CITY,
:lat => lat,
:lon => lon,
:country_code => country_code,
:matched_string => current_word,
:start_index => (current_index+1),
:end_index => word_end_index
})
return current_result
end
# This looks for sub-regions within countries. At the moment the only values in the database are for US states
def is_region(text, text_starting_index, previous_result)
# Narrow down the search by country, if we already have it
country_code = nil
if previous_result
found_tokens = previous_result[:found_tokens]
found_tokens.each do |found_token|
type = found_token[:type]
if type == :COUNTRY
country_code = found_token[:code]
end
end
end
current_word = ''
current_index = text_starting_index
pulled_word_count = 0
found_row = nil
while pulled_word_count < DSTKConfig::WORD_MAX do
pulled_word, current_index, end_skipped = pull_word_from_end(text, current_index)
pulled_word_count += 1
if current_word == ''
current_word = pulled_word
word_end_index = (text_starting_index-end_skipped)
last_word = pulled_word.downcase
if !$regions_cache.has_key?(last_word)
break
end
all_candidate_dicts = $regions_cache[last_word]
if country_code
candidate_dicts = []
all_candidate_dicts.each do |possible_dict|
candidate_country = possible_dict['country_code']
if candidate_country.downcase() == country_code.downcase():
candidate_dicts << possible_dict
end
end
else
candidate_dicts = all_candidate_dicts
end
name_map = {}
candidate_dicts.each do |candidate_dict|
name = candidate_dict['region'].downcase
name_map[name] = candidate_dict
end
else
current_word = pulled_word+' '+current_word
end
if current_word == ''
return nil
end
if current_word[0].chr =~ /[a-z]/
next
end
name_key = current_word.downcase
if name_map.has_key?(name_key)
found_row = name_map[name_key]
end
if found_row
break
end
if current_index < 0
break
end
end
if !found_row
return nil
end
if !previous_result
current_result = {
:found_tokens => [],
}
else
current_result = previous_result
end
region_code = found_row['region_code']
lat = found_row['lat']
lon = found_row['lon']
country_code = found_row['country_code'].downcase
current_result[:found_tokens].unshift( {
:type => :REGION,
:code => region_code,
:lat => lat,
:lon => lon,
:country_code => country_code,
:matched_string => current_word,
:start_index => (current_index+1),
:end_index=> word_end_index
})
return current_result
end
# A special case - used to look for 'at' or 'in' before a possible location word. This helps me be more certain
# that it really is a location in this context. Think 'the New York Times' vs 'in New York' - with the latter
# fragment we can be pretty sure it's talking about a location
def is_location_word(text, text_starting_index, previous_result)
current_index = text_starting_index
current_word, current_index, end_skipped = pull_word_from_end(text, current_index)
word_end_index = (text_starting_index-end_skipped)
if current_word == ''
return nil
end
current_word.downcase!
if !DSTKConfig::LOCATION_WORDS.has_key?(current_word)
return nil
end
return previous_result
end
def is_postal_code(text, text_starting_index, previous_result)
# Narrow down the search by country, if we already have it
country_code = nil
if previous_result
found_tokens = previous_result[:found_tokens]
found_tokens.each do |found_token|
type = found_token[:type]
if type == :COUNTRY
country_code = found_token[:code]
end
end
end
current_word = ''
current_index = text_starting_index
pulled_word_count = 0
found_rows = nil
while pulled_word_count < DSTKConfig::WORD_MAX do
pulled_word, current_index, end_skipped = pull_word_from_end(text, current_index)
pulled_word_count += 1
if current_word == ''
current_word = pulled_word
word_end_index = (text_starting_index-end_skipped)
last_word = pulled_word.downcase
select = "SELECT * FROM postal_codes"
select += " WHERE last_word='"+pulled_word.downcase+"'"
select += " OR last_word='"+pulled_word.upcase+"'"
if country_code
select += " AND country_code='"+country_code.upcase+"'"
end
candidate_dicts = select_as_hashes(select, DSTKConfig::DATABASE)
name_map = {}
candidate_dicts.each do |candidate_dict|
name = candidate_dict['postal_code'].downcase
if !name_map[name] then name_map[name] = [] end
name_map[name] << candidate_dict
end
else
current_word = pulled_word+' '+current_word
end
if current_word == ''
return nil
end
if current_word[0].chr =~ /[a-z]/
next
end
name_key = current_word.downcase
if name_map.has_key?(name_key)
found_rows = name_map[name_key]
end
if found_rows
break
end
if current_index < 0
break
end
end
if !found_rows
return nil
end
# Confirm the postal code against the country suffix
found_row = nil
if country_code
found_rows.each do |row|
if row['country_code'] == country_code
found_row = row
break
end
end
end
if !found_row
return nil
end
# Also pull in the prefixed region, if there is one
region_result = is_region(text, current_index, nil)
if region_result
region_token = region_result[:found_tokens][0]
region_code = region_token[:code]
if found_row['region_code'] == region_code
current_index = region_token[:start_index]-1
current_word = region_token[:matched_string] + ' ' + current_word
end
end
if !found_row
return nil
end
if !previous_result
current_result = {
:found_tokens => [],
}
else
current_result = previous_result
end
region_code = found_row['region_code']
lat = found_row['lat']
lon = found_row['lon']
country_code = found_row['country_code'].downcase
region_code = found_row['region_code'].downcase
postal_code = found_row['postal_code'].downcase
current_result[:found_tokens].unshift( {
:type => :POSTAL_CODE,
:code => postal_code,
:lat => lat,
:lon => lon,
:region_code => region_code,
:country_code => country_code,
:matched_string => current_word,
:start_index => (current_index+1),
:end_index=> word_end_index
})
return current_result
end
# Characters to ignore when pulling out words
WHITESPACE = " \t'\",.-/\n\r<>!?".split(//).to_set
$tokenized_words = {}
# Walks backwards through the text from the end, pulling out a single unbroken sequence of non-whitespace
# characters, trimming any whitespace off the end
def pull_word_from_end(text, index, use_cache=true)
if use_cache and $tokenized_words.has_key?(index)
return $tokenized_words[index]
end
found_word = ''
current_index = index
end_skipped = 0
while current_index>=0 do
current_char = text[current_index].chr
current_index -= 1
if WHITESPACE.include?(current_char)
if found_word == ''
end_skipped += 1
next
else
current_index += 1
break
end
end
found_word << current_char
end
# reverse the result (since we're appending for efficiency's sake)
found_word.reverse!
result = [found_word, current_index, end_skipped]
$tokenized_words[index] = result
return result
end
# Converts the result of an SQL fetch into an associative dictionary, rather than a numerically indexed list
def get_hash_from_row(fields, row)
d = {}
fields.each_with_index do |field, index|
value = row[index]
d[field] = value
end
return d
end
# Returns the most specific token from the array
def get_most_specific_token(tokens)
if !tokens then return nil end
result = nil
result_priority = nil
tokens.each do |token|
priority = $token_priorities[token[:type]]
if !result or result_priority > priority
result = token
result_priority = priority
end
end
result
end
# Returns the results of the SQL select statement as associative arrays/hashes
def select_as_hashes(select, database_name)
begin
conn = get_database_connection(database_name)
Thread.critical = true
res = conn.exec('BEGIN')
res.clear
res = conn.exec('DECLARE myportal CURSOR FOR '+select)
res.clear
res = conn.exec('FETCH ALL in myportal')
fields = res.fields
rows = res.result
res = conn.exec('CLOSE myportal')
res = conn.exec('END')
result = []
rows.each do |row|
hash = get_hash_from_row(fields, row)
result.push(hash)
end
rescue PGError
if conn
printf(STDERR, conn.error)
else
$stderr.puts 'select_as_hashes() - no connection for ' + database_name
end
if conn
conn.close
end
$connections[database_name] = nil
exit(1)
ensure
Thread.critical = false
end
return result
end
def get_database_connection(database_name)
begin
Thread.critical = true
if !$connections[database_name]
$connections[database_name] = PGconn.connect(DSTKConfig::HOST,
DSTKConfig::PORT,
'',
'',
database_name,
DSTKConfig::USER,
DSTKConfig::PASSWORD)
end
ensure
Thread.critical = false
end
if !$connections[database_name]
$stderr.puts "get_database_connection('#{database_name}') - Couldn't open connection"
end
$connections[database_name]
end
# Types of locations we'll be looking for
$token_definitions = {
:COUNTRY => {
:match_function => :is_country
},
:CITY => {
:match_function => :is_city
},
:REGION => {
:match_function => :is_region
},
:LOCATION_WORD => {
:match_function => :is_location_word
},
:POSTAL_CODE => {
:match_function => :is_postal_code
}
}
# Particular sequences of those location words that give us more confidence they're actually describing
# a place in the text, and aren't coincidental names (eg 'New York Times')
$token_sequences = [
[ :POSTAL_CODE, :REGION, :COUNTRY ],
[ :REGION, :POSTAL_CODE, :COUNTRY ],
[ :POSTAL_CODE, :CITY, :COUNTRY ],
[ :POSTAL_CODE, :COUNTRY ],
[ :CITY, :COUNTRY ],
[ :CITY, :REGION ],
[ :REGION, :COUNTRY ],
[ :COUNTRY ],
[ :LOCATION_WORD, :REGION ], # Regions and cities are too common as words to use without additional evidence
[ :LOCATION_WORD, :CITY ]
]
# Location identifiers in order of decreasing specificity
$token_priorities = {
:POSTAL_CODE => 0,
:CITY => 1,
:REGION => 2,
:COUNTRY => 3,
}
if __FILE__ == $0
require 'json'
test_text = <<-TEXT
Spain
Italy
Bulgaria
Foofofofof
New Zealand
Barcelona, Spain
Wellington New Zealand
I've been working on the railroad, all the live-long day! The quick brown fox jumped over the lazy dog in Alabama
I'm mentioning Los Angeles here, but without California or CA right after it, it won't be detected. If I talk about living in Wisconsin on the other hand, that 'in' gives the algorithm extra evidence it's actually a location.
It should still pick up more qualified names like Amman Jordan or Atlanta, Georgia though!
Dallas, TX or New York, NY
It should now pick up Queensland, Australia, or even NSW, Australia!
Postal codes like QLD 4002, Australia, QC H3W, Canada, 2608 Lillehammer, Norway, or CA 94117, USA are supported too.
TEXT
puts "Analyzing '#{test_text}'"
puts "Found locations:"
locations = find_locations_in_text(test_text)
locations.each_with_index do |location_info, index|
found_tokens = location_info[:found_tokens]
location = get_most_specific_token(found_tokens)
match_start_index = found_tokens[0][:start_index]
match_end_index = found_tokens[found_tokens.length-1][:end_index]
matched_string = test_text[match_start_index..match_end_index]
result = {
'type' => location[:type],
'name' => location[:matched_string],
'latitude' => location[:lat].to_s,
'longitude' => location[:lon].to_s,
'start_index' => location[:start_index].to_s,
'end_index' => location[:end_index].to_s,
'matched_string' => matched_string,
'country' => location[:country_code],
'code' => location[:code],
}
puts result.to_json
end
end
You can either remove the call, or ask first before trying:
Thread.respond_to(:critical=) and Thread.critical = true
That being said, since Thread.critical= was removed in Ruby 1.9 it's pretty safe to trash that code entirely. Anyone running Ruby 1.8.x is living dangerously.
Unless you have a specific requirement to support 1.8.x, you'll have to delete the calls and use an alternative.
The purpose of critical= is to prevent pre-emption of the thread by another. That's a really heavy-handed way to synchronize threads, and dangerous enough that Ruby pulled support for it lest that start to become more pervasive.
What you probably want is a Mutex if you need to lock a resource. There's no obviously shared resources here unless get_database_connection returns one. It doesn't seem to as the connection is closed on error.
This code is full of some seriously suspect things, like using $connections, a global variable, and hard-exiting the whole process on failure. You may want to do a more thorough investigation as to what the purpose of the critical lock was in the first place.
I've searched and haven't found a method for this particular conundrum. I have two CSV files of data that sometimes relate to the same thing. Here's an example:
CSV1 (500 lines):
date,reference,amount,type
10/13/2015,,1510.40,sale
10/13/2015,,312.90,sale
10/14/2015,,928.50,sale
10/15/2015,,820.25,sale
10/12/2015,,702.70,credit
CSV2 (20000 lines):
reference,date,amount
243534985,10/13/2015,312.90
345893745,10/15/2015,820.25
086234523,10/14/2015,928.50
458235832,10/13/2015,1510.40
My goal is to match the date and amount from CSV2 with the date and amount in CSV1, and write the reference from CSV2 to the reference column in the corresponding row.
This is a simplified view, as CSV2 actually contains many many more columns - these are just the relevant ones, so ideally I'd like to refer to them by header name or maybe index somehow?
Here's what I've attempted, but I'm a bit stuck.
require 'csv'
data1 = {}
data2 = {}
CSV.foreach("data1.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
data1[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
CSV.foreach("data2.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
data2[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
data1.each do |data1_row|
data2.each do |data2_row|
if (data1_row['comparitive'] == data2_row['comparitive'])
puts data1_row['identifier'] + data2_row['column_thats_important_and_wanted']
end
end
end
Result:
22:in `[]': no implicit conversion of String into Integer (TypeError)
I've also tried:
CSV.foreach('data2.csv') do |data2|
CSV.foreach('data1.csv') do |data1|
if (data1[3] == data2[4])
data1[1] << data2[1]
puts "Change made!"
else
puts "nothing changed."
end
end
end
This however did not match anything inside the if statement, so perhaps not the right approach?
The headers method should help you match columns--from there it's a matter of parsing and writing the modified data back out to a file.
Solved.
data1 = CSV.read('data1.csv')
data2 = CSV.read('data2.csv')
data2.each do |data2|
data1.each do |data1|
if (data1[5] == data2[4])
data1[1] = data2[1]
puts "Change made!"
puts data1
end
end
end
File.open('referenced.csv','w'){ |f| f << data1.map(&:to_csv).join("")}
I've got a list of persons saved in an array and I want to loop a file with organizations looking for matches and save them but it keeps going wrong. I think I'm doing something wrong with the arrays.
This is exactly what I'm doing:
I have a list of persons in a file called 'personen_fixed.csv'.
I save that list into an array.
I have another file that also has the name of the people ("pers2"), but also three other interesting columns of data. I save the four columns into arrays.
I want to loop over the first array (the persons) and search for matches with the list of persons ("pers2").
If there is a match I want to save that row.
What I'm getting now is two rows of data, of which one is filled with ALL persons. See my code below. On the bottom i have some sample input data.
require 'csv'
array_pers1 = []
array_pers2 = []
array_orgaan = []
array_functie = []
array_rol = []
filename_1 = 'personen_fixed.csv'
CSV.foreach(filename_1, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
array_pers1 << row[0].to_s
end
filename_2 = 'Functies_fixed.csv'
CSV.foreach(filename_2, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
array_pers2 << row[1].to_s
array_orgaan << row[16].to_s
array_functie << row[17].to_s
array_rol << row[18].to_s
end
CSV.open("testrij.csv", "w") do |row|
row << ["rijnummer","link","ptext","soort_woonhuis"]
for rij in array_pers1
for x in 1...4426 do
if rij === array_pers2["#{x}".to_f]
pers2 = array_pers2["#{x}".to_f]
orgaan = array_orgaan["#{x}".to_f]
functie = array_functie["#{x}".to_f]
rol = array_rol["#{x}".to_f]
row << [pers2,orgaan,functie,rol]
else
pers2 = ""
orgaan = ""
functie = ""
rol = ""
end
end
end
end
input data for the first excel data (excel column name and first row of data):
person
someonesname
Input data for the second excel file:
person,organizationid,role,organization,function
someonesname,34971,member,americanairways,boardofdirectors
Since many of the people in the dataset have multiple jobs at different organizations, I want to save all them next to eachother (output I'm going for):
person,organization(1),function(1),role(1),organization(2),function(2),role(2) (max 5)
I don't understand the purpose of storing a single row from your Functies csv file in 4 separate arrays, and then combining them together later, so my answer doesn't tell you why your approach isn't working. Instead, I suggest a different approach that I believe is cleaner.
Building an array of names from the first file is ok. For the second file, I would store each row as an array and use a hash:
data = {
"name1 => ["name1", "orgaan1", "functie1", "rol1"],
"name2 => ["name2", "orgaan2", "functie2", "rol2"],
...
}
Building it might look like
data = {}
CSV.foreach(filename_2, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
name = row[1]
orgaan = row[16]
functie = row[17]
rol = row[18]
data[name] = [name, orgaan, functie, rol]
end
Then you would iterate over your first array and keep all the arrays that match
results = []
for name in array_pers1
results << data[name] if data.include?(name)
end
On the other hand, if you don't want to use a hash and insist on using arrays (perhaps because names are not unique), I would still store them like
data = [
["name1", "orgaan1", "functie1", "rol1"],
["name2", "orgaan2", "functie2", "rol2"]
]
And then during your search step you would just iterate like
results = []
for name in array_pers1
for row in data
results << row if row[0] == name
end
end
I have a vertical CSV file that looks like this:
name,value
case,"123Case0001"
custodian,"Doe_John"
PDate,"10/30/2013"
I can read the file like this:
CSV.foreach("#{batch_File_Dir_cdata}", :quote_char => '"', :col_sep =>',', :row_sep =>:auto, :headers => true) do |record|
ev_info = record[0]
ev_val = record[1]
The problem is, I need to get a specific ev_val for just one specific ev_info. I could potentially use the row number, but foresight tells me that this could change. What will be the same is the name of information. I want to find the row with the specific information name and get that value.
When I do the foreach, it gets that value and then goes past it and leaves me with an empty variable, because it went on to the other rows.
Can anyone help?
You've got a lot of choices, but the easiest is to assign to a variable based on the contents, as in:
ev_info = record[0]
ev_val = record[1] if ev_info='special name'
Note, though, that you need to define whatever variable you are assigning to outside of the block as it will otherwise be created as a local variable and be inaccessible to you afterwards.
Alternatively, you can read in the entire array and then select the record you're interested in with index or select.
I'd do it something like:
require 'pp'
require 'csv'
ROWS_IN_RECORD = 4
data = []
File.open('test.dat', 'r') do |fi|
loop do
record = {}
ROWS_IN_RECORD.times do
row = fi.readline.parse_csv
record[row.first] = row.last
end
data << record
break if fi.eof?
end
end
pp data
Running that outputs:
[{"name"=>"value",
"case"=>"123Case0001",
"custodian"=>"Doe_John",
"PDate"=>"10/30/2013"},
{"name"=>"value_2",
"case"=>"123Case0001 2",
"custodian"=>"Doe_John 2",
"PDate"=>"10/30/2013 2"}]
It returns an array of hashes, so each hash is the record you'd normally get from CSV if the file was a normal CSV file.
There are other ways of breaking down the input file into logical groups, but this is scalable, with a minor change, to work on huge data files. For a huge file just process each record at the end of the loop instead of pushing it onto the data variable.
I got it to work. I original had the following:
CSV.foreach("#{batch_File_Dir_cdata}", :quote_char => '"', :col_sep =>',', :row_sep =>:auto, :headers => true) do |record|
ev_info = record[0]
c_val = record[1]
case when ev_info == "Custodian"
cust = cval
end
end
puts cust
what I needed to do was this:
CSV.foreach("#{batch_File_Dir_cdata}", :quote_char => '"', :col_sep =>',', :row_sep =>:auto, :headers => true) do |record|
ev_info = record[0]
case when ev_info == "Custodian"
c_val = record[1]
end
end
puts c_val
Edit: The issue is being unable to get the quantity of arrays within the hash, so it can be, x = amount of arrays. so it can be used as function.each_index{|x| code }
Trying to use the index of the amount of rows as a way of repeating an action X amount of times depending on how much data is pulled from a CSV file.
Terminal issued
=> Can't convert symbol to integer (TypeError)
Complete error:
=> ~/home/tests/Product.rb:30:in '[]' can't convert symbol into integer (TypeError) from ~home/tests/Product.rub:30:in 'getNumbRel'
from test.rb:36:in '<main>'
the function is that is performing the action is:
def getNumRel
if defined? #releaseHashTable
return #releaseHashTable[:releasename].length
else
#releaseHashTable = readReleaseCSV()
return #releaseHashTable[:releasename].length
end
end
The csv data pull is just a hash of arrays, nothing snazzy.
def readReleaseCSV()
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has started")
$log.debug("reading product csv file")
# Create a Hash where the default is an empty Array
result = Array.new
csvPath = "#{File.dirname(__FILE__)}"+"/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
if "#{column}" == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
productId = Integer(value)
if result[productId] == nil
result[productId] = Array.new
end
result[productId][result[productId].length] = proHash
end
end
end
$log.info("Method "+"#{self.class.name}"+"."+"#{__method__}"+" has finished")
#productReleaseArr = result
end
Sorry, couldn't resist, cleaned up your method.
# empty brackets unnecessary, no uppercase in method names
def read_release_csv
# you don't need + here
$log.info("Method #{self.class.name}.#{__method__} has started")
$log.debug("reading product csv file")
# you're returning this array. It is not a hash. [] is preferred over Array.new
result = []
csvPath = "#{File.dirname(__FILE__)}/../../data/addingProdRelProjIterTestSuite/releaseCSVdata.csv"
CSV.foreach(csvPath, :headers => true, :header_converters => :symbol) do |row|
row.each do |column, value|
# to_s is preferred
if column.to_s == "prodid"
proHash = Hash.new { |h, k| h[k] = [ ] }
proHash['relid'] << row[:relid]
proHash['releasename'] << row[:releasename]
proHash['inheritcomponents'] << row[:inheritcomponents]
# to_i is preferred
productId = value.to_i
# this notation is preferred
result[productId] ||= []
# this is identical to what you did and more readable
result[productId] << proHash
end
end
end
$log.info("Method #{self.class.name}.#{__method__} has finished")
#productReleaseArr = result
end
You haven't given much to go on, but it appears that #releaseHashTable contains an Array, not a Hash.
Update: Based on the implementation you posted, you can see that productId is an integer and that the return value of readReleaseCSV() is an array.
In order to get the releasename you want, you have to do this:
#releaseHashTable[productId][n][:releasename]
where productId and n are integers. Either you'll have to specify them specifically, or (if you don't know n) you'll have to introduce a loop to collect all the releasenames for all the products of a particular productId.
This is what Mark Thomas meant:
> a = [1,2,3] # => [1, 2, 3]
> a[:sym]
TypeError: can't convert Symbol into Integer
# here starts the backstrace
from (irb):2:in `[]'
from (irb):2
An Array is only accessible by an index like so a[1] this fetches the second element from the array
Your return a an array and thats why your code fails:
#....
result = Array.new
#....
#productReleaseArr = result
# and then later on you call
#releaseHashTable = readReleaseCSV()
#releaseHashTable[:releasename] # which gives you TypeError: can't convert Symbol into Integer