Searching for a word within a list of lists - sublist

I'm trying to write a function that will search a list of lists to find a word or fragment given by the user and return all units including that word.
here's what I have so far:
def cat(dogs):
"""
searches for cat names in dogs
"""
search = raw_input("search for: ")
word = search[0].upper() + search[1:]
for i in range(len(dogs)):
if word in dogs[i]:
print "yes"
else:
print "sorry, nothing found"
return
how do I fix this?
Thanks so much!!

'''
searches for companies with the word input and gives back full
company names and ticker tags
'''
def company(stockdata):
searchString = raw_input("search for: ")
flatList = [item.upper() for sublist in stockdata for item in sublist]
if any(searchString.upper() in s for s in flatList):
print "Yes"
else:
print "sorry, nothing found"

I would suggest that you should convert your both string and the strings in stockdata to uppercase in the search process so it'd be able to detect both Computer and computer
You should also print sorry, not found if no result was found and I've added a results variable to see the search results.
def company(stockdata):
"""
searches for companies with the word inputted and gives back full
company names and ticker tags
"""
found = False
results = []
search = raw_input("search for: ")
word = search.upper()
for i in range(len(stockdata)):
if word in stockdata[i][0].upper() or word in stockdata[i][1].upper(): # I've used stockdata[i][0] because the string is in a list of list
found = True
results.append(stockdata[i][0])
if found:
print 'yes'
print 'results : ',results
else:
print "sorry, nothing found"
stock = [['AAME', 'Atlantic American Corporation', '2013-11-04', 4.04, 4.05, 4.01, 4.05, 5400.0, 4.05], ['AAON', 'AAON Inc.', '2013-11-04', 27.28, 27.48, 27.08, 27.32, 96300.0, 27.32], ['AAPL', 'Apple Inc.', '2013-11-04', 521.1, 526.82, 518.81, 526.75, 8716100.0, 526.75], ['AAWW', 'Atlas Air Worldwide Holdings', '2013-11-04', 38.65, 39.48, 38.65, 38.93, 490500.0, 38.93], ['AAXJ', 'iShares MSCI All Country Asia ex Japan Index Fund', '2013-11-04', 60.55, 60.55, 60.3, 60.48, 260300.0, 60.48], ['ABAX', 'ABAXIS Inc.', '2013-11-04', 36.01, 36.91, 35.89, 36.2, 208300.0, 36.2]]
company(stock)
produces: for the search term abaxis
yes
results : ['ABAX']
Note: please provide a sample of your stock data list which is passed onto the function if possible as to ensure this works

If you're searching a list of lists, you need another for loop unless I'm misunderstanding your question. K DawG has given the best answer so far. Unfortunately I can't upvote it.
def company(stockdata):
search = raw_input("search for: ")
word = search.upper()
for i in range(len(stockdata)):
for j in range(len(stockdata[i])):
if word in stockdata[i][j].upper():
print stockdata[i][j]
else:
print "sorry, nothing found"
return
data = [["computer", "cheese"], ["apple"], ["mac Computers"]]
company(data)
returns:
computer
sorry, nothing found
sorry, nothing found
mac Computers

Related

Check 4 chars with regex Ruby console input

I have 4 chars, first one is letter 'L' for example, the other two are numbers and the last one is letter again, all of them are separated by one space. User is entering them in the Ruby console. I need to check that they are separated by one space and don't have other weird characters and that there is nothing after the last letter.
So if a user enters for example gets.chomp = 'L 5 7 A', I need to check that everything is ok and separated by only one space and return input[1], input[2], input[3]. How can I do that? Thanks.
You can do something like this:
puts "Enter string"
input = gets.chomp
r = /^(L)\s(\d)\s(\d)\s([A-Z])$/
matches = input.match r
puts matches ? "inputs: #{$1}, #{$2}, #{$3}, #{$4}" : "input-format incorrect"
Here $1 is the first capture, similarly for $2, $3 etc. If you want to store the result in an array you can use:
matches = input.match(r).to_a
then the first element is the entire match, followed by each capture.
Try
/^\w\s(\d)\s(\d)\s(\w)$/
Rubular is a good sandbox site for experimenting with and debugging regexes.

Searching a list and displaying whether or not the list contains the search variable

I have a Twitter manager program where one option is searching for tweets containing whatever input the user provides. I am unable to figure out a way to have the for loop skip the else statement when there is a match found in the list. As it works now, the program will print the tweet it finds that matches the search criteria, but also prints that no tweet is found containing the search criteria.
# Option 3, search tweet_list for specific content and display if found
# in descending order starting with the most recent.
elif count == 3:
tweet_list.reverse()
if len(tweet_list) == 0:
print('There are no tweets to search.', '\n')
else:
search = input('What would you like to search for? ')
print('\nSearch Results')
print('--------------')
for n in tweet_list:
a = n.get_author()
b = n.get_text()
c = n.get_age()
if search in a or search in b:
print(a + '-' + c + '\n' + b + '\n')
else:
print('No tweets contain "' + search + '".' + '\n')
print('')
Create a variable, say found, and initialize it to False. When you find a tweet, set it to True. Then replace the else: with if not found:.

RegEx to remove new line characters and replace with comma

I scraped a website using Nokogiri and after using xpath I was left with the following string (which is a few td's pushed into one string).
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
My goal is to make this into an array that looks like the following(it will be a nested array):
["Total First Downs", "359", "274"]
The issue is creating a regex equation that removes the escaped characters, subs in one "," but does not sub in a "," after the last set of integers. If the comma after the last set of integers is necessary, I could use #compact to get rid of the nil that occurs in the array. If you need the code on how I scraped the website here it is: (please note i saved the webpage for testing in order for my ip address to not get burned during the trial phase)
f = File.open('page')
doc = Nokogiri::HTML:(f)
f.close
number = doc.xpath('//tr[#class="tbdy1"]').count
stats = Array.new(number) {Array.new}
i = 0
doc.xpath('//tr[#class="tbdy1"]').each do |tr|
stats[i] << tr.text
i += 1
end
Thanks for your help
I don't fully understand your problem, but the result can be easily achieved with this:
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
.split(/[\n\t]+/)
# => ["Total First Downs", "359", "274"]
Try with gsub
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t".gsub("/[\n\t]+/",",")

Programming concept

I want to make a program that sort mail from junkmail using a point--system.
For some couple of words in the mail,
I want the program to give different points for each word that I have in my program categorized as "junkwords" where I also have assign different points for different words, so that each word is worth some amount of points.
My pseudocode:
Read text from file
Look for "junk words"
for each word that comes up give the point the word is worth.
If the total points for each junkword is 10 print "SPAM" followed by a list of words that were in the file and categorized as junkwords and their points.
Example (a textfile):
Hello!
Do you have trouble sleeping?
Do you need to rest?
Then dont hesitate call us for the absolute solution- without charge!
So when the programs run and analyzes the text above it should look like:
SPAM 14p
trouble 6p
charge 3p
solution 5p
So what I was planing to write was in this manners:
class junk(object):
fil = open("filnamne.txt","r")
junkwords = {"trouble":"6p","solution":"3p","virus":"4p"}
words = junkwords
if words in fil:
print("SPAM")
else:
print("The file doesn't contain any junk")
So my problem now is how do I give points for each word in my list that comes up in the file?
And how to I sum the total points so that if total_points are > 10 then the program should print "SPAM",
Followed by the list of the 'junkwords' that are found in the file and the total points of each word..
Here is a quick script that might get you close to there:
MAXPOINTS = 10
JUNKWORDS={"trouble":6,"solution":5,"charge":3,"virus":7}
fil = open("filnamne.txt", "r")
foundwords = {}
points = 0
for word in fil.read().split():
if word in JUNKWORDS:
if word not in foundwords:
foundwords[word] = 0
points += JUNKWORDS[word]
foundwords[word] += 1
if points > 10:
print "SPAM"
for word in foundwords:
print word, foundwords[word]*JUNKWORDS[word]
else:
print "The file doesn't contain any junk"
You may want to use .lower() on the words and make all your dictionary keys lowercase. Maybe also remove all non-alphanumeric characters.
Here's another approach:
from collections import Counter
word_points = {'trouble': 6, 'solution': 5, 'charge': 3, 'virus': 7}
words = []
with open('ham.txt') as f:
for line in f:
if line.strip(): # weed out empty lines
for word in line.split():
words.append(word)
count_of_words = Counter(words)
total_points = {}
for word in word_points:
if word in count_of_words:
total_points[word] = word_points[word] * count_of_words[word]
if sum(i[0] for i in total_points.iteritems()) > 10:
print 'SPAM {}'.format(sum(i[0] for i in total_points.iteritems()))
for i in total_points.iteritems():
print 'Word: {} Points: {}'.format(*i)
There are some optimizations you can do, but it should give you an idea of the general logic. Counter is available from Python 2.7 and above.
I have assumed that each word has different points, so I have used a dictionary.
You need to find the number of times a word in words has come in the file.
You should store the point for each word as an integer. not as '6p' or '4p'
So, try this:
def find_junk(filename):
word_points = {"trouble":6,"solution":3,"charge":2,"virus":4}
word_count = {word:0 for word in word_points}
count = 0
found = []
with open(filename) as f:
for line in f:
line = line.lower()
for word in word_points:
c = line.count(word)
if c > 0:
count += c * word_points[word]
found.append(word)
word_count[word] += c
if count >= 10:
print ' SPAM'*4
for word in found:
print '%10s%3s%3s' % (word, word_points[word], word_count[word])
else:
print "Not spam"
find_junk('spam.txt')

How to do this in regex in ruby?

Format of string is any of the following... language is ruby
#word > subcategory
#word word > sub / category
#word > sub category
#word word > subcategory
I just want to match the "word" or "word word" (two words with a space)
So far I have this but its not matching the space
scan(/#([^ ]*)/)[0]
Also, for the second one it appears to be working however certain phrases arent matching even though they're identical. I have no idea why. Is there something wrong with the following? (this is to match "subcategory" or "sub category"
scan(/.* > (.*)$/)[0]
The first portion is letters only, the second portion can have any number of spaces, words, characters like / or _
Try this:
^#([^>]*)
[^>]* will match anything until the first > (or the end of the text).
^ is not really needed, but it may protect you from mistakes (for example, if the category contains another hash sign)
Working example: http://rubular.com/r/LO6T9AV3rp
Note that you can match both the word and the category on the same match, for example, using the pattern:
^#([^>]*)>(.*)$
You can capture both groups, and use them:
s = "#word word > sub / category"
m = s.scan(/^#([^>]*)>(.*)$/)
puts m[0]
puts m[1]
Working example: http://ideone.com/SPlvm
I don't quite understand your question.
Do you want to retrive XXX and YYY in the form of "#XXX > YYY"?
In that case, following regular expression will help:
scan(/#([^>]*?) *> *(.*)$/)
For example:
> "#world world > sub / category".scan(/#([^>]*?) *> *(.*)$/)
=> [["world world", "sub / category"]]

Resources