Searching a list and displaying whether or not the list contains the search variable - for-loop

I have a Twitter manager program where one option is searching for tweets containing whatever input the user provides. I am unable to figure out a way to have the for loop skip the else statement when there is a match found in the list. As it works now, the program will print the tweet it finds that matches the search criteria, but also prints that no tweet is found containing the search criteria.
# Option 3, search tweet_list for specific content and display if found
# in descending order starting with the most recent.
elif count == 3:
tweet_list.reverse()
if len(tweet_list) == 0:
print('There are no tweets to search.', '\n')
else:
search = input('What would you like to search for? ')
print('\nSearch Results')
print('--------------')
for n in tweet_list:
a = n.get_author()
b = n.get_text()
c = n.get_age()
if search in a or search in b:
print(a + '-' + c + '\n' + b + '\n')
else:
print('No tweets contain "' + search + '".' + '\n')
print('')

Create a variable, say found, and initialize it to False. When you find a tweet, set it to True. Then replace the else: with if not found:.

Related

Sort an array depending on given set of characters within each element

I have an array made of a number of elements read from a .txt file. Each element is long and has a lot of information, for instance:
20201102066000000000000000000000000020052IC04008409Z8000000000030546676591AFIP
All the lines are already part of the array lines_array, but I need to sort them out depending on the content of the 36° character until the 51°, which in the example provided above would be:
20052IC04008409Z
I was already able to catch the patterns from each of the elements in the array:
lines_array = File.readlines(complete_filename)
pattern = nil
lines_array.each do |line|
pattern = line[36] + line[37] + line[38] + line[39] + line[40] + line[41] + line[42] + line[43] +
line[44] + line[45] + line[46] + line[47] + line[48] + line[49] + line[50] + line[51]
end
What I need to do now is to be able to sort alphabetically all the elements of the array (with the long elements) based on the content of the variable variable pattern. I tried with methods sort and sort_by but I wasn't able to pass my variable pattern as a parameter. For example, a correct order of three given elements would be:
20201102066000000000000000000000000020001IC04180127X8000000000030546676591AFIP
20201104066000000000000000000000000020001IC04182757T8000000000030546676591AFIP
20201102066000000000000000000000000020001IC05020641D8000000000030546676591AFIP
Any help?
Firstly, there are easier and cleaner ways to extract the substring you're after, you could use String#[] or String#slice:
# These do the same thing, use whichever reads better to you.
pattern = line[36, 16]
pattern = line.slice(36, 16)
pattern = line[36..51]
pattern = line.slice(36..51)
Then you can Enumerable#sort_by on that slice by using a block with sort_by:
sorted = lines_array.sort_by { |str| str[36, 16] }

How to find text across HTML tag boundaries?

I have HTML like this:
<div>Lorem ipsum <b>dolor sit</b> amet.</div>
How can I find a plain text based match for my search string ipsum dolor in this HTML? I need the start and end XPath node pointers for the match, plus character indexes to point inside these start and stop nodes. I use Nokogiri to work with the DOM, but any solution for Ruby is fine.
Difficulty:
I can't node.traverse {|node| … } through the DOM and do a plain text search whenever a text node comes across, because my search string can cross tag boundaries.
I can't do a plain text search after converting the HTML to plain text, because I need the XPath indexes as result.
I could implement it myself with basic tree traversal, but before I do I'm asking if there is a Nokogiri function or trick to do it more comfortably.
You could do something like:
doc.search('div').find{|div| div.text[/ipsum dolor/]}
In the end, we used code as follows. It is shown for the example given in the question, but also works in the generic case of arbitrary-depth HTML tag nesting. (Which is what we need.)
In addition, we implemented it in a way that can ignore excess (≥2) whitespace characters in a row. Which is why we have to search for the end of the match and can't just use the length of the search string / quote and the start of the match position: the number of whitespace characters in the search string and search match might differ.
doc = Nokogiri::HTML.fragment("<div>Lorem ipsum <b>dolor sit</b> amet.</div>")
quote = 'ipsum dolor'
# (1) Find search string in document text, "plain text in plain text".
quote_query =
quote.split(/[[:space:]]+/).map { |w| Regexp.quote(w) }.join('[[:space:]]+')
start_index = doc.text.index(/#{quote_query}/i)
end_index = start_index+doc.text[/#{quote_query}/i].size
# (2) Find XPath values and character indexes for our search match.
#
# To do this, walk through all text nodes and count characters until
# encountering both the start_index and end_index character counts
# of our search match.
start_xpath, start_offset, end_xpath, end_offset = nil
i = 0
doc.xpath('.//text() | text()').each do |x|
 offset = 0
 x.text.split('').each do
   if i == start_index
     e = x.previous
     sum = 0
     while e
       sum+= e.text.size
       e = e.previous
     end
     start_xpath = x.path.gsub(/^\?/, '').gsub(
/#{Regexp.quote('/text()')}.*$/, ''
)
     start_offset = offset+sum
   elsif i+1 == end_index
     e = x.previous
     sum = 0
     while e
       sum+= e.text.size
       e = e.previous
     end
     end_xpath = x.path.gsub(/^\?/, '').gsub(
/#{Regexp.quote('/text()')}.*$/, ''
)
     end_offset = offset+1+sum
   end
   offset+=1
   i+=1
 end
end
At this point, we can retrieve the desired XPath values for the start and stop of the search match (and in addition, character offsets pointing to the exact character inside the XPath designated element for the start and stop of the search match). We get:
puts start_xpath
/div
puts start_offset
6
puts end_xpath
/div/b
puts end_offset
5

Searching for a word within a list of lists

I'm trying to write a function that will search a list of lists to find a word or fragment given by the user and return all units including that word.
here's what I have so far:
def cat(dogs):
"""
searches for cat names in dogs
"""
search = raw_input("search for: ")
word = search[0].upper() + search[1:]
for i in range(len(dogs)):
if word in dogs[i]:
print "yes"
else:
print "sorry, nothing found"
return
how do I fix this?
Thanks so much!!
'''
searches for companies with the word input and gives back full
company names and ticker tags
'''
def company(stockdata):
searchString = raw_input("search for: ")
flatList = [item.upper() for sublist in stockdata for item in sublist]
if any(searchString.upper() in s for s in flatList):
print "Yes"
else:
print "sorry, nothing found"
I would suggest that you should convert your both string and the strings in stockdata to uppercase in the search process so it'd be able to detect both Computer and computer
You should also print sorry, not found if no result was found and I've added a results variable to see the search results.
def company(stockdata):
"""
searches for companies with the word inputted and gives back full
company names and ticker tags
"""
found = False
results = []
search = raw_input("search for: ")
word = search.upper()
for i in range(len(stockdata)):
if word in stockdata[i][0].upper() or word in stockdata[i][1].upper(): # I've used stockdata[i][0] because the string is in a list of list
found = True
results.append(stockdata[i][0])
if found:
print 'yes'
print 'results : ',results
else:
print "sorry, nothing found"
stock = [['AAME', 'Atlantic American Corporation', '2013-11-04', 4.04, 4.05, 4.01, 4.05, 5400.0, 4.05], ['AAON', 'AAON Inc.', '2013-11-04', 27.28, 27.48, 27.08, 27.32, 96300.0, 27.32], ['AAPL', 'Apple Inc.', '2013-11-04', 521.1, 526.82, 518.81, 526.75, 8716100.0, 526.75], ['AAWW', 'Atlas Air Worldwide Holdings', '2013-11-04', 38.65, 39.48, 38.65, 38.93, 490500.0, 38.93], ['AAXJ', 'iShares MSCI All Country Asia ex Japan Index Fund', '2013-11-04', 60.55, 60.55, 60.3, 60.48, 260300.0, 60.48], ['ABAX', 'ABAXIS Inc.', '2013-11-04', 36.01, 36.91, 35.89, 36.2, 208300.0, 36.2]]
company(stock)
produces: for the search term abaxis
yes
results : ['ABAX']
Note: please provide a sample of your stock data list which is passed onto the function if possible as to ensure this works
If you're searching a list of lists, you need another for loop unless I'm misunderstanding your question. K DawG has given the best answer so far. Unfortunately I can't upvote it.
def company(stockdata):
search = raw_input("search for: ")
word = search.upper()
for i in range(len(stockdata)):
for j in range(len(stockdata[i])):
if word in stockdata[i][j].upper():
print stockdata[i][j]
else:
print "sorry, nothing found"
return
data = [["computer", "cheese"], ["apple"], ["mac Computers"]]
company(data)
returns:
computer
sorry, nothing found
sorry, nothing found
mac Computers

Programming concept

I want to make a program that sort mail from junkmail using a point--system.
For some couple of words in the mail,
I want the program to give different points for each word that I have in my program categorized as "junkwords" where I also have assign different points for different words, so that each word is worth some amount of points.
My pseudocode:
Read text from file
Look for "junk words"
for each word that comes up give the point the word is worth.
If the total points for each junkword is 10 print "SPAM" followed by a list of words that were in the file and categorized as junkwords and their points.
Example (a textfile):
Hello!
Do you have trouble sleeping?
Do you need to rest?
Then dont hesitate call us for the absolute solution- without charge!
So when the programs run and analyzes the text above it should look like:
SPAM 14p
trouble 6p
charge 3p
solution 5p
So what I was planing to write was in this manners:
class junk(object):
fil = open("filnamne.txt","r")
junkwords = {"trouble":"6p","solution":"3p","virus":"4p"}
words = junkwords
if words in fil:
print("SPAM")
else:
print("The file doesn't contain any junk")
So my problem now is how do I give points for each word in my list that comes up in the file?
And how to I sum the total points so that if total_points are > 10 then the program should print "SPAM",
Followed by the list of the 'junkwords' that are found in the file and the total points of each word..
Here is a quick script that might get you close to there:
MAXPOINTS = 10
JUNKWORDS={"trouble":6,"solution":5,"charge":3,"virus":7}
fil = open("filnamne.txt", "r")
foundwords = {}
points = 0
for word in fil.read().split():
if word in JUNKWORDS:
if word not in foundwords:
foundwords[word] = 0
points += JUNKWORDS[word]
foundwords[word] += 1
if points > 10:
print "SPAM"
for word in foundwords:
print word, foundwords[word]*JUNKWORDS[word]
else:
print "The file doesn't contain any junk"
You may want to use .lower() on the words and make all your dictionary keys lowercase. Maybe also remove all non-alphanumeric characters.
Here's another approach:
from collections import Counter
word_points = {'trouble': 6, 'solution': 5, 'charge': 3, 'virus': 7}
words = []
with open('ham.txt') as f:
for line in f:
if line.strip(): # weed out empty lines
for word in line.split():
words.append(word)
count_of_words = Counter(words)
total_points = {}
for word in word_points:
if word in count_of_words:
total_points[word] = word_points[word] * count_of_words[word]
if sum(i[0] for i in total_points.iteritems()) > 10:
print 'SPAM {}'.format(sum(i[0] for i in total_points.iteritems()))
for i in total_points.iteritems():
print 'Word: {} Points: {}'.format(*i)
There are some optimizations you can do, but it should give you an idea of the general logic. Counter is available from Python 2.7 and above.
I have assumed that each word has different points, so I have used a dictionary.
You need to find the number of times a word in words has come in the file.
You should store the point for each word as an integer. not as '6p' or '4p'
So, try this:
def find_junk(filename):
word_points = {"trouble":6,"solution":3,"charge":2,"virus":4}
word_count = {word:0 for word in word_points}
count = 0
found = []
with open(filename) as f:
for line in f:
line = line.lower()
for word in word_points:
c = line.count(word)
if c > 0:
count += c * word_points[word]
found.append(word)
word_count[word] += c
if count >= 10:
print ' SPAM'*4
for word in found:
print '%10s%3s%3s' % (word, word_points[word], word_count[word])
else:
print "Not spam"
find_junk('spam.txt')

Check for dates consistency in MATLAB

Is there any straight forward way to do that? I want to give an array of dates as an input (for example 1997-01-02 1997-01-03... using the format yyyy-mm-dd) and get 1 if all the elements of the given array are consistent and 0 otherwise.
Any idea?
Here is one idea:
d = {
'1997-01-02'
'1997-01-03'
'1111-99-99'
'not a date'
}
isDateValid = false(size(d));
for i=1:numel(d)
try
str = datestr(datenum(d{i},'yyyy-mm-dd'),'yyyy-mm-dd');
isDateValid(i) = isequal(str,d{i});
catch ME
end
end
The result:
>> isDateValid
isDateValid =
1
1
0
0
The reason I do the conversion back and forth is that MATLAB will carry values outside the normal range of fields to the next one -- third example will actually be parsed as: 1119-06-07. While the last one will throw an exception
Many ways to do this using regexp. A couple of simple ones:
str = '1917-01-23';
regexp(str,'\d\d\d\d-\d\d-\d\d')
ans =
1
If the string matches exactly that pattern, you will get 1, else empty.
Or do this:
regexp(str,'-','split')
ans =
'1917' '01' '23'
Now you can verify the first piece is a valid year, the second a valid month, etc.

Resources