Can i do this code python with snowball? - stemming

The word length is 5. I want to delete the letter in position 0 and the letter in position 3
with python seems like this :
word = word[1:3] + word[4] #this is with python
The question is, How i can do it with snowball ?

Maybe the solution seems like this:
do(
[substring] among (
$p1 ($word_len == 5 delete)
)
markto $p2
[substring] among (
'cha' (atmark==$p2 delete)
)
)

Related

Get line number where first occurrence of a value appears?

I have a CSV file like below:
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
I'm trying to use find the row number where the final column has a value below a certain range. For example, below 0.6.
Using the above CSV file, I want to return 3 because E = 3 is the first row where Mean <= 0.60. If there is no value below 0.6 I want to return 0. I am in effect returning the value in the first column based on the final column.
I plan to initialize this number as a constant in gnuplot. How can this be done? I've tagged awk because I think it's related.
In case you want a gnuplot-only version... if you use a file remove the datablock and replace $Data by your filename in " ".
Edit: You can do it without a dummy table, it can be done shorter with stats (check help stats). Even shorter than the accepted solution (well, we are not at code golf here), but additionally platform-independent because it's gnuplot-only.
Furthermore, in case E could be any number, i.e. 0 as well, then it might be better
to first assign E = NaN and then compare E to NaN (see here: gnuplot: How to compare to NaN?).
Script:
### conditional extraction into a variable
reset session
$Data <<EOD
E Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Mean
1 0.7019 0.6734 0.6599 0.6511 0.701 0.6977 0.680833333
2 0.6421 0.6478 0.6095 0.608 0.6525 0.6285 0.6314
3 0.6039 0.6096 0.563 0.5539 0.6218 0.5716 0.5873
4 0.5564 0.5545 0.5138 0.4962 0.5781 0.5154 0.535733333
5 0.5056 0.4972 0.4704 0.4488 0.5245 0.4694 0.485983333
EOD
E = NaN
stats $Data u ($8<=0.6 && E!=E? E=$1 : 0) nooutput
print E
### end of script
Result:
3.0
Actually, OP wants to return E=0 if the condition was not met. Then the script would be like this:
E=0
stats $Data u ($8<=0.6 && E==0? E=$1 : 0) nooutput
Another awk. You could initialize the default return value to var ret in BEGIN but since it's 0 there is really no point as empty var+0 produces the same effect. If the threshold value of 0.6 is not met before the ENDis reached, that is returned. If it is met, exit invokes the END and ret is output:
$ awk '
NR>1 && $NF<0.6 { # final column has a value below a certain range
ret=$1 # I want to return 3 because E = 3
exit
}
END {
print ret+0
}' file
Output:
3
Something like this should do the trick:
awk 'NR>1 && $8<.6 {print $1;fnd=1;exit}END{if(!fnd){print 0}}' yourfile

Netlogo How to avoid NOBODY Runtime error?

How can I avoid NOBODY Runtime error? The following is sample code. This is a code that can be used for zero division error avoidance. Therefore, I know that it can not be used to avoid NOBODY error. But I can not find any other way. The following is a Runtime error message-> "IFELSE-VALUE expected input to be a TRUE / FALSE but got NOBODY instead." I appreciate your advice.
set top ifelse-value (nobody)
[ 0 ][ top ]
set ts turtles with [speed = 0 and not right-end]
set top max-one-of turtles [who]
set topx [xcor] of top ; A NOBODY error appears in "of" of this code
set L count ts with [xcor > topx]
The input of ifelse-value needs to be a reporter the returns either true or false (full details here. So, if you use nobody as the input, Netlogo does not evaluate whether or not the input is nobody or not, it just reads nobody- in other words your input is not returning either true or false.
For input, then, you need to instead use a boolean variable (one that is either true or false), a to-report that returns true or false, an expression that Netlogo can evaluate, etc. Consider the following examples:
to go
let x true
set top ifelse-value ( x )
["x is true"]
["x is NOT true"]
print ( word "Example 1: " top )
set top ifelse-value ( this-is-true )
["Reporter returned true"]
["Reporter did not return true"]
print ( word "Example 2: " top )
set x nobody
set top ifelse-value ( x = nobody )
["x IS nobody"]
["Is NOT nobody"]
print ( word "Example 3: " top )
set x 0
set top ifelse-value ( x = nobody )
["x IS nobody"]
["x Is NOT nobody"]
print ( word "Example 4: " top )
set top ifelse-value ( nobody = nobody )
["nobody = nobody"]
["nobody != nobody"]
print ( word "Example 5: " top )
end
to-report this-is-true
report true
end

How to make the agent pick the highest value between two values?

guys. I created this procedure in NetLogo for my agents (farmers):
to calculate-deforestation
ask farmers [
set net-family-labor ( family-labor - ( ag-size * cell-labor-ag-keep ) )
set net-family-money ( family-money - ( ag-size * cell-cost-ag-keep ) )
ifelse net-family-labor < 0 or net-family-money < 0
[ set n-aband-cell-labor ( family-labor / cell-labor-ag-keep )
set n-aband-cell-money ( family-money / cell-cost-ag-keep )
set n-aband with-max [ n-aband-cell-labor n-aband-cell-money ]
]
[ set n-def-cell-labor ( net-family-labor / cell-labor-deforest )
set n-def-cell-money ( net-family-money / cell-cost-deforest )
set n-def with-min [ n-def-cell-labor n-def-cell-money ]
]
]
end
For the "n-aband", I would like to get the max value between "n-aband-cell-labor" and "n-aband-cell-money" (either one or the other; the same goes for "n-def"). I know a limited number of NetLogo primitives but the ones I was able to find do not work for my case, for instance, "with-max", "max-n-of", "max-one-of". I am sure there must be one that would work but I am having trouble finding it in the NetLogo dictionary. I wonder if anyone could suggest me one that could work for my case. Thank you in advance.
If you want to get the max value of a list, simply use max. So,
set n-aband max (list n-aband-cell-labor n-aband-cell-money )
will set n-aband to the highest of the two values.

Why does that loop sometimes click randomly on screen?

I have made that loop my self and Iam trying to make it faster, better... but sometimes after it repeat searching for existing... it press random ( i think cuz its not similar to any img iam using in sikuli ) place on the screen. Maybe you will know why.
Part of this loop below
while surowiec_1:
if exists("1451060448708.png", 1) or exists("1451061746632.png", 1):
foo = [w_lewo, w_prawo, w_dol, w_gore]
randomListElement = foo[random.randint(0,len(foo)-1)]
click(randomListElement)
wait(3)
else:
if exists("1450930340868.png", 1 ):
click(hemp)
wait(1)
hemp = exists("1450930340868.png", 1)
elif exists("1451086210167.png", 1):
click(tree)
wait(1)
tree = exists("1451086210167.png", 1)
elif exists("1451022614047.png", 1 ):
hover("1451022614047.png")
click(flower)
flower = exists("1451022614047.png", 1)
elif exists("1451021823366.png", 1 ):
click(fish)
fish = exists("1451021823366.png")
elif exists("1451022083851.png", 1 ):
click(bigfish)
bigfish = exists("1451022083851.png", 1)
else:
foo = [w_lewo, w_prawo, w_dol, w_gore]
randomListElement = foo[random.randint(0,len(foo)-1)]
click(randomListElement)
wait(3)
I wonder if this is just program problem with img recognitions or I have made a mistake.
You call twice the exist method indending to get the same match (the first one in your if statement, the second time to assign it to the value. You ask sikuli to evaluate the image twice, and it can have different results.
From the method's documentation
the best match can be accessed using Region.getLastMatch() afterwards.

Parallel Reading/Writing File in c

Problem is to read a file of size about 20GB simultaneously by n processes. File contains one string at each line and Length of the strings may or may not be same. String length can be at-most 10 bytes long.
I have a cluster of having 16 nodes. Each node are the uni-processor and having 6GB RAM.I am using MPI to write Parallel codes.
What are the efficient way to partition this big file so that all resources can be utilized ?
Note: The constraints to the partitions is to read file as a chunk of fixed number of lines.
Assume file contains 1600 lines(e.g. 1600 strings). then first process should read from 1st line to 100th line, second process should do from 101th line to 200th line and so on....
As i think that one can't read a file by more than one processes at a time because we have only one file handler that point to somewhere only one string. then how other processes can read parallely from different chunks?
So as you're discovering, text file formats are poor for dealing with large amounts of data; not only are they larger than binary formats, but you run into formatting problems like here (seaching for newlines), and everything is much slower (data must be converted into strings). There can easily be 10x difference in IO speeds between text-based formats and binary formats for numerical data. But we'll assume for now you're stuck with the text file format.
Presumably, you're doing this partitioning for speed. But unless you have a parallel filesystem -- that is, multiple servers serving from multiple disks, and a FS that can keep those coordinated -- it's unlikely you're going to get a significant speedup from having multiple MPI tasks reading from the same file, as ultimately these requests are all going to get serialized anyway at the server/controller/disk level.
Further, reading in large blocks of data is going to be much faster than fseek()ing around and doing small reads looking for newlines.
So my suggestion would be to have one process (perhaps the last) read all the data in as few chunks as it can and send the relevant lines to each task (including, finally, itself). If you know how many lines the file has at the start, this is fairly simple; read in say 2 GB of data, search through memory for the end of the N/Pth line, and send that to task 0, send task 0 a "completed your data" message, and continue.
You don't specify if there are any constraints on the partitions, so I'll assume there are none. I'll also assume that you want the partitions to be as close to equal in size as possible.
The naïve approach would be to split the file into chunks of size 20GB/n. The starting position of chunk i wouild be i*20GB/n for i=0..n-1.
The problem with that is, of course, that there's no guarantee that chunk boundaries would fall between the lines of the input file. In general, they won't.
Fortunately, there's an easy way to correct for this. Having established the boundaries as above, shift them slightly so that each of them (except i=0) is placed after the following newline.
That'll involve reading 15 small fragments of the file, but will result in a very even partition.
In fact, the correction can be done by each node individually, but it's probably not worth complicating the explanation with that.
I think it would be better to write a piece of code that would get line lengths and distribute lines to processes. That distributing function would work not with strings themselves, but only their lengths.
To find an algorythm for even distribution of sources of fixed size is not a problem.
And after that the distributing func will tell other processes what pieces they have to get for work. Process 0 (distributor) will read a line. It already knows, that the line num. 1 should be worked by the process 1. ... P.0 reads line num. N and knows what process has to work with it.
Oh! We needn't optimize the distribution from the start. Simply the distributor process reads a new line from input and gives it to a free process. That's all.
So, you have even two solutions: heavily optimized and easy one.
We could reach even more optimalization if the distributor process will reoptimalize the unread yet strings from time to time.
Here is a function in python using mpi and the pypar extension to read the number of lines in a big file using mpi to split up the duties amongst a number of hosts.
def getFileLineCount( file1 ):
import pypar, mmap, os
"""
uses pypar and mpi to speed up counting lines
parameters:
file1 - the file name to count lines
returns:
(line count)
"""
p1 = open( file1, "r" )
f1 = mmap.mmap( p1.fileno(), 0, None, mmap.ACCESS_READ )
#work out file size
fSize = os.stat( file1 ).st_size
#divide up to farm out line counting
chunk = ( fSize / pypar.size() ) + 1
lines = 0
#set start and end locations
seekStart = chunk * ( pypar.rank() )
seekEnd = chunk * ( pypar.rank() + 1 )
if seekEnd > fSize:
seekEnd = fSize
#find start of next line after chunk
if pypar.rank() > 0:
f1.seek( seekStart )
l1 = f1.readline()
seekStart = f1.tell()
#tell previous rank my seek start to make their seek end
if pypar.rank() > 0:
# logging.info( 'Sending to %d, seek start %d' % ( pypar.rank() - 1, seekStart ) )
pypar.send( seekStart, pypar.rank() - 1 )
if pypar.rank() < pypar.size() - 1:
seekEnd = pypar.receive( pypar.rank() + 1 )
# logging.info( 'Receiving from %d, seek end %d' % ( pypar.rank() + 1, seekEnd ) )
f1.seek( seekStart )
logging.info( 'Calculating line lengths and positions from file byte %d to %d' % ( seekStart, seekEnd ) )
l1 = f1.readline()
prevLine = l1
while len( l1 ) > 0:
lines += 1
l1 = f1.readline()
if f1.tell() > seekEnd or len( l1 ) == 0:
break
prevLine = l1
#while
f1.close()
p1.close()
if pypar.rank() == 0:
logging.info( 'Receiving line info' )
for p in range( 1, pypar.size() ):
lines += pypar.receive( p )
else:
logging.info( 'Sending my line info' )
pypar.send( lines, 0 )
lines = pypar.broadcast( lines )
return ( lines )

Resources