Redis store geo infomation, - ruby

I just meet a problem. I use redis to store the geo information. for
example:
hset 10001 la 41.000333
hset 10001 lo 121.999999
or
zadd la 41.xxxxx pk-value
zadd lo 121.xxxxx pk-value
about 40000 key-values
the key is for the terminal id, and value is set, storing the termianl
gps info.
I have a requirement to computing the around terminal.
for example, my location is 41.000123, 121.999988, and I want to the
fastest compute the terminal around my location, I have idea how to
compute the two location's distance.
All I want is to think a way to fast iterate all data. In Redis 2.6 there is lua support. Can it help to resolve my problem?

You probably want to use geohashes, then you will be able to store (and search by) lon/lat with any precision you want, also it's relatively easy to get points which are in given bounding box.
For implementation with redis, have a look at geodis.

As I understand your question, you want to find all values close to some coordinates? One way would be to use Lua scripting, another would be to store one sorted set for each approximate latitude/longitude (if you know in advance which granularity you require). Example:
zadd la.41 41.000333 pk-value
zadd lo.121 121.999999 pk-value
Then, when you need to find something close to some coords (let's say (42.01, 122.03)), you would do something like:
lat = 42.01
lon = 122.03
lat_min, lat_mid, lat_max = round(lat - 1), round(lat), round(lat + 1)
lon_min, lon_mid, lon_max = round(lon - 1), round(lon), round(lon + 1)
Thus, you would look in the sorted sets la.41, la.42, la.43, lo.121, lo.122, lo.123:
zinterstore close.${lat},${lon} 6 la.${lat_min}, la.${lat_mid}, la.${lat_max}, lo.${lon_min}, lo.${lon_mid}, lo.${lon_max}
Now, close.${lat},${lon} should contain the id of every terminal close to the supplied coordinates.
Obviously, you could store each coordinate with greater granularity, like la.41.0, lo.121.0 and look only for terminals that close. Optionally, you could further filter the result in your client code.

Related

Random choice in Anylogic

I want to select a random number from [0,1,2,3] in a probability of [0.1, 0.2, 0.3, 0.4] in anylogic. This can be easily done in Python by using numpy.random.choice. However, I couldn't find someway to do this in anylogic. I don't want to use the customized distribution since I want to apply it to many agents in different parameters.
I am looking for someway to do this in anylogic.
you can do this with customDistributions
YOu can just get the value by calling customDistribution()
If you want to do it in a more flexible way:
Create a variable called cd in your agent of type CustomDistribution and then you can do something like this:
int[]x={0,1,2,3,4};//you need an extra number to complete the interval
double[]y={0.1,0.2,0.3,0.4,0};//extra number has probability 0
cd=new CustomDistribution(x,y,new Random());
to get the random value you do
roundToInt(cd.get(new Random()));

(Using Julia) How can I reduce my data matrix by averaging values from the same hour?

I am trying to reduce the size of my data and I cannot make it work. I have data points taken every minute over 1 month. I want to reduce this data to have one sample for every hour. The problem is: Some of my runs have "NA" value, so I delete these rows. There is not exactly 60 points for every hour - it varies.
I have a 'Timestamp' column. I have used this to make a 'datehour' column which has the same value if the data set has the same date and hour. I want to average all the values with the same 'datehour' value.
How can I do this? I have tried using the if and for loop below, but it takes so long to run.
Thanks for all your help! I am new to Julia and come from a Matlab background.
======= CODE ==========
uniquedatehour=unique(datehour,1)
index=[]
avedata=reshape([],0,length(alldata[1,:]))
for j in uniquedatehour
for i in 1:length(datehour)
if datehour[i]==j
index=vcat(index,i)
else
rows=alldata[index,:]
rows=convert(Array{Float64,2},rows)
avehour=mean(rows,1)
avedata=vcat(avedata,avehour)
index=[]
continue
end
end
end
There are several layers to optimizing this code. I am assuming that your data is sorted on datehour (your code assumes this).
Layer one: general recommendation
Wrap your code in a function. Executing code in global scope in Julia is much slower than within a function. By wrapping it make sure to either pass data to your function as arguments or if data is in global scope it should be qualified with const;
Layer two: recommendations to your algorithm
Statement like [] creates an array of type Any which is slow, you should use type qualifier like index=Int[] to make it fast;
Using vcat like index=vcat(index,i) is inefficient, it is better to do push!(index, i) in place;
It is better to preallocate avedata with e.g. fill(NA, length(uniquedatehour), size(alldata, 2)) and assign values to an existing matrix than to do vcat on it;
Your code will produce incorrect results if I am not mistaken as it will not catch the last entry of uniquedatehour vector (assume it has only one element and check what happens - avedata will have zero rows)
Line rows=convert(Array{Float64,2},rows) is probably not needed at all. If alldata is not Matrix{Float64} it is better to convert it at the beginning with Matrix{Float64}(alldata);
You can change line rows=alldata[index,:] to a view like view(alldata, index, :) to avoid allocation;
In general you can avoid creation of index vector as it is enough that you remember start s and end e position of the range of the same values and then use range s:e to select rows you want.
If you correct those things please post your updated code and maybe I can help further as there is still room for improvement but requires a bit different algorithmic approach (but maybe you will prefer option below for simplicity).
Layer three: how I would do it
I would use DataFrames package to handle this problem like this:
using DataFrames
df = DataFrame(alldata) # assuming alldata is Matrix{Float64}, otherwise convert it here
df[:grouping] = datehour
agg = aggregate(df, :grouping, mean) # maybe this is all what you need if DataFrame is OK for you
Matrix(agg[2:end]) # here is how you can convert DataFrame back to a matrix
This is not the fastest solution (as it converts to a DataFrame and back but it is much simpler for me).

Using redis to store a structured event log

I'm a bit new to Redis, so please forgive if this is basic.
I'm working on an app that sends automatic replies to users for certain events. I would like to use Redis to store who has received what event.
Essentially, in ruby, the data structure could look like this where you have a map of users to events and the dates that each event was sent.
{
"mary#example.com" => {
"sent_comment_reply" => ["12/12/2014", "3/6/2015"],
"added_post_reply" => ["1/4/2006", "7/1/2016"]
}
}
What is the best way to represent this in a Redis data structure so you can ask, did Mary get a sent_comment_reply? and if so, when was the latest?
In short, the question is, how(if possible) can you have a Hash structure that holds an array in Redis.
The rationale as opposed to using a set or list with a compound key is that hashes have O(1) lookup time, whereas lookups on lists(lrange) and sets(smembers) will be O(s+n) and sets O(n), respectively.
One way of structuring it in Redis, depending on the idea that you know the events of the user and you want the latest to be fresh in memory :
A sorted set per user. the content of the sorted set will be event codes; sent_comment_reply, added_post_reply with the score of the latest event as the highest. you can use ZRANK to get the answer for the question :
Did Mary get a sent_comment_reply?
A hash also for the user, this time you will have the field as the event sent_comment_reply and the value is the content of it which should be updated with the latest value including the body, date, etc. this will answer the question:
and if so, when was the latest?
Note: Sorted sets are really fast , and in this example we are depending on the events as the data.
With sorted sets you can add, remove, or update elements in a very
fast way (in a time proportional to the logarithm of the number of
elements). Since elements are taken in order and not ordered
afterwards, you can also get ranges by score or by rank (position) in
a very fast way. Accessing the middle of a sorted set is also very
fast, so you can use Sorted Sets as a smart list of non repeating
elements where you can quickly access everything you need: elements in
order, fast existence test, fast access to elements in the middle!
A possible approach to use a hash to map an array is as follows:
add_element(key , value):
len := redis.hlen(key)
redis.hset(key , len , value)
this will map array[i] element to i field in a hash key.
this will work for some cases, but I would probably go with the answer suggested in https://stackoverflow.com/a/34886801/2868839

Condense nested for loop to improve processing time with text analysis python

I am working on an untrained classifier model. I am working in Python 2.7. I have a loop. It looks like this:
features = [0 for i in xrange(len(dictionary))]
for bgrm in new_scored:
for i in xrange(len(dictionary)):
if bgrm[0] == dictionary[i]:
features[i] = int(bgrm[1])
break
I have a "dictionary" of bigrams that I have collected from a data set containing customer reviews and I would like to construct feature arrays of each review corresponding to the dictionary I have created. It would contain the frequencies of the bigrams found within the review of the features in the dictionary (I hope that makes sense). new_scored is a list of tuples which contains the bigrams found within a particular review paired with their relative frequency of occurrence in that review. The final feature arrays will be the same length as the original dictionary with few non zero entries.
The above works fine but I am looking at a data set of 13000 reviews, for each review to loop through this code is going to take for eeever (if my computer doesnt run out of RAM first). I have been sitting with it for a while and cannot see how I can condense it.
I am very new to python so I was hoping a more experienced could help with condensing it or perhaps point me in the right direction towards a library that will contain the function I need.
Thank you in advance!
Consider making dictionary an actual dict object (or some fancier subclass of dict if it better suits your needs), as opposed to an iterable (list or tuple seems like what it is now). dictionary could map bigrams as keys to an integer identifier that would identify a feature position.
If you refactor dictionary that way, then the loop can be rewritten as:
features = [0 for key in dictionary]
for bgram in new_scored:
try:
features[dictionary[bgram[0]]] = int(bgrm[1])
except KeyError:
# do something if the bigram is not in the dictionary for some reason
This should convert what was an O(n) traversal through dictionary into a hash lookup.
Hope this helps.

How do I create random in Eclipse?

I have 10 images and I want them to be prompt out randomly. What code should I use? I'm still an amateur so I hope I could get some answers here... The images are named like 'pic1', 'pic2' and so on. is it possible to get the number from the file name and use Math.random() ?
Store the pics in an Array.
Say array index is from 0 to 9
Use Java Math.random to get random number.
You need to multiply its result by 10 get indexes in your range

Resources