insert the same value twice in a column called 'mer_value' of a shapefile - insert

Hi there, i want to insert a value between 0 and the output of function total polygons twice to the column of 'mer_value'(from the bottom left) of a shapefile so that the print output would look some thing like this.
mer_value
0.0
0.0
1.0
1.0
2.0
2.0
....
right now it only inserts the value from 0.0 all the way to the end repeating only once, like the below example.
mer_value
0.0
1.0
2.0
.....
any help is appreciated
{
import geopandas as gpd
number_of_geocells = 2
#create a funtion to read the total polygons in a given shapefile and divide it by the number of geocells and store the value in a new varialbe called "total_polygons"
def total_polygons(shapefile):
total_polygons = gpd.read_file(shapefile).shape[0] / number_of_geocells
total_geocells_per_zone_rounded = round(total_polygons)
return total_geocells_per_zone_rounded
#create a function to duplicate the shapefile and save it as a new shapefile called "zones"
def duplicate_shapefile(shapefile):
zones = gpd.read_file(shapefile)
zones.to_file("C:/tmp/shapefiletotestautomation/zones.shp")
return zones
#create a function to create a colum called 'mer_value' in the shapefile called "zones"
def create_merge_value(shapefile):
zones = gpd.read_file(shapefile)
zones['mer_value'] = zones.index
zones.to_file("C:/tmp/shapefiletotestautomation/zones.shp")
return zones
#create a loop to insert the same number twice to column "mer_value" in the shapefile called "zones"
def insert_merge_value(shapefile):
zones = gpd.read_file(shapefile)
for i in range(total_polygons(shapefile)):
zones.loc[i, 'mer_value'] = i
zones.loc[i + number_of_geocells, 'mer_value'] = i
zones.to_file("C:/tmp/shapefiletotestautomation/zones.shp")
return zones
print(insert_merge_value("C:/tmp/shapefiletotestautomation/m/GEOCELLS.shp"))
}

Related

DolphinDB: how to divide time period based on given conditions?

Suppose the given stock data are in 1-minute intervals. I’m trying to perform a time split for around every 1.5 million shares traded for each stock so as to obtain data records with different time windows. In this case, around every 1.5 million means that the value at a time point should be added if the sum_volume could be closer to 1.5 million after adding it. Otherwise, the value should not be added.
It can be achieved by the following script. The key to group data lies in the expression: iif(accumulate(caclCumVol{1500000}, volume) ==volume, time, NULL).ffill()
//Define an accumulate function `calcCumVol`. If the value at this point should be included in the current group, the function returns the accumulated volume. Otherwise, it creates a new group and returns the volume at the current point.
def caclCumVol(target, a, b){
newVal = a + b
if(newVal < target) return newVal
else if(newVal - target > target - a) return b
else return newVal
}
// import data
t = loadText("f:/DolphinDB/sample.csv")
// The key lies in the the expression iif(accumulate(caclCumVol{1500000), volume) == volume, time, NULL).ffill()
// If the accumulated value == volume, it means a new group begins.
// If it is the start of a new group, record the current time. Otherwise leave it NULL and fill it with function `ffill`. Therefore, the data in the same group all use the same start time.
output = select first(wind_code) as wind_code, first(date) as date, sum(volume) as sum_volume, last(time) as endTime from t group by iif(accumulate(caclCumVol{1500000}, volume) ==volume, time, NULL).ffill() as startTime

Using nearest neighbour to find postcode to new postcodes found

I have a list of new postcodes and I'm trying to find the nearest postcode from an existing postcode file to attach to the new postcodes. I am using the below code but it seems to have duplicated some rows, please could I have some help resolving this...
My 2 dataframes are:
new_postcode_df which contains 92,590 rows, and columns:
Postcode e.g. "AB101BJ"
Latitude e.g. 57.146051
Longitude e.g. -2.107375
current_postcode_df which contains 1,738,339 rows, and columns:
Postcode e.g. "AB101AB"
Latitude e.g. 57.149606
Longitude e.g. -2.096916
my desired output is output_df
new_postcode e.g. "AB101BJ"
current_postcode e.g. "AB101AB"
My code is below:
new_postcode_df_gps = new_postcode_df[["lat", "long"]].values
current_postcode_df_gps = current_postcode_df[["Latitude", "Longitude"]].values
new_postcode_df_radians = np.radians(new_postcode_df_gps)
current_postcode_df_radians = np.radians(current_postcode_df_gps)
tree = BallTree(current_postcode_df_radians , leaf_size=15, metric='haversine')
distance, index = tree.query(new_postcode_df_radians, k=1)
earth_radius = 6371000
distance_in_meters = distance * earth_radius
current_postcode_df.Postcode_NS[index[:,0]]
my output is shown in the attached where you can see postcodes beginning with "GY" have been added near the top which should not be the case. Postcodes starting with "AB" should all be at the top.
The new dataframe has increase from 92,590 rows to 92,848 rows
Image of final output dataframe
Libraries I'm using are:
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree
new_postcode_df = pd.DataFrame({"Postcode":["AB101BJ", "AB101BL", "AB107FU"],
"Latitude":[57.146051, 57.148655, 57.119636],
"Longitude":[-2.107375, -2.097433, -2.147906]})
current_postcode_df = pd.DataFrame({"Postcode":["AB101AB", "AB101AF", "AB101AG"],
"Latitude":[57.149606, 57.148707, 57.149051],
"Longitude":[-2.096916, -2.097806, -2.097004]})
output_df = pd.DataFrame({"Postcode":["AB101RS", "AB129TS", "GY35HG"]})

Query REST API latitude and longitude

I want my users to query two slugs fields (latitude, longitude) and then the 2 slug fields get compared to find nearest distance within 1.5km radius and display the api according to the nearest safehouses.
For example: when the users add latitude, longitude in their query,
www.example.com/safeplace/?find=-37.8770,145.0442
This will show the nearest safeplaces within 1.5km
Here is my function
def distance(lat1, long1, lat2, long2):
R = 6371 # Earth Radius in Km
dLat = math.radians(lat2 - lat1) # Convert Degrees 2 Radians
dLong = math.radians(long2 - long1)
lat1 = math.radians(lat1)
lat2 = math.radians(lat2)
a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLong/2) *
math.sin(dLong/2) * math.cos(lat1) * math.cos(lat2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = R * c
return d
Here is my model
class Safeplace(models.Model):
establishment = models.CharField(max_length=250)
address = models.CharField(max_length=250)
suburb = models.CharField(max_length=250)
postcode = models.IntegerField()
state = models.CharField(max_length=250)
type = models.CharField(max_length=250)
latitude = models.DecimalField(decimal_places=6,max_digits=10)
longtitude = models.DecimalField(decimal_places=6,max_digits=10)
Is there a way to run a for loop in my database? I am currently working on Django SQLite. On Views.py, how can i implement the distance function with the user input in my rest api url to find the nearest safeplace and display as REST Api?
What you need is to run a comparison for loop in your views.py. It is pretty difficult to execute but I will try to explain step by step.
assuming you are using that distance(lat, lng, lat2, lng2) function and trying to find the distance within 2km for example.
In views.py
import pandas as pd
class someapiview(ListAPIView):
serializer_class = SafeplaceSerializer
### Now we are creating definition which sorts parameters lng and lat ###
def get_queryset(self):
queryset = Safeplace.Objects.all()
lat = float(self.query_params.get('lag', None)
lng = float(self.query_params.get('lng', None)
### Now, we are reading your api using pandas ###
df = pd.read_json('yourapi') ## yourapi is a url to ur api
obj = []
for x in range(0, len(df)):
latx = float(df['latitude'][x])
lngx = float(df['longitude'][x])
### Calculating distance ###
km = distance(lat, lng, latx, lngx)
if km <= 2:
obj.append(df['id'][x])
### Django auto generate primary key which usually calls id ###
### Now we are going to call those pk as a queryset ###
return Safeplace.objects.filter(pk__in=obj)
I used pandas to work around, the load time might be slow if you have lots of data. However, I think this does the job. Usually Geo Django provides an efficient system to deal with long and lat, however I am not very competent in Geo Django so I cannot really tell. But I Believe this is a good work around.
UPDATE :
you can query with "www.yourapi.com/safeplace?lat=x&lng=y"
I believe you know how to set urls

How to pick up a random key from a table in Lua?

I'm using this code to draw a random pics from this table
FishImages = {image1 = love.graphics.newImage("bg/fish1.png"),
image2 = love.graphics.newImage("bg/fish2.png"),
image3 = love.graphics.newImage("bg/fish3.png"),
image4 = love.graphics.newImage("bg/fish4.png"),}
with this function love.graphics.draw({FishImages.image1#--I guess the modification is here },pos.x,pos.y)
so how to pick up a random key from a table in Lua ?
math.random(1,4) generates a random integer in the range 1 to 4. So you can use:
FishImages['image' .. tostring(math.random(1,4))]

Labeled LDA learn in Stanford Topic Modeling Toolbox

It's ok when I run the example-6-llda-learn.scala as follows:
val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);
val tokenizer = {
SimpleEnglishTokenizer() ~> // tokenize on space and punctuation
CaseFolder() ~> // lowercase everything
WordsAndNumbersOnlyFilter() ~> // ignore non-words and non-numbers
MinimumLengthFilter(3) // take terms with >=3 characters
}
val text = {
source ~> // read from the source file
Column(4) ~> // select column containing text
TokenizeWith(tokenizer) ~> // tokenize with tokenizer above
TermCounter() ~> // collect counts (needed below)
TermMinimumDocumentCountFilter(4) ~> // filter terms in <4 docs
TermDynamicStopListFilter(30) ~> // filter out 30 most common terms
DocumentMinimumLengthFilter(5) // take only docs with >=5 terms
}
// define fields from the dataset we are going to slice against
val labels = {
source ~> // read from the source file
Column(2) ~> // take column two, the year
TokenizeWith(WhitespaceTokenizer()) ~> // turns label field into an array
TermCounter() ~> // collect label counts
TermMinimumDocumentCountFilter(10) // filter labels in < 10 docs
}
val dataset = LabeledLDADataset(text, labels);
// define the model parameters
val modelParams = LabeledLDAModelParams(dataset);
// Name of the output model folder to generate
val modelPath = file("llda-cvb0-"+dataset.signature+"-"+modelParams.signature);
// Trains the model, writing to the given output path
TrainCVB0LabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1000);
// or could use TrainGibbsLabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1500);
But it's not ok when I change the last line from:
TrainCVB0LabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1000);
to:
TrainGibbsLabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1500);
And the method of CVB0 cost much memory.I train a corpus of 10,000 documents with about 10 labels each document,it will cost 30G memory.
I've encountered the same situation and indeed I believe it's a bug. Check GIbbsLabeledLDA.scala in edu.stanford.nlp.tmt.model.llda under the src/main/scala folder, from line 204:
val z = doc.labels(zI);
val pZ = (doc.theta(z)+topicSmoothing(z)) *
(countTopicTerm(z)(term)+termSmooth) /
(countTopic(z)+termSmoothDenom);
doc.labels is self-explanatory, and doc.theta records the distribution (counts, actually) of its labels, which has the same size as doc.labels.
zI is index variable iterating doc.labels, while the value z gets the actual label number. Here comes the problem: it's possible this documents has only one label - say 1000 - therefore zI is 0 and z is 1000, then doc.theta(z) gets out of range.
I suppose the solution would be to modify doc.theta(z) to doc.theta(zI).
(I'm trying to check whether the results would be meaningful, anyway this bug has made me not so confident in this toolbox.)

Resources