Im using statsmodels to fit an ARIMA in python. However the forecast values are getting inverted after integrating, could anyone point what I am doing wrong?
Differencing the series for stationarity
ts['val_diff1'] = ts.Value - ts.Value.shift(1)
Fitting a ARIMA model
model2 = sm.tsa.ARIMA(ts.val_diff1.dropna(inplace=False), order=(4, 0, 4))
results2 = model2.fit(disp=-1)
Transforming fitted values back with integrating to get fitted values(un-difference)
ts['frc2'] = np.r_[(ts['Value'].iloc[0]),results2.fittedvalues].cumsum()
However I am getting inverted forecast.
[1]: https://i.stack.imgur.com/pCKfl.png
Related
I am using geopandas sample data for this question.
import geopandas as gpd
df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
My real dataset is somewhat different containing only 'polygon' type geometry points (in EPSG::4326), but what I would like to do is figure out the area of each polygon for each country in kilometers squared.
I am new to geopandas so I'm not sure if I am doing this right. My process is as follows;
ndf=df
ndf.to_crs("epsg:32633")
ndf["area"] = ndf['geometry'].area/ 10**6
ndf.head(2)
but the resulting areas don't make sense.
So I tried
df_2= df.to_crs({'proj':'cea'})
df_2["area"] = df_2['geometry'].area/ 10**6
df_2.head(2)
which is better, but still not accurate when run a google search for the areas.
So I'm wondering 1) is this the correct method? 2) how do I know the best projection type?
Computing polygon areas on equal-area types of map projection does not always yield good result due to the requirement of dense vertices along the boundaries of the polygon involved.
Computing on the un-projected earth surface is not difficult. With appropriate Python library that takes great-circle arcs between succeeding vertices that forms the surface areas in the computation, the results are more accurate.
The most accurate (imho) method to compute surface areas on the earth with Python can be demonstrated with this simple code.
import geopandas as gpd
from pyproj import Geod, Proj
# Use the included dataset of Geopandas
df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# Prep an ellipsoidal earth (WGS84's parameters)
geod = Geod('+a=6378137 +f=0.0033528106647475126')
# List of countries
some_countries = ["Thailand", "Nepal"]
def area_perim (country_name):
pgon = df[df["name"]==country_name].geometry.iloc[0]
# Extract list of longitude/latitude of country's boundary
lons, lats = pgon.exterior.xy[:][0], pgon.exterior.xy[:][1]
# Compute surface area and perimeter
poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
# Print the results
print("\nCountry:", country_name)
print("Area, (sq.Km): {:.1f}".format(abs(poly_area)/10**6))
print("Perimeter, (Km): {:.2f}".format(poly_perimeter/10**3))
for each in some_countries:
area_perim(each)
Output:
Country: Thailand
Area, (sq.Km): 510125.6
Perimeter, (Km): 5555.56
Country: Nepal
Area, (sq.Km): 150706.9
Perimeter, (Km): 1983.42
Note that, df has CRS = epsg:4326.
If the source geodataframe you use has CRS other than epsg:4326, you can convert it to epsg:4326 before use.
See reference for more details.
Context
I have an merged geodataframe of 1). Postalcode areas and 2). total amount of deliveries within that postalcode area in the city of Groningen called results. The geodataframe includes geometry that include Polygons and Multiploygons visualizing different Postal code areas within the city.
I am new to GeoPandas and therefore I've tried different tutorials including this one from the geopandas official website wherein I got introduced into interactive Folium maps, which I really like. I was able to plot my geodataframe using result.explore(), which resulted in the following map
The problem
So far so good, but now I want to simply place an marker using the folium libarty with the goal to calculate the distance between the marker and the postalcode areas. After some looking on the internet I found out in the quickstart guild that you need to create an folium.Map, then you need folium.Choropleth for my geodataframe and folium.Marker and add them to the folium.Map.
m = folium.Map(location=[53.21917, 6.56667], zoom_start=15)
folium.Marker(
[53.210903, 6.598276],
popup="My marker"
).add_to(m)
folium.Choropleth(results, data=results, columns="Postcode", fill_color='OrRd', name="Postalcode areas").add_to(m)
folium.LayerControl().add_to(m)
m
But when try to run the above code I get the following error:
What is the (possible) best way?
Besides my failing code (which would be great if someone could help me out). I am curious if this is the way to do it (Folium map + marker + choropleth). Is it not possible to call geodataframe.explore() which results into the map in second picture and then just add an marker on the same map? I have the feeling that I am making it too difficult, there must be an better solution using Geopandas.
you have not provided the geometry. Have found postal districts of Netherlands and used that
explore() supports will draw a point as a marker with appropriate parameters
hence two layers,
one is postal areas coloured using number of deliveries
second is point, with distance to each area calculated
import geopandas as gpd
import shapely.geometry
import pandas as pd
import numpy as np
geo_url = "https://geodata.nationaalgeoregister.nl/cbsgebiedsindelingen/wfs?request=GetFeature&service=WFS&version=2.0.0&typeName=cbs_provincie_2017_gegeneraliseerd&outputFormat=json"
gdf = gpd.read_file(geo_url).assign(
deliveries=lambda d: np.random.randint(10**4, 10**6, len(d))
)
p = gpd.GeoSeries(shapely.geometry.Point(6.598276, 53.210903), crs="epsg:4386")
# calc distances to point
gdf["distance"] = gdf.distance(p.to_crs(gdf.crs).values[0])
# dataframe of flattened distances
dfp = pd.DataFrame(
[
"<br>".join(
[f"{a} - {b:.2f}" for a, b in gdf.loc[:, ["statcode", "distance"]].values]
)
],
columns=["info"],
)
# generate colored choropleth
m = gdf.explore(
column="deliveries", categorical=True, legend=False, height=400, width=400
)
# add marker with distances
gpd.GeoDataFrame(
geometry=p,
data=dfp,
).explore(m=m, marker_type="marker")
I have a row that contains the names and photos of people in Oracle, how do I make face recognition that can recognize names only by taking pictures from the camera ??
what techniques can I use?
Firstly, do not store the raw images in the blob column. You should store the vector representation of raw images. The following python code block will find the vector representation of a face image.
#!pip install deepface
from deepface.basemodels import VGGFace, Facenet
model = VGGFace.loadModel() #you can use google facenet instead of vgg
target_size = model.layers[0].input_shape
#preprocess detects facial area and aligns it
img = functions.preprocess_face(img="img.jpg", target_size=target_size)
representation = model.predict(img)[0,:]
Here, you can either pass exact image path like img.jpg or the 3D array to img argument of preprocess_face. In this way, you will store the vector representations in the blob column of oracle database.
When you have a new face image, and want to find its identity in the database find its representation again.
#preprocess detects facial area and aligns it
target_img = functions.preprocess_face(img="target.jpg", target_size=target_size)
target_representation = model.predict(target_img )[0,:]
Now, you have the vector representation of the target image and vector representations of the database images. You need to find the similarity score of target image representation and each instance of database representations.
Euclidean distance is the easiest way to compare vectors.
def findEuclideanDistance(source_representation, test_representation):
euclidean_distance = source_representation - test_representation
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
return euclidean_distance
We will compare each data base instance to target. Suppose that representations of data base instances are stored in representations object.
distances = []
for i in range(0, len(representations)):
source_representation = representations[i]
#find the distance between target_representation and source_representation
distance = findEuclideanDistance(source_representation, target_representation )
distances.append(distance)
Distances list stores the distance of each item in the data base to target. We need to find the lowest distance.
import numpy as np
idx = np.argmax(distances)
Idx is the id of the target image in the database.
I am using an ImageDataGenerator to augment my images. I need to get the y labels from the generator.
Example : I have 10 training images, 7 are label 0 and 3 are label 1. I want to increase training set size to 100.
total_training_images = 100
total_val_images = 50
model.fit_generator(
train_generator,
steps_per_epoch= total_training_images // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps= total_val_images // batch_size)
By my understanding, this trains a model on 100 training images for each epoch, with each image being augmented in some way or the other according to my data generator, and then validates on 50 images.
If I do train_generator.classes, I get an output [0,0,0,0,0,0,0,1,1,1]. This corresponds to my 7 images of label 0 and 3 images of label 1.
For these new 100 images, how do I get the y-labels?
Does this mean when I am augmenting this to 100 images, my new train_generator labels are the same thing, but repeated 10 times? Essentially np.append(train_generator.classes) 10 times?
I am following this tutorial, if that helps :
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
The labels generate as one-hot-encoding with the images.Hope this helps !
training_set.class_indices
from keras.preprocessing import image
import matplotlib.pyplot as plt
x,y = train_generator.next()
for i in range(0,3):
image = x[i]
label = y[i]
print (label)
plt.imshow(image)
plt.show()
Based on what you're saying about the generator, yes.
It will replicate the same label for each augmented image. (Otherwise the model would not train properly).
One simple way to check what the generator is outputting is to get what it yields:
X,Y = train_generator.next() #or next(train_generator)
Just remember that this will place the generator in a position to yield the second element, not the first anymore. (This would make the fit method start from the second element).
I am using sci-kit image to get the "regionprops" of a segmented image. I then wish to replace each of the segment labels with their corresponding statistic (e.g eccentricity).
from skimage import segmentation
from skimage.measure import regionprops
#a segmented image
labels = segmentation.slic(img1, compactness=10, n_segments=200)
propimage = labels
#props loop
for region in regionprops(labels1, properties ='eccentricity') :
eccentricity = region.eccentricity
propimage[propimage==region] = eccentricity
This runs, but the propimage values do not change from their original labels
I have also tried:
for i in range(0,max(labels)):
prop = regions[i].eccentricity #the way to cal a single prop
propimage[i]= prop
This delivers this error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I am a recent migrant from matlab where I have implemented this, but the data structures used are completely different.
Can any one help me with this?
Thanks
Use ndimage from scipy : the sum() function can operate using your label array.
from scipy import ndimage as nd
sizes = nd.sum(label_file[0]>0, labels=label_file[0], index=np.arange(0,label_file[1])
You can then evaluate the distribution with numpy.histogram and so on.