numpy savetxt sort order - sorting

I am trying to save text using numpy, I need to figure out how to sort before saving essentially I want a reverse order
p is array
3 2.5
2 1.98
1 7.2
with open('fin.dat', 'a') as fout:
numpy.savetxt(fout,p,fmt='%.4f')
I want to save as
1 7.2
2 1.98
3 2.5
How can I do this?

import numpy as np
p = np.array([3,2.5,2,1.98,1,7.2])
p = p[::-1]
>> [7.2,1,1.98,2,2.5,3]
np.savetxt(file,p)

I edited your question to clarify that p is a 2d array. You need to reverse p before giving it to savetxt. For example
q = p[::-1,:]
reverses the order of the rows.
There is simple function that does this same thing:
q = np.flipud(p)

Related

Loop until matrix is full?

I have a conditional statement that adds row of binary values from matrix A to matrix B. I want to put this in a loop so that it continues to add rows from matrix A until matrix B is full. Currently matrix B is initialized as 10 by 10 matrix of zeros. Do I need to initialize matrix B differently in order to create this condition or is there a way of doing it as is?
Below is roughly how my code looks so far
from random import sample
import numpy as np
matrixA = np.random.randint(2, size=(10,10))
matrixB = np.zeros((10,10))
x, y = sample(range(1, 10), k=2)
if someCondition:
matrixB = np.append(matrixB, [matrixA[x]], axis=0)
else:
matrixB = np.append(matrixB, [matrixA[y]], axis=0)
you don't need a loop for it. It is really easy to just do it using smart indexing. For example:
import numpy as np
A = np.random.randint(0, 10, size=(20,10))
B = np.empty((10, 10))
print(A)
# Copy till the row that satisfies your conditions. Here I assume it to be 10
B = A[:10, :]
print(B)

inverse of symmetric matrix is not symmetric in Julia

I am using Julia version 0.6.2 and I am facing this problem.
mat = zeros(6, 6)
for i = 1 : 6
for j = 1 : 6
mat[i, j] = exp(-(i - j)^2)
end
end
issymmetric(mat)
issymmetric(inv(mat))
And the output is
Main> issymmetric(mat)
true
Main> issymmetric(inv(mat))
false
I also tried the following Matlab code
mat = zeros(6, 6);
for i = 1 : 6
for j = 1 : 6
mat(i, j) = exp(-(i - j)^2);
end
end
issymmetric(mat)
issymmetric(inv(mat))
And the output is
logical 1
logical 1
Apart from manually making the matrix symmetric as you propose, e.g. taking the average of matrix and its transpose like
A = inv(mat)
(A+A.')/2
probably a cleaner way is
smat = Symmetric(mat)
B = inv(smat)
now B (as well as smat) passes issymmetric. Moreover, the fact that it is symmetric is ensured on type level (Symmetric) - some functions might take advantage of this additional information. This is exactly what inv does for smat.
EDIT: the question was also posted on Discourse, where you can find additional discussion about the performance of Symmetric.

Python: break up dataframe (one row per entry in column, instead of multiple entries in column)

I have a solution to a problem, that to my despair is somewhat slow, and I am seeking advice on how to speed up my solution (by adding vectorization or other clever methods). I have a dataframe that looks like this:
toy = pd.DataFrame([[1,'cv','c,d,e'],[2,'search','a,b,c,d,e'],[3,'cv','d']],
columns=['id','ch','kw'])
Output is:
The task is to break up kw column into one (replicated) row per comma-separated entry in each string. Thus, what I wish to achieve is:
My initial solution is the following:
data = pd.DataFrame()
for x in toy.itertuples():
id = x.id; ch = x.ch; keys = x.kw.split(",")
data = data.append([[id, ch, x] for x in keys], ignore_index=True)
data.columns = ['id','ch','kw']
Problem is: it is slow for larger dataframes. My hope is that someone has encountered a similar problem before, and knows how to optimize my solution. I'm using python 3.4.x and pandas 0.19+ if that is of importance.
Thank you!
You can use str.split for lists, then get len for length.
Last create new DataFrame by constructor with numpy.repeat and numpy.concatenate:
cols = toy.columns
splitted = toy['kw'].str.split(',')
l = splitted.str.len()
toy = pd.DataFrame({'id':np.repeat(toy['id'], l),
'ch':np.repeat(toy['ch'], l),
'kw':np.concatenate(splitted)})
toy = toy.reindex_axis(cols, axis=1)
print (toy)
id ch kw
0 1 cv c
0 1 cv d
0 1 cv e
1 2 search a
1 2 search b
1 2 search c
1 2 search d
1 2 search e
2 3 cv d

Code to calculate 1D Median Filter

I wonder if anyone knows some python or java code to calculate 1D median filter.
I have a file comma delimited with two fields: Date and Signal.
Something like that:
2014-06-01 11:22:12, 23.8
2014-06-01 11:23:12, 25.9
2014-06-01 11:24:12, 45.7
I would like to read this file and apply the 1D Median Filter with size 23
for the field Signal and save it in another file to remove the noise.
Thanks in advance.
Alexandre.
In case someone stumbled on this later.
To extract the data you can use regex, while for the custom median filter you can have a look here.
I will leave a copy down here in case it is removed:
def medfilt (x, k):
"""Apply a length-k median filter to a 1D array x.
Boundaries are extended by repeating endpoints.
"""
assert k % 2 == 1, "Median filter length must be odd."
assert x.ndim == 1, "Input must be one-dimensional."
k2 = (k - 1) // 2
y = np.zeros ((len (x), k), dtype=x.dtype)
y[:,k2] = x
for i in range (k2):
j = k2 - i
y[j:,i] = x[:-j]
y[:j,i] = x[0]
y[:-j,-(i+1)] = x[j:]
y[-j:,-(i+1)] = x[-1]
return np.median (y, axis=1)
scipy.signal.medfilt accepts 1D kernels:
import pandas as pd
import scipy.signal
def median_filter(file_name, new_file_name, kernel_size):
with open(file_name, 'r') as f:
df = pd.read_csv(f, header=None)
signal = df.iloc[:, 1].values
median = scipy.signal.medfilt(signal, kernel_size)
df = df.drop(df.columns[1], 1)
df[1] = median
df.to_csv(new_file_name, sep=',', index=None, header=None)
if __name__=='__main__':
median_filter('old_signal.csv', 'new_signal.csv', 23)

numpy: evaluating function in matrix, using previous array as argument in calculating the next

I have an m x n array: a, where the integers m > 1E6, and n <= 5.
I have functions F and G, which are composed like this: F( u, G ( u, t)). u is a 1 x n array, t is a scalar, and F and G returns 1 x n arrays.
I need to evaluate each row of a in F, and use previously evaluated row as the u-array for the next evaluation. I need to make m such evaluations.
This has to be really fast. I was previously impressed by scitools.std StringFunction evaluaion for a whole array, but this problem requires using the previously calculated array as an argument in calculating the next. I don't know if StringFunction can do this.
For example:
a = zeros((1000000, 4))
a[0] = asarray([1.,69.,3.,4.1])
# A is a float defined elsewhere, h is a function which accepts a float as its argument and returns an arbitrary float. h is defined elsewhere.
def G(u, t):
return asarray([u[0], u[1]*A, cos(u[2]), t*h(u[3])])
def F(u, t):
return u + G(u, t)
dt = 1E-6
for i in range(1, 1000000):
a[i] = F(a[i-1], i*dt)
i += 1
The problem with the above code is that it is slow as hell. I need to get these calculations done by numpy milliseconds.
How can I do what I want?
Thank you for our time.
Kind regards,
Marius
This sort of thing is very difficult to do in numpy. If we look at this by column we see a few simpler solutions.
a[:,0] is very easy:
col0 = np.ones((1000))*2
col0[0] = 1 #Or whatever start value.
np.cumprod(col0, out=col0)
np.allclose(col0, a[:1000,0])
True
As mentioned earlier this will overflow very quickly. a[:,1] can be done much along the same lines.
I do not believe there is a way to do the next two columns inside numpy alone quickly. We can turn to numba for this:
from numba import auotojit
def python_loop(start, count):
out = np.zeros((count), dtype=np.double)
out[0] = start
for x in xrange(count-1):
out[x+1] = out[x] + np.cos(out[x+1])
return out
numba_loop = autojit(python_loop)
np.allclose(numba_loop(3,1000),a[:1000,2])
True
%timeit python_loop(3,1000000)
1 loops, best of 3: 4.14 s per loop
%timeit numba_loop(3,1000000)
1 loops, best of 3: 42.5 ms per loop
Although its worth pointing out that this converges to pi/2 very very quickly and there is little point in calculating this recursion past ~20 values for any start value. This returns the exact same answer to double point precision- I didn't bother finding the cutoff, but it is much less then 50:
%timeit tmp = np.empty((1000000));
tmp[:50] = numba_loop(3,50);
tmp[50:] = np.pi/2
100 loops, best of 3: 2.25 ms per loop
You can do something similar with the fourth column. Of course you can autojit all of the functions, but this gives you several different options to try out depending on numba usage:
Use cumprod for the first two columns
Use an approximation for column 3 (and possible 4) where only the first few iterations are calculated
Implement columns 3 and 4 in numba using autojit
Wrap everything inside of an autojit loop (the best option)
The way you have presented this all rows past ~200 will either be np.inf or np.pi/2. Exploit this.
Slightly faster. Your first column is basicly 2^n. Calculating 2^n for n up to 1000000 is gonna overflow.. second column is even worse.
def calc(arr, t0=1E-6):
u = arr[0]
dt = 1E-6
h = lambda x: np.random.random(1)*50.0
def firstColGen(uStart):
u = uStart
while True:
u += u
yield u
def secondColGen(uStart, A):
u = uStart
while True:
u += u*A
yield u
def thirdColGen(uStart):
u = uStart
while True:
u += np.cos(u)
yield u
def fourthColGen(uStart, h, t0, dt):
u = uStart
t = t0
while True:
u += h(u) * dt
t += dt
yield u
first = firstColGen(u[0])
second = secondColGen(u[1], A)
third = thirdColGen(u[2])
fourth = fourthColGen(u[3], h, t0, dt)
for i in xrange(1, len(arr)):
arr[i] = [first.next(), second.next(), third.next(), fourth.next()]

Resources