I have a matrix whose shape is (TxK, and K << T). I want to extend it into shape TxT, and right shift the i-th row with i steps.
For an example:
inputs: T= 5, and K = 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
expected outputs:
1 2 3 0 0
0 1 2 3 0
0 0 1 2 3
0 0 0 1 2
0 0 0 0 1
My solutions:
right_pad = T - K + 1
output = F.pad(input, (0, right_pad), 'constant', value=0)
output = output.view(-1)[:-T].view(T, T)
My solution will cause the error -- gradient computation has been modified by an in-place operation. Is there an efficient and feasible way to achieve my purpose?
Your function is fine and is not a cause of your error (using PyTorch 1.6.0, if you are using other version, please update your dependencies).
Code below works fine:
import torch
import torch.nn as nn
import torch.nn.functional as F
T = 5
K = 3
inputs = torch.tensor(
[[1, 2, 3,], [1, 2, 3,], [1, 2, 3,], [1, 2, 3,], [1, 2, 3,],],
requires_grad=True,
dtype=torch.float,
)
right_pad = T - K + 1
output = F.pad(inputs, (0, right_pad), "constant", value=0)
output = output.flatten()[:-T].reshape(T, T)
output.sum().backward()
print(inputs.grad)
Please notice I have explicitly specified dtype as torch.float as you can't backprop integers.
view and slice will never break backpropagation, as the gradient is connected to single value, no matter whether it is viewed as 1D or unsqueezed 2D or whatever. Those are not modified in-place. In-place modification breaking gradient could be:
output[0, 3] = 15.
Also, your solution returns this:
tensor([[1., 2., 3., 0., 0.],
[0., 1., 2., 3., 0.],
[0., 0., 1., 2., 3.],
[0., 0., 0., 1., 2.],
[3., 0., 0., 0., 1.]], grad_fn=<ViewBackward>)
so you have a 3 in the bottom left corner. If that's not what you expect, you should add this line (multiplication by upper triangular matrix with 1) after output = output.flatten()[:-T].reshape(T,T):
output *= torch.triu(torch.ones_like(output))
which gives:
tensor([[1., 2., 3., 0., 0.],
[0., 1., 2., 3., 0.],
[0., 0., 1., 2., 3.],
[0., 0., 0., 1., 2.],
[0., 0., 0., 0., 1.]], grad_fn=<AsStridedBackward>)
And inputs.grad:
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 0.],
[1., 0., 0.]])
You can do this column by column with PyTorch.
# input is a T * K tensor
input = torch.ones((T, K))
index = torch.tensor(np.linspace(0, T - 1, num=T, dtype=np.int64))
output = torch.zeros((T, T))
output[index, index] = input[:, 0]
for k in range(1, K):
output[index[:-k], index[:-k] + k] = input[:-k, k]
print(output)
Related
My understanding is that the JaccardIndex of a tensor with itself should turn 1 considering the intersection and union of a set with itself is always the set itself.
However when I experiment with JaccardIndex class from torchmetrics library I see the following.
from torchmetrics import JaccardIndex
import torch
pred = torch.tensor([1, 2, 19, 17, 17])
target = torch.tensor([1, 2, 3, 17, 4])
jaccard = JaccardIndex(num_classes=21)
jaccard(pred, pred)
Out[13]: tensor(0.1905)
jaccard(target, pred)
Out[14]: tensor(0.1190)
So instead of 1 the similarity of pred to itself is 0.1905.
Why is this so?
What am I missing?
The metric is averaging the jaccard score over the 21 classes you defined.
If you pass average=None to the metric it will show the jaccard score per class
from torchmetrics import JaccardIndex
import torch
pred = torch.tensor([1, 2, 19, 17, 17])
target = torch.tensor([1, 2, 3, 17, 4])
jaccard = JaccardIndex(num_classes=21)
jaccard(pred, pred)
Out: tensor([0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
0., 1., 0.])
Source: Docs
problem trying to solve:
compressing training instances by aggregating label (mean of weighed average) and summing weight based on same feature while keeping binary log loss same as cross entropy loss. Here is an example and test cases of log_loss shows that binary log loss is equivalent to weighted log loss.
original data: compressed_data
feature, label, weight, prediction feature, label, weight, prediction
x1, 1, 1, 0.8 x1, 1/3, 3, 0.8
x1, 0, 2, 0.8 -->
x2, 1, 2, 0.1 x2, 2/3, 3, 0.1
x2, 0, 1, 0.1
x3, 1, 1, 0.9 x3, 1, 1, 0.9
issue: binary log loss is not always equivalent to cross entropy loss in lgbm, model performance change (such as log loss, average precision and ROC_AUC) is mild but actual prediction and prediction distribution are quite significant. Experiment 1 shows that they are equivalent in binary label case, while Experiment 2 shows there are certain cases binary log loss does not align with cross entropy (check out examples for more details).
first, verify binary log loss is same as cross entropy loss by numpy
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from lightgbm.sklearn import LGBMRegressor, LGBMClassifier
import lightgbm
# use X of cancer data as training feature for both experiment 1 and 2
X, _ = load_breast_cancer(return_X_y=True)
def logloss(y_true, y_pred, weight):
l = np.mean((-(y_true * np.log(y_pred))-((1-y_true)*np.log(1-y_pred)))*weight)
# normalize loss
l = l*y_true.shape[0]/weight.sum()
return l
"""
feature, label, weight, prediction feature, label, weight, prediction
x1, 1, 1/3, 0.7
x1, 1, 1/3, 0.7 --> x1, 2/3, 1, 0.7
x1, 0, 1/3, 0.7
"""
l1 = logloss(np.array([1,1,0]), np.array([0.7,0.7,.7]), np.array([1/3,1/3,1/3]))
l2 = logloss(np.array([2/3]), np.array([0.7]), np.array([1]))
"""
feature, label, weight, prediction feature, label, weight, prediction
x1, 1, 1, 0.8 x1, 1/3, 3, 0.8
x1, 0, 2, 0.8 -->
x2, 1, 2, 0.1 x2, 2/3, 3, 0.1
x2, 0, 1, 0.1
x3, 1, 1, 0.9 x3, 1, 1, 0.9
"""
l3 = logloss(np.array([1,0,1,0,1]),
np.array([0.8,0.8,0.1,0.1,0.9]),
np.array([1,2,2,1,1]))
l4 = logloss(np.array([1/3,2/3,1]), np.array([0.8,0.1,0.9]), np.array([3,3,1]))
np.testing.assert_almost_equal(l1, l2, decimal=4)
np.testing.assert_almost_equal(l3, l4, decimal=4)
experiment 1 (binary log loss is equivalent to cross entropy loss in binary label case):
######## data for experiment 1
np.random.seed(42)
n = X.shape[0]
y_binary = np.random.randint(0,2,size=(n))
eps = 1e-2
y_float = np.random.uniform(eps,1-eps,size=(n))
lgbm_params = {
'boosting_type': 'gbdt',
'class_weight': None,
'colsample_bytree':1,
'importance_type': 'split',
'learning_rate': 0.06472914709339864,
'max_depth': 46,
'min_child_weight': 0.001,
'min_split_gain': 0.0,
'n_estimators': 20,
'n_jobs': 1,
'num_leaves': 178,
'random_state': 1574094090,
'reg_alpha': 0.4894283599023894,
'reg_lambda': 0.09743058458885945,
'silent': True,
'subsample':1,
# 'subsample_for_bin': 200000, # try larger values (10M+)
# 'subsample_freq': 252,
'min_data_in_bin':1,
'min_child_samples':1,
}
X_train_array, X_test_array, y_train_binary, y_test_binary, y_train_float, y_test_float = \
train_test_split(X, y_binary, y_float, test_size=0.3, random_state=1)
##### binary label case in sklearn API that binary objective is equivalent to cross_entropy objective
binary_model1 = LGBMClassifier(objective='binary')
binary_model1.set_params(**lgbm_params)
binary_model1.fit(
X_train_array,
y_train_binary,
sample_weight=np.ones(X_train_array.shape[0])
)
binary_model2 = LGBMRegressor(objective='cross_entropy')
binary_model2.set_params(**lgbm_params)
binary_model2.fit(
X_train_array,
y_train_binary,
sample_weight=np.ones(X_train_array.shape[0])
)
binary_pred_1 = binary_model1.predict_proba(X_test_array)[:,1]
binary_pred_2 = binary_model2.predict(X_test_array)
binary_y_pred_diff = binary_pred_1-binary_pred_2
# binary log loss and cross_entropy loss are same given binary labels
np.testing.assert_almost_equal(binary_pred_1, binary_pred_2, decimal=4)
experiment 2: cross entropy loss can be different from log loss (not sure why)
######## data for experiment 2
def make_compressed_df(X, fixed_ratio=None):
"""
this function stimulates compressed data that instances with same feature will be deduped
and label becomes mean of these instance labels and weight becomes sum of these instance weight
ex.
args:
fixed_ratio: int or None, if int, raito of pos_count/neg_count is consistent (key of the experiment!)
original_data: compressed_data:
feature, label, weight feature, label, pos_count, neg_count, weight,
x1, 1, 1
x1, 1, 1 --> x1, 2/3, 2, 1, 3
x1, 0, 1
-------------------------------------------------
x2, 0, 1
x2, 1, 1 --> x2, 1/2, 1, 1, 2
-------------------------------------------------
x3, 1, 1
x3, 1, 1 --> x3, 2/2, 2, 0, 2
"""
compressed_df = pd.DataFrame(X)
pos_count = np.random.randint(1,3,size=(X.shape[0]))
compressed_df['pos_count'] = pos_count
if fixed_ratio:
compressed_df['neg_count'] = int(fixed_ratio)*compressed_df['pos_count']
else:
neg_count = np.random.randint(1,3,size=(X.shape[0]))
compressed_df['neg_count'] = neg_count
compressed_df['total_count'] = compressed_df['pos_count']+compressed_df['neg_count']
compressed_df['weight'] = compressed_df['pos_count']+compressed_df['neg_count']
compressed_df['label'] = compressed_df['pos_count']/compressed_df['total_count']
return compressed_df
def restore_data(df):
"""
restore original features, labels and weight based on pos_count and neg_count.
instances with same feature will repeat (pos_count+neg_count) times, labels will become
[1]*pos_count+[0]*neg_count, and weight becomes weight/(pos_count+neg_count)
ex.
compressed_data: original_data:
feature, label, pos_count, neg_count, weight feature, label, weight
x1, 1, 1
x1, 2/3, 2, 1, 3 --> x1, 1, 1
x1, 0, 1
-------------------------------------------------
x2, 0, 1
x2, 1/2, 1, 1, 2 --> x2, 1, 1
-------------------------------------------------
x3, 1, 1
x3, 2/2, 2, 0, 2 --> x3, 1, 1
"""
pos_df = df.loc[df.index.repeat(df['pos_count'])]
pos_df['label'] = 1
neg_df = df.loc[df.index.repeat(df['neg_count'])]
neg_df['label'] = 0
df = pd.concat([pos_df, neg_df], axis=0)
del pos_df, neg_df
df['weight'] = df['weight']/df['total_count']
df = df.drop(['pos_count', 'neg_count', 'total_count'], axis=1)
return df
def make_compressed_and_restored_data(X, fixed_ratio):
np.random.seed(42)
compressed_df = make_compressed_df(X, fixed_ratio)
compressed_train_df, compressed_test_df = train_test_split(
compressed_df, test_size=0.3, random_state=1)
restored_train_df = restore_data(compressed_train_df)
restored_test_df = restore_data(compressed_test_df)
return (compressed_train_df, compressed_test_df), (restored_train_df, restored_test_df)
# when ratio of pos_count/neg_count is not fixed, objectives are different
(compressed_train_random_ratio_df, compressed_test_df), \
(restored_train_random_ratio_df, restored_test_random_ratio_df) = \
make_compressed_and_restored_data(X, fixed_ratio=None)
model1 = LGBMClassifier(objective='binary')
model1.set_params(**lgbm_params)
model1.fit(
restored_train_random_ratio_df.iloc[:,:30],
restored_train_random_ratio_df['label'],
sample_weight=restored_train_random_ratio_df['weight']
)
model2 = LGBMRegressor(objective='cross_entropy')
model2.set_params(**lgbm_params)
model2.fit(
compressed_train_random_ratio_df.iloc[:,:30],
compressed_train_random_ratio_df['label'],
sample_weight=compressed_train_random_ratio_df['weight']
)
y1 = model1.predict_proba(compressed_test_df.iloc[:,:30])[:,1]
y2 = model2.predict(compressed_test_df.iloc[:,:30])
# this assertion fails
np.testing.assert_almost_equal(y1, y2, decimal=4)
# when ratio of pos_count/neg_count is fixed, objectives are same
(compressed_train_fixed_ratio_df, compressed_test_fixed_ratio_df), \
(restored_train_fixed_ratio_df, restored_test_fixed_ratio_df) = \
make_compressed_and_restored_data(X, fixed_ratio=2)
model3 = LGBMClassifier(objective='binary')
model3.set_params(**lgbm_params)
model3.fit(
restored_train_fixed_ratio_df.iloc[:,:30],
restored_train_fixed_ratio_df['label'],
sample_weight=restored_train_fixed_ratio_df['weight']
)
model4 = LGBMRegressor(objective='cross_entropy')
model4.set_params(**lgbm_params)
model4.fit(
compressed_train_fixed_ratio_df.iloc[:,:30],
compressed_train_fixed_ratio_df['label'],
sample_weight=compressed_train_fixed_ratio_df['weight']
)
y3 = model3.predict_proba(compressed_test_fixed_ratio_df.iloc[:,:30])[:,1]
y4 = model4.predict(compressed_test_fixed_ratio_df.iloc[:,:30])
# this assertion passes
np.testing.assert_almost_equal(y3, y4, decimal=4)
It looks like this question was cross-posted here and in the official LightGBM repo.
LightGBM maintainers have provided an answer there: https://github.com/microsoft/LightGBM/issues/3576.
I've a vector y = [1; 1; 2; 3] and a matrix Y = zeros(4, 3).
I need to set to 1 the columns in Y that corresponds to values of the vector y. i.e.
Y = [1, 0, 0; 1, 0, 0; 0, 1, 0; 0, 0, 1]
Y(y) or Y(:, y) does not give me the result I need!
Any idea how I could achieve this?
You need to convert those columns indices into linear indices. You do it like so:
octave:1> A = zeros (4, 3);
octave:2> c_sub = [1, 1, 2, 3];
octave:3> ind = sub2ind (size (A), 1:rows(A), c_sub)
ind =
1 2 7 12
octave:4> A(ind) = 1
A =
1 0 0
1 0 0
0 1 0
0 0 1
However, if your matrix is that sparse, do create a proper sparse matrix:
octave:4> sparse (1:4, c_sub, 1, 4, 3)
ans =
Compressed Column Sparse (rows = 4, cols = 3, nnz = 4 [33%])
(1, 1) -> 1
(2, 1) -> 1
(3, 2) -> 1
(4, 3) -> 1
and maybe consider using a logical matrix (use false instead of zeros and true instead of 1.
In the past I've often used loops of the following kind (Haskell example):
upperBoundToTuples :: Int -> [(Int, Int)]
upperBoundToTuples n = [(x,y) | x <- [0..n], y <- [x+1..n]]
The above code produces tuples of range (0,1)..(n,n) where for all x < y.
I was wondering if it there was an efficient way of getting those (x,y) indices given a single index? Possible applications include optimization problems on the GPU where loops are not allowed and each thread only gets an index.
Also if it is possible for the 2D case, could such an algorithm be generalized to multiple dimensions?
You're asking for a bijection from [0, N(N+1)/2) to pairs (x, y) with 0 <= x < y <= N.
Here's one simple way to define it (in pseudocode, but should be trivial to convert to Haskell):
x0, y0 = i / (N + 1), i % (N + 1)
if x0 < y0 then result = (x0, y0)
else result = (N - 1 - x0, N - y0)
Here's a visualisation of the function for N=6. The map is laid out in a table with rows of length N+1=7, with the first row representing the value of the function for i=0 to 6, the next i=7 to 13 and so on. If you look very closely and carefully, you can see that things above the leading diagonal map to their own location in the table, and things on or below the diagonal map rotationally to the later entries.
5,6 0,1 0,2 0,3 0,4 0,5 0,6
4,6 4,5 1,2 1,3 1,4 1,5 1,6
3,6 3,5 3,4 2,3 2,4 2,5 2,6
And here's the opposite of this visualisation: a table T of size (N+1) by (N+1) with T[x, y] = i where i is mapped to (x, y) by the function above.
- 1 2 3 4 5 6
- - 9 10 11 12 13
- - - 17 18 19 20
- - - - 16 15 14
- - - - - 8 7
- - - - - - 0
- - - - - - -
Higher dimensions
This method can probably be made to work in higher dimensions, but I don't immediately see how. As an alternative, here's a simple but somewhat inefficient method that does work in arbitrary dimensions.
First, note there's choose(N + 1, k) increasing sequences of length k from the numbers from 0 to N (where choose(N, k) is the binomial coefficient). Of those, choose(N, k - 1) of them end with N. That gives this recursive function that generates the sequences in descending colexicographical order (again in pseudocode):
sequence(N, k, index)
= [] if k == 0
= sequence(N - 1, k - 1, index) + [N] if index < choose(N, k - 1)
= sequence(N - 1, k, index - choose(N, k - 1)) otherwise
Here's, sequence(5, 3, index) for index between 0 and 19:
0 -> [3, 4, 5]
1 -> [2, 4, 5]
2 -> [1, 4, 5]
3 -> [0, 4, 5]
4 -> [2, 3, 5]
5 -> [1, 3, 5]
6 -> [0, 3, 5]
7 -> [1, 2, 5]
8 -> [0, 2, 5]
9 -> [0, 1, 5]
10 -> [2, 3, 4]
11 -> [1, 3, 4]
12 -> [0, 3, 4]
13 -> [1, 2, 4]
14 -> [0, 2, 4]
15 -> [0, 1, 4]
16 -> [1, 2, 3]
17 -> [0, 2, 3]
18 -> [0, 1, 3]
19 -> [0, 1, 2]
We may equivalently consider [(x,y) | x<-[0..n], y<-[0..x-1]]. This list has length
ℓn = x=0∑n x
= n·(n+1)/2.
Hence we can get, to a given ℓ, the nearest lower n through
2·ℓn = n·(n+1) = n2 + n
n~ = -½ ± √(¼ + 2·ℓn)
In particular, for a given index i,
ni− = ⌊-½ ± √(¼ + 2·i)⌋
is the x-length of the last fully completed triangle. Thus, the index i lies in the row ni−+1. That triangle had an area of
ℓni− = ni−·(ni−+1)/2
which we therefore need to subtract from i to get the remainder index (in y-direction). This gives rise to the definition
lowerTriangularTuple :: Int -> (Int,Int)
lowerTriangularTuple i = (nmin+1, i - (nmin*(nmin+1))`div`2)
where nmin = floor $ -1/2 + sqrt(1/4 + 2 * fromIntegral i)
Example:
GHCi> lowerTriangularTuple <$> [0..30]
[(1,0),(2,0),(2,1),(3,0),(3,1),(3,2),(4,0),(4,1),(4,2),(4,3),(5,0),(5,1),(5,2),(5,3),(5,4),(6,0),(6,1),(6,2),(6,3),(6,4),(6,5),(7,0),(7,1),(7,2),(7,3),(7,4),(7,5),(7,6),(8,0),(8,1),(8,2)]
I have 2 vectors that are x and y coordinates of the 8 vertexes of a polygon
x=[5 5 7 7 9 9 5 7]
y=[8 6 6 8 6 8 10 10]
I wanna sort them (clockwise) to obtain the right vectors (to draw the polygon correctly)
x=[5 7 9 9 7 7 5 5]
y=[6 6 6 8 8 10 10 8]
Step 1: Find the unweighted mean of the vertices:
cx = mean(x);
cy = mean(y);
Step 2: Find the angles:
a = atan2(y - cy, x - cx);
Step 3: Find the correct sorted order:
[~, order] = sort(a);
Step 4: Reorder the coordinates:
x = x(order);
y = y(order);
Python version (numpy) for Ben Voigt's algorithm:
def clockwise(points):
x = points[0,:]
y = points[1,:]
cx = np.mean(x)
cy = np.mean(y)
a = np.arctan2(y - cy, x - cx)
order = a.ravel().argsort()
x = x[order]
y = y[order]
return np.vstack([x,y])
Example:
In [281]: pts
Out[281]:
array([[7, 2, 2, 7],
[5, 1, 5, 1]])
In [282]: clockwise(pts)
Out[282]:
array([[2, 7, 7, 2],
[1, 1, 5, 5]])
I tried the solutions by #ben-voight and #mclafee, but I think they are sorting the wrong way.
When using atan2 the angles are stated in the following way:
Matlab Atan2
The angle is positive for counter-clockwise angles (upper half-plane,
y > 0), and negative for clockwise angles (lower half-plane, y < 0).
Wikipedia Atan2
This means that using ascending sort() of Numpy or Matlab will progress counterclockwise.
This can be verified using the Shoelace equation
Wikipedia Shoelace
Python Shoelace
So, adjusting the answers mentioned above to use descending sorting the correct solution in Matlab is
cx = mean(x);
cy = mean(y);
a = atan2(y - cy, x - cx);
[~, order] = sort(a, 'descend');
x = x(order);
y = y(order);
The solution in numpy is
import numpy as np
def clockwise(points):
x = points[0,:]
y = points[1,:]
cx = np.mean(x)
cy = np.mean(y)
a = np.arctan2(y - cy, x - cx)
order = a.ravel().argsort()[::-1]
x = x[order]
y = y[order]
return np.vstack([x,y])
pts = np.array([[7, 2, 2, 7],
[5, 1, 5, 1]])
clockwise(pts)
pts = np.array([[1.0, 1.0],
[-1.0, -1.0],
[1.0, -1.0],
[-1.0, 1.0]]).transpose()
clockwise(pts)
Output:
[[7 2 2 7]
[5 1 5 1]]
[[2 7 7 2]
[5 5 1 1]]
[[ 1. -1. 1. -1.]
[ 1. -1. -1. 1.]]
[[-1. 1. 1. -1.]
[ 1. 1. -1. -1.]]
Please notice the [::-1] used to invert arrays / lists.
This algorithm does not apply to non-convex polygons.
Instead, consider using MATLAB's poly2cw()