gdb query xmm-registers, select representation [duplicate] - debugging

I've been using GDB for 1 day and I've accumulated a decent understanding of it.
However when I set a breakpoint at the final semicolon using GDB and print registers I can't fully interpret the meaning of the data stored into the XMM register.
I don't know if the data is in (MSB > LSB) format or vice versa.
__m128i S = _mm_load_si128((__m128i*)Array16Bytes);
}
So this is the result that I'm getting.
(gdb) print $xmm0
$1 = {
v4_float = {1.2593182e-07, -4.1251766e-18, -5.43431603e-31, -2.73406277e-14},
v2_double = {4.6236050467459811e-58, -3.7422963639201271e-245},
v16_int8 = {52, 7, 55, -32, -94, -104, 49, 49, -115, 48, 90, -120, -88, -10, 67, 50},
v8_int16 = {13319, 14304, -23912, 12593, -29392, 23176, -22282, 17202},
v4_int32 = {872888288, -1567084239, -1926210936, -1460255950},
v2_int64 = {3749026652749312305, -8273012972482837710},
uint128 = 0x340737e0a29831318d305a88a8f64332
}
So would someone kindly guide me how to interpret the data.

SSE (XMM) registers can be interpreted in various different ways. The register itself has no knowledge of the implicit data representation, it just holds 128 bits of data. An XMM register can represent:
4 x 32 bit floats __m128
2 x 64 bit doubles __m128d
16 x 8 bit ints __m128i
8 x 16 bit ints __m128i
4 x 32 bit ints __m128i
2 x 64 bit ints __m128i
128 individual bits __m128i
So when gdb displays an XMM register it gives you all possible interpretations, as seen in your example above.
If you want to display a register using a specific interpretation (e.g. 16 x 8 bit ints) then you can do it like this:
(gdb) p $xmm0.v16_int8
$1 = {0, 0, 0, 0, 0, 0, 0, 0, -113, -32, 32, -50, 0, 0, 0, 2}
As for endianness, gdb displays the register contents in natural order, i.e. left-to-right, from MS to LS.
So if you have the following code:
#include <stdio.h>
#include <stdint.h>
#include <xmmintrin.h>
int main(int argc, char *argv[])
{
int8_t buff[16] __attribute__ ((aligned(16))) = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
__m128i v = _mm_load_si128((__m128i *)buff);
printf("v = %vd\n", v);
return 0;
}
If you compile and run this you will see:
v = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
However if you step through the code in gdb and examine v you will see:
v16_int8 = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0}

Related

Eigen: Extracting a block misses some entries

I try to take a block in Eigen:
Eigen::VectorXi v = Eigen::VectorXi::Zero(20);
v << 7, 10, 11, 14, 15, 16, 16, 1, 2, 3, 2, 3, 4, 5, 4, 5, 0, 0, 0, 0;
cout << "v = " << v << endl;
v = v.block(0, 0, 16, 1);
cout << "v = "<< v << endl;
Strangely, the first two entries in v will be zero after taking the block.
The output of the program looks as follows:
v = 7 # start original vector from here
10
11
14
15
16
16
1
2
3
2
3
4
5
4
5
0
0
0
0
v = 0 # start block out of vector, why zero here?
0
11
14
15
16
16
1
2
3
2
3
4
5
4
5
What's going wrong here? Could this be a bug in Eigen or did I missunderstand something from the documentation?
You are experiencing an aliasing problem: v gets resized before v.block(...) gets assigned to the new v. There are two solutions:
Evaluate into a temporary using .eval():
v = v.block(0,0,16,1).eval(); // or
v = v.head(16).eval(); // shorter but equivalent
Or, use conservativeResize() (but only works if you want to keep the top-left corner):
v.conservativeResize(16);
For further reference this page summarizes some common aliasing pitfalls (it also mentions the resizing alias you experienced).

How to change the outerstride in Eigen matrices without using map function

I want to change the outerstride of Eigen matrix at compile time without using map function.
I tried to change this using OuterStrideAtCompileTime variable,but it doesn't work.Is there any way to do that ?
One more thing printing mat.Outerstride() every time gives number of rows of input matrix.How to print OuterStride of eigen matrix?
Thanks in advance.
I was defining an eigen matrix with map function like
MatrixXf mat;
float arr[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
mat = Map<Matrix<float,Dynamic,Dynamic,Eigen::RowMajor>, 0, OuterStride<Dynamic> > (arr,4,4,OuterStride<Dynamic>(5));
It's working fine, whenever I tried to change the outer stride by using
mat.OuterStrideAtCompileTime = 7;
It's not working.
Outer stride is a parameter related to the data storage. A more commonly used name is leading dimension. You could find some explaination here.
http://www.ibm.com/support/knowledgecenter/SSFHY8_5.3.0/com.ibm.cluster.essl.v5r3.essl100.doc/am5gr_leaddi.htm
Basically for an existing matrix, it can not be changed. The only way to change it without changing the elements of the matrix is to copy the matrix to a new memory space using a different outer stride setting. This usually happens when you copy a matrix into another as a sub-matrix.
For a column major matrix the minimum possible outer stride equals to the number of rows, as the number you have printed out.
When using Eigen, you don't need worry about it as Eigen usually takes care of it for you expect for Eigen::Map.
You code actually doesn't work. Setting out stride to 5 is already out of range as the existing matrix(4x4) stored in arr is of stride 4 and stride 5 x 4 columns = 20 > 16.
#include <iostream>
#include <Eigen/Eigen>
int main(void) {
using namespace Eigen;
MatrixXf mat;
float arr[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
mat = Map<Matrix<float, Dynamic, Dynamic, Eigen::RowMajor>, 0,
OuterStride<Dynamic> >(arr, 4, 4, OuterStride<Dynamic>(5));
std::cout << "mat with stride 5:\n" << mat << std::endl;
mat = Map<Matrix<float, Dynamic, Dynamic, Eigen::RowMajor>, 0,
OuterStride<Dynamic> >(arr, 4, 4, OuterStride<Dynamic>(4));
std::cout << "mat with stride 4:\n" << mat << std::endl;
return 0;
}
Please compare the output.
mat with stride 5:
1 2 3 4
6 7 8 9
11 12 13 14
16 0 0 5.01639e-14
mat with stride 4:
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
If you extend the array to 20 elements
#include <iostream>
#include <Eigen/Eigen>
int main(void) {
using namespace Eigen;
MatrixXf mat;
float arr[20] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 };
Map<Matrix<float, Dynamic, Dynamic, Eigen::RowMajor>, 0, OuterStride<Dynamic> > map1(arr, 4, 4, OuterStride<Dynamic>(5));
mat = map1;
std::cout << "map1 outer stride: " << map1.outerStride() << std::endl << map1 << std::endl;
std::cout << "mat outer stride: " << mat.outerStride() << std::endl << mat << std::endl;
Map<Matrix<float, Dynamic, Dynamic, Eigen::RowMajor>, 0, OuterStride<Dynamic> > map2(arr, 4, 4, OuterStride<Dynamic>(4));
mat = map2;
std::cout << "map2 outer stride: " << map2.outerStride() << std::endl << map2 << std::endl;
std::cout << "mat outer stride: " << mat.outerStride() << std::endl << mat << std::endl;
return 0;
}
The output will be
map1 outer stride: 5
1 2 3 4
6 7 8 9
11 12 13 14
16 17 18 19
mat outer stride: 4
1 2 3 4
6 7 8 9
11 12 13 14
16 17 18 19
map2 outer stride: 4
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
mat outer stride: 4
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
You could also see the outer stride changes when copying map1 to mat.
Hope this give you a better view of what out stride is.
In fact in your original code, you are using Map in a wrong way - you shouldn't have copied Map() to Matrix mat.
That's why when you print the stride of mat, it is always 4.
What you need to do is to eliminate the unnecessary data copy and print the stride of map1/map2.

Replacing part of std::vector by smaller std::vector

I wonder what would be the correct way to replace (overwriting) a part of a given std::vector "input" by another, smaller std::vector?
I do neet to keep the rest of the original vector unchanged.
Also I do not need to bother what has been in the original vector and
I don't need to keep the smaller vector afterwards anymore.
Say I have this:
std::vector<int> input = { 0, 0, 1, 1, 2, 22, 3, 33, 99 };
std::vector<int> a = { 1, 2, 3 };
std::vector<int> b = { 4, 5, 6, 7, 8 };
And I want to achieve that:
input = { 1, 2, 3, 4, 5, 6, 7, 8, 99}
What is the right way to do it? I thought of something like
input.replace(input.beginn(), input.beginn()+a.size(), a);
// intermediate input would look like that: input = { 1, 2, 3, 1, 2, 22, 3, 33, 99 };
input.replace(input.beginn()+a.size(), input.beginn()+a.size()+b.size(), b);
There should be a standard way to do it, shouldn't it?
My thoughts on this so far are the following:
I can not use std::vector::assign for it destroys all elements of input
std::vector::push_back would not replace but enlarge the input --> not what I want
std::vector::insert also creates new elements and enlages the input vector but I know for sure that the vectors a.size() + b.size() <= input.size()
std::vector::swap would not work since there is some content of input that needs to remain there ( in the example the last element) also it would not work to add b that way
std::vector::emplace also increases the input.size -> seems wrong as well
Also I would prefer if the solution would not waste performance by unnecessary clears or writing back values into the vectors a or b. My vectors will be very large for real and this is about performance in the end.
Any competent help would be appreciated very much.
You seem to be after std::copy(). This is how you would use it in your example (live demo on Coliru):
#include <algorithm> // Necessary for `std::copy`...
// ...
std::vector<int> input = { 0, 0, 1, 1, 2, 22, 3, 33, 99 };
std::vector<int> a = { 1, 2, 3 };
std::vector<int> b = { 4, 5, 6, 7, 8 };
std::copy(std::begin(a), std::end(a), std::begin(input));
std::copy(std::begin(b), std::end(b), std::begin(input) + a.size());
As Zyx2000 notes in the comments, in this case you can also use the iterator returned by the first call to std::copy() as the insertion point for the next copy:
auto last = std::copy(std::begin(a), std::end(a), std::begin(input));
std::copy(std::begin(b), std::end(b), last);
This way, random-access iterators are no longer required - that was the case when we had the expression std::begin(input) + a.size().
The first two arguments to std::copy() denote the source range of elements you want to copy. The third argument is an iterator to the first element you want to overwrite in the destination container.
When using std::copy(), make sure that the destination container is large enough to accommodate the number of elements you intend to copy.
Also, the source and the target range should not interleave.
Try this:
#include <iostream>
#include <vector>
#include <algorithm>
int main() {
std::vector<int> input = { 0, 0, 1, 1, 2, 22, 3, 33, 99 };
std::vector<int> a = { 1, 2, 3 };
std::vector<int> b = { 4, 5, 6, 7, 8 };
std::set_union( a.begin(), a.end(), b.begin(), b.end(), input.begin() );
for ( std::vector<int>::const_iterator iter = input.begin();
iter != input.end();
++iter )
{
std::cout << *iter << " ";
}
return 0;
}
It outputs:
1 2 3 4 5 6 7 8 99

XS PPCODE not behaving

I'm working on calling a third-party DLL from my Perl project using XS, under Cygwin on Windows using g++. One of the DLL functions takes a struct as an argument and returns its main results in a pointer to a struct. For now I pass in a flat list of 28 integers and populate the first struct. Then I call the function. Then I want to flatten the resulting struct into a list of up to 54 integers.
(This seems like a lot of integers, but the DLL function is quite complex and takes a long time to run, so I think it's worth it. Unless someone has a better idea?)
This is close to working. I can tell that the results are mostly sensible. But there are two bizarre problems.
When I print out the same variables, I get different results depending on whether it's in a 'for' loop or not! I show this below. I've stared at this so long now.
I get "Out of memory" as soon as I get to the first XPUSHs.
Here is the XS code.
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"
#include "dll.h"
MODULE = Bridge::Solver::DDS_IF PACKAGE = Bridge::Solver::DDS_IF
PROTOTYPES: ENABLE
void
SolveBoard(inlist)
SV * inlist
INIT:
struct deal dl;
struct futureTricks fut;
int target, solutions, mode, thrId;
int i, j, ret;
if ((! SvROK(inlist)) ||
(SvTYPE(SvRV(inlist)) != SVt_PVAV) ||
av_len((AV *) SvRV(inlist)) != 27)
{
XSRETURN_UNDEF;
}
printf("New INIT OK\n");
PPCODE:
dl.trump = SvIV(*av_fetch((AV *)SvRV(inlist), 0, 0));
dl.first = SvIV(*av_fetch((AV *)SvRV(inlist), 1, 0));
dl.currentTrickSuit[0] = SvIV(*av_fetch((AV *)SvRV(inlist), 2, 0));
dl.currentTrickSuit[1] = SvIV(*av_fetch((AV *)SvRV(inlist), 3, 0));
dl.currentTrickSuit[2] = SvIV(*av_fetch((AV *)SvRV(inlist), 4, 0));
dl.currentTrickRank[0] = SvIV(*av_fetch((AV *)SvRV(inlist), 5, 0));
dl.currentTrickRank[1] = SvIV(*av_fetch((AV *)SvRV(inlist), 6, 0));
dl.currentTrickRank[2] = SvIV(*av_fetch((AV *)SvRV(inlist), 7, 0));
dl.remainCards[0][0] = SvUV(*av_fetch((AV *)SvRV(inlist), 8, 0));
dl.remainCards[0][1] = SvUV(*av_fetch((AV *)SvRV(inlist), 9, 0));
dl.remainCards[0][2] = SvUV(*av_fetch((AV *)SvRV(inlist), 10, 0));
dl.remainCards[0][3] = SvUV(*av_fetch((AV *)SvRV(inlist), 11, 0));
dl.remainCards[1][0] = SvUV(*av_fetch((AV *)SvRV(inlist), 12, 0));
dl.remainCards[1][1] = SvUV(*av_fetch((AV *)SvRV(inlist), 13, 0));
dl.remainCards[1][2] = SvUV(*av_fetch((AV *)SvRV(inlist), 14, 0));
dl.remainCards[1][3] = SvUV(*av_fetch((AV *)SvRV(inlist), 15, 0));
dl.remainCards[2][0] = SvUV(*av_fetch((AV *)SvRV(inlist), 16, 0));
dl.remainCards[2][1] = SvUV(*av_fetch((AV *)SvRV(inlist), 17, 0));
dl.remainCards[2][2] = SvUV(*av_fetch((AV *)SvRV(inlist), 18, 0));
dl.remainCards[2][3] = SvUV(*av_fetch((AV *)SvRV(inlist), 19, 0));
dl.remainCards[3][0] = SvUV(*av_fetch((AV *)SvRV(inlist), 20, 0));
dl.remainCards[3][1] = SvUV(*av_fetch((AV *)SvRV(inlist), 21, 0));
dl.remainCards[3][2] = SvUV(*av_fetch((AV *)SvRV(inlist), 22, 0));
dl.remainCards[3][3] = SvUV(*av_fetch((AV *)SvRV(inlist), 23, 0));
target = SvIV(*av_fetch((AV *)SvRV(inlist), 24, 0));
solutions = SvIV(*av_fetch((AV *)SvRV(inlist), 25, 0));
mode = SvIV(*av_fetch((AV *)SvRV(inlist), 26, 0));
thrId = SvIV(*av_fetch((AV *)SvRV(inlist), 27, 0));
ret = SolveBoard(dl, target, solutions, mode, &fut, thrId);
printf("Return code %d\n", ret);
printf("Nodes %d\n", fut.nodes);
printf("Cards %d\n", fut.cards);
printf("%6s %12s %12s %12s %12s\n",
"", "suit", "rank", "equals", "score");
printf("%6d %12d %12d %12d %12d\n\n",
0, fut.suit[0], fut.rank[0], fut.equals[0], fut.score[0]);
for (i = 0; i < 13; i++)
{
printf("%6d %12d %12d %12d %12d\n",
i, fut.suit[i], fut.rank[i], fut.equals[i], fut.score[i]);
}
printf("\n%6d %12d %12d %12d %12d\n\n",
0, fut.suit[0], fut.rank[0], fut.equals[0], fut.score[0]);
printf("Trying to push nodes\n");
XPUSHs(sv_2mortal(newSViv(fut.nodes)));
printf("Trying to push cards\n");
XPUSHs(sv_2mortal(newSViv(fut.cards)));
printf("Trying to loop\n");
for (i = 0; i <= 12; i++)
{
XPUSHs(sv_2mortal(newSViv(fut.suit [i])));
XPUSHs(sv_2mortal(newSViv(fut.rank [i])));
XPUSHs(sv_2mortal(newSViv(fut.equals[i])));
XPUSHs(sv_2mortal(newSViv(fut.score [i])));
}
printf("Done looping\n");
Here is the relevant part of the DLL header file.
struct futureTricks
{
int nodes;
int cards;
int suit[13];
int rank[13];
int equals[13];
int score[13];
};
struct deal
{
int trump;
int first;
int currentTrickSuit[3];
int currentTrickRank[3];
unsigned int remainCards[4][4];
};
extern "C" int SolveBoard(
struct deal dl,
int target,
int solutions,
int mode,
struct futureTricks *futp,
int threadIndex);
And here is the output. The return code is OK. The nodes and cards are not. If you squint, you might notice that 0 and 768 also occur within the output table, so maybe there's some kind of offset going on.
The first bizarre thing is that the two '0' lines before and after the main table are different from the '0' line in the main table. The data in the main table is as expected, though, including the garbage in lines 10-12.
The second problem is that XPUSHs doesn't do as intended.
New INIT OK
Return code 1
Nodes 0
Cards 768
suit rank equals score
0 0 2 -2147319000 -2147296756
0 2 2 0 2
1 2 6 0 2
2 2 10 768 2
3 2 13 0 2
4 3 14 0 2
5 0 6 0 1
6 0 10 512 1
7 0 13 0 1
8 3 4 0 0
9 3 11 0 0
10 1773292640 -2147056120 4 -2147319000
11 1772354411 0 -2146989552 -2146837752
12 8192 35 2665016 -2147319000
0 0 2 -2147319000 -2147296756
Trying to push nodes
Out of memory!
It was indeed a problem with the stack.
The supplied dll.h tested _WIN32 and #define'd STDCALL to __stdcall under _WIN32, otherwise to empty.
g++ under Cygwin does not emit _WIN32, so I guess the calling convention defaulted to __cdecl.
Manually defining _WIN32 created lots of other errors, but instead I added to dll.h a test for \__CYGWIN__, which the compiler does emit, and gave it to the author for his next release.
A very frustrating error to find, so I hope this might help somebody else in the future. You never know...
with the offset problem, there may be because Perl messes pretty bad with C variable definitions.
including dll.h before all others will probably solve that.

Hand tracing a pseudo code

I have this pseudo code that I need to hand trace:
begin
count <- 1
while count < 11
t <- (count ^ 2) - 1
output t
count <- count + 1
endwhile
end
I am unsure what <- means and I don't really understand what to do with the t. I also keep getting 1,1,1, etc. every time I go through. Any help would be appreciated!
First off the operator <- means "gets", as in an assignment. So:
count <- count + 1
Means to set the variable count to the value count + 1.
Second the program will output the first 10 values of x2-1, so:
t <- count^2 - 1
will evaluate to:
0, 3, 8, 15, 24, 35, 48, 63, 80, 99
for the values of count
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
respectively.
here is the code for it in C++, hope it helps:
int count = 1; // count <- 1
int t;
while ( count < 11 ){ // while count < 11
t = count * count - 1; // t <- (count ^ 2) - 1
std::cout<<t<<std::endl; // output t
count ++; // count <- count + 1
} // endwhile
and as said in the previous answer:
count takes the values: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
and t will take the values: 0, 3, 8, 15, 24, 35, 48, 63, 80, 99

Resources