I'm new to kernel development, and I need to write a Linux kernel module that performs several matrix multiplications (I'm working on an x64_64 platform). I'm trying to use fixed-point values for these operations, however during compilation, the compiler encounters this error:
error: SSE register return with SSE disabled
I don't know that much about SSE or this issue in particular, but from what i've found and according to most answers to questions about this problem, it is related to the usage of Floating-Point (FP) arithmetic in kernel space, which seems to be rarely a good idea (hence the utilization of Fixed-Point arithmetics). This error seems weird to me because I'm pretty sure I'm not using any FP values or operations, however it keeps popping up and in some ways that seem weird to me. For instance, I have this block of code:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
const int scale = 16;
#define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
#define FIXED_TO_DOUBLE(x) ((x) / (1 << scale))
#define MULT(x, y) ((((x) >> 8) * ((y) >> 8)) >> 0)
#define DIV(x, y) (((x) << 8) / (y) << 8)
#define OUTPUT_ROWS 6
#define OUTPUT_COLUMNS 2
struct matrix {
int rows;
int cols;
double *data;
};
double outputlayer_weights[OUTPUT_ROWS * OUTPUT_COLUMNS] =
{
0.7977986, -0.77172316,
-0.43078753, 0.67738613,
-1.04312621, 1.0552227 ,
-0.32619684, 0.14119884,
-0.72325027, 0.64673559,
0.58467862, -0.06229197
};
...
void matmul (struct matrix *A, struct matrix *B, struct matrix *C) {
int i, j, k, a, b, sum, fixed_prod;
if (A->cols != B->rows) {
return;
}
for (i = 0; i < A->rows; i++) {
for (j = 0; j < B->cols; j++) {
sum = 0;
for (k = 0; k < A->cols; k++) {
a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
b = DOUBLE_TO_FIXED(B->data[k * B->rows + j]);
fixed_prod = MULT(a, b);
sum += fixed_prod;
}
/* Commented the following line, causes error */
//C->data[i * C->rows + j] = sum;
}
}
}
...
static int __init insert_matmul_init (void)
{
printk(KERN_INFO "INSERTING MATMUL");
return 0;
}
static void __exit insert_matmul_exit (void)
{
printk(KERN_INFO "REMOVING MATMUL");
}
module_init (insert_matmul_init);
module_exit (insert_matmul_exit);
which compiles with no errors (I left out code that I found irrelevant to the problem). I have made sure to comment any error-prone lines to get to a point where the program can be compiled with no errors, and I am trying to solve each of them one by one. However, when uncommenting this line:
C->data[i * C->rows + j] = sum;
I get this error message in a previous (unmodified) line of code:
error: SSE register return with SSE disabled
sum += fixed_prod;
~~~~^~~~~~~~~~~~~
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error. Maybe my fixed-point implementation is flawed (I'm no expert in that matter either), or maybe I'm missing something obvious. Just in case, I have tested the same logic in a user-space program (using Floating-Point values) and it seems to work fine. In either case, any help in solving this issue would be appreciated. Thanks in advance!
Edit: I have included the definition of matrix and an example matrix. I have been using the default kbuild command for building external modules, here is what my Makefile looks like:
obj-m = matrix_mult.o
KVERSION = $(shell uname -r)
all:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules
Linux compiles kernel code with -mgeneral-regs-only on x86, which produces this error in functions that do anything with FP or SIMD. (Except via inline asm, because then the compiler doesn't see the FP instructions, only the assembler does.)
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error.
GCC optimizes whole functions when optimization is enabled, and you are using FP inside that function. You're doing FP multiply and truncating conversion to integer with your macro and assigning the result to an int, since the MCVE you eventually provided shows struct matrix containing double *data.
If you stop the compiler from using FP instructions (like Linux does by building with -mgeneral-regs-only), it refuses to compile your file instead of doing software floating-point.
The only odd thing is that it pins down the error to an integer += instead of one of the statements that compiles to a mulsd and cvttsd2si
If you disable optimization (-O0 -mgeneral-regs-only) you get a more obvious location for the same error (https://godbolt.org/z/Tv5nG6nd4):
<source>: In function 'void matmul(matrix*, matrix*, matrix*)':
<source>:9:33: error: SSE register return with SSE disabled
9 | #define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
| ~~~~~^~~~~~~~~~~~~~~
<source>:46:21: note: in expansion of macro 'DOUBLE_TO_FIXED'
46 | a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
| ^~~~~~~~~~~~~~~
If you really want to know what's going on with the GCC internals, you could dig into it with -fdump-tree-... options, e.g. on the Godbolt compiler explorer there's a dropdown for GCC Tree / RTL output that would let you look at the GIMPLE or RTL internal representation of your function's logic after various analyzer passes.
But if you just want to know whether there's a way to make this function work, no obviously not, unless you compile a file without -mgeneral-registers-only. All functions in a file compiled that way must only be called by callers that have used kernel_fpu_begin() before the call. (and kernel_fpu_end after).
You can't safely use kernel_fpu_begin inside a function compiled to allow it to use SSE / x87 registers; it might already have corrupted user-space FPU state before calling the function, after optimization. The symptom of getting this wrong is not a fault, it's corrupting user-space state, so don't assume that happens to work = correct. Also, depending on how GCC optimizes, the code-gen might be fine with your version, but might be broken with earlier or later GCC or clang versions. I somewhat expect that kernel_fpu_begin() at the top of this function would get called before the compiler did anything with FP instructions, but that doesn't mean it would be safe and correct.
See also Generate and optimize FP / SIMD code in the Linux Kernel on files which contains kernel_fpu_begin()?
Apparently -msse2 overrides -mgeneral-regs-only, so that's probably just an alias for -mno-mmx -mno-sse and whatever options disables x87. So you might be able to use __attribute__((target("sse2"))) on a function without changing build options for it, but that would be x86-specific. Of course, so is -mgeneral-regs-only. And there isn't a -mno-general-regs-only option to override the kernel's normal CFLAGS.
I don't have a specific suggestion for the best way to set up a build option if you really do think it's worth using kernel_fpu_begin at all, here (rather than using fixed-point the whole way through).
Obviously if you do save/restore the FPU state, you might as well use it for the loop instead of using FP to convert to fixed-point and back.
I tried a example on Auto for variable initialization and STL in C++. For normal variable, type was printed using : typeid(var_name).name() to print i (integer) / d(float) / pi(pointer) which works fine.
But while working on STL,
`#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<string> st;
st.push_back("geeks");
st.push_back("for");
for (auto it = st.begin(); it != st.end(); it++)
cout << typeid(it).name() << "\n";
return 0;
}
`
which gives output like,
`N9__gnu_cxx17__normal_iteratorIPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS6_SaIS6_EEEE
N9__gnu_cxx17__normal_iteratorIPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS6_SaIS6_EEEE`
and I am unable to understand the output logic behind it, can anyone explain why it is giving output like this? and thanks in advance
That's the "name mangled" version of the name of the type of it. typeinfo::name() is not required by the standard to return a name in human-readable format (a shortcoming IMHO) and GCC doesn't do so.
To get the actual, human-readable name, you need to call the abi::__cxa_demangle() function provided by GCC, but note that this is non-portable so if your project needs to work on different compilers you'll need to wrap it appropriately.
I want to read a binary file in 16 bit words. Right now, I'm using an std::ifstream to read into a 2 character array c.
#include <iostream>
#include <fstream>
#include <stdint.h>
int main() {
std::ifstream file("./tetris.rom", std::ios::in | std::ios::binary);
char c[2];
while (file.read(c, 2)) {
uint16_t word = (static_cast<uint8_t>(c[0]) << 8) | static_cast<uint8_t>(c[1]);
std::cout << "word\t" << std::hex << word << std::endl;
}
}
This works for me, but is there a better (either safer or faster) way of doing this in C++11?
There are no new APIs of reading files in C++11.
If the file fits into your RAM, the most optimal way is to map it into memory and access it as a byte array. However, the C++ standard library does not provide an API for that. You can do that with Boost though, see Boost.Interprocess Memory Mapped Files.
The usual advice stands though: start with your simple and correctly working code, benchmark and see if file reading is the bottleneck.
I downloaded some opensource using lapack/blas and i want to change that into Eigenbased source for auto SIMD code generation.
Is there any function in Eigen library same as dsyev in LAPACK.
dsyve returns info value for several purposes.
but as far as i know, eigensolver in Eigen library returns eigenvalue or eigenvector.
Is there a function what i want in Eigen library.
I think what you want is .info(), as well as other APIs provided by SelfAdjointEigenSolver.
The tutorial page also shows how to use it.
#include <iostream>
#include <Eigen/Dense>
using namespace std;
using namespace Eigen;
int main()
{
Matrix2f A;
A << 1, 2, 2, 3;
cout << "Here is the matrix A:\n" << A << endl;
SelfAdjointEigenSolver<Matrix2f> eigensolver(A);
if (eigensolver.info() != Success) abort();
cout << "The eigenvalues of A are:\n" << eigensolver.eigenvalues() << endl;
cout << "Here's a matrix whose columns are eigenvectors of A \n"
<< "corresponding to these eigenvalues:\n"
<< eigensolver.eigenvectors() << endl;
}
If you really want to know the details of NoConvergence as reported by dsyev(), you may have to use the low-level LAPACK API.
This function returns a value info.
If info=0, the execution is successful.
If info = -i, the i-th parameter had an illegal value.
If info = i, then the algorithm failed to converge; i indicates the
number of elements of an intermediate tridiagonal form which did not
converge to zero.
I am trying to do arbitrary precision arithmetic combined with the nice array syntax from blitz++. My problem is, that the generic math functions like cos, exp and so on don't work:
#include <blitz/array.h>
#include <boost/multiprecision/float128.hpp>
using namespace boost::multiprecision;
using namespace blitz;
int main() {
float128 a = 1;
a = cos(a);
cout << a << endl;
Array<float128,3> myarray(2,3,4);
myarray = 1;
myarray = cos(myarray);
cout << myarray;
}
g++ test.cpp -lquadmath -o test
The first block, using only float128 but not blitz, works fine. The second block with blitz however won't do the cos(myarray). The compiler seemingly figures out the iteration, but can not find the function to do the actual cos(x) for the values: Compiler error log
I would also like to use boost::multiprecision::mpfr, but one thing at a time. I hope someone can help.
I have found a solution, but it involves patching blitz. I have written this patch for blitz-0.10 and with the modified blitz the above code just works.