How to efficiently prepend a vector to a vector - c++11

I have 2 vectors like so:
std::vector<unsigned char> v1;
std::vector<unsigned char> v2;
Each of them has some data of course.
I know that following is the way if I want to append v2 to v1.
v1.insert(v1.end(), v2.begin(), v2.end());
Question:
How can I prepend v1 to v2 instead?
v2.insert(v2.begin(), v1.begin(), v1.end()) doesn't seem to work here?
I know that I might get suggestions of using std::deque instead but the problem is that v2 is coming as a std::vector out of a function in a legacy piece of code which I cannot change. And it contains a huge amount of data which I do not want copy over to v2 by appending. So, I simply want to prepend v1 to v2 because v1 is extremely small compared to v2

This does work very well:
std::vector<int> a{ 1,2,3,4 };
std::vector<int> b{ 5, 6, 7 };
a.insert(a.begin(), b.begin(), b.end());
// a is {5, 6, 7, 1, 2, 3, 4}
What problem did you face?

Related

What is the efficient way of sorting columnar data

In my program, I need an efficient way of sorting in-memory columnar data.
Let me explain this problem.
The data consists of four objects:
(1, 15, ‘apple’), (9, 27, ‘pear’),(7, 38, ‘banana’),(4, 99, ‘orange’)
And the four objects are kept in memory in a columnar, and looks like this:
[1,9,7,4],[15, 27, 38, 99],[‘apple’,’pear’,’banana’,’orange’]
I need to sort this list according to the second column in ascending order and the third in descending order.
In the case of only one columnar, it is a simple sorting problem.
But the case is different when two or more columns exist for in-memory columnar data.
The swap function may incur too many overheads during sorting columnar data.
I’ve checked several Open-sourced implementations to find the best practice, e.g., Apache Arrow, Presto, TDengine, and other projects.
And I found that using index sort is a way that can avoid the overhead introduced by swapping since the index, instead of columnar data, will be swapped.
And I’m wondering is the index sort the most efficient means to handle this problem?
If you want speed then fastest language is C++.
You can use std::sort for sorting purposes with parallel execution policy std::execution::par_unseq (which enables multi-threaded multi-core parallel sort).
As you can notice in my code below, I did arg-sort, because you wished for it. But in C++ it is not really necessary, it is enough to do regular sort, for two reasons.
One is because of cache locality, swapping data elements themselves instead of indices is faster because sorting algorithms are often more or less cache friendly, meaning that swapping/comparing nearby-in-memory elements happens often.
Second is because swapping of elements in std::sort is done through std::swap, which in turn uses std::move, this move function swaps classes like std::string very efficiently by just swapping pointers to data instead of copying data itself.
From above it follows that doing arg-sort instead of regular sort might be even slower.
Following code snippet first creates small example file with several data tuples that you provided. In real case you should remove this file writing code so that you don't overwrite your file. At the end it outputs to new file.
After program finishes see created files data.in and data.out.
Try it online!
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <tuple>
#include <execution>
#include <algorithm>
int main() {
{
std::ofstream f("data.in");
f << R"(
7, 38, banana
9, 27, pear
4, 99, orange
1, 15, apple
)";
}
std::vector<std::tuple<int, int, std::string>> data;
{
std::ifstream f("data.in");
std::string c;
while (true) {
int a = 0, b = 0;
char comma = 0;
c.clear();
f >> a >> comma >> b >> comma >> c;
if (c.empty() && !f)
break;
data.push_back({a, b, c});
}
}
std::vector<size_t> idx(data.size());
for (size_t i = 0; i < idx.size(); ++i)
idx[i] = i;
std::sort(std::execution::par_unseq, idx.begin(), idx.end(),
[&data](auto const & i, auto const & j){
auto const & [_0, x0, y0] = data[i];
auto const & [_1, x1, y1] = data[j];
if (x0 < x1)
return true;
else if (x0 == x1)
return y0 > y1;
else
return false;
});
{
std::ofstream f("data.out");
for (size_t i = 0; i < idx.size(); ++i) {
auto const & [x, y, z] = data[idx[i]];
f << x << ", " << y << ", " << z << std::endl;
}
}
}
Input:
7, 38, banana
9, 27, pear
4, 99, orange
1, 15, apple
Output:
1, 15, apple
9, 27, pear
7, 38, banana
4, 99, orange
Normally this kind of activities is duty of databases.
Database can store data in any sorted order; then no need to sort data every time you want to retrieve them. If you use SQL Server as Database you can create table like:
Create Table TableName (
FirstColumn int not null,
SecondColumn int not null,
ThirdColumn nvarchar(1000) not null,
Primary Key(SecondColumn ASC, ThirdColumn DESC))
The most important part is specifying clustered index which I choose to be the combination of SecondColumn in ascending order & ThirdColumn in descending order as you requested in the question. Why? To answer the reason you should know the facts, that Clustered Index (Primary Key) specify physical order of data in disk (So already data is in your favorite order sorted). Also you can add more Nonclustered index which covers your query to ensure other queries (which their sort order are not same as physical's order) will retrieve fast enough.
Be aware of poor performance of bad design. So If you are not familiar with Database Design, get help from Database Developers or administrators.

Eigen: map non-continous data in an arra with stride

I have a data array (double *) in memory which looks like:
[x0,y0,z0,junk,x1,y1,z1,junk,...]
I would like to map it to an Eigen vector and virtually remove the junk values by doing something like:
Eigen::Map<
Eigen::Matrix<double, Eigen::Dynamic, 1, Eigen::ColMajor>,
Eigen::Unaligned,
Eigen::OuterStride<4>
>
But it does not work because the outerstride seems to be restricted to 2D matrices.
Is there a trick to do what I want?
Many thanks!
With the head of Eigen, you can map it as a 2D matrix and then view it as a 1D vector:
auto m1 = Matrix<double,3,Dynamic>::Map(ptr, 3, n, OuterStride<4>());
auto v = m1.reshaped(); // new in future Eigen 3.4
But be aware accesses to such a v involve costly integer division/modulo.
If you want a solution compatible with Eigen 3.3, you can do something like this
VectorXd convert(double const* ptr, Index n)
{
VectorXd res(n*3);
Matrix3Xd::Map(res.data(), 3, n) = Matrix4Xd::Map(ptr, 4, n).topRows<3>();
return res;
}
But this of course would copy the data, which you probably intended to avoid.
Alternatively, you should think about whether it is possible to access your data as a 3xN array/matrix instead of a flat vector (really depends on what you are actually doing).

Correct way to initialize tensor in ARM compute library?

What is the correct way to initialize a tensor in the ARM compute library? I have not found any documentation on what is the correct way to do it.
The tensor I have contains floats (F32). I can write data directly by accessing the underlying data through the buffer() interface, which returns a pointer to uint8_t. However, I am not sure how to figure out the data layout because it does not appear to be contiguous, i.e. if I write 4 floats to a 4x1 tensor,
Tensor x{};
x.allocator()->init(TensorInfo(4, 1, Format::F32));
float xdata[] = {1, 2, 3, 4};
FILE *fd = fmemopen(x.buffer(), 4 * sizeof(float), "wb");
fwrite(xdata, sizeof(float), 4, fd);
fclose(fd);
x.print(std::cout);
This prints out,
1 2 3 1.17549e-38
The fist 3 elements of 'x' are initialized, but the last one is not. If I change the fwrite line to,
fwrite(xdata, sizeof(float), 6, fd);
then the output is
1 2 3 4
So it may be that there are more bytes being allocated than necessary for 4 floats, or this could be some misleading coincidence. Either way, this is not the right way to initialize the values of the tensor.
Any help would be greatly appreciated.
From arm compute library documentation (v18.08), looks like the right way to initialize in your case would be "import_memory" function. See example here: https://github.com/ARM-software/ComputeLibrary/blob/master/tests/validation/NEON/UNIT/TensorAllocator.cpp
I think you have to allocate the tensor as well.
More precisely
Tensor x{};
x.allocator()->init(TensorInfo(4, 1, Format::F32)); Set the metadata
x.allocator()->allocate(); // Now the memory has been allocated
float xdata[] = {1, 2, 3, 4};
memcpy(x.data(), xdata, 4 * sizeof(float), "wb");
x.print(std::cout);
This code is not tested, but it should give you a fairly good idea!

Initializing a 2D vector using initialization list in C++11

How can i initialize a 2D vector using an initialization list?
for a normal vector doing :
vector<int> myvect {1,2,3,4};
would suffice. But for a 2D one doing :
vector<vector<int>> myvect{ {10,20,30,40},
{50,60,70,80}
};
What is a correct way of doing it?
And how can i iterate through it using for?
for(auto x: myvect)
{
cout<<x[j++]<<endl;
}
this for only shows:
10,1 !
And by the way what does this mean ?
vector<int> myvect[5] {1,2,3,4};
i saw it here and cant understand it! Link
What is a correct way of doing it?
The way you showed is a possible way. You could also use:
vector<vector<int>> myvect = { {10,20,30,40},
{50,60,70,80} };
vector<vector<int>> myvect{ vector<int>{10,20,30,40},
vector<int>{50,60,70,80} };
The first one constructs a std::initializer_list<std::vector<int>> where the elements are directly initialized from the inner braced-initializer-lists. The second one explicitly constructs temporary vectors which then are moved into a std::initializer_list<std::vector<int>>. This will probably not make a difference, since that move can be elided.
In any way, the elements of the std::initializer_list<std::vector<int>> are copied back out into myvect (you cannot move out of a std::initializer_list).
And how can i iterate through it using for?
You essentially have a vector of vectors, therefore you need two loops:
for(vector<int> const& innerVec : myvect)
{
for(int element : innerVec)
{
cout << element << ',';
}
cout << endl;
}
I refrained from using auto to explicitly show the resulting types.
And by the way what does this mean ?
This is probably a typo. As it stands, it's illegal. The declaration vector<int> myvect[5]; declares an array of 5 vector<int>. The following list-initialization therefore needs to initialize the array, but the elements of this list are not implicitly convertible to vector<int> (there's a ctor that takes a size_t, but it's explicit).
That has already been pointed out in the comments of that side.
I guess the author wanted to write std::vector<int> vArray = {3, 2, 7, 5, 8};.

coding with vectors using the Accelerate framework

I'm playing around with the Accelerate framework for the first time with the goal of implementing some vectorized code into an iOS application. I've never tried to do anything with respect to working with vectors in Objective C or C. Having some experience with MATLAB, I wonder if using Accelerate is indeed that much more of a pain. Suppose I'd want to calculate the following:
b = 4*(sin(a/2))^2 where a and b are vectors.
MATLAB code:
a = 1:4;
b = 4*(sin(a/2)).^2;
However, as I see it after some spitting through the documentation, things are quite different using Accelerate.
My C implementation:
float a[4] = {1,2,3,4}; //define a
int len = 4;
float div = 2; //define 2
float a2[len]; //define intermediate result 1
vDSP_vsdiv(a, 1, &div, a2, 1, len); //divide
float sinResult[len]; //define intermediate result 2
vvsinf(sinResult, a2, &len); //take sine
float sqResult[len]; //square the result
vDSP_vsq(sinResult, 1, sqResult, 1, len); //take square
float factor = 4; //multiply all this by four
float b[len]; //define answer vector
vDSP_vsmul(sqResult, 1, &factor, b, 1, len); //multiply
//unset all variables I didn't actually need
Honestly, I don't know what's worst here: keeping track of all intermediate steps, trying to memorize how the arguments are passed in vDSP with respect to VecLib (quite different), or that it takes so much time doing something quite trivial.
I really hope I am missing something here and that most steps can be merged or shortened. Any recommendations on coding resources, good coding habits (learned the hard way or from a book), etc. would be very welcome! How do you all deal with multiple lines of vector calculations?
I guess you could write it that way, but it seems awfully complicated to me. I like this better (intel-specific, but can easily be abstracted for other architectures):
#include <Accelerate/Accelerate.h>
#include <immintrin.h>
const __m128 a = {1,2,3,4};
const __m128 sina2 = vsinf(a*_mm_set1_ps(0.5));
const __m128 b = _mm_set1_ps(4)*sina2*sina2;
Also, just to be pedantic, what you're doing here is not linear algebra. Linear algebra involves only linear operations (no squaring, no transcendental operations like sin).
Edit: as you noted, the above won't quite work out of the box on iOS; the biggest issue is that there is no vsinf (vMathLib is not available in Accelerate on iOS). I don't have the SDK installed on my machine to test, but I believe that something like the following should work:
#include <Accelerate/Accelerate.h>
const vFloat a = {1, 2, 3, 4};
const vFloat a2 = a*(vFloat){0.5,0.5,0.5,0.5};
const int n = 4;
vFloat sina2;
vvsinf((float *)&sina2, (const float *)&a, &n);
const vFloat b = sina2*sina2*(vFloat){4,4,4,4};
Not quite as pretty as what is possible with vMathLib, but still fairly compact.
In general, a lot of basic arithmetic operations on vectors just work; there's no need to use calls to any library, which is why Accelerate doesn't go out of its way to supply those operations cleanly. Instead, Accelerate usually tries to provide operations that aren't immediately available by other means.
To answer my own question:
In iOS 6, vMathLib will be introduced. As Stephen clarified, vMathLib could already be used on OSX, but it was not available in iOS. Until now.
The functions that vMathLib provides will allow for easier vector calculations.

Resources