How to print address of all the dymanically allocated bytes - memory-management

First of all i am not sure whether my question is correct.
Here's the question.
char* ch = new char[20]
Now i want to know the address of each of the 20 bytes allocated.
I want to do something like
for(int i=0;i<20;i++)
cout<<ch++;
I am getting blank characters on doing this. Isn't this the correct way to do?
Can't i print address of all 20 bytes?
My aim :- I will allocate initially 20 bytes of memory for character type. Now i want to write one character at a time in each memory location. How to do this?

Try
const unsigned int COUNT = 20 ;
char *ch = new char [COUNT] ;
for (unsigned int ii = 0 ; ii < COUNT ; ++ ii)
{
void *tmp = &ch [ii] ;
cout << tmp << endl ;
}

Related

Counting Byte Occurrence in Read Files in BPF

I am relatively new to BPF and trying to write a program that counts the occurrence of each byte read from a file (later will calculate entropy).
The idea is to have two BPF_PERCPU_ARRAYS to circumvent stack size limitations. To one, I will copy the first 4096 bytes of the content of the written file, and with the other, I will count the occurrence of each possible value of a byte.
The described arrays are initialized like this:
struct data_t {
u8 data[4096];
};
struct counter_t {
u32 data[256];
};
BPF_PERCPU_ARRAY(storage, struct data_t, 1); //to store the buffer
BPF_PERCPU_ARRAY(countarr, struct counter_t, 1); //to count occurrences
and used later in a function:
//set both arrays to zero
int zero = 0;
struct data_t *pstorage = storage.lookup(&zero);
struct counter_t *pcountarr = countarr.lookup(&zero);
//check if init worked
if (!pstorage)
return 0;
if (!pcountarr)
return 0;
//copy data to storage
bpf_probe_read((void*)&pstorage->data, sizeof(pstorage->data), (void*)buf);
u8 tmpint = 0;
for(i = 0; i < 4095; i++){
if (i == count){
break;
}
tmpint = pstorage->data[i];
//TROUBLE IS HERE
//bpf_trace_printk("Current Byte: %d", (int)tmpint); //THIS IS LINE A
//pcountarr->data[tmpint]++; //THIS IS LINE B
}
The last two lines that are commented out are the ones giving me trouble. Uncommenting line A gives me the error
invalid access to map value, value_size=4096 off=4096 size=1
R8 min value is outside of the allowed memory range
processed 102513 insns (limit 1000000) max_states_per_insn 4 total_states 981 peak_states 977 mark_read 459
with R8 (are R8 and R8_w the same?) being
R8_w=map_value(id=0,off=4096,ks=4,vs=4096,imm=0)
Doing so with Line B results in pretty much the same problem. At this point im decently lost with my experience and wish i had posted this several days ago :D...
Any help is appreciated :)
You are assigning zero to i but it is defined outside of the loop. for(i = 0; i < 4095; i++){. I suspect that i is not an unsigned number and thus can have a negative minimum value according to the verifier. Would define i as a u16 and see if that fixes the issue:
for(u16 i = 0; i < 4095; i++){

Number of Computing units in OpenCL

_kernel void kmp(__global char pattern[2*4], __global char* string, __global int failure[2*4], __global int ret[2], int g_length, int l_length, int thread_num){
int pattern_num = 2;
int pattern_size = 4;
int gid = get_group_id(0);
int glid = get_global_id(0);
int lid = get_local_id(0);
int i, j, x = 0;
int old = 0;
__local char tmp_string[32768];
event_t event;
event = async_work_group_copy(tmp_string+lid*l_length, string+glid*l_length, l_length, 0);
wait_group_events(1, &event);
for(i = 0; i < pattern_num; i++){
x = i*pattern_size;
for(j = lid*l_length; j < (lid+1)*l_length; j++){
while(tmp_string[j] != pattern[x] && x > 0 && x != i*pattern_size){
x = failure[x-1]+i*pattern_size;
}
if(tmp_string[j] == pattern[x]){
if(x == (i+1)*pattern_size-1){
//ret[i]++;
old = atomic_add(&ret[i], 1);
x = failure[x]+i*pattern_size;
}
else{
x++;
}
}
}
}
barrier(CLK_LOCAL_MEM_FENCE);
}
I need help with this code.
To find the matched pattern in the string, I wrote code like this.
I'm using AMD Hawaii and it has 44 groups which have 64 cores in each group(Total 2816 computing units, I mean).
The problem is when I try using more than 44 computing units(Using more than 1 core in one group; like 88 units-using 2 cores in each group- or 2816 units-using 64 cores in each group-), it doesn't work well.
It couldn't correctly find the matched number.
I checked the index of string, ids(glid, gid, lid) and the size of all variable.
But, there is nothing wrong.
Anyone who has some advice, please help!
What is going wrong that you saying it doesn't work well? Also why are you not doing anything within async copy? Maybe a simple global to local assignment could work. Why is there a local barrier at the end of kernel?
Anyway, the error seems to be the async copy. It has different values for each thread in a group. For it to work right, it must be given exact same numbers in all threads of a group. Thats why it works with local size = 1 and not for bigger local groups.
For example, glid is different for all 64 threads in a group so it wouldn't work. Async work group copy command makes all threads of a group work on same copy. Not different copies. If you need different copies, you need multiple async commands serially but they would work async if you use the waiting on all of them at once.

when to use hton/ntoh and when to convert data myself?

to convert a byte array from another machine which is big-endian, we can use:
long long convert(unsigned char data[]) {
long long res;
res = 0;
for( int i=0;i < DATA_SIZE; ++i)
res = (res << 8) + data[i];
return res;
}
if another machine is little-endian, we can use
long long convert(unsigned char data[]) {
long long res;
res = 0;
for( int i=DATA_SIZE-1;i >=0 ; --i)
res = (res << 8) + data[i];
return res;
}
why do we need the above functions? shouldn't we use hton at sender and ntoh when receiving? Is it because hton/nton is to convert integer while this convert() is for char array?
The hton/ntoh functions convert between network order and host order. If these two are the same (i.e., on big-endian machines) these functions do nothing. So they cannot be portably relied upon to swap endianness. Also, as you pointed out, they are only defined for 16-bit (htons) and 32-bit (htonl) integers; your code can handle up to the sizeof(long long) depending on how DATA_SIZE is set.
Through the network you always receive a series of bytes (octets), which you can't directly pass to ntohs or ntohl. Supposing the incoming bytes are buffered in the (unsigned) char array buf, you could do
short x = ntohs(*(short *)(buf+offset));
but this is not portable unless buf+offset is always even, so that you read with correct alignment. Similarly, to do
long y = ntohl(*(long *)(buf+offset));
you have to make sure that 4 divides buf+offset. Your convert() functions, though, don't have this limitation, they can process byte series at arbitrary (unaligned) memory address.

Pass host pointer array to device global memory pointer array?

Suppose we have;
struct collapsed {
char **seq;
int num;
};
...
__device__ *collapsed xdev;
...
collapsed *x_dev
cudaGetSymbolAddress((void **)&x_dev, xdev);
cudaMemcpyToSymbol(x_dev, x, sizeof(collapsed)*size); //x already defined collapsed * , this line gives ERROR
Whay do you think I am getting error at the last line : invalid device symbol ??
The first problem here is that x_dev isn't a device symbol. It might contain an address in a device memory, but that address cannot be passed to cudaMemcpyToSymbol. The call should just be:
cudaMemcpyToSymbol(xdev, ......);
Which brings up the second problem. Doing this:
cudaMemcpyToSymbol(xdev, x, sizeof(collapsed)*size);
would be illegal. xdev is a pointer, so the only valid value you can copy to xdev is a device address. If x is the address of a struct collapsed in device memory, then the only valid version of this memory transfer operation is
cudaMemcpyToSymbol(xdev, &x, sizeof(collapsed *));
ie. x must have previously have been set to the address of memory allocated in the device, something like
collapsed *x;
cudaMalloc((void **)&x, sizeof(collapsed)*size);
cudaMemcpy(x, host_src, sizeof(collapsed)*size, cudaMemcpyHostToDevice);
As promised, here is a complete working example. First the code:
#include <cstdlib>
#include <iostream>
#include <cuda_runtime.h>
struct collapsed {
char **seq;
int num;
};
__device__ collapsed xdev;
__global__
void kernel(const size_t item_sz)
{
if (threadIdx.x < xdev.num) {
char *p = xdev.seq[threadIdx.x];
char val = 0x30 + threadIdx.x;
for(size_t i=0; i<item_sz; i++) {
p[i] = val;
}
}
}
#define gpuQ(ans) { gpu_assert((ans), __FILE__, __LINE__); }
void gpu_assert(cudaError_t code, const char *file, const int line)
{
if (code != cudaSuccess)
{
std::cerr << "gpu_assert: " << cudaGetErrorString(code) << " "
<< file << " " << line << std::endl;
exit(code);
}
}
int main(void)
{
const int nitems = 32;
const size_t item_sz = 16;
const size_t buf_sz = size_t(nitems) * item_sz;
// Gpu memory for sequences
char *_buf;
gpuQ( cudaMalloc((void **)&_buf, buf_sz) );
gpuQ( cudaMemset(_buf, 0x7a, buf_sz) );
// Host array for holding sequence device pointers
char **seq = new char*[nitems];
size_t offset = 0;
for(int i=0; i<nitems; i++, offset += item_sz) {
seq[i] = _buf + offset;
}
// Device array holding sequence pointers
char **_seq;
size_t seq_sz = sizeof(char*) * size_t(nitems);
gpuQ( cudaMalloc((void **)&_seq, seq_sz) );
gpuQ( cudaMemcpy(_seq, seq, seq_sz, cudaMemcpyHostToDevice) );
// Host copy of the xdev structure to copy to the device
collapsed xdev_host;
xdev_host.num = nitems;
xdev_host.seq = _seq;
// Copy to device symbol
gpuQ( cudaMemcpyToSymbol(xdev, &xdev_host, sizeof(collapsed)) );
// Run Kernel
kernel<<<1,nitems>>>(item_sz);
// Copy back buffer
char *buf = new char[buf_sz];
gpuQ( cudaMemcpy(buf, _buf, buf_sz, cudaMemcpyDeviceToHost) );
// Print out seq values
// Each string should be ASCII starting from ´0´ (0x30)
char *seq_vals = buf;
for(int i=0; i<nitems; i++, seq_vals += item_sz) {
std::string s;
s.append(seq_vals, item_sz);
std::cout << s << std::endl;
}
return 0;
}
and here it is compiled and run:
$ /usr/local/cuda/bin/nvcc -arch=sm_12 -Xptxas=-v -g -G -o erogol erogol.cu
./erogol.cu(19): Warning: Cannot tell what pointer points to, assuming global memory space
ptxas info : 8 bytes gmem, 4 bytes cmem[14]
ptxas info : Compiling entry function '_Z6kernelm' for 'sm_12'
ptxas info : Used 5 registers, 20 bytes smem, 4 bytes cmem[1]
$ /usr/local/cuda/bin/cuda-memcheck ./erogol
========= CUDA-MEMCHECK
0000000000000000
1111111111111111
2222222222222222
3333333333333333
4444444444444444
5555555555555555
6666666666666666
7777777777777777
8888888888888888
9999999999999999
::::::::::::::::
;;;;;;;;;;;;;;;;
<<<<<<<<<<<<<<<<
================
>>>>>>>>>>>>>>>>
????????????????
################
AAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDD
EEEEEEEEEEEEEEEE
FFFFFFFFFFFFFFFF
GGGGGGGGGGGGGGGG
HHHHHHHHHHHHHHHH
IIIIIIIIIIIIIIII
JJJJJJJJJJJJJJJJ
KKKKKKKKKKKKKKKK
LLLLLLLLLLLLLLLL
MMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNN
OOOOOOOOOOOOOOOO
========= ERROR SUMMARY: 0 errors
Some notes:
To simplify things a bit, I have only used a single memory allocation _buf to hold all of the string data. Each value of seq is set to a different address within _buf. This is functionally equivalent to running a separate cudaMalloc call for each pointer, but much faster.
The key concept is to assemble a copy of the structure you wish to access on the device in host memory, then copy that to the device. All of the pointers in my xdev_host are device pointers. The CUDA API doesn't have any sort of deep copy or automatic pointer translation facility, so it is the programmer's responsibility to make sure this is correct.
Each thread in the kernel just fills its sequence with a difference ASCII character. Note that I have declared my xdev as a structure, rather than pointer to structure and copy values rather than a reference to the __device__ symbol (again to simplify things slightly). But otherwise the sequence of operations is what you would need to make your design pattern work.
Because I only have access to a compute 1.x device, the compiler issues a warning. One compute 2.x and 3.x this won't happen because of the improved memory model in those devices. The warning is normal and can be safely ignored.
Because each sequence is just written into a different part of _buf, I can transfer all the sequences back to the host with a single cudaMemcpy call.

How do I reverse a UTF-8 string in place?

Recently, someone asked about an algorithm for reversing a string in place in C. Most of the proposed solutions had troubles when dealing with non single-byte strings. So, I was wondering what could be a good algorithm for dealing specifically with utf-8 strings.
I came up with some code, which I'm posting as an answer, but I'd be glad to see other people's ideas or suggestions. I preferred to use actual code, so I've chosen C#, as it seems to be one of the most popular language in this site, but I don't mind if your code is in another language, as long as it could be reasonably understood by anyone who is familiar with an imperative language. And, as this is intended to see how such an algorithm could be implemented at a low-level (by low-level I just mean dealing with bytes), the idea is to avoid using libraries for the core code.
Notes:
I'm interested in the algorithm itself, its performance and how could it be optimized (I mean algorithm-level optimization, not replacing i++ with ++i and such; I'm not really interested in actual benchmarks either).
I don't mean to actually use it in production code or "reinventing the wheel". This is just out of curiosity and as an exercise.
I'm using C# byte arrays so I'm assuming you can get the length of the string without running though the string until you find a NUL.
That is, I'm not accounting for the complexity of finding the length of the string. But if you're using C, for instance, you could factor that out by using strlen() before calling the core code.
Edit:
As Mike F points out, my code (and other people's code posted here) is not dealing with composite characters. Some info about those here. I'm not familiar with the concept, but if that means that there are "combining characters", i.e., characters / code points that are only valid in combination with other "base" characters / code points, a look-up table of such characters could be used to preserve the order of the "global" character ("base" + "combining" characters) when reversing.
I'd make one pass reversing the bytes, then a second pass that reverses the bytes in any multibyte characters (which are easily detected in UTF8) back to their correct order.
You can definitely handle this in line in a single pass, but I wouldn't bother unless the routine became a bottleneck.
This code assumes that the input UTF-8 string is valid and well formed (i.e. at most 4 bytes per multibyte character):
#include "string.h"
void utf8rev(char *str)
{
/* this assumes that str is valid UTF-8 */
char *scanl, *scanr, *scanr2, c;
/* first reverse the string */
for (scanl= str, scanr= str + strlen(str); scanl < scanr;)
c= *scanl, *scanl++= *--scanr, *scanr= c;
/* then scan all bytes and reverse each multibyte character */
for (scanl= scanr= str; c= *scanr++;) {
if ( (c & 0x80) == 0) // ASCII char
scanl= scanr;
else if ( (c & 0xc0) == 0xc0 ) { // start of multibyte
scanr2= scanr;
switch (scanr - scanl) {
case 4: c= *scanl, *scanl++= *--scanr, *scanr= c; // fallthrough
case 3: // fallthrough
case 2: c= *scanl, *scanl++= *--scanr, *scanr= c;
}
scanr= scanl= scanr2;
}
}
}
// quick and dirty main for testing purposes
#include "stdio.h"
int main(int argc, char* argv[])
{
char buffer[256];
buffer[sizeof(buffer)-1]= '\0';
while (--argc > 0) {
strncpy(buffer, argv[argc], sizeof(buffer)-1); // don't overwrite final null
printf("%s → ", buffer);
utf8rev(buffer);
printf("%s\n", buffer);
}
return 0;
}
If you compile this program (example name: so199260.c) and run it on a UTF-8 environment (a Linux installation in this case):
$ so199260 γεια και χαρά français АДЖИ a♠♡♢♣b
a♠♡♢♣b → b♣♢♡♠a
АДЖИ → ИЖДА
français → siaçnarf
χαρά → άραχ
και → ιακ
γεια → αιεγ
If the code is too cryptic, I will happily clarify.
Agree that your approach is the only sane way to do it in-place.
Personally I don't like revalidating UTF8 inside every function that deals with it, and generally only do what's needed to avoid crashes; it adds up to a lot less code. Dunno much C# so here it is in C:
(edited to eliminate strlen)
void reverse( char *start, char *end )
{
while( start < end )
{
char c = *start;
*start++ = *end;
*end-- = c;
}
}
char *reverse_char( char *start )
{
char *end = start;
while( (end[1] & 0xC0) == 0x80 ) end++;
reverse( start, end );
return( end+1 );
}
void reverse_string( char *string )
{
char *end = string;
while( *end ) end = reverse_char( end );
reverse( string, end-1 );
}
My initial approach could by summarized this way:
1) Reverse bytes naively
2) Run the string backwards and fix the utf8 sequences as you go.
Illegal sequences are dealt with in the second step and in the first step, we check if the string is in "sync" (that is, if it starts with a legal leading byte).
EDIT: improved validation for leading byte in Reverse()
class UTF8Utils {
public static void Reverse(byte[] str) {
int len = str.Length;
int i = 0;
int j = len - 1;
// first, check if the string is "synced", i.e., it starts
// with a valid leading character. Will check for illegal
// sequences thru the whole string later.
byte leadChar = str[0];
// if it starts with 10xx xxx, it's a trailing char...
// if it starts with 1111 10xx or 1111 110x
// it's out of the 4 bytes range.
// EDIT: added validation for 7 bytes seq and 0xff
if( (leadChar & 0xc0) == 0x80 ||
(leadChar & 0xfc) == 0xf8 ||
(leadChar & 0xfe) == 0xfc ||
(leadChar & 0xff) == 0xfe ||
leadChar == 0xff) {
throw new Exception("Illegal UTF-8 sequence");
}
// reverse bytes in-place naïvely
while(i < j) {
byte tmp = str[i];
str[i] = str[j];
str[j] = tmp;
i++;
j--;
}
// now, run the string again to fix the multibyte sequences
UTF8Utils.ReverseMbSequences(str);
}
private static void ReverseMbSequences(byte[] str) {
int i = str.Length - 1;
byte leadChar = 0;
int nBytes = 0;
// loop backwards thru the reversed buffer
while(i >= 0) {
// since the first byte in the unreversed buffer is assumed to be
// the leading char of that byte, it seems safe to assume that the
// last byte is now the leading char. (Given that the string is
// not out of sync -- we checked that out already)
leadChar = str[i];
// check how many bytes this sequence takes and validate against
// illegal sequences
if(leadChar < 0x80) {
nBytes = 1;
} else if((leadChar & 0xe0) == 0xc0) {
if((str[i-1] & 0xc0) != 0x80) {
throw new Exception("Illegal UTF-8 sequence");
}
nBytes = 2;
} else if ((leadChar & 0xf0) == 0xe0) {
if((str[i-1] & 0xc0) != 0x80 ||
(str[i-2] & 0xc0) != 0x80 ) {
throw new Exception("Illegal UTF-8 sequence");
}
nBytes = 3;
} else if ((leadChar & 0xf8) == 0xf0) {
if((str[i-1] & 0xc0) != 0x80 ||
(str[i-2] & 0xc0) != 0x80 ||
(str[i-3] & 0xc0) != 0x80 ) {
throw new Exception("Illegal UTF-8 sequence");
}
nBytes = 4;
} else {
throw new Exception("Illegal UTF-8 sequence");
}
// now, reverse the current sequence and then continue
// whith the next one
int back = i;
int front = back - nBytes + 1;
while(front < back) {
byte tmp = str[front];
str[front] = str[back];
str[back] = tmp;
front++;
back--;
}
i -= nBytes;
}
}
}
The best solution:
Convert to a wide char string
Reverse the new string
Never, never, never, never treat single bytes as characters.

Resources