Our application workflow uses protobuf for cross-language IPC.
We have a C++ application publishing data over Linux shared memory to various clients on the same host.
Data is published to shared memory using the protobuf API of "SerializeAsString", and the client's side does "ParseFromString".
Some of the clients are written in Python, while others are written in C++.
Even though the data we get after parsing appears to be fine, in C++ the "ParseFromString" method always returns false.
In the beginning we used protobuf v3.15.5 on the Python clients, we got "RuntimeWarning: Unexpected end-group tag: Not all data was converted", from ParseFromString() call.
After upgrading server and client side protobuf to 21.12, we start getting Decode error exception. google.protobuf.message.DecodeError: Error parsing message
Again, the strange thing is that all the data looks fine despite the exceptions. Any suggestions?
Language: C++//Python
operating system :
Server - Docker image Ubuntu 20.04.5 run on aarch64.
Client - Docker image Ubuntu 20.04.5 run on x86.
Runtime / compiler are you using - Python 3.10, Gcc/G++ - 9.
What did you do?
Steps to reproduce the behavior:
Part of my Proto:
message Frame
bytes depth = 1;
bytes rgb = 2;
uint64 SampleTime = 3;
uint64 SentTime = 4;
uint64 AlignTime = 5;
message CameraData
Frame frame = 1;
uint32 fps = 2;
Serialize with:
data blob :
Parse with :
google.protobuf.message.DecodeError: Error parsing message
Thanks a lot for your Help !

I am working with Itzik, you are right, the schema is not the same as the serialized data.
this is the correct schema:
message ExposureState { bool AutoExposure = 1; uint32 ExposureValue = 2;}
message TData {
float X = 1;
float Y = 2;
float Z = 3;
float qX = 4;
float qY = 5;
float qZ = 6;
float qW = 7;
float X_velocity = 8;
float Y_velocity = 9;
float Z_velocity = 10;
float X_angular_velocity = 11;
float Y_angular_velocity = 12;
float Z_angular_velocity = 13;
float X_acceleration = 14;
float Y_acceleration = 15;
float Z_acceleration = 16;
float X_angular_acceleration = 17;
float Y_angular_acceleration = 18;
float Z_angular_acceleration = 19;
uint32 confidence = 20;
float asic_temperature = 21;
float motion_module_temperature = 22;
uint32 fps = 23;
uint64 SampleTime = 24;
uint64 SentTime = 25;
ExposureState Exposure = 26; }


Ways to improve ESC/POS Thermal_Printer image printing speed?

I have been doing printing job with Thermal Printer Image Printing on portable thermal printer for weeks and this is code I got for Image Printing.
public static byte[] GetByteImage(Bitmap bm, int BitmapWidth)
BitmapData data = GetGreyScaledBitmapData(bm, BitmapWidth);
BitArray dots = data.Dots;
string t = data.Width.ToString();
byte[] width = BitConverter.GetBytes(data.Width);
int offset = 0;
MemoryStream stream = new MemoryStream();
BinaryWriter bw = new BinaryWriter(stream);
//Line spacing
while (offset < data.Height)
//Declare printer to print image mode
for (int x = 0; x < data.Width; ++x)
for (int k = 0; k < 3; ++k)
byte slice = 0;
for (int b = 0; b < 8; ++b)
int y = (((offset / 8) + k) * 8) + b;
int i = (y * data.Width) + x;
bool v = false;
if (i < dots.Length)
v = dots[i];
slice |= (byte)((v ? 1 : 0) << (7 - b));
offset += 24;
byte[] bytes = stream.ToArray();
return bytes;
public static BitmapData GetGreyScaledBitmapData(Bitmap bmpFileName, double imgsize)
using (var bitmap = (Bitmap)(bmpFileName))
var threshold = 127;
var index = 0;
double multiplier = imgsize;
double scale = (double)(multiplier / (double)bitmap.Width);
int xheight = (int)(bitmap.Height * scale);
int xwidth = (int)(bitmap.Width * scale);
var dimensions = xwidth * xheight;
var dots = new BitArray(dimensions);
for (var y = 0; y < xheight; y++)
for (var x = 0; x < xwidth; x++)
var _x = (int)(x / scale);
var _y = (int)(y / scale);
Android.Graphics.Color color = new Android.Graphics.Color(bitmap.GetPixel(_x, _y));
var luminance = (int)(color.R * 0.3 + color.G * 0.59 + color.B * 0.11);
dots[index] = (luminance < threshold);
return new BitmapData()
Dots = dots,
Height = (int)(bitmap.Height * scale),
Width = (int)(bitmap.Width * scale)
public class BitmapData
public BitArray Dots
public int Height
public int Width
The problem is, it print very slow and make jerking sound while printing.
Another problem is, the method of image converting to Grey Scale is a bit slow.
And when I test with other apps I found that they have no jerking sound and almost instantly print image after clicked print button.
Is there a way to improve above code so it can print smoothly ?
This is the app I tested Printer Lab - Thermal printer manager
The Thermal Printer I used RPP300 72mm Mobile Printer
The ESC * command you are using prints every 24 dots in height.
Then, as you feel the problem, it will be jerky and slow print.
Please use a combination of GS * and GS / commands to improve it.
Details of their specifications are described on pages 24 to 26 of the Thermal Mobile Printer Command Set Manual.
In Addition:
By the way, I was overlooking another command.
It would be easier for us to create the data that we will send.
However, smooth printing depends on the printer performance and communication line speed.
That command is GS v 0. It is described on pages 32 and 33 of the manual.
The program in this article is a bit image data conversion process for FS q and GS (L / GS 8 L commands, but it can also be used for GS * commands. Please try it.
Convert raster byte[] image data to column Format in C#
Finally got a solution. I was really dumb back then. Just ask your printer manufacturer company for SDK or find SDK from other printer manufacturer.

Segmentation Fault accessing qpscale_table in AVFrame

I'm modifying this file slightly:
This code decodes a video and makes opencv Mats out of the frame pixels as it goes.
In particular I only want to grab frames that have specific macroblock-related data. I'm attempting to get that data like this:
total_qp = get_total_qp(decframe->qscale_table, mb_width, mb_height, mb_stride);
However, whenever I try to access the data by iterating over that array, I get a segmentation fault:
static float get_total_qp(int8_t *qscale_table, int mb_width, int mb_height, int mb_stride)
int mb_count = mb_height * mb_width;
int y, x;
float qp_total = 0.0f;
for (y = 0; y < mb_height; y++) {
for (x = 0; x < mb_width; x++) {
qp_total += qscale_table[x + y * mb_stride]; <-- SEGFAULT here
return qp_total;
I've also tried sending in:
and I've tried populating it, but this own't compile because it can't find that function:
int8_t *qscale_table = av_frame_get_qp_table(decframe->qscale_table, &mb_stride, &qscale_type);
So my question is this:
Given an AVFrame* how do I ensure that the qscale_table is populated and access it?
It turns out that the qpscale_table doesn't get exported onto the decoded frame after the decoding happens in h264dec.c.
In order to retrieve the values I had to modify the finalize_frame method in h264dec to export the qscale_table onto the frame, like so:
static int h264_export_qp_table(H264Context *h, AVFrame *f, H264Picture *p, int qp_type)
AVBufferRef *ref = av_buffer_ref(p->qscale_table_buf);
int offset = 2*h->mb_stride + 1;
av_assert0(ref->size >= offset + h->mb_stride * ((f->height+15)/16));
ref->size -= offset;
ref->data += offset;
return av_frame_set_qp_table(f, ref, h->mb_stride, f->qscale_type);
and add in the call into finalize_frame:
ff_print_debug_info2(h->avctx, dst, NULL,
h->mb_width, h->mb_height, h->mb_stride, 1);
// NT: make the qscale_table accessible!
h264_export_qp_table(h, dst, out, FF_QSCALE_TYPE_H264);
And then recompile FFmpeg using these instructions:

Battery scheduling with CP optimizer

I'm a student new at CP optimizer.
I want to make battery charging/discharging scheduling in CP.
So, I want to know how to charge or discharge at each sept.
using CP;
int numEVs = ...;
range EVs = 0..numEVs-1;
int time = ...;
range times = 0..time-1;
int cost[times] = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24];
float min_soc[EVs] = [0.4,0.4,0.4,0.4,0.4];
float max_soc[EVs] = [0.9,0.9,0.9,0.9,0.9];
float Soc[EVs] = [0.4, 0.5, 0.6, 0.7, 0.8];
int k[times,EVs];
tuple EVs2 {
key int id;
int Cpower[times];
int Dpower[times];
//float delSm[EVs] = Soc[EVs] - min_soc[EVs];
//float delSp[EVs] = Soc[EVs] - max_soc[EVs];
dvar interval t[i in times] optional size 1;
dvar int Pcmax[times, EVs]; // why I can't use float.
dvar int Pdmax[times, EVs];
//dvar int k[times,EVs] in 0..1;
dexpr float Cost = sum(t, j in EVs) (k[t,j]*cost[t]*Pcmax[t,j] - (1-k[t,j])*cost[t]*Pdmax[t,j]);
minimize Cost; // minimize charging/discharging price
subject to {
forall(t, j in EVs)
k[t,j]*Pcmax[t,j] - (1-k[t,j])*Pdmax[t,j] >= Soc[j]-min_soc[j] && k[t,j]*Pcmax[t,j] - (1-k[t,j])*Pdmax[t,j] <= Soc[j]-max_soc[j];
// each EV's battery state of charge must less or bigger than limits.
forall(t, j in EVs)
Pdmax[t][j] >=0;
Pdmax[t][j] <=10;
Pcmax[t][j] >=0;
Pcmax[t][j] <=8;
this is my code, but not working help me plz.
Copying a short version of the answer this question got at, where it was cross posted: The model is infeasible. You can use the conflict refiner to figure out which constraints render the problem infeasible and then fix them.

implementing PID algorithm in line following robot

I'm working on a small project with NXT mindstorms set. My intention was to build a Robot that can follow a line very smoothly and as fast as possible. Therefore after a small research I found the PID algorithm and I was able to understand and implement the algorithm into a NXC code. The robot has just did everything right according to the algorithm but when the line is interrupted (gaps) the robot loses the line and can't get back to it. The thing is that when the gap is up to 9cm it is fine he can get back but in 10 he just loses the line. I'm using one light sensor. Is there any way that I can adjust the PID code to work with this Problem?
My Code:
// kd ,ki kp are also defined
task main()
int error = 0;
float previous_error = 0;
float setpoint = 0;
float actual_position = 0;
int integral = 0;
float derivative = 0;
float speed=50;
float lasterror = 0
float correction = 0
float fahrenA = 0
float fahrenC = 0
actual_position = LIGHTSENSOR;
error = setpoit - actual_position ;
integral = error + intergral ;
derivative = error - previous_error;
correction = (kp * error )+ (ki * intergral) + (kd * derivative );
turn = correction / 100;
fahrenA = Tp + turn;
fahrenC = Tp – turn;
previous_error = error ;
By a sine-wave pattern, we mean that the robot may follow the following path to increase the chances of re-capturing the line after losing it. You can code the path using simple if-else and timers/tachometer readings. (Thanks to #Spektre for the suggestion!):

CUDA limit seems to be reached, but what limit is that?

I have a CUDA program that seems to be hitting some sort of limit of some resource, but I can't figure out what that resource is. Here is the kernel function:
__global__ void DoCheck(float2* points, int* segmentToPolylineIndexMap,
int segmentCount, int* output)
int segmentIndex = threadIdx.x + blockIdx.x * blockDim.x;
int pointCount = segmentCount + 1;
if(segmentIndex >= segmentCount)
int polylineIndex = segmentToPolylineIndexMap[segmentIndex];
int result = 0;
if(polylineIndex >= 0)
float2 p1 = points[segmentIndex];
float2 p2 = points[segmentIndex+1];
float2 A = p2;
float2 a;
a.x = p2.x - p1.x;
a.y = p2.y - p1.y;
for(int i = segmentIndex+2; i < segmentCount; i++)
int currentPolylineIndex = segmentToPolylineIndexMap[i];
// if not a different segment within out polyline and
// not a fake segment
bool isLegit = (currentPolylineIndex != polylineIndex &&
currentPolylineIndex >= 0);
float2 p3 = points[i];
float2 p4 = points[i+1];
float2 B = p4;
float2 b;
b.x = p4.x - p3.x;
b.y = p4.y - p3.y;
float2 c;
c.x = B.x - A.x;
c.y = B.y - A.y;
float2 b_perp;
b_perp.x = -b.y;
b_perp.y = b.x;
float numerator = dot(b_perp, c);
float denominator = dot(b_perp, a);
bool isParallel = (denominator == 0.0);
float quotient = numerator / denominator;
float2 intersectionPoint;
intersectionPoint.x = quotient * a.x + A.x;
intersectionPoint.y = quotient * a.y + A.y;
result = result | (isLegit && !isParallel &&
intersectionPoint.x > min(p1.x, p2.x) &&
intersectionPoint.x > min(p3.x, p4.x) &&
intersectionPoint.x < max(p1.x, p2.x) &&
intersectionPoint.x < max(p3.x, p4.x) &&
intersectionPoint.y > min(p1.y, p2.y) &&
intersectionPoint.y > min(p3.y, p4.y) &&
intersectionPoint.y < max(p1.y, p2.y) &&
intersectionPoint.y < max(p3.y, p4.y));
output[segmentIndex] = result;
Here is the call to execute the kernel function:
DoCheck<<<702, 32>>>(
The sizes of the parameters are as follows:
devicePoints = 22,464 float2s = 179,712 bytes
deviceSegmentsToPolylineIndexMap = 22,463 ints = 89,852 bytes
numSegments = 1 int = 4 bytes
deviceOutput = 22,463 ints = 89,852 bytes
When I execute this kernel, it crashes the video card. It would appear that I am hitting some sort of limit, because if I execute the kernel using DoCheck<<<300, 32>>>(...);, it works. Just to be clear, the parameters are the same, just the number of blocks is different.
Any idea why one crashes the video driver, and the other doesn't? The one that fail seems to be still within the card's limit on number of blocks.
More information on my system configuration:
Video Card: nVidia 8800GT
CUDA Version: 1.1
OS: Windows Server 2008 R2
I also tried it on a laptop with the following configuration, but got the same results:
Video Card: nVidia Quadro FX 880M
CUDA Version: 1.2
OS: Windows 7 64-bit
The resource which is being exhausted is time. On all current CUDA platforms, the display driver includes a watchdog timer which will kill any kernel which takes more than a few seconds to execute. Running code on a card which is running a display is subject to this limit.
On the WDDM Windows platforms you are using, there are three possible solutions/work-arounds:
Get a Telsa card and use the TCC driver, which eliminates the problem completely
Try modifying registry settings to increase the timer limit (google for the TdrDelay registry key for more information, but I am not a Windows user and can't be more specific than that)
Modify your kernel code to be "re-entrant" and process the data parallel work load in several kernel launches rather than one. Kernel launch overhead isn't all that large and processing the workload over several kernel runs is often pretty easy to achieve, depending on the algorithm you are using.
