Pentaho Kettle - Convert hex to Number from field of type binary - binaryfiles

I need to use Kettle/PDI community version to read big fixed length data files and do some ETL stuff on them. During development stage I faced following issue:
Kettle plugin "Fixed File Input" allows multiple data types with remark they are actually Strings or byte arrays.
My input contained both: Strings and byte arrays corresponding to Little Endian representation of long, int and short (Intel specific endian-ness).
Example of record structure to be read:
Column1(char:8), Column2(long:8 hex), Column3(char:2),Column4(int:4 hex).
I tried to use "Select Values" plugin and change Binary type of column to Integer but such method is not implemented. Finaly I ended with following solution:
I used "User Defined Java Class" with code pasted below.
As you can see I used a formula to obtain long value.
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
r = createOutputRow(r, data.outputRowMeta.size());
// Get the value from an input field
byte[] buf;
long longValue;
// BAN_L - 8 bytes
buf= get(Fields.In, "BAN").getBinary(r);
longValue= ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8)
| ((buf[2] & 0xFFL) << 16) | ((buf[3] & 0xFFL) << 24)
| ((buf[4] & 0xFFL) << 32) | ((buf[5] & 0xFFL) << 40)
| ((buf[6] & 0xFFL) << 48) | ((buf[7] & 0xFFL) << 56);
get(Fields.Out, "BAN_L").setValue(r, longValue);
//DEPOSIT_PAID_AMT -4 bytes
buf = get(Fields.In, "DEPOSIT_PAID_AMT").getBinary(r);
longValue= ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8)
| ((buf[2] & 0xFFL) << 16) | ((buf[3] & 0xFFL) << 24);
get(Fields.Out, "DEPOSIT_PAID_AMT_L").setValue(r, longValue);
//BILL_SEQ_NO_L -2 bytes
buf = get(Fields.In, "BILL_SEQ_NO").getBinary(r);
longValue = ((buf[0] & 0xFFL) << 0) | ((buf[1] & 0xFFL) << 8);
get(Fields.Out, "BILL_SEQ_NO_L").setValue(r, longValue);
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
//binaryToDecimal();
return true;
}
Problem arise when I have in one data extracts 8-20 binary fields.
Is there any alternative to this approach so I can call something like:
getNumberFromLE(byte [] buff, buff.length);
Is there any other plugin in development which can be used to transform byte[] to Pentaho Kettle "Number" data type? (BigNumber and Integer are also good).

I found following possibilities:
1) it is possible to add additional types to ValueMetaInterface class:
org.pentaho.di.core.row.ValueMetaInterface
and add conversion functions into
org.pentaho.di.core.row.ValueMeta
2) add code snippet implementation getNumberFromLE to "Common use" Code snippits of "User Defined Java Class"
3) add as plugin new data types as described in bellow two links:
Jira pluggable types
GitHub pdi-valuemeta-map
AddingDataTypes

Related

C++ map::find & map::at performance difference

I was grading some exercises and at a specific program although the algorithm seemed correct it would be too slow (and I mean too slow). The program was accessing a map using map::at (introduced in C++11). With a minimum change of replacing at with find (and fixing the syntax) the same program would be really fast (compared to the original version).
Looking at cplusplus.com both methods claim to have the same complexity and I couldn't see why one would be different from the other (other than API reason, not throwing an exception, etc).
Then I saw that the description in the section about data races is different. But I don't fully understand the implications. Is my assumption that map::at is thread safe (whereas map::find is not) and thus incurring some runtime penalties correct?
http://www.cplusplus.com/reference/map/map/at/
http://www.cplusplus.com/reference/map/map/find/
Edit
Both are in a loop called 10.000.000 times. No optimization flags. Just g++ foo.cpp. Here is the diff (arrayX are vectors, m is a map)
< auto t = m.find(array1.at(i));
< auto t2 = t->second.find(array2.at(i));
< y = t->second.size();
< cout << array.at(i) << "[" << t2->second << " of " << y << "]" << endl;
---
> auto t = m.at(array1.at(i));
> x = t.at(array2.at(i));
> y = m.at(array1.at(i)).size();
> cout << array.at(i) << "[" << x << " of " << y << "]" << endl;
The performance difference you are observing can be attributed to object copying.
auto t = m.at(array1.at(i));
According to template argument deduction rules (same are applied for the auto specifier), in the above statement, t is deduced to mapped_type, which triggers an object copy.
You need to define t as auto& t for it to be deduced to mapped_type&.
Related conversation: `auto` specifier type deduction for references

C++ string.substr compiles differently

I have two pairs of files. The source files are identical copies apart from the path to the identical text files they interrogate.
One pair runs on Linux Cinnamon 18.x the other on Raspbian Stretch. Each pair is compiled on its own platform.
std::string sTemp = ImportDS18B20("testy.txt");
if (sTemp.find("YES") != std::string::npos) {
size_t p = sTemp.find("t= ");
if (p != std::string::npos) {
p += 3;
sFloor = sTemp.substr(p);
uint uTemp = sFloor.length();
std::cout << uTemp << " |" << sFloor << "| " << std::endl;
}
break;
}
The code produces 5 |19555| on Raspbian and 6 |19555\n| on Cinnamon. (\n is of course just to represent a CR on this site.)
I assume this is a C++ compiler issue. Is that correct? How do I make the code portable?
I suspect that your issue is with the ImportDS18B20() function rather than the code you've posted or the compiler. To verify that the files are identical, check the length and md5sum.
I would strip trailing \r (and \n to make it cross-platform)
sFloor = sTemp.substr(p);
while (sTemp.back() == '\r' || sTemp.back() == '\n')
sTemp.pop_back();
uint uTemp = sFloor.length();
Mike

SDL2 - how to get the appropriate display resolution

I'm trying to open a fullscreen window using SDL2. I've thoroughly looked at the documentation on Display and window management ( https://wiki.libsdl.org/CategoryVideo )... however I don't understand what the best practice would be to get the display resolution I am actually working on.
I have the following sample code:
SDL_DisplayMode mMode;
SDL_Rect mRect;
int ret0 = SDL_GetDisplayBounds(0, &mRect);
std::cout << "bounds w and h are: " << mRect.w << " x " << mRect.h << std::endl;
int ret2 = SDL_GetCurrentDisplayMode(0, &mMode);
std::cout << "current display res w and h are: " << mMode.w << " x " << mMode.h << std::endl;
int ret3 = SDL_GetDisplayMode(0, 0, &mMode);
std::cout << "display mode res w and h are: " << mMode.w << " x " << mMode.h << std::endl;
I am working on a single display that has a resolution of 1920x1080. However, the printed results are:
program output
It seems that SDL_GetDisplayMode() is the only function that displays the correct resolution, so I'd be inclined to use that one. However, I've read that when it comes to SDL_GetDisplayMode(), display modes are sorted according to a certain priority, so that calling it with a 0 returns the largest supported resolution for the display, which is not necessarily the actual resolution (see also: SDL desktop resolution detection in Linux ).
My question is: what is the best practice to obtain the correct resolution?

How to save Image from FLIR AX5 camera into raw format using eBus sdk

Intro: I am trying to write a program which connects to a FLIR AX5(GigE Vision) camera and then save images after regular intervals to a pre-specified location on my PC. These images must be 14bit which contains the temperature information. Later I need to process these images using openCV to get some meaningful results from obtained temperature data.
Current Position: I can save image at regular interval but the image which I am getting doesn't contain 14 bit data but 8 bit data instead. This even after I change the PixelFormat to 14 bit, CMOS and LVDT bit depths to 14 bit. I checked the resulting .bin file in matlab and found that the max pixel value is 255 which means image is being stored in 8 bit format. I am using the sample code provided by eBus SDK to do this job. In this code I have made some changes as per my requirement.
Please help in saving the image in the raw format from which I can read the temperature data.
P.S. Relevant code is here.
// If the buffer contains an image, display width and height.
uint32_t lWidth = 0, lHeight = 0;
lType = lBuffer->GetPayloadType();
cout << fixed << setprecision( 1 );
cout << lDoodle[ lDoodleIndex ];
cout << " BlockID: " << uppercase << hex << setfill( '0' ) << setw( 16 ) << lBuffer->GetBlockID();
if (lType == PvPayloadTypeImage)
{
// Get image specific buffer interface.
PvImage *lImage = lBuffer->GetImage();
// Read width, height.
lWidth = lImage->GetWidth();
lHeight = lImage->GetHeight();
cout << " W: " << dec << lWidth << " H: " << lHeight;
lBuffer->GetImage()->Alloc(lWidth, lHeight, lBuffer->GetImage()->GetPixelType());
if (lBuffer->GetBlockID()%50==0) {
char filename[]= IMAGE_SAVE_LOC;
std::string s=std::to_string(lBuffer->GetBlockID());
char const *schar=s.c_str();
strcat(filename, schar);
strcat(filename,".bin");
lBufferWriter.Store(lBuffer,filename);
}
Be sure that the streaming is configured for 14 bits stream.
Before create PvStream you have to set PixelFormat to 14 bits. If you PvDevice object it's called _pvDevice:
_pvDevice->GetParameters()->SetEnumValue("PixelFormat", PvPixelMono14);
_pvDevice->GetParameters()->SetEnumValue("DigitalOutput", 3);

UTF-16 to UTF-8 using ICU library

I wanted to convert UTF-16 strings to UTF-8. I came across the ICU library by Unicode. I am having problems doing the conversion as the default is UTF-16.
I have tried using converter:
UErrorCode myError = U_ZERO_ERROR;
UConverter *conv = ucnv_open("UTF-8", &myError);
int32_t bytes = ucnv_fromUChars(conv, target, 0, (UChar*)source, numread, &myError);
char *targetLimit = target + reqdLen;
const UChar *sourceLimit = mySrc + numread;
ucnv_fromUnicode(conv,&target, targetLimit, &mySrc, sourceLimit, NULL, TRUE, &myError);
I get bytes as -(big random number)
and garbage at the original target location
What am i missing?
It's a best practice to check for errors after calls that specify a UErrorCode parameter. I would start there.
Something like...
if (U_FAILURE(status))
{
std::cout << "error: " << status << ":" << u_errorName(status) << std::endl;
}

Resources