Send negative value OBX-5 segment in HL7 - hl7-v2

How to send negative measured numeric in OBX segment of ORU_R01 profile for PCD DEC profile using HL7 2.6 ??
and also where i can find that requirement defined by IHE..
-Thanks-

Since you have to define a data type for each Observation Value (OBX.5), your question is actually how to use HL7 data types to code negative values. For example, the HL7v2 standard defined NM as:
A number represented as a series of ASCII numeric characters consisting of an optional leading sign (+ or -), the digits and an optional decimal point.
Check HL7v2.6 Chapter 2A for other data types.

Here is a link to the IHE Pathology and Laboratory Medicine Technical Framework which contains their requirements.

Related

maximum field number in protobuf message

The official document for protocol buffers https://developers.google.com/protocol-buffers/docs/proto3 says the maximum field number for fields in protobuf message is 2^29-1. But why is this limit?
Please anyone can explain in some detail? I am newbie to this.
I read answers to the this question at why 2^29-1 is the biggest key in protocol buffers.
But I am not clarified
Each field in an encoded protocol buffer has a header (called key or tag) prefixed to the actual encoded value. The encoding spec defines this key:
Each key in the streamed message is a varint with the value (field_number << 3) | wire_type – in other words, the last three bits of the number store the wire type.
Here the spec says the tag is a varint where the first 3 bits are used to encode the wire type. A varint could encode a 64 bit value, thus just by going on this definition the limit would be 2^61-1.
In addition to this, the Language Guide narrows this down to a 32 bit value at max.
The smallest field number you can specify is 1, and the largest is 2^29 - 1, or 536,870,911.
The reasons for this are not given. I can only speculate for the reasons behind this:
Artificial limit as no one is expecting a message to have that many fields. Just think about fitting a message with that many fields into memory.
As the key is a varint, it isn't simply the next 4 bytes in the raw buffer, rather a variable length of bytes (Java code reading a varint32). Each byte has 7 bit of actual data and 1 bit indicating if the end is reached. It cloud be that for performance reasons it was deemed to be better to limit the range.
Since proto3 is the 3rd version of protocol buffers, it could be that either proto1 or proto2 defined the tag to be a varint32. To keep backwards compatibility this limit is still true in proto3 today.
Because of this line:
#define GOOGLE_PROTOBUF_WIRE_FORMAT_MAKE_TAG(FIELD_NUMBER, TYPE) \
static_cast<uint32>((static_cast<uint32>(FIELD_NUMBER) << 3) | (TYPE))
this line create a "tag", which left only 29 (32 - 3) bits to save field indice.
Don't know why google use uint32 instead of uint64 though, since field number is a varint, may be they think 2^29-1 fields is large enough for a single message declaration.
I suspect this is simply so that a field-header (wire-type and tag-number) can be decoded and handled as a 32-bit value. The wire-type is always the 3 least significant bits, leaving 29 bits for the tag number. Technically "varint" should support 64 bits, but it makes sense to limit it to reasonable numbers, not least because "varint" encoding means that larger numbers take more bytes to encode.
Edit: I realise now that this is similar to the linked post, but... it remain true! Each field in protobuf is prefixed by a "varint" that expresses what field (tag-number) follows, and what data type it is (wire-type). The latter is important especially so that unexpected fields (version differences) can be stored or skipped correctly. It is convenient for that field-header to be trivially processed by most frameworks, and most frameworks are fine with 32-bit integers.
this is another question rather a comment, in the document it says,
Field numbers in the range 16 through 2047 take two bytes. So you
should reserve the numbers 1 through 15 for very frequently occurring
message elements. Remember to leave some room for frequently occurring
elements that might be added in the future.
Because for the first byte, top 5 bits are used for field number, and bottom 3 bits for field type, isn't it that field number from 31 (because zero is not used) to 2047 take two bytes? (and I also guess the second bytes' lower 3 bits are used also for field type.. I'm in the middle of reading it, so I'll fix it when I know it)

Does it exist some kind of sorting convention?

Does it exist some established convention of sorting lines (characters)? Some convention which should play the similar role as PCRE for regular expressions.
For example, if you try to sort 0A1b-a2_B (each character on its own line) with Sublime Text (Ctrl-F9) and Vim (:%sort), the result will be the same (see below). However, I'm not sure it will be the same with another editors and IDEs.
-
0
1
2
A
B
_
a
b
Generally, characters are sorted based on their numeric value. While this used to only be applied to ASCII characters, this has also been adopted by unicode encodings as well. http://www.asciitable.com/
If no preference is given to the contrary, this is the de facto standard for sorting characters. Save for the actual alphabetical characters, the ordering is somewhat arbitrary.
There are two main ways of sorting character strings:
Lexicographic: numeric value of either the codepoint values or the code unit values or the serialized code unit values (bytes). For some character encodings, they would all be the same. The algorithm is very simple but this method is not human-friendly.
Culture/Locale-specific: an ordinal database for each supported culture is used. For the Unicode character set, it's called the CLDR. Also, in applying sorting for Unicode, sorting can respect grapheme clusters. A grapheme cluster is a base codepoint followed by a sequence of zero or more non-spacing (applied as extensions of the previous glyph) marks.
For some older character sets with one encoding, designed for only one or two scripts, the two methods might amount to the same thing.
Sometimes, people read a format into strings, such as a sequence of letters followed by a sequence of digits, or one of several date formats. These are very specialized sorts that need to be applied where users expect. Note: The ISO 8601 date format for the Julian calendar sorts correctly regardless of method (for all? character encodings).

Convert a long character field to numeric, NOT scientific notation (SAS)

I need to join two tables - one table has householdid which is CHAR30, which appears to have center alignment and the other householdid as numeric 20. I need to convert to the numeric 20 but when I do that it appears truncated, perhaps because of the strange alignment (not all of the 30 positions are actually needed).
When I try to keep the full 30 positions as a numeric I instead get a conversion to scientific notation so of course this will not work as a key id for later operations.
As long as the number is converted properly, it doesn't matter what format it has. A format just tells SAS how to show you the number. Behind the scenes, it is just a DOUBLE.
1.0 = 1 = 1e0
Now if you have converted to a number and cannot get a join, then look at the informat you used to read it in.
try
num_id = input(strip(char_id),best32.);
Strip removes leading and trailing blanks. The BEST32. INFORMAT tries its "best" to read the number up to 32 characters in length.
You cannot store a 20 digit number as a numeric in SAS. SAS stores all numbers as 8 byte floating point and so does not have enough bits to represent that many digits uniquely. You can ask SAS what is the largest integer it can represent exactly by using the CONSTANT() function.
1 data _null_;
2 x=constant('EXACTINT',8);
3 put x = comma32. ;
4 run;
x=9,007,199,254,740,992
Read and store your 20 and 30 digit strings as character variables.
Use the bestd32. format. Tends to work out pretty well for long key variables. Depending on the length of the variable, you can change 32 to whichever length you need.
Based on the comments under the original question, the only thing you can do is convert all ID fields to strings, and use the strings to do the joins. #Reeza suggested this in one of the comments but it should have been posted as an answer.
I assume you are pulling this information out of another database/system that allows for greater numeric precision then SAS does. If you don't convert the values to strings when they are read into SAS, then you run the risk of losing precision.
If you lose precision, the ID in SAS is likely to become very slightly different to the ID in the original system, which can cause problems when searching the original system for an ID obtained from SAS.
Be sure you don't read the numbers into SAS as numeric, then convert to string. If you do it this way you are still losing precision as soon as the numbers are stored in SAS as numeric variables.

Decimal Presentation : different purpose between zoned decimal and packed decimal

I'm learning Computer Science course and when I read to these definition, I understand. But I don't know what different purpose of two presentations and why.
Here some short explanation of purpose that my book said:
Zone decimal : hightly compatible with text data.
Packed decimal : faster computing speed.
Something I want to know is:
1) in zone decimal presentation there is a zone section that duplicate every digit. Why ? I see this is no purpose :(
2) why they say zone decimal is compatible with text data and why packed decimaal is faster.
Thanks :)
Firstly - where are you learning CS? Those terms are from the 1960s, the more common name is BCD (Binary Coded Decimal)
Zone decimal uses an entire byte for each digit. This means you can just print a number as if it was text (each 'character' stores a digit 0-9) but since there are only 10 digits and a byte can hold 256 different values this is a bit wasteful.
Packed decimal uses the fact that 4bits can store 16different values. So you can store two digits in a byte (top 4bits and bottom 4bits). This is still a bit wasteful since you only use half the capacity. But it's pretty easy to extract the two digits with just shift and mask operations.
Pretty much the only place you would see BCD these days is in some low level hardware where you want to read/x-mit a digit without using a microprocessor at all. It's easy to make a BCD counter just in transistors
but if you want to do any maths you either have to do long multiplication on each digit like you would on paper - or convert into regular ints and back again
Both of these representations have fallen out of favor, perhaps because they are not directly supported by C, and hence all of the systems descended from Unix.
Packed decimal has an advantage in two respects: since takes up less space it can get off the bus and into the processor faster, and many CISC instruction sets have dedicated instructions for arithmetic. To quote from http://en.wikipedia.org/wiki/Packed_decimal#Packed_BCD:
Packed BCD [binary coded decimal] is supported in the COBOL programming language as the
"COMPUTATIONAL-3" (an IBM extension adopted by many other compiler
vendors) or "PACKED-DECIMAL" (part of the 1985 COBOL standard) data
type. Besides the IBM System/360 and later compatible mainframes,
packed BCD was implemented in the native instruction set of the
original VAX processors from Digital Equipment Corporation and was the
native format for the Burroughs Corporation Medium Systems line of
mainframes (descended from the 1950s Electrodata 200 series).
Zoned decimal (http://en.wikipedia.org/wiki/Zoned_decimal#Zoned_decimal) has an easy mapping between characters on punch cards and their representation in memory, which perhaps explains your textbook's claim that it is "highly compatible with text data." As the Wikipedia article suggests, it's a term more used in IBM mainframe circles. On minis, we tended to just call it plain old decimal, PIC 9 data.
"Zoned Decimal" in its natural environment is meant to be compatable with the EBCDIC char set .
ASCII represents numbers as x'3x' -- x'39' which display as character "0" to "9".
The EBCDIC character sets (which has its origins in Hollerith pucnched cards) uses a similar but different scheme where x'F0' is displayed as characer "0' and x'F9' is displayed as character '9'.
Punched cards had a fixed length of 80 characters in many cases 10 or 12 of these characters were eaten up with record type identifiers and sequence numbers (desperately important if you dropped a bunch of cards on the floor!). So space was at a premium. Rather than enter a "+" or "-" character next to each number an "overpunch" extra holes near the top bit of the card was used to represent a positive or negative numbers, so saving a byte.
These overpunched characters were encdoded in EBCDIC as x"D0' to x'D9" for -0 to -9 and x'C0' to x'C9' for +0 to +9 usually in the last digit of the number.
Hence the "Zoned Decimal" format. The first four bits of each byte are the Zone, the second four bits the "number" to -42 was encoded as x'F4D2'.
This is more of a convention than anything else as the computer could not do anything with this format. So it needed to be encoded into "packed" format before any calculations took place. This is pretty easy s 'X'F4D2' -> x'042D' is mostly a case a grabbing the last zone then extracting the "numeric" four bits from each byte, which, could then be converted to binary.
When IBM mainframes were designed the largest group of users were banks, insurance companies and utility companies. The bulk of there processing followed this pattern.
read punch card.
read tape record.
add monthly payment to balance
store new balance on tape
print new balance
Most of the calculations involved currency amounts and most of the results were displayed immediately. It became clear that if the machine could do the arithmetic directly on the packed decimal values you could avoid several expensive "convert to binary" and "convert to decimal" instructions. As a bonus it made it easy to place the decimal point at the correct position and perform any decimal rounding. So a great deal of work went into implementing native packed decimal instructions (zero, add, subtract, multiply, divide, shift and round etc.).
This has been the preferred currency format for IBM mainframes ever since.
For many years developers on other platforms poured scorn on the mainframers for using such an archaic format, and, only recently began to realize how difficult it was to do fixed point decimal arithmetic to the standards accountants and tax collectors expect. Thanks to the efforts of Mike_Cowlishaw and others the rest of the world has caught up with the venerable IBM 360 and Java programmers can now calculate sales tax correctly using the BigDecimal library which is based on a variation on the old packed decimal format.

What's the name of this algorithm/routine?

I am writing a utility class which converts strings from one alphabet to another, this is useful in situations where you have a target alphabet you wish to use, with a restriction on the number of characters available. For example, if you can use lower case letters and numbers, but only 12 characters its possible to compress a timestamp from the alphabet 01234567989 -: into abcdefghijklmnopqrstuvwxyz01234567989 so 2010-10-29 13:14:00 might become 5hhyo9v8mk6avy (19 charaters reduced to 16).
The class is designed to convert back and forth between alphabets, and also calculate the longest source string that can safely be stored in a target alphabet given a particular number of characters.
Was thinking of publishing this through Google code, however I'd obviously like other people to find it and use it - hence the question on what this is called. I've had to use this approach in two separate projects, with Bloomberg and a proprietary system, when you need to generate unique file names of a certain length, but want to keep some plaintext, so GUIDs aren't appropriate.
Your examples bear some similarity to a Dictionary coder with a fixed target and source dictionaries. Also worthwhile to look at is Fibonacci coding, which has a fixed target dictionary (of variable-length bits), which is variably targeted.
I think it also depends whether it is very important that your target alphabet has fixed width entries - if you allow for a fixed alphabet with variable length codes, your compression ratio will approach your entropy that much more optimally! If the source alphabet distribution is known in advance, a static Huffman tree could easily be generated.
Here is a simple algorithm:
Consider that you don't have to transmit the alphabet used for encoding. Also, you don't use (and transmit) the probabilities of the input symbols, as in standard compressions, so we just re-encode somehow the data.
In this case we can consider that the input data are in number represented with base equal to the cardinality of the input alphabet. We just have to change its representation to another base, that is a simple task.
EDITED example:
input alpabet: ABC, output alphabet: 0123456789
message ABAC will translate to 0102 in base 3, that is 11 (9 + 2) in base 10.
11 to base 10: 11
We could have a problem decoding it, because we don't know how many 0-es to use at the begining of the decoded result, so we have to use one of the modifications:
1) encode somehow in the stream the size of compressed data.
2) use a dummy 1 at the start of the stream: in this way our example will become:
10102 (base 3) = 81 + 9 + 2 = 92 (base 10).
Now after decoding we just have to ignore the first 1 (this also provides a basic error detection).
The main problem of this approach is that in most cases (GCD == 1) each new encoded character will completely change the output. This will be very inneficient and difficult to implement. We end up with arithmetic coding as the best solution (actually a simplified version of it).
You probably know about Base64 which does the same thing just usually the other way around. Too bad there are way too many Google results on BaseX or BaseN...

Resources