Ruby IO#read max length for single read - ruby

How can i determine the max length IO#read can get in a single read on the current platform?
irb(main):301:0> File.size('C:/large.file') / 1024 / 1024
=> 2145
irb(main):302:0> s = IO.read 'C:/large.file'
IOError: file too big for single read

That message comes from io.c, remain_size. It is emitted when the (remaining) size of the file is greater or equal to LONG_MAX. That value depends on the platform your Ruby has been compiled with.
At least in Ruby 1.8.7, the maximum value for Fixnums happens to be just half of that value (-1), so you could get the limit by
2 * 2 ** (1..128).to_a.find { | i | (1 << i).kind_of? Bignum } - 1
You should rather not rely on that.

Related

Dynamic number system in Qlik Sense

My data consists of large numbers, I have a column say - 'amount', while using it in charts(sum of amount in Y axis) it shows something like 1.4G, I want to show them as if is billion then e.g. - 2.8B, or in millions then 80M or if it's in thousands (14,000) then simply- 14k.
I have used - if(sum(amount)/1000000000 > 1, Num(sum(amount)/1000000000, '#,###B'), Num(sum(amount)/1000000, '#,###M')) but it does not show the M or B at the end of the figure and also How to include thousand in the same code.
EDIT: Updated to include the dual() function.
This worked for me:
=dual(
if(sum(amount) < 1, Num(sum(amount), '#,##0.00'),
if(sum(amount) < 1000, Num(sum(amount), '#,##0'),
if(sum(amount) < 1000000, Num(sum(amount)/1000, '#,##0k'),
if(sum(amount) < 1000000000, Num(sum(amount)/1000000, '#,##0M'),
Num(sum(amount)/1000000000, '#,##0B')
))))
, sum(amount)
)
Here are some example outputs using this script to format it:
=sum(amount)
Formatted
2,526,163,764
3B
79,342,364
79M
5,589,255
5M
947,470
947k
583
583
0.6434
0.64
To get more decimals for any of those, like 2.53B instead of 3B, you can format them like '#,##0.00B' by adding more zeroes at the end.
Also make sure that the Number Formatting property is set to Auto or Measure expression.

Golang readers: Why writing int64 numbers using bitwise operator <<

I have come across the following code when dealing with Go readers to limit the number of bytes read from a remote client when sending a file through multipart upload (e.g. in Postman).
r.Body = http.MaxBytesReader(w, r.Body, 32<<20+1024)
If I am not mistaken, the above notation should represent 33555456 bytes, or 33.555456 MB (32 * 2 ^ 20) + 1024. Or is this number not correct?
What I don't understand is:
why did the author use it like this? Why using 20 and not some other number?
why the author used the notation +1024 at all? Why didn't he write 33 MB instead?
would it be OK to write 33555456 directly as int64?
If I am not mistaken, the above notation should represent 33555456 bytes, or 33.555456 MB (32 * 2 ^ 20) + 1024. Or is this number not correct?
Correct. You can trivially check it yourself.
fmt.Println(32<<20+1024)
Why didn't he write 33 MB instead?
Because this number is not 33 MB. 33 * 1024 * 1024 = 34603008
would it be OK to write 33555456 directly as int64?
Naturally. That's what it likely is reduced to during compilation anyway. This notation is likely easier to read, once you figure out the logic behind 32, 20 and 1024.
Ease of reading is why I almost always (when not using ruby) write constants like "50 MB" as 50 * 1024 * 1024 and "30 days" as 30 * 86400, etc.

MPI_ALLreduce with Fortran and 2 bytes integer

I'm trying to do an MPI sum of 2 bytes integer:
INTEGER, PARAMETER :: SIK2 = SELECTED_INT_KIND(2)
INTEGER(SIK2) :: s_save(dim)
Indeed its an array which takes integer values from 1 to 48 max, so 2 bytes is enough for memory reasons.
Therefore I tried the following:
CALL MPI_TYPE_CREATE_F90_INTEGER(SIK2, int2type, ierr)
CALL MPI_ALLreduce(MPI_IN_PLACE, s_save, nkpt_in, int2type, MPI_SUM, world_comm, ierr)
This works well for Gfortran + openmpi.
However in the case of intel I get a crash:
MPI_Allreduce(1000)......: MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x55d2160, count=987, dtype=USER<f90_integer>, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_SUM_check_dtype(106): MPI_Op MPI_SUM operation not defined for this datatype
Is there a proper (or recommended) way to do this so that it works for most compilers?

Strange Value for MF_MT_AM_FORMAT_TYPE and MF_MT_H264_MAX_MB_PER_SEC

I am trying enumerate Video Capture format for Logitech camera.I am using this.
I got following entries
MF_MT_FRAME_SIZE 640 x 480
MF_MT_AVG_BITRATE 6619136
MF_MT_COMPRESSED 1
MF_MT_H264_MAX_MB_PER_SEC 245,0,245,0,0,0,0,0,0,0
MF_MT_MAJOR_TYPE MFMediaType_Video
MF_MT_H264_SUPPORTED_USAGES 3
MF_MT_H264_SUPPORTED_RATE_CONTROL_MODES 15
MF_MT_AM_FORMAT_TYPE {2017BE05-6629-4248-AAED-7E1A47BC9B9C}
MF_MT_H264_SUPPORTED_SYNC_FRAME_TYPES 2
MF_MT_MPEG2_LEVEL 40
MF_MT_H264_SIMULCAST_SUPPORT 0
MF_MT_MPEG2_PROFILE 256
MF_MT_FIXED_SIZE_SAMPLES 0
MF_MT_H264_CAPABILITIES 33
MF_MT_FRAME_RATE 30 x 1
MF_MT_PIXEL_ASPECT_RATIO 1 x 1
MF_MT_H264_SUPPORTED_SLICE_MODES 14
MF_MT_ALL_SAMPLES_INDEPENDENT 0
MF_MT_FRAME_RATE_RANGE_MIN 30 x 1
MF_MT_INTERLACE_MODE 2
MF_MT_FRAME_RATE_RANGE_MAX 30 x 1
MF_MT_H264_RESOLUTION_SCALING 3
MF_MT_H264_MAX_CODEC_CONFIG_DELAY 1
MF_MT_SUBTYPE MFVideoFormat_H264_ES
MF_MT_H264_SVC_CAPABILITIES 1
Note: I have modified the function in Media Type Debugging Code as follows.when i run the program i got cElement = 10 and i have put pElemet in for loop to get this value MF_MT_H264_MAX_MB_PER_SEC 245,0,245,0,0,0,0,0,0,0
case VT_VECTOR | VT_UI1:
{
//DBGMSG(L"<<byte array Value>>");
// Item count for the array.
UINT cElement = var.caub.cElems/sizeof(UINT);
// Array pointer.
UINT* pElement = (UINT*)(var.caub.pElems);
for( int i = 0; i < cElement;i++)
DBGMSG(L"%d,", pElement[i]);
}
I am not able to find out what these value signifies
MF_MT_AM_FORMAT_TYPE {2017BE05-6629-4248-AAED-7E1A47BC9B9C}
MF_MT_H264_MAX_MB_PER_SEC 245,0,245,0,0,0,0,0,0,0
MSDN explains value of MF_MT_H264_MAX_MB_PER_SEC attribute:
Data type
UINT32[] stored as UINT8[]
Hence, array of bytes is the expected formatting.
The value of the attribute is an array of UINT32 values, which correspond to the following fields in the UVC 1.5 H.264 video format descriptor.
You have:
dwMaxMBperSecOneResolutionNoScalability
Specifies the maximum macroblock processing rate allowed for
non-scalable Advanced Video Coding (AVC) streams, summing up across
all layers when all layers have the same resolution.
16056565
dwMaxMBperSecTwoResolutionsNoScalability
Specifies the maximum macroblock processing rate allowed for
non-scalable AVC streams, summing up across all layers when all layers
consist of two different resolutions.
0
Media Type GUID "2017be05-6629-4248-aaed-7e1a47bc9b9c" means FORMAT_UVCH264Video
You can then cast the pbFormat struct to KS_H264VIDEOINFO*

Ruby big array and memory

I created a big array a, whose memory grew to ~500 MB:
a = []
t = Thread.new do
loop do
sleep 1
print "#{a.size} "
end
end
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
t.join
After that, I "cleared" a, but the allocated memory didn't change until I killed the process. Is there something special I need to do to remove all these data which were assigned to a from the memory?
If I use the Ruby Garbage Collection Profiler on a lightly modified version of your code:
GC::Profiler.enable
GC::Profiler.clear
a = []
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
GC::Profiler.report
I get the following output (on Ruby 1.9.3)(some columns and rows removed):
GC 60 invokes.
Index Invoke Time(sec) Use Size(byte) Total Size(byte) ...
1 0.109 131136 409200 ...
2 0.125 192528 409200 ...
...
58 33.484 199150344 260938656 ...
59 36.000 211394640 260955024 ...
The profile starts with 131 136 bytes used, and ends with 211 394 640 bytes used, without decreasing in size anywhere in the run, we can assume that no garbage collection has taken place.
If I then add a line of code which adds a single element to the array a, placed after a has grown to 5 million elements, and then has an empty array assigned to it:
GC::Profiler.enable
GC::Profiler.clear
a = []
5_000_000.times do
a << [rand(36**10).to_s(36)]
end
puts "\n size is #{a.size}"
a = []
# the only change is to add one element to the (now) empty array a
a << [rand(36**10).to_s(36)]
GC::Profiler.report
This changes the profiler output to (some columns and rows removed):
GC 62 invokes.
Index Invoke Time(sec) Use Size(byte) Total Size(byte) ...
1 0.156 131376 409200 ...
2 0.172 192792 409200 ...
...
59 35.375 211187736 260955024 ...
60 36.625 211395000 469679760 ...
61 41.891 2280168 307832976 ...
This profiler run now starts with 131 376 bytes used, which is similar to the previous run, grows, but ends with 2 280 168 bytes used, significantly lower than the previous profile run that ended with 211 394 640 bytes used, we can assume that garbage collection took place this during this run, probably triggered by our new line of code that adds an element to a.
The short answer is no, you don't need to do anything special to remove the data that was assigned to a, but hopefully this gives you the tools to prove it.
You can call GC.start(), but you might not want to. See for example: Ruby garbage collect for a discussion here on Stack Overflow. Basically, I'd let the garbage collector decide for itself when to run unless you have a compelling reason to force it.

Resources