gnuplot experiencing overflow? - overflow

I'm trying to plot certain data created with fortran and I find that gnuplot simply ignores part of my data.
After double- and triple-checking I had gotten all the I/O right and gnuplot could read the data I was creating, I came to realize that gnuplot doesn't accept observations greater or equal (in absolute value) than 1.0E308.
I don't think it's an issue with my machine, as fortran is creating this data without complaining.
Therefore, my question is whether there's a way within gnuplot to figure out what the largest number it'll handle is and whether that limit is adjustable in any way. I couldn't find anything of the sorts in the documentation.

Related

LabVIEW - How to accumulate data in array?

I made a program aimed to simulate the intensity of light when many light bulbs are put together. I have intensity data of one bulb in xls.-files. So, I want to program to work as follows.
Open the xls.-file and get the data.
Put the data into different positions. I put one data set (one bulb) in each excel sheet. This is to simulate putting the bulb in different places.
Sum the data in the same cell across the different sheets.
My LabVIEW front panel and block diagram are:
My problem is this program runs too slowly. How should I improve this? I have an idea to make a big array and accumulate data in that array. However, I do not know how to do it. The Insert Into Array and Replace Array Subset functions are not suitable for my purposes.
The most probable reason of slow performance is that you do a lot of operations on Excel file. You should rather read data into memory and operate on them in VI. At the end, if you need, you can update the Excel file with final results.
It would be difficult to tell you exactly how to do it. As you said, you're beginner and I think that the best way would be to simple do some LabVIEW exercises and gain more experience to understand how to work with arrays :) I recommend to take a look at examples (Help->Find Examples), read some user guides from ni.com or find other "getting started" materials on the Internet.
Check these, you may find them useful:
https://zone.ni.com/reference/en-XX/help/371361R-01/lvhowto/lv_getting_started/
https://www.ni.com/getting-started/labview-basics/data-structures
https://www.ni.com/pl-pl/support/documentation/supplemental/08/labview-arrays-and-clusters-explained.html

ELKI and ARFF files

I compare my results against a result-base, but keep on getting different results than the resultbase, even though I have their data.
I wonder if it is hard to get the same results and why, maybe because they invoked it from a java programme and I do it in the GUI and with ARFF files, which should be troublesome and not developed at right now.
My question is - The results of a AUCROCcurve made from ELKI - would the result vary, if I invoked it from a java programme and not as I do now from the GUI. I would like to get precise results and know I do it right.
Results from the MiniGUI are precise.
The UI is an assistant for building a command line, but that doesn't introduce any imprecision. It may introduce some performance cost when e.g. -verbose is used. The visualization may cause memory problems after the algorithm has finished.
Obviously, the input format (CSV; ARFF) shouldn't have any impact on the outcome. Unless you introduce incorrect additional columns, e.g. an id column that should not be used for analysis...

My Algorithm only fails for large values - How do I debug this?

I'm working on transcribing as3delaunay to Objective-C. For the most part, the entire algorithm works and creates graphs exactly as they should be. However, for large values (thousands of points), the algorithm mostly works, but creates some incorrect graphs.
I've been going back through and checking the most obvious places for error, and I haven't been able to actually find anything. For smaller values I ran the output of the original algorithm and placed it into JSON files. I then read that output in to my own tests (tests with 3 or 4 points only), and debugged until the output matched; I checked the output of the two algorithms line for line, and found the discrepancies. But I can't feasibly do that for 1000 points.
Answers don't need to be specific to my situation (although suggesting tools I can use would be excellent).
How can I debug algorithms that only fail for large values?
If you are transcribing an existing algorithm to Objective-C, do you have a working original in some other language? In that case, I would be inclined to put in print statements in both versions and debug the first discrepancy (the first, because later discrepancies could be knock-on errors).
I think it is very likely that the program also makes mistakes for smaller graphs, but more rarely. My first step would in fact be to use the working original (or some other means) to run a large number of automatically checked test runs on small graphs, hoping to find the bug on some more manageable input size.
Find the threshold
If it works for 3 or 4 items, but not for 1000, then there's probably some threshold in between. Use a binary search to find that threshold.
The threshold itself may be a clue. For example, maybe it corresponds to a magic value in the algorithm or to some other value you wouldn't expect to be correlated. For example, perhaps it's a problem when the number of items exceeds the number of pixels in the x direction of the chart you're trying to draw. The clue might be enough to help you solve the problem. If not, it may give you a clue as to how to force the problem to happen with a smaller value (e.g., debug it with a very narrow chart area).
The threshold may be smaller than you think, and may be directly debuggable.
If the threshold is a big value, like 1000. Perhaps you can set a conditional breakpoint to skip right to iteration 999, and then single-step from there.
There may not be a definite threshold, which suggests that it's not the magnitude of the input size, but some other property you should be looking at (e.g., powers of 10 don't work, but everything else does).
Decompose the problem and write unit tests
This can be tedious but is often extremely valuable--not just for the current issue, but for the future. Convince yourself that each individual piece works in isolation.
Re-visit recent changes
If it used to work and now it doesn't, look at the most recent changes first. Source control tools are very useful in helping you remember what has changed recently.
Remove code and add it back piece by piece
Comment out as much code as you can and still get some kind of reasonable output (even if that output doesn't meet all the requirements). For example, instead of using a complicated rounding function, just truncate values. Comment out code that adds decorative touches. Put assert(false) in any special case handlers you don't think should be activated for the test data.
Now verify that output, and slowly add back the functionality you removed, one baby step at a time. Test thoroughly at each step.
Profile the code
Profiling is usually for optimization, but it can sometimes give you insight into code, especially when the data size is too large for single-stepping through the debugger. I like to use line or statement counts. Is the loop body executing the number of times you expect? Or twice as often? Or not at all? How about the then and else clauses of those if statements? Logic bugs often become very obvious with this type of profiling.

Quickest way to quickly write/read/cache simple information as described

I do apologize for the terrible question. I'm a 3D guy who amateurs python for plugins and scripts.
I've successfully come up with the worst possible way to export particle information (two vectors per particle per frame for position and alignment). My first method was to write out a billion vectors per line to a .txt where each line represented a frame. Now I have it just writing out a .txt per frame and loading and closing the right one depending on the frame.
Yeah, it's slow. And dumb. Whatever. What direction would you suggest I go/research? A different file type? A :checks google: bin, perhaps? Or should my retarded method actually not take very long and something else is making things move more slowly? I don't need an exhaustive answer, just some general information to get me moving in the right direction.
Thanks a million.
if this info is going to be read by another python application ( especially if its the same application that wrote it out) look into just pickling your data structures. Just build them in memory and use pickle to dump them out to a binary file. The caveats here:
1) Do you have memory to do it all at once, or does it have to be one frame at time? You can make big combined files in the first case, you'd need to do one-frame-per-file in the second. If you're running out of memory the yield statement is your friend.
2) Pickled files need to be of the same python version to be reliable, so you need to be sure all the reading and writing apps are on the same python version
3) Pickled files are binary, so not human readable.
If you need to exchange with other applications, look into Alembic, which is an open source file format designed for this sort of problem - baking out large volumes of particle or simulation data. There's a commercial exporter avalable from EcoCortex which comes with a Python module for dealing with Alembic data

Techniques for handling arrays whose storage requirements exceed RAM

I am author of a scientific application that performs calculations on a gridded basis (think finite difference grid computation). Each grid cell is represented by a data object that holds values of state variables and cell-specific constants. Until now, all grid cell objects have been present in RAM at all times during the simulation.
I am running into situations where the people using my code wish to run it with more grid cells than they have available RAM. I am thinking about reworking my code so that information on only a subset of cells is held in RAM at any given time. Unfortunately the grids (or matrices if you prefer) are not sparse, which eliminates a whole class of possible solutions.
Question: I assume that there are libraries out in the wild designed to facilitate this type of data access (i.e. retrieve constants and variables, update variables, store for future reference, wipe memory, move on...) After several hours of searching Google and Stack Overflow, I have found relatively few libraries of this sort.
I am aware of a few options, such as this one from the HSL mathematical library: http://www.hsl.rl.ac.uk/specs/hsl_of01.pdf. I'd prefer to work with something that is open source and is written in Fortran or C. (my code is mostly Fortran 95/2003, with a little C and Python thrown in for good measure!)
I'd appreciate any suggestions regarding available libraries or advice on how to reformulate my problem. Thanks!
Bite the bullet and roll your own?
I deal with too-large data all the time, such as 30,000+ data series of half-hourly data that span decades. Because of the regularity of the data (daylight savings changeovers a problem though) it proved quite straightforward to devise a scheme involving a random-access disc file and procedures ReadDay and WriteDay that use a series number, and a day number, with further details because series start and stop at different dates. Thus, a day's data in an array might be Array(Run,DayNum) but now is ReturnCode = ReadDay(Run,DayNum,Array) and so forth, the codes indicating presence/absence of that day's data, etc. The key is that a day's data is a convenient size, and a regular (almost) size, and although my prog. allocates a buffer of one record per series, it runs in ~100MB of memory rather than GB.
Because your array is non-sparse, it is regular. Granted that a grid cell's data are of fixed size, you could devise a random-access disc file with each record holding one cell, or, perhaps a row's worth of cells (or a column's worth of cells) or some worthwhile blob size. I choose to have 4,096 bytes/record as that is the disc file allocation size. Let the computer's operating system and disc storage controller do whatever buffering to real memory they feel up to. Typical execution is restricted to the speed of data transfer however, unless the local data's computation is heavy. Thus, I get cpu use of a few percent until data requests start being satisfied from buffers.
Because fortran uses the same syntax for functions as for arrays (unlike say Pascal), instead of declaring DIMENSION ARRAY(Big,Big) you would remove that and devise FUNCTION ARRAY(i,j), and all read references in your source file stay as they are. Alas, in the absence of a "palindromic" function declaration, assignments of values to your array will have to be done with a different syntax and you devise a subroutine or similar. Possibly a scratchpad array could be collated, worked upon with convenient syntax, and then written back if changed.

Resources