Extracting data from text file in AMPL without adding indexes - matrix

I'm new to AMPL and I have data in a text file in matrix form from which I need to use certain values. However, I don't know how to use the matrices directly without having to manually add column and row indexes to them. Is there a way around this?
So the data I need to use looks something like this, with hundreds of rows and columns (and several more matrices like this), and I would like to use it as a parameter with index i for rows and j for columns.
t=1
0.0 40.95 40.36 38.14 44.87 29.7 26.85 28.61 29.73 39.15 41.49 32.37 33.13 59.63 38.72 42.34 40.59 33.77 44.69 38.14 33.45 47.27 38.93 56.43 44.74 35.38 58.27 31.57 55.76 35.83 51.01 59.29 39.11 30.91 58.24 52.83 42.65 32.25 41.13 41.88 46.94 30.72 46.69 55.5 45.15 42.28 47.86 54.6 42.25 48.57 32.83 37.52 58.18 46.27 43.98 33.43 39.41 34.0 57.23 32.98 33.4 47.8 40.36 53.84 51.66 47.76 30.95 50.34 ...

I'm not aware of an easy way to do this. The closest thing is probably the table format given in section 9.3 of the AMPL Book. This avoids needing to give indices for every term individually, but it still requires explicitly stating row and column indices.
AMPL doesn't seem to do a lot with position-based input formats, probably because it defaults to treating index sets as unordered so the concept of "first row" etc. isn't meaningful.
If you really wanted to do it within AMPL, you could probably put together a work-around along these lines:
declare a single-index param with length equal to the total size of your matrix (e.g. if your matrix is 10 x 100, this param has length 1000)
edit the beginning and end of your "matrix" data file to turn it into appropriate format for a single-index parameter indexed from 1 to n
then define your matrix something like this:
param m{i in 1..nrows,j in 1..ncols} := x[j+i*(ncols-1)];
(not tested, I won't promise that I have rows and columns the right way around there!)
But you're probably better off editing the input file into one of the standard AMPL matrix formats. AMPL isn't really designed for data wrangling - you can do it in a pinch but if you're doing this kind of thing repeatedly it may be less trouble to code it in a general-purpose language e.g. Python.

Related

custom array printing in gdb

I know gdb has several means of exploring data, some of them quite convenient. However, I cannot combine them to get that I need/want. I would like to display some custom string based on the first n values of a big array starting at <PT_arr>, and the last m values of the same array at a distance (in this case) 4096. Looking something like this:
table beginning:
0x804cfe0 <PT_arr>: 0x00100300 0x00200300 0x00300300 0x00400300
table end:
0x804cfe0 <PT_arr+4064>: 0x00500300 0x00600300 0x00700300 0x00800300
printf let's me add custom text (like table beginning)
the examine x gives me that nice alignment, let's me read many elements and group them by byte, words, etc; and shows addresses at the left (which is ideal for my case).
x aligns the content of regions of memory in an easy to read manner with the size and unit parameters. (what I want)
display is constantly printing. (what I want).
The issue with display (manual), is that unlike examine x (manual) it doesn't have a size or unit parameter.
Is there a way to accomplish that?
Thanks.

(Using Julia) How can I reduce my data matrix by averaging values from the same hour?

I am trying to reduce the size of my data and I cannot make it work. I have data points taken every minute over 1 month. I want to reduce this data to have one sample for every hour. The problem is: Some of my runs have "NA" value, so I delete these rows. There is not exactly 60 points for every hour - it varies.
I have a 'Timestamp' column. I have used this to make a 'datehour' column which has the same value if the data set has the same date and hour. I want to average all the values with the same 'datehour' value.
How can I do this? I have tried using the if and for loop below, but it takes so long to run.
Thanks for all your help! I am new to Julia and come from a Matlab background.
======= CODE ==========
uniquedatehour=unique(datehour,1)
index=[]
avedata=reshape([],0,length(alldata[1,:]))
for j in uniquedatehour
for i in 1:length(datehour)
if datehour[i]==j
index=vcat(index,i)
else
rows=alldata[index,:]
rows=convert(Array{Float64,2},rows)
avehour=mean(rows,1)
avedata=vcat(avedata,avehour)
index=[]
continue
end
end
end
There are several layers to optimizing this code. I am assuming that your data is sorted on datehour (your code assumes this).
Layer one: general recommendation
Wrap your code in a function. Executing code in global scope in Julia is much slower than within a function. By wrapping it make sure to either pass data to your function as arguments or if data is in global scope it should be qualified with const;
Layer two: recommendations to your algorithm
Statement like [] creates an array of type Any which is slow, you should use type qualifier like index=Int[] to make it fast;
Using vcat like index=vcat(index,i) is inefficient, it is better to do push!(index, i) in place;
It is better to preallocate avedata with e.g. fill(NA, length(uniquedatehour), size(alldata, 2)) and assign values to an existing matrix than to do vcat on it;
Your code will produce incorrect results if I am not mistaken as it will not catch the last entry of uniquedatehour vector (assume it has only one element and check what happens - avedata will have zero rows)
Line rows=convert(Array{Float64,2},rows) is probably not needed at all. If alldata is not Matrix{Float64} it is better to convert it at the beginning with Matrix{Float64}(alldata);
You can change line rows=alldata[index,:] to a view like view(alldata, index, :) to avoid allocation;
In general you can avoid creation of index vector as it is enough that you remember start s and end e position of the range of the same values and then use range s:e to select rows you want.
If you correct those things please post your updated code and maybe I can help further as there is still room for improvement but requires a bit different algorithmic approach (but maybe you will prefer option below for simplicity).
Layer three: how I would do it
I would use DataFrames package to handle this problem like this:
using DataFrames
df = DataFrame(alldata) # assuming alldata is Matrix{Float64}, otherwise convert it here
df[:grouping] = datehour
agg = aggregate(df, :grouping, mean) # maybe this is all what you need if DataFrame is OK for you
Matrix(agg[2:end]) # here is how you can convert DataFrame back to a matrix
This is not the fastest solution (as it converts to a DataFrame and back but it is much simpler for me).

Format statement with unknown columns

I am attempting to use fortran to write out a comma-delimited file for import into another commercial package. The issue is that I have an unknown number of data columns. My output needs to look like this:
a_string,a_float,a_different_float,float_array_elem1,float_array_elem2,...,float_array_elemn
which would result in something that might look like this:
L1080,546876.23,4325678.21,300.2,150.125,...,0.125
L1090,563245.1,2356345.21,27.1245,...,0.00983
I have three issues. One, I would prefer the elements to be tightly grouped (variable column width), two, I do not know how to define a variable number of array elements in the format statement, and three, the array elements can span a large range--maybe 12 orders of magnitude. The following code conceptually does what I want, but the variable 'n' and the lack of column-width definition throws an error (of course):
WRITE(50,900) linenames(ii),loc(ii,1:2),recon(ii,1:n)
900 FORMAT(A,',',F,',',F,n(',',F))
(I should note that n is fixed at run-time.) The write statement does what I want it to when I do WRITE(50,*), except that it's width-delimited.
I think this thread almost answered my question, but I got quite confused: SO. Right now I have a shell script with awk fixing the issue, but that solution is...inelegant. I could do some manipulation to make the output a string, and then just write it, but I would rather like to avoid that option if at all possible.
I'm doing this in Fortran 90 but I like to try to keep my code as backwards-compatible as possible.
the format close to what you want is f0.3, this will give no spaces and a fixed number of decimal places. I think if you want to also lop off trailing zeros you'll need to do a good bit of work.
The 'n' in your write statement can be larger than the number of data values, so one (old school) approach is to put a big number there, eg 100000. Modern fortran does have some syntax to specify indefinite repeat, i'm sure someone will offer that up.
----edit
the unlimited repeat is as you might guess an asterisk..and is evideltly "brand new" in f2008
In order to make sure that no space occurs between the entries in your line, you can write them separately in character variables and then print them out using theadjustl() function in fortran:
program csv
implicit none
integer, parameter :: dp = kind(1.0d0)
integer, parameter :: nn = 3
real(dp), parameter :: floatarray(nn) = [ -1.0_dp, -2.0_dp, -3.0_dp ]
integer :: ii
character(30) :: buffer(nn+2), myformat
! Create format string with appropriate number of fields.
write(myformat, "(A,I0,A)") "(A,", nn + 2, "(',',A))"
! You should execute the following lines in a loop for every line you want to output
write(buffer(1), "(F20.2)") 1.0_dp ! a_float
write(buffer(2), "(F20.2)") 2.0_dp ! a_different_float
do ii = 1, nn
write(buffer(2+ii), "(F20.3)") floatarray(ii)
end do
write(*, myformat) "a_string", (trim(adjustl(buffer(ii))), ii = 1, nn + 2)
end program csv
The demonstration above is only for one output line, but you can easily write a loop around the appropriate block to execute it for all your output lines. Also, you can choose different numerical format for the different entries, if you wish.

SSRS 2008: Using StDevP from multiple fields / Combining multiple fields in general

I'd like to calculate the standard deviation over two fields from the same dataset.
example:
MyFields1 = 10, 10
MyFields2 = 20
What I want now, is the standard deviation for (10,10,20), the expected result is 4.7
In SSRS I'd like to have something like this:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value)
Unfortunately this isn't possible, since (Fields!MyField1.Value + Fields!MyField2.Value) returns a single value and not a list of values. Is there no way to combine two fields from the same dataset into some kind of temporary dataset?
The only solutions I have are:
To create a new Dataset that contains all values from both fields. But this is very annoying because I need about twenty of those and I have six report parameters that need to filter every query. => It's probably getting very slow and annoying to maintain.
Write the formula by hand. But I don't really know how yet. StDevP is not that trivial to me. This is how I did it with Avg which is mathematically simpler:
=(SUM(Fields!MyField1.Value)+SUM(Fields!MyField2.Value))/2
found here: http://social.msdn.microsoft.com/Forums/is/sqlreportingservices/thread/7ff43716-2529-4240-a84d-42ada929020e
Btw. I know that it's odd to make such a calculation, but this is what my customer wants and I have to deliver somehow.
Thanks for any help.
CTDevP is standard deviation.
Such expression works fine for me
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value) but it's deviation from one value (Fields!MyField1.Value + Fields!MyField2.Value) which is always 0.
you can look here for formula:
standard deviation (wiki)
I believe that you need to calculate this for some group (or full dataset), to do this you need set in the CTDevP your scope:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value, "MyDataSet1")

R: Which heatmap/image to get row-sorted plot without any dendrogram?

Which package is best for a heatmap/image with sorting on rows only, but don't show any dendrogram or other visual clutter (just a 2D colored grid with automatic named labels on both axes). I don't need fancy clustering beyond basic numeric sorting. The data is a 39x10 table of numerics in the range (0,0.21) which I want to visualize.
I searched SO (see this) and the R sites, and tried a few out. Check out R Graphical Manual to see an excellent searchable list of screenshots and corresponding packages.
The range of packages is confusing - which one is the preferred heatmap (like ggplot2 is for most other plotting)? Here is what I found out so far:
base::image - bad, no name labels on axes, no sorting/clustering
base::heatmap - options are far less intelligible than the following:
pheatmap::pheatmap - fantastic but can't seem to turn off the
dendrograms? (any hacks?)
ggplot2 people use geom_tile, as Andrie points out
gplots::heatmap.2 , ref - seems
to be favored by biotech people, but way overkill for my purposes. (no
relation to ggplot* or Prof Wickham)
plotrix::color2D.matplot also exists
base::heatmap is annoying, even with args heatmap(..., Colv=NA, keep.dendro=FALSE) it still plots the unwanted dendrogram on rows.
For now I'm going with pheatmap(..., cluster_cols=FALSE, cluster_rows=FALSE) and manually presorting my table, like this guy: Order of rows in heatmap?
Addendum: to display the value inside each cell, see: display a matrix, including the values, as a heatmap . I didn't need that but it's nice-to-have.
With pheatmap you can use options treeheight_row and treeheight_col and set these to 0.
just another option you have not mentioned...package bipartite as it is as simple as you say
library(bipartite)
mat<-matrix(c(1,2,3,1,2,3,1,2,3),byrow=TRUE,nrow=3)
rownames(mat)<-c("a","b","c")
colnames(mat)<-c("a","b","c")
visweb(mat,type="nested")

Resources