Iteratively populate dataframes using a for loop in Julia

Iteratively populate dataframes using a for loop in Julia - for-loop

I am looking to find a way to iteratively populate a dataframe in Julia.
I have a working function that creates multiple points along a line:
#function to draw QMD lines
using DataFrames
function make_lines(qmd)
BA=Float64[]
TPA=Float64[]
QMD=Int[]
for i in stk_percent
tpa= 1*(i*10)/(a[1]+a[2]*(-0.259+0.973*qmd)+a[3]*qmd^2)
ba=pi*(qmd/24)^2*tpa
push!(TPA,tpa)
push!(BA,ba)
push!(QMD,qmd)
end
return DataFrame(TPA=TPA,BA=BA,QMD=QMD)
end
The next step I am trying to accomplish is to run the make_lines function in a loop using a pre-defined set of inputs with all the outputs in one single dataframe but I cannot get it to work.
dia = [7, 8, 10, 12, 14, 16, 18, 20, 22]
# can't get for loop to append all the data frames?
for i in dia
df=DataFrame(TPA=Float64[],BA=Float64[],QMD=Int[])
append!(df,make_lines(i))
return df
end
At first I thought it was how I was using Dataframes, I have never used Push! etc before but I got this code chunk to work
#this works to combine dataframe
test=make_lines(22)
test2=make_lines(8)
test[:]
append!(test,test2)
So why when I run the for loop, do I end up with only the last dataframe it produces?
Am I misinterpreting something? From what I have read Dataframes in Julia work differently than dataframes in R, but I cannot wrap my head around how to get this working.

You are pretty close, but there are a couple of places where you are getting tripped up in your code. You currently have:
dia = [7, 8, 10, 12, 14, 16, 18, 20, 22]
# can't get for loop to append all the data frames?
for i in dia
df=DataFrame(TPA=Float64[],BA=Float64[],QMD=Int[])
append!(df,make_lines(i))
return df
end
This isn't quite what you want for two reasons:
One: This snippet isn't a function. It thus doesn't make sense, and will cause problems, to have return in it.
Two: At each step in your loop, you are re-creating your dataframe df from scratch, erasing everything that you put before it. This is why, as you say, you only end up with the last data frame that it produces. Instead, you would want something like:
dia = [7, 8, 10, 12, 14, 16, 18, 20, 22]
df=DataFrame(TPA=Float64[],BA=Float64[],QMD=Int[])
for i in dia
append!(df,make_lines(i))
end
Note: I couldn't get a completely working version of your code going - the objects stk_percent and a in your main function never get defined, so I didn't really know what to put in for those. But, I believe that if you fix these issues you'll likely be in a better spot (I made up some values for them and it worked fine).
Performance Tip: When you do fix those, my recommendation would be to make them as explicit arguments that you pass to your function. Although it will still work if they are just variables in the global space, this will lead to suboptimal performance of your code, both now and in the future, and potentially worse things, like confusing the scope of variables, having their values change when you don't want, etc. Best to start off from the beginning of your journey with Julia adopting as many best practices in writing your code as is practicable.

I managed to create a blank dataframe by providing the type of variable and the column names
df = DataFrame([DateTime;fill(Float64, 2);String;fill(Float64, 2)],
["Date","A","B","Letter","C","D"])
Then I can append the results to populate the new dataframe by using rename! and then append! functions inside the for loop.
This is very useful for large datasets with numerous columns.

Related

Is there a drawback in using rxjs for readonly collection manipulation

I need to do a Min and Max operation on a array getting from server side.
I am new to rxjs extensions but those library is actually mean to observe changes on a collection, but in my case its just a ONE time calculation on a collection which is no further changed then until I do a server side refresh of the data.
I just want to use the right tool for the right job, thus I ask is it correct to use rxjs here or is that shooting with bombs on flys?
Or should I rather use a library like https://github.com/ENikS/LINQ
to get the Min/Max value of a collection?

There is a LINQ implementation IxJS that is developed and maintained by the same team that is developing RxJS. This might be the right tool for you.
However, you could go with RxJS as well. When using Rx.Observable.from([1, 2, ...]) the execution is synchronous on subscription.
I would use IxJS however:
// An array of values.. (just creating some random ones here)
const values = [2, 4, 23, 1, 0, 34, 56, 2, 3, 45, 98, 6, 3];
// Create an enumerable from the array
const valEnum = Ix.Enumerable.fromArray(values);
const min = valEnum.min();
const max = valEnum.max();
Working example on jsfiddle.

https://github.com/ENikS/LINQ uses all the latest language features and theoretically much faster than IxJS. Last edit on IxJS is 3 years old. (ECMA-262/6.0/) introduced few very important advancements and speed improvements.
It also has better compliance with standard LINQ API and can operate on any collection implementing iterables, including strings, maps, typed arrays, and etc. IxJS can only query array types.

highcharts remove redundant data points to improve speed

i am drawing a simple line chart with highcharts. one chart can include many many points which introduces delay when playing around with the chart.
since many data-points are redundant i came up with the idea of not adding a new data-point if the value is the same as the previous one. this reduces the amount of data, but should still result in the same graph.
please see this example: http://jsfiddle.net/qm94j14t/1/
i would like to have one straight line without the data-points from February until November.
right now the data array looks like this:
data: [7,7,7,7,7,7,7,7,7,7,7,10]
What do i need to change in the code to get a straight line without these redundant 7 values?

Instead of using [y_1, y_2, ... , y_n] format, use [ [x_1, y_1], [x_2, y_2] , ... , [x_n, y_n]] format.
Then remove redundant data, demo: http://jsfiddle.net/qm94j14t/7/ So in your case it's [[0,7], [9, 7], [10, 10]].

Most efficient way to parse a file in Lua

I'm trying to figure out what is the most efficient way to parse data from a file using Lua. For example lets say I have a file (example.txt) with something like this in it:
0, Data
74, Instance
4294967295, User
255, Time
If I only want the numbers before the "," I could think of a few ways to get the information. I'd start out by getting the data with f = io.open(example.txt) and then use a for loop to parse each line of f. This leads to the heart of my question. What is the most efficient way to do this?
In the for loop I could use any of these methods to get the # before the comma:
line.find(regex)
line:gmatch(regex)
line:match(regex)
or Lua's split function
Has anyone run test for speed for these/other methods which they could point out as the fast way to parse? Bonus points if you can speak to speeds for parsing small vs. large files.

You probably want to use line:match("%d+").
line:find would work as well but returns more than you want.
line:gmatch is not what you need because it is meant to match several items in a string, not just one, and is meant to be used in a loop.
As for speed, you'll have to make your own measurements. Start with the simple code below:
for line in io.lines("example.txt") do
local x=line:match("%d+")
if x~=nil then print(x) end
end

How to write to and read from a file in Visual C++/CLI?

I am a new learner of the CLI. My teacher posts the following code as an example of hwo to save an object into a file and read an object from the file. I think I understand the code. Now, my teacher also post a question of how to save an array of objects and also read an array of objects (same types).
Q1. How I can know that how many objects in the file?
Q2. What commands should I use to write and read an array of object?
Thanks.
Player ^Joe = gcnew Player("Joe", "Human", "Thief", 10, 18, 9, 13, 10, 11);
Console::WriteLine("Original Joe");
Joe->Print();
FileStream ^plStream = File::Create((args->Length==1)?args[0]:"Player.dat");
BinaryFormatter ^f = gcnew BinaryFormatter();
f->Serialize(plStream, Joe);
plStream->Close();
plStream = File::OpenRead((args->Length==1)?args[0]:"Player.dat");
Player ^JoeClone = (Player^)f->Deserialize(plStream);
plStream->Close();
Console::WriteLine("\nCloned Joe");
JoeClone->Print();

For Q1, a simple way to know how many Objects are in the file is when writing the file, have the first thing written be the number of Objects in the array.
Having that written down, you can have it loop through in the reading process. I'll let you come up with code for that.

Put you objects into one of the standard containers (for example, a generic list), and serialize / deserialize that list.
The example, in the docs show you how to do this with a Hashtable instead of a List, should not be too hard to adapt this: http://msdn.microsoft.com/en-us/library/c5sbs8z9.aspx

Problem converting a Matrix to Data Frame in R (R thinks all numeric types are factors)

I am passing data from C# to R over a COM interface. When the data arrives in R it is housed in a 'Matrix'. Some of the functions that I use require that the data be inside a 'DataFrame' instead. I convert the data structure using
newDataFrame <- as.data.frame(oldMatrix)
The table of data reaches R just fine, once I make the conversion to the DataFrame however, it assumes all of my numeric data are factors!
So it turns: {34, 46, 90, 54, 69, 54} into {1, 2, 3, 4, 5, 4}
My data table DOES have factors in it though, so I just can't force the whole thing to be numeric. Is there any way around this? Note: I can't export the data as a CSV onto the filesystem and read it into R manually.
On a side note, the function I am using that requires a DataFrame is the 'Hmisc' package using
hist.data.frame(dataFrame)
this produces a frequency histogram for every column of data in the DataFram and arranges them in all in a grid pattern (quite nifty)!
Thanks!
-Dave

I think you have mis-diagnosed the problem - all columns in a matrix must be of the same type, so this is likely to be where the problem arises, not the conversion to a data frame.

I've had this problem before. You need to set stringsAsFactors=F when you read the data.
Now, you can convert individual variables/columns to factors (ie, with as.numeric() and the like), without worrying about how the numbers are treated.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Iteratively populate dataframes using a for loop in Julia - for-loop

Related

Is there a drawback in using rxjs for readonly collection manipulation

highcharts remove redundant data points to improve speed

Most efficient way to parse a file in Lua

How to write to and read from a file in Visual C++/CLI?

Problem converting a Matrix to Data Frame in R (R thinks all numeric types are factors)

Categories

Resources