plotting multi bar graph (like clustered column in Excel) - d3.js

I have the data in the following format
Type Sub-type Value
A A_1 10
A A_1 20
A A_1 30
A A_1 40
A A_2 25
A A_2 35
A A_3 45
B B_1 10
B B_1 20
B B_2 30
C C_1 10
C C_1 20
C C_2 10
C C_2 20
I want to plot multi bar plot in such a way that the bars of same sub-type will be same color. The different sub-types should have different column. The types should be separated by some spaces.
[Edited]
I used the http://bl.ocks.org/mbostock/3887051 with data.csv file but with some more rows
CA,2704659,4499890,2159981,3853788,10604510,8819342,4114496
CA,3704659,4499890,2659981,3853788,10604510,8819342,4114496
CA,6704659,4499890,2159981,3853788,10604510,8819342,4114496
TX,2027307,3277946,1420518,2454721,7017731,5656528,2472223
NY,1208495,2141490,1058031,1999120,5355235,5120254,2607672
NY,1008495,2671490,1058031,1999120,5355235,5120254,2607672
NY,1208495,2141490,1058031,1999120,5355235,5120254,2607672
FL,1140516,1938695,925060,1607297,4782119,4746856,3187797
IL,894368,1558919,725973,1311479,3596343,3239173,1575308
PA,737462,1345341,679201,1203944,3157759,3414001,1910571
PA,737462,1345341,679201,1203944,3157759,3414001,1910571
PA,37462,345341,79201,3944,31579,34101,1910571

This is too broad a question and the problem is not clearly stated. In order to get you started, here a very basic gnuplot solution (so.dat is the file with the data you have provided):
set datafile separator ","
set style fill solid
set style histogram
set style data histograms
set boxwidth .9
set yrange [0:11000000]
plot "so.dat" u 2:xtic(1), "so.dat" u 3:xtic(1), "so.dat" u 4:xtic(1), "so.dat" u 5:xtic(1), "so.dat" u 6:xtic(1), "so.dat" u 7:xtic(1), "so.dat" u 8:xtic(1)
which produces
You could customize from here, if that is close enough to what you want.

Related

How to plot on the same graph with for cycles with Gnuplot?

I want to fit multiple data set and plot the result on the same graph, what I' am doing is:
do for [i=2:500]{
fit f(x) "myData" using 1:i via a,b
plot f(x)
}
The fit works fine, the big problem is that this code produce a different plot at each iteration. I would like to have all the fitted functions in a single graph. Is there any way ?
I guess you cannot fit and plot in the same loop. Well, there would be the multiplot environment (check help multiplot), but I guess this is not your idea.
So, you can fit in a do for loop and store the fitted parameters in an array for later use during plotting.
You didn't specify any function, so I assumed something. Check the following minimized example:
Code:
### fitting in a loop
reset session
$Data <<EOD
1 1 6 4
2 4 10 1
3 9 15 0
4 16 22 1
5 25 31 4
6 36 42 9
7 49 55 16
EOD
f(x,a,b,c) = a*(x-b)**2 + c
colMin = 2
colMax = 4
set fit quiet nolog
array A[colMax]
array B[colMax]
array C[colMax]
do for [col=colMin:colMax] {
a=1; b=1; c=1 # some initial values, sometimes 0 or NaN is not a good start
fit f(x,a,b,c) $Data u 1:col via a,b,c
A[col] = a; B[col] = b; C[col] = c
}
set key top left
plot for [col=colMin:colMax] $Data u 1:col w p pt 7 title sprintf("Column %d",col), \
for [col=colMin:colMax] f(x,A[col],B[col],C[col]) w l \
title sprintf("a=%.2f, b=%.2f, c=%.2f",A[col],B[col],C[col])
### end of code
Result:

GNUPLOT 5: conditional plotting with timefmt abscissa

I am having hard time applying conditional plotting to data with timefmt abscissa using gnuplot 5.0 patchlevel 6.
I am trying to plot the content of an ASCII file consisting in two columns:
2016-12-01 12
2017-01-01 1
2017-02-01 2
2017-03-01 3
2017-04-01 4
2017-05-01 5
2017-06-01 6
so I just issue:
set timefmt "%Y-%m-%d"
set xdata time
p 'file.dat' u 1:2 w l, '' u 1:($1>strptime("%Y-%m-%d","2017-03-01")?$2:10) w p
I expect the plot to look like a line following the second column and a series of dots, following the line for the last three abscissas or marking the value 10 at the previous ones.
Actually, all the points are at 10. Do you have any clue about what is happening? Many thanks in advance.
Use timecolumn.
p 'file.dat' u 1:2 w l, '' u 1:(timecolumn(1, "%Y-%m-%d")>strptime("%Y-%m-%d", "2017-03-01") ? column(2) : (10)) w p

Combining every column-combination of an arbitrary number of matrices

I'm trying to figure out a way to do a certain "reduction"
I have a varying number of matrices of varying size, e.g
1 2 2 2 5 6...70 70
3 7 8 9 7 7...88 89
1 3 4
2 7 7
3 8 8
9 9 9
.
.
44 49 49 49 49 49 49
50 50 50 50 50 50 50
87 87 88 89 90 91 92
What I need to do (and I hope that I'm explaining this clearly enough) is to combine any possible
combination of columns from these matrices, this means that one column might be
1
3
1
2
3
9
.
.
.
44
50
87
Which would reduce down to
1
2
3
9
.
.
.
44
50
87
The reason why I'm doing this is because I need to find the smallest unique combined column
What am I trying to accomplish
For those interested, I'm trying to find the smallest set of gene knockouts
to disable reactions. Here, every matrix represents a reactions, and the columns represent the indices of
the genes that would disable that reaction.
The method may be as brute force as needed, as these matrices rarely become overwhelmingly large,
and the reaction combinations won't be long either
The problem
I can't (as far as I know) create a for loop with an arbitrary number of iterators, and the number of
matrices (reactions to disable) is arbitrary.
Clarification
If I have matrices A,B,C with columns a1,a2...b1,b2...c1...cn what I need
are the columns [a1 b1 c1], [a1, b1, c2], ..., [a1 b1 cn] ... [an bn cn]
Solution
Courtesy of Michael Ohlrogge below.
Extension of his answer, for completeness
His solution ends with
MyProd = product(Array_of_ColGroups...)
Which gets the job done
And picking up where he left off
collection = collect(MyProd); #MyProd is an iterator
merged_cols = Array[] # the rows of 'collection' are arrays of arrays
for (i,v) in enumerate(collection)
# I apologize for this line
push!(merged_cols, sort!(unique(vcat(v...))))
end
# find all lengths so I can find which is the minimum
lengths = map(x -> length(x), merged_cols);
loc_of_shortest = find(broadcast((x,y) -> length(x) == y, merged_cols,minimum(lengths)))
best_gene_combos = merged_cols[loc_of_shortest]
tl;dr - complete solution:
# example matrices
a = rand(1:50, 8,4); b = rand(1:50, 10,5); c = rand(1:50, 12,4);
Matrices = [a,b,c];
toJagged(x) = [x[:,i] for i in 1:size(x,2)];
JaggedMatrices = [toJagged(x) for x in Matrices];
Combined = [unique(i) for i in JaggedMatrices[1]];
for n in 2:length(JaggedMatrices)
Combined = [unique([i;j]) for i in Combined, j in JaggedMatrices[n]];
end
Lengths = [length(s) for s in Combined];
Minima = findin(Lengths, min(Lengths...));
SubscriptsArray = ind2sub(size(Lengths), Minima);
ComboTuples = [((i[j] for i in SubscriptsArray)...) for j in 1:length(Minima)]
Explanation:
Assume you have matrix a and b
a = rand(1:50, 8,4);
b = rand(1:50, 10,5);
Express them as a jagged array, columns first
A = [a[:,i] for i in 1:size(a,2)];
B = [b[:,i] for i in 1:size(b,2)];
Concatenate rows for all column combinations using a list comprehension; remove duplicates on the spot:
Combined = [unique([i;j]) for i in A, j in B];
You now have all column combinations of a and b, as concatenated rows with duplicates removed. Find the lengths easily:
Lengths = [length(s) for s in Combined];
If you have more than two matrices, perform this process iteratively in a for loop, e.g. by using the Combined matrix in place of a. e.g. if you have a matrix c:
c = rand(1:50, 12,4);
C = [c[:,i] for i in 1:size(c,2)];
Combined = [unique([i;j]) for i in Combined, j in C];
Once you have the Lengths array as a multidimensional array (as many dimensions as input matrices, where the size of each dimension is the number of columns in each matrix), you can find the column combinations that correspond to the lowest value (there may well be more than one combination), via a simple ind2sub operation:
Minima = findin(Lengths, min(Lengths...));
SubscriptsArray = ind2sub(size(Lengths), Minima)
(e.g. for a randomized run with 3 input matrices, I happened to get 4 results with the minimal length of 19. The result of ind2sub was ([4,4,3,4,4],[3,3,4,5,3],[1,3,3,3,4])
You can convert this further to a list of "Column Combination" tuples with a (somewhat ugly) list comprehension:
ComboTuples = [((i[j] for i in SubscriptsArray)...) for j in 1:length(Minima)]
# results in:
# 5-element Array{Tuple{Int64,Int64,Int64},1}:
# (4,3,1)
# (4,3,3)
# (3,4,3)
# (4,5,3)
# (4,3,4)
Ok, let's see if I understand this. You've got n matrices and want all combinations with one column from each of the n matrices? If so, how about the product() (for Cartesian product) from the Iterators package?
using Iterators
n = 3
Array_of_Arrays = [rand(3,3) for idx = 1:n] ## arbitrary representation of your set of arrays.
Array_of_ColGroups = Array(Array, length(Array_of_Arrays))
for (idx, MyArray) in enumerate(Array_of_Arrays)
Array_of_ColGroups[idx] = [MyArray[:,jdx] for jdx in 1:size(MyArray,2)]
end
MyProd = product(Array_of_ColGroups...)
This will create an iterator object which you can then loop over to consider the specific combinations of columns.

Matlab - Algorithm for calculating 1d consecutive line segment edges from midpoints?

So I have a rectilinear grid that can be described with 2 vectors. 1 for the x-coordinates of the cell centres and one for the y-coordinates. These are just points with spacing like x spacing is 50 scaled to 10 scaled to 20 (55..45..30..10,10,10..10,12..20,20,20) and y spacing is 60 scaled to 40 scaled to 60 (60,60,60,55..42,40,40,40..40,42..60,60) and the grid is made like this
e.g. x = 1 2 3, gridx = 1 2 3, y = 10 11 12, gridy = 10 10 10
1 2 3 11 11 11
1 2 3 12 12 12
so then cell centre 1 is 1,10 cc2 is 2,10 etc.
Now Im trying to formulate an algorithm to calculate the positions of the cell edges in the x and y direction. So like my first idea was to first get the first edge using x(1)-[x(2)-x(1)]/2, in the real case x(2)-x(1) is equal to 60 and x(1) = 16348.95 so celledge1 = x(1)-30 = 16318.95. Then after calculating the first one I go through a loop and calculate the rest like this:
for aa = 2:length(x)+1
celledge1(aa) = x(aa-1) + [x(aa-1)-celledge(aa-1)]
end
And I did the same for y. This however does not work and my y vector in the area where the edge spacing should be should be 40 is 35,45,35,45... approx.
Anyone have any idea why this doesnt work and can point me in the right direction. Cheers
Edit: Tried to find a solution using geometric alebra:
We are trying to find the points A,B,C,....H. From basic geometry we know:
c1 (centre 1) = [A+B]/2 and c2 = [B+C]/2 etc. etc.
So we have 7 equations and 8 variables. We also know the the first few distances between centres are equal (60,60,60,60) therefore the first segment is 60 too.
B - A = 60
So now we have 8 equations and 8 variables so I made this algorithm in Matlab:
edgex = zeros(length(DATA2.x)+1,1);
edgey = zeros(length(DATA2.y)+1,1);
edgex(1) = (DATA2.x(1)*2-diffx(1))/2;
edgey(1) = (DATA2.y(1)*2-diffy(1))/2;
for aa = 2:length(DATA2.x)+1
edgex(aa) = DATA2.x(aa-1)*2-edgex(aa-1);
end
for aa = 2:length(DATA2.y)+1
edgey(aa) = DATA2.y(aa-1)*2-edgey(aa-1);
end
And I still got the same answer as before with the y spacing going 35,45,35,45 where it should be 40,40,40... Could it be an accuracy error??
Edit: here are the numbers if ur interested and I did the same computation as above only in excel: http://www.filedropper.com/workoutedges
It seems you're just trying to interpolate your data. You can do this with the built-in interp1
x = [30 24 19 16 8 7 16 22 29 31];
xi = interp1(2:2:numel(x)*2, x, 1:(numel(x)*2+1), 'linear', 'extrap');
This just sets up the original data as the even-indexed elements and interpolates the odd indices, including extrapolation for the two end points.
Results:
xi =
Columns 1 through 11:
33.0000 30.0000 27.0000 24.0000 21.5000 19.0000 17.5000 16.0000 12.0000 8.0000 7.5000
Columns 12 through 21:
7.0000 11.5000 16.0000 19.0000 22.0000 25.5000 29.0000 30.0000 31.0000 32.0000

Gnuplot: plotting the maximum of two files

let's assume I have two files formatted like this:
x --- y
0 --- 2
1 --- 2.4
2 --- 3.6
which differ for the values of y.
is there a way to plot a single graph that is, for every x, the maximum value of y between the two files?
Dunno if explained my self sufficiently well.
I was trying with conditional sentences but I couldn't find any expression that let me search in 2 different files
There is no way to combine two files or more in a single plot with gnuplot only. You must use an external tool to do this, e.g. the command line utility paste:
max(x, y) = (x > y ? x : y)
plot '< paste fileA.txt fileB.txt' using 1:(max($2, $4))
The y values are contained in the second and fourth columns.
This next version uses a python script with numpy to concatenate the files, but any other scripting language would also do:
"""paste.py: merge lines of two files."""
import numpy as np
import sys
if (len(sys.argv) < 3):
raise RuntimeError('Need two files')
A = np.loadtxt(sys.argv[1])
B = np.loadtxt(sys.argv[2])
np.savetxt(sys.stdout, np.c_[A, B], delimiter='\t')
To plot, use:
max(x, y) = (x > y ? x : y)
plot '< python paste.py fileA.txt fileB.txt' using 1:(max($2, $4))
Just for the records, there is a way with gnuplot only to get the maximum out of two files.
For sure, it's probably more efficient to use Linux tools or on Windows install, e.g. CoreUtils from GnuWin, but with gnuplot-only you are surely platform independent without extra installations.
Assumption: both files have same number of lines and identical x-values
Edit: simplified code which works for all gnuplot versions>=4.6.0 and faster version for gnuplot>=5.2.0 using an array.
The simple "trick" is to write the y value of one file into a single string and address them via word(). For small data this is ok, but for larger data (>10'000 lines) it might get slow because apparently it runs with something like O(N^2). Just to get an idea (on my system): 1'000 lines take 0.4 seconds, 10'000 lines take 13 seconds and 20'000 lines already take 45 seconds.
As comparison, the array-solution for gnuplot>=5.2.0 just takes about 3 seconds for 10'000 lines.
Data:
SO19079146_1.dat
1 1.1
2 2.1
4 1.5
6 1.3
7 0.2
8 1.5
9 2.1
SO19079146_2.dat
1 2.1
2 2.5
4 1.5
6 0.3
7 0.7
8 1.0
9 1.4
Script 1: (works for gnuplot>=4.6.0, March 2012)
### plot maximum from two files
reset
FILE1 = 'SO19079146_1.dat'
FILE2 = 'SO19079146_2.dat'
data2 = ''
stats FILE2 u (data2=data2.' '.sprintf("%g",$2)) nooutput
set offset 1,1,1,1
max(col) = (i=int(column(0)+1), y1=column(col), y2=real(word(data2,i)), y1>y2 ? y1 : y2)
plot FILE1 u 1:(max(2)) w lp pt 7 lw 8 lc rgb "grey" ti "Max", \
'' u 1:2 w lp pt 7 lc rgb "red" ti "Data1", \
FILE2 u 1:2 w lp pt 7 lc rgb "blue" ti "Data2"
### end of script
Script 2: (works for gnuplot>=5.2.0, Sept. 2017)
### find the maximum out of two files/datablocks (gnuplot>=5.2.0)
reset session
FILE1 = 'SO/SO19079146_1.dat'
FILE2 = 'SO/SO19079146_2.dat'
stats FILE1 u 0 nooutput
array A[STATS_records]
stats FILE2 u (i=int($0+1), A[i]=$2) nooutput
set offset 1,1,1,1
max(col) = (i=int(column(0)+1), y1=column(col), y2=A[i], y1>y2 ? y1 : y2)
plot FILE1 u 1:(max(2)) w lp pt 7 lw 8 lc "grey" ti "Max", \
'' u 1:2 w lp pt 7 lc "red" ti "Data1", \
FILE2 u 1:2 w lp pt 7 lc "blue" ti "Data2"
### end of script
Result: (identical for all above versions)

Resources