I'm writing a program in IDL to read DICOM images, then store them in a big matrix and finally save them in .dat file. The DICOMs are under the name IM0,IM1,IM2,..IM21777. I wrote the code below but I am getting an error. I am using IDL version 6.4.
files = file_search('E:\SE7\IM*)
n_files = n_elements(files)
full_data = fltarr(256,256,n_files)
for i=0L, n_files-1 do begin
full_data[*,*,i] = read_dicom('E:\SE7\IM')
endfor
path = 'E:\'
open, 1, path + "full_data.dat'
writeu, 1, full_data
close, 1
I am not sure how to loop over the DICOM name i.e. IM0, IM1,IM2 etc
After I store them in the big matrix (i.e. full_data =[256,256,2178]) I would like to make the 3D matrix 4D. Is that possible? I would like to make it have the dimensions [256, 256, 22, 99] i.e. 2178/99.
I'm not sure what error you are getting, but you are missing a quotation mark in the first line. It should be:
files = file_search('E:\SE7\IM*')
To loop over the DICOM name, you can string concatenate the loop index using + and STRTRIM() as follows:
for i=0L, n_files-1 do begin
full_data[*,*,i] = read_dicom('E:\SE7\IM'+STRTRIM(i,2))
endfor
Finally, to turn your (256,256,2178) matrix into a (256,256,22,99) matrix, use REBIN:
final_data = REBIN(full_data, 256, 256, 20, 99)
Depending on the way you want the dimensions arranged, you may need additional operations. This post is a great primer on how to manipulate arrays and their dimensionality: Coyote Dimensional Juggling Tutorial.
Related
I'm attempting to use this tutorial to manipulate and plot ATAC-sequencing data. I have all the libraries listed in that tutorial installed and loaded, except while they use biocLite(BSgenome.Hsapiens.UCSC.hg19) for the human genome, I'm using biocLite(TxDb.Mmusculus.UCSC.mm10.knownGene) for the mouse genome.
Here I have loaded in my BAM file
sorted_AL1.1BAM <-"Sorted_1_S1_L001_R1_001.fastq.gz.subread.bam"
And created an object called TSS, which is transcription start site regions from the mouse genome. I want to ultimately plot the average signal in my read data across mouse transcription start sites.
TSSs <- resize(genes(TxDb.Mmusculus.UCSC.mm10.knownGene), fix = "start", 1)
The problem occurs with the following code:
nucFree <- regionPlot(bamFile = sorted_AL1.1BAM, testRanges = TSSs, style = "point",
format = "bam", paired = TRUE, minFragmentLength = 0, maxFragmentLength = 100,
forceFragment = 50)
The error is as follows:
Reading Bam header information.....Done
Filtering regions which extend outside of genome boundaries.....Done
Filtered 24528 of 24528 regions
Splitting regions by Watson and Crick strand..Error in DataFrame(..., check.names = FALSE) :
different row counts implied by arguments
I assume my BAM file contains empty values that need to be changed to NAs. My issue is that I'm not sure how to visualize and manipulate BAM files in R in order to do this. Any help would be appreciated.
I tried the following:
data.frame(sorted_AL1.1BAM)
sorted_AL1.1BAM[sorted_AL1.1BAM == ''] <- NA
I expected this to resolve the issue of different row counts, but I get the same error message.
I am working with large matrices of data (Nrow x Ncol) that are too large to be stored in memory. Instead, it is standard in my field of work to save the data into a binary file. Due to the nature of the work, I only need to access 1 column of the matrix at a time. I also need to be able to modify a column and then save the updated column back into the binary file. So far I have managed to figure out how to save a matrix as a binary file and how to read 1 'column' of the matrix from the binary file into memory. However, after I edit the contents of a column I cannot figure out how to save that column back into the binary file.
As an example, suppose the data file is a 32-bit identity matrix that has been saved to disk.
Nrow = 500
Ncol = 325
data = eye(Float32,Nrow,Ncol)
stream_data = open("data","w")
write(stream_data,data[:])
close(stream_data)
Reading the entire file from disk and then reshaping back into the matrix is straightforward:
stream_data = open("data","r")
data_matrix = read(stream_data,Float32,Nrow*Ncol)
data_matrix = reshape(data_matrix,Nrow,Ncol)
close(stream_data)
As I said before, the data-matrices I am working with are too large to read into memory and as a result the code written above would normally not be possible to execute. Instead, I need to work with 1 column at a time. The following is a solution to read 1 column (e.g. the 7th column) of the matrix into memory:
icol = 7
stream_data = open("data","r")
position_data = 4*Nrow*(icol-1)
seek(stream_data,position_data)
data_col = read(stream_data,Float32,Nrow)
close(stream_data)
Note that the coefficient '4' in the 'position_data' variable is because I am working with Float32. Also, I don't fully understand what the seek command is doing here, but it seems to be giving me the correct output based on the following tests:
data == data_matrix # true
data[:,7] == data_col # true
For the sake of this problem, lets say I have determined that the column I loaded (i.e. the 7th column) needs to be replaced with zeros:
data_col = zeros(Float32,size(data_col))
The problem now, is to figure out how to save this column back into the binary file without affecting any of the other data. Naturally I intend to use 'write' to perform this task. However, I am not entirely sure how to proceed. I know I need to start by opening up a stream to the data; however I am not sure what 'mode' I need to use: "w", "w+", "a", or "a+"? Here is a failed attempt using "w":
icol = 7
stream_data = open("data","w")
position_data = 4*Nrow*(icol-1)
seek(stream_data,position_data)
write(stream_data,data_col)
close(stream_data)
The original binary file (before my failed attempt to edit the binary file) occupied 650000 bytes on disk. This is consistent with the fact that the matrix is size 500x325 and Float32 numbers occupy 4 bytes (i.e. 4*500*325 = 650000). However, after my attempt to edit the binary file I have observed that the binary file now occupies only 14000 bytes of space. Some quick mental math shows that 14000 bytes corresponds to 7 columns of data (4*500*7 = 14000). A quick check confirms that the binary file has replaced all of the original data with a new matrix with size 500x7, and whose elements are all zeros.
stream_data = open("data","r")
data_new_matrix = read(stream_data,Float32,Nrow*7)
data_new_matrix = reshape(data_new_matrix,Nrow,7)
sum(abs(data_new_matrix)) # 0.0f0
What do I need to do/change in order to only modify only the 7th 'column' in the binary file?
Instead of
icol = 7
stream_data = open("data","w")
position_data = 4*Nrow*(icol-1)
seek(stream_data,position_data)
write(stream_data,data_col)
close(stream_data)
in the OP, write
icol = 7
stream_data = open("data","r+")
position_data = 4*Nrow*(icol-1)
seek(stream_data,position_data)
write(stream_data,data_col)
close(stream_data)
i.e. replace "w" with "r+" and everything works.
The reference to open is http://docs.julialang.org/en/release-0.4/stdlib/io-network/#Base.open and it explains the various modes. Preferably open shouldn't be used with the original somewhat confusing but definitely slower string parameter.
You can use SharedArrays for the need you describe:
data=SharedArray("/some/absolute/path/to/a/file", Float32,(Nrow,Ncols))
# do something with data
data[:,1]=a[:,1].+1
exit()
# restart julia
data=SharedArray("/some/absolute/path/to/a/file", Float32,(Nrow,Ncols))
#show data[1,1]
# prints 1
Now, be mindful that you're supposed to handle synchronisation to read/write from/to this file (if you have async workers) and that you're not supposed to change the size of the array (unless you know what you're doing).
Problem in generating file names
I have around 4000 .txt files each containing three columns of data. I want to read all the 3 columns from a single file one at a time and then plot three values which correspond to x,y,z values on a contour plot.
These files are created at various time step. So a plot from one file will be a level curve and plots from all of them will give me a contour plot.
But the problem I want to do something which I can do in bash like this:
for n in `seq -f "%09g" 30001 200 830001`; do
./someFile$n.whateverFileFormat
done
How can I do this in matlab so that if I have let's say:
t-000030001.txt
1 2 3
......
......
......
t-0000320001.txt
2 4 5
. . .
. . .
. . .
and so on to
t-0008300001.txt
3 5 6
. . .
. . .
and on it goes.
I want to load all these files one at a time store the values in a infx3 array plot them on a contour plot and do this again and again for all the files so that I can have all of them on a single plot.
P.S. I need to reproduce something equivalent to that bash script mentioned above so as to load files appropriately then only I will be read from them
One way to get the list of file names is this:
fnames = arrayfun(#(num)sprintf('t-%09g.txt', num), 30001:200:830001, 'Uniformoutput', 0);
Let's have a closer look: 30001:200:830001 generates an array, starting at 30001, incrementing by 200, ending at 830001. sprintf generates a formatted string, and arrayfun applies the anonymous function passed as its first argument to each element of the array in its second argument (the sequence). The output is a cell array containing the file names.
EDIT
The solution above is equivalent to the following code:
ind = 30001:200:830001;
fnames = cell(numel(ind), 1);
for i = 1:numel(ind)
fnames{i} = sprintf('t-%09g.txt',ind(i));
end
This stores all the values in the a cell array.
Writing #(num)sprintf('t-%09g.txt', num) creates an anonymous function. The looping happens in arrayfun.
I am writing a program in IDL that requires reading n images (each of m pixels) from a directory, convert them to grayscale, concatenate each image as a single vector, and then form a an m * n matrix from the data.
So far I have managed to read and convert a single image to a grayscale vector, but I can't figure out how to extend this to reading multiple image files.
Can anyone advise on how I could adapt my code in order to do this?
(The image files will all be of the same size, and stored in the same directory with convenient filenames - i.e. testpicture1, testpicture2, etc)
Thanks
pro readimage
image = READ_IMAGE('Z:\My Documents\testpicture.jpg')
redChannel = REFORM(image[0, *, *])
greenChannel = REFORM(image[1, * , *])
blueChannel = REFORM(image[2, *, *])
grayscaleImage = BYTE(0.299*FLOAT(redChannel) + $
0.587*FLOAT(greenChannel) + 0.114*FLOAT(blueChannel))
imageVec = grayscaleImage[*]
end
Use FILE_SEARCH to find the names and number of the images of the given name:
filenames = FILE_SEARCH('Z:\My Documents\testpicture*.jpg', count=nfiles)
You will probably also want to declare an array to hold your results:
imageVec = bytarr(m, nfiles)
Then loop over the files with a FOR loop doing what you are doing already:
for f = 0L, nfiles - 1L do begin
; stuff you are already doing
imageVec[*, f] = grayscaleImage[*]
endfor
I am quite new to image processing and would like to produce an array that stores 10 images. After which I would like to run a for loop through some code that identifies some properties of the images, specifically the surface area of a biological specimen, which then spits out an array containing 10 areas.
Below is what I have managed to scrap up so far, and this is the ensuing error message:
??? Index exceeds matrix dimensions.
Error in ==> Testing1 at 14
nova(i).img = imread([myDir B(i).name]);
Below is the code I've been working on so far:
my_Dir = 'AC04/';
ext_img='*.jpg';
B = dir([my_Dir ext_img]);
nfile = max(size(B));
nova = zeros(1,nfile);
for i = 1:nfile
nova(i).img = imread([myDir B(i).name]);
end
areaarray = zeros(1,nfile);
for k = 1:nfile
[nova(k), threshold] = edge(nova(k), 'sobel');
.
.
.
.%code in this area is irrelevant to the problem I think%
.
.
.
areaarray(k) = bwarea(BWfinal);
end
areaarray
There are few ways you could store an image in a kind of an array structure in Matlab. You could use array of structs. In that case you could do as you did:
nova(i).img = imread([myDir B(i).name]);
You access first image with nova(1).img, second one with nova(2).img etc.
Other way to do it is to use cell array (similar to arrays but are more flexible in the sense that members could be of the different type):
nova{i} = imread([myDir B(i).name]);
You access first image with nova{1}, second one with nova{2} etc.
[ IMPORTANT ] In both cases you should remove this line from code:
nova = zeros(1,nfile);
I suppose you've tried to pre-allocate memory for images, and since you're beginner I advise you not to be concerned with it. It is an optimization concern to be addressed if you come across some performance issues - and if you don't come across them, take advantage of Matlab's automatic memory (re)allocation.