Getting tsv file into 1Kb genomic bins - genomicranges

I have a .tsv file set out as chromosome - genomic position - methylation status:
I want to bin it into 1Kb bins of genomic regions, as well as get the third columns average.enter image description here
i.e. Get the methylation average per 1kb bin.
Thanks in advance :)
Tried genomic ranges however not in right format.

Related

Pixel by pixel image hashing methods

I'm trying to remove damaged jpeg duplicates of some 27k photos. Unfortunately most of these are actually damaged duplicates showing half or less of the original image before cutting out to mess/grey.
Is there any intelligent algorithm to hash a picture that instead of hashing a reduced size version of the full image (as in aHash, pHash and dHash) does it pixel by pixel (starting top left and reading LTR)?
The thing is most algorithms just reduce the image size and then create an hash in order to compare the pictures. As these damaged files really lack most of the data it's impossible to compare the first "few lines or few pixels of an image". The only software coming close to this is AllDup, but it only does a comparison bit-by-bit and not checking the actual image data.
Would that even be possible?
Thanks in advance.

Empty PowerPoint chart size is massive - Why?

I created a macro that updates around 55 PowerPoint slides, ranging from populating tables to updating line and bar charts. The macro works well, however, for some reason the PowerPoint file size has increased significantly. While working on the macro the size was around 80,000 KB, after making very few minor changes it suddenly close to doubled to 150,000 KB. To find out which slides cause this huge size, I published the slides to see the individual slide sizes and am able to narrow down the problem. Due to the large variety of charts I will focus on one kind.
I have 2 regular line charts on one slide and the size is 5000+ KB! Whenever I delete one of the two, the size is reduced to roughly half the size.
I have taken the following steps to try to find the problem:
1) Removed and deleted all cells that the chart references to (inside the PowerPoint) -- No change in file size.
2) Removed all chart features, such as axis title, legends, etc -- No change in file size.
3) Slide is not macro enabled and has therefore no macro included in the file.
4) Made sure there are no hidden objects.
All that is left is an empty 'Chart Placeholder' with no data in the XL file and yet the size is very large.
The PowerPoint slide contains no images either. A regular PowerPoint slide with a line chart should only have a size of around 50-100 KB and I am not sure how the chart has such a massive size.
First time posting my question here! Hopefully someone can help out.
Thanks!
UPDATE:
I finally was able to find the problem. For some reason, all charts had the maximum number of rows open (1+ million rows) making the file size that large!
I added: wb.worksheets(1).UsedRangeto the end of each procedure and now the entire file size is around 4000 KB!
Thank you.

How can I efficiently stabilize a set of images?

I regularly record data at 30 Hz for one hour at a time (over 100,000 frames). I am recording an object that moves around minimally (less than a hundred pixels). Each frame is stored as a .tiff file.
I'm trying to efficiently generate a "stabilization matrix." For example, if the image has shifted two pixels to the left and one pixel up, then I want to generate a matrix with a row as:
[2, 1]
to signify that image needs to be shifted 2 units on the x-axis, and one unit on the y-axis. Each row in the matrix would represent the necessary "shift" in this way.
Is this possible? I am open to using any language or platform. I also have access to a cluster at my university. I've been passed down a code that's written in Matlab, but it takes about 12 hours to run. I'm hoping to find a more efficient solution. Any pointers in the right direction would be greatly appreciated.
Thanks in advance.

saving a very large image in MATLAB

I know that this question has been asked before. But, I could not find a clear answer for it. I have data for a very high resolution colorful image with the size of 50,000 by 60,000 with the data type unit8. I cannot save the entire image by using imwrite. I have gotten the error that says:
"Images must contain fewer than 2^32 - 1 bytes of data"
Is there a way to save the entire image in MATLAB?
right now, I have to break the data into smaller pieces (sub-images) and then use imwrite to write each piece to a png file. The output format of the file is not important.
Your image occupies 8*50000*60000*3 = 7.2000e+10 bytes of data that is 16.7638 times more than MATLAB image size limit. Why no to split it in 20 pieces, save them and then merge them manually? If you split your image into 6 8x50000x3000x3 pieces, they would all fit into 2^32 limit.
I am sure OP has enough aptitude to do this, but I'll explain the procedure anyway. Convert your image into 50000x60000x3 array and do the following:
x = 0:3000:60000;
for i = 1:length(x)-1
imwrite(A(:,x(i)+1:x(i+1),:),strcat('image',num2str(i),'.png'),'png');
end
This would create 20 images for you with names 'image1.png', 'image2.png' and so on. Then, you can merge these images manually using this first google search result. Perhaps, there is some fancier way to do this, but I think this is the easiest one.
Another question has an answer which worked for me: if your image is stored as a double matrix, convert to uint8 with im2uint8(img), then save.

JPEG - Can EOI Marker appear inside image data after SOS?

I understand that a JPEG file starts with 0xFFD8 (SOI), followed by a number of 0xFFEn segments holding metadata, then a number of segments holding the compression relate data (DQT, DHT, etc) of which the final one is 0xFFDA (SOS); then comes the actual image data which ends with 0xFFD9 (EOI). Each of those segments states its length in the two bytes following the JPEG marker so it is a trivial execise to calculate the end of a segment/start of next segment and the start of the image data can be calculated from the length of the SOS segment.
Up to that point, the appearance of 0xFFD9 (EOI) is irrelevant 1, because the segments are identified by the length. As far as I can see, however, there is no way of determining the length of the image data other than finding the 0xFFD9(EOI) marker following the SOS segment. In order for that to be assured, it would mean that 0xFFD9 must not appear inside the actual image data itself. Is there something built into the JPEG algorithm to ensure that or am I missing something here?
1 A second 0xFFD8 and 0xFFD9 can appear if a thumbnail is included in the image but that is taken care of by the length of the containing segment - usually a 0xFFE1 (APP1) segment from what I have seen. In images I have checked so far, the start and size of the thumbnail image data is still given in the 0x0201 (JPEGInterchangeFormat - Offset to JPEG SOI)and 0x202 (JPEGInterchangeFormatLength - Bytes of JPEG data) fields in IFD1, even though these were deprecated in Tech Note #2.
In JPEG, the Compressed value FF is encoded as FF00.
The compressed value FFD9 would be encoded as FF00D9.

Resources