Open large text file (300GB) in text editor - text-files

I have 300 GB text file that contains genomics data and I want to open it. What text editor can I use?
Also, I want to compare two genomics data text files "line by line" of the same big size "hunderds of GB" to extract the newly added sequences.What is the sufficient way?
Thanks,

Related

How to convert a DGN file (shared cells) into a DWG file (AutoCad blocks) in FME?

Me and my colleague are currently trying to convert a DGN file (with many points) into a realDWG file. The DGN file is roughly 90 MB, since there are shared cells in the DGN file. Shared cells are elements that are stored only once in the DGN file, regardless of how often the cell is placed within the DGN file. This makes it possible to have a relatively small size for the DGN file. However when I simply connect the DGN reader and the DWG writer together, the filesize of realdwg is roughly 600MB! Apparently shared cells are during the translation to realdwg treated as an autoblock in itself, while I would like to have the same capacity in the realdwg that there is only one autoblock and the the others are referencing to that one autoblock, however still visually shown which ideally should make the realdwg smaller. Does anyone know if that can be achieved in FME?
In order to convert a DWG file to DGN format.
Steps:
1-Open the DWG file in MicroStation using File > Open
2-From Menu bar, Choose File > Save As.
3-Use [Save as type:] drop down and change to: Autodesk DWG Files (*.DWG)
4-Click [Options] and change desired settings.
https://i.stack.imgur.com/gNHcB.png
For Unsupported Dimension
-set "ON"
-Drop Unsupported Dimensions (from Advanced" Tab)
https://i.stack.imgur.com/ROKD6.jpg
Click Save.

Converting bold text within a .doc to marked-up text programmatically

I am currently dealing with a large .docx file (roughly 400 pages). It is divided up into sections that are very easily digestable by humans and look like this :
Bold text
Written paragraph
This is perfectly humanly readable and great. Unfortunately we have an in-house program in our University that uses the mark-up of .docx files to sort them out/do some processing on them. By this I mean that sectioning a .doc/.docx using only bold markup is not enough, you must use the in-built tools within MS Office to do this (as below) :
So what I need to write is a simple script that will find the text that is bold within a .docx document and convert this text to properly marked up "Heading 1"s, or similar. It doesn't concern me whether or not the .docx file format is maintained or anything like this.
is it possible to do this? What APIs/languages/tools should I start looking into to accomplish this relatively simple task?
Using a short VBA macro you can iterate over all paragraphs and change the style for all paragraphs containing only bold text into a heading style:
Sub FormatBoldAsHeading()
Dim p As Paragraph
For Each p In ActiveDocument.Paragraphs
If p.Range.Font.Bold <> wdUndefined And p.Range.Font.Bold Then
p.Style = WdBuiltinStyle.wdStyleHeading1
End If
Next
End Sub

Is it possible to highlight the output of putty?

I don’t know whether it is possible? Forgive me if it is silly.
I am using the putty. Where I will run scripts and scripts results large set of set of unorganized data. In which i have to search the data.
For instance in the large set of data retrieved i have to highlight or change the color of the font in the sentence which contains the text "ERROR"?
"the large set of data retrieved i have to highlight or change the color of the font"
hgrep (highlight grep) provides such functionality.
http://acme.com/software/hgrep/
using vi
":set hlsearch" highlight search keyword. but, when data is large, you should have enough memory to load whole data file

How can I output .doc files with bolded and colored text

I need to output text to a .doc file. I am currently just outputting to a file like usual and using a .doc at the end of the file name
File.open('output_file.doc', 'a+') {|x| x.write(str)}
The issue is I want to make some of the text red and bold. How can this be achieved? I am using ruby, but I can easily switch to jruby thanks to the amazingness that is rvm, so if there are java libraries for this, that'd be great as well.
The short answer: use .rtf and then convert to .doc using word or open office. The following .rtf file (writes "normal text red text more normal text." and colors and bolds the red text):
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf350
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;\red255\green0\blue0;}
\margl1440\margr1440\vieww13280\viewh10420\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural
\f0\fs24 \cf0 normal text
\b \cf2 red text
\b0 \cf0 more normal text.}
The long answer:
Strings are just plain ascii text, so there is no command that can make them bold. This is a property of all files in general, not just how Ruby works with files.
What text-editors do is use key strings within the file as commands to render the text in a certain way. For example, double asterisk surrounds bold text in the Stack Overflow editor. The file format of a file determines these rules.
.rtf is a basic file format that has the features you want and is easy to convert to .doc using msword or open office. THe advantage to .rtf is that it is human readable. So you can write an rtf file with red text, rename it .txt and open in a text editor and see what "decorations" the red font added. Play around with the parameters
If you are curious, the complete .rtf specifications can be found here:
http://www.biblioscape.com/rtf15_spec.htm
What's all the garbage at the top? That is header stuff. Fortunately you don't need to add more header material to add more text.

Decompress "wmz" file

When we try to save msword doc file as html file we get "wmz" files for the math equation objects.
I tried decompressing the wmz file and saving the content as jpg.
I can open this jpg file in the "Microsoft Picture manager" properly. But trying to open the file in browser displays the error message "The image cannot be displayed, because it contain errors".
What is the procedure to decompress this wmz file and convert it to jpg.
What will be the extension of decompressed file?
.WMZ seems to be a zipped .WMF file.
You can open the unzipped file with a picture view/editor (just tried IrfanView) and save as .jpg.
When you save your Word documents as "Web Page, filtered" you won't get these WMZ files but just PNG files.
Set the "Web Options" to target to a low version of IE (i.e. 4.0) and check "allow PNG files" and "disable features not supported by these browsers".
Added advantage is that the webpage will display better in different browsers.
However, you should do all of this after you first make a copy of your document (and associated files) using Explorer into another location. Open this copy with Word and then Save as "Web Page, filtered". The original you keep for editing. (Don't save the original as a "web page, filtered" or you will loose the ability to edit the equation objects).
Thanks for the help.
Finally i could not remove the black background from the image file.
So using the round about approach for now
1)Decompress the wmz file to byte array(wmf).
2)Open a new word document
3)Paste the byte array into word document.(this document should only contain this data, and no other extra information)
4)Save the doc as html file (WdSaveFormat.wdFormatFilteredHTML)
5)open the "_files" directory created for the html output
6)Find the only "gif" file created inside the directory

Resources