How to use file command with named pipe - bash

Original problem - I want to check a file format starting at every single offset of a given file.
To do that, the idea was to call the command file and find a way to call it starting at a chosen offset. But this command doesn't work:
file <(tail -c +10 nknukkodes.dat)
With this error message
/dev/fd/63: broken symbolic link to pipe:[26963]
I use WSL and I don't know if it's a WSL problem, I already did that but I don't remember if I use another way on Linux (with Ubuntu).
I could copy the file for each byte, but even the file are relatively small (200kb), copying at each offset is expensive in square of the file size: 40 GB of copy. How could I achieve this ? Either with calling file with a named pipe or with another approch ?

I suggest:
tail -c +10 nk_nuclear_codes.dat | file -

Related

Restoring disk space after a failed sort command applied to a large text file

After applying sort command on Ubuntu to 89GB text file I've got a message that there is no disk space, after about 30 minutes. As I can see the space used by the output file is 0.
The command I used is like sort myfile.txt>outfile.txt.
I'm using Ubuntu 16.04.
I have no clue which files or folders are taking the space.
Your temporary folder has run out of space (/tmp). The intermediate sort results are written to $TMPDIR or /tmp and then merged into result file. You can change the default temp folder with -T, --temporary-directory flag.
For example, if you want to use your current working directory:
sort -T $(pwd) /var/log/syslog > syslog.sorted
To see all the docs for the sort command use:
man sort

Loop Over Files as Input for Program, Rename and Write Output to Different Directory

I have a problem with writing the output of a program to a different directory when I loop different files as variables as inputs. I run this in the command line. The problem is that I do not know how to "tell" the program to put the output with a changed filename into another directory than the input directory.
Here is the command, although it is a bioinformatic tool which requires specific input file formats. I am sorry that I could not give a better example. Nonetheless, the program is called computeMatrix in a software-tool box called deeptools2.
command:
for f in ~/my/path/*spc_files*; do computeMatrix reference-point--referencePoint center --regionsFileName /target/region.bed --binSize 500 --scoreFileName "$f" **--outFileName "$f.matrix"** ; done \
So far, I tried to use the command basename to just get the filename and then change the directory before that. However I could not figure out:
if this is combinable
what is the correct order of the commands (e.g.:
outputFile='basename"$f"', "~/new/targetDir/'basename$f'")
Probably there are other options to solve the problem which I could not think of/ find.

how can i run rouge Summarization on windows?

I installed Strawberry Perl to run the rouge program in Windows. But when I want to run my program, I receive an error message that you can see on the image:
The system can't find the path specified.
My code is attempting to run "ROUGE-1.5.5.pl" but i think the system can't find this file. So I think maybe I don't initialize the path correctly?
I change my code to :
#!/usr/bin/perl
use Cwd;
$curdir=getcwd;
$ROUGE="..\ROUGE-1.5.5.pl";
chdir("sample-test");
$cmd="$ROUGE -e ..\data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -a DUC2002-ROUGE.in.26.spl.xml > ..\sample-output\output.out";
print $cmd,"\n";
system($cmd);
chdir($curdir);
and i receive this error:
Missing braces on \o{} at C:\runROUGE-test.pl line 7, near "$ROUGE" Execution of C:\runROUGE-test.pl aborted due to compilation errors.
As per the screen shot, you are attempting to run \ROUGE-1.5.5.pl where you probably want it without the spurious backslash (or with ..\ROUGE-1.5.5.pl if the parent directory is not on your PATH).
Similarly, you probably want the output in sample-output\output.out, or even just output.out, not \sample-output\output.out unless you specifically have a folder C:\sample-output for this purpose.
The backslash is significant; it is the absolute path to the root (of the current drive, on Windows). ..\ is the relative path to the parent folder.
Why are you writing a Perl script to run a Perl script, though? Either a simple batch file, or copy/pasting the command directly at the DOS prompt would seem like a less roundabout solution.
The problem is that when you are outputting the contents of the program to the file output.out in folder sample-output, the sample-output folder does not exist.
Command Prompt will not create the folders for you, only the files. Try creating a directory first called "sample-output" (In your drive root) such that the path resolves to something like C:\sample-output and run it again.
If the same problem results, try using an absolute path such as C:\sample-output\output.out instead.

saving entire file in VIM

I have a very large CSV file, over 2.5GB, that, when importing into SQL Server 2005, gives an error message "Column delimiter not found" on a specific line (82,449).
The issue is with double quotes within the text for that column, in this instance, it's a note field that someone wrote "Transferred money to ""MIKE"", Thnks".
Because the file is so large, I can't open it up in Notepad++ and make the change, which brought me to find VIM.
I am very new to VIM and I reviewed the tutorial document which taught me how to change the file using 82,449 G to find the line, l over to the spot, x the double quotes.
When I save the file using :saveas c:\Test VIM\Test.csv, it seems to be a portion of the file. The original file is 2.6GB and the new saved one is 1.1GB. The original file has 9,389,222 rows and the new saved one has 3,751,878. I tried using the G command to get to the bottom of the file before saving, which increased the size quite a bit, but still didn't save the whole file; Before using G, the file was only 230 MB.
Any ideas as to why I'm not saving the entire file?
You really need to use a "stream editor", something similar to sed on Linux, that lets you pipe your text through it, without trying to keep the entire file in memory. In sed I'd do something like:
sed 's/""MIKE""/"MIKE"/' < source_file_to_read > cleaned_file_to_write
There is a sed for Windows.
As a second choice, you could use a programming language like Perl, Python or Ruby, to process the text line by line from a file, writing as it searches for the doubled-quotes, then changing the line in question, and continuing to write until the file has been completely processed.
VIM might be able to load the file, if your machine has enough free RAM, but it'll be a slow process. If it does, you can search from direct mode using:
:/""MIKE""/
and manually remove a doubled-quote, or have VIM make the change automatically using:
:%s/""MIKE""/"MIKE"/g
In either case, write, then close, the file using:
:wq
In VIM, direct mode is the normal state of the editor, and you can get to it using your ESC key.
You can also split the file into smaller more manageable chunks, and then combine it back. Here's a script in bash that can split the file into equal parts:
#!/bin/bash
fspec=the_big_file.csv
num_files=10 # how many mini-files you want
total_lines=$(cat ${fspec} | wc -l)
((lines_per_file = (total_lines+num_files-1) / num_files))
split --lines=${lines_per_file} ${fspec} part.
echo "Total Lines = ${total_lines}"
echo "Lines per file = ${lines_per_file}"
wc -l part.*
I just tested it on a 1GB file with 61151570 lines, and each resulting file was almost 100 MB
Edit:
I just realized you are on Windows, so the above may not apply. You can use a utility like simple text splitter a Windows program which does the same thing.
When you're able to open the file without errors like E342: Out of memory!, you should be able to save the complete file, too. There should at least be an error on :w, a partial save without error is a severe loss of data, and should be reported as a bug, either on the vim_dev mailing list or at http://code.google.com/p/vim/issues/list
Which exact version of Vim are you using? Using GVIM 7.3.600 (32-bit) on Windows 7/x64, I wasn't able to open a 1.9 GB file without out of memory. I was able to successfully open, edit, and save (fully!) a 3.9 GB file with the 64-bit version 7.3.000 from here. If you're not using that native 64-bit version yet, give it a try.

read directory file

we all know that in linux directory is a special file containing the file name and the inode number of constituent files. I want to read the contents of this directory file using standard command line utility.
cat . gives an error that I cannot open a directory.
However, apparently vim can understand the content of this file using readdir probably. It displays the contents of the directory file in a formatted manner. I want the raw contents of the file. How is this possible ??
As far as I can tell, it cannot be done. I was pretty sure dd would do it, and then I found the following
‘directory’
Fail unless the file is a directory. Most operating systems do not allow I/O to a directory, so this flag has limited utility.
http://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html
So I think you have your answer there. dd supports it, as do probably a number of other utilities, but that doesn't mean linux allows it.
I think stat might be the command you're looking for.

Resources