append a text on the top of the file - shell

I want to add a text on the top of my data.txt file, this code add the text at the end of the file. how I can modify this code to write the text on the top of my data.txt file. thanks in advance for any assistance.
open (MYFILE, '>>data.txt');
print MYFILE "Title\n";
close (MYFILE)

perl -pi -e 'print "Title\n" if $. == 1' data.text

Your syntax is slightly off deprecated (thanks, Seth):
open(MYFILE, '>>', "data.txt") or die $!;
You will have to make a full pass through the file and write out the desired data before the existing file contents:
open my $in, '<', $file or die "Can't read old file: $!";
open my $out, '>', "$file.new" or die "Can't write new file: $!";
print $out "# Add this line to the top\n"; # <--- HERE'S THE MAGIC
while( <$in> ) {
print $out $_;
}
close $out;
close $in;
unlink($file);
rename("$file.new", $file);
(gratuitously stolen from the Perl FAQ, then modified)
This will process the file line-by-line so that on large files you don't chew up a ton of memory. But, it's not exactly fast.
Hope that helps.

There is a much simpler one-liner to prepend a block of text to every file. Let's say you have a set of files named body1, body2, body3, etc, to which you want to prepend a block of text contained in a file called header:
cat header | perl -0 -i -pe 'BEGIN {$h = <STDIN>}; print $h' body*

Appending to the top is normally called prepending.
open(M,"<","data.txt");
#m = <M>;
close(M);
open(M,">","data.txt");
print M "foo\n";
print M #m;
close(M);
Alternately open data.txt- for writing and then move data.txt- to data.txt after the close, which has the benefit of being atomic so interruptions cannot leave the data.txt file truncated.

See the Perl FAQ Entry on this topic

perl -ni -e 'print "Title\n" $.==1' filename , this print the answer once

Related

IO::Uncompress::Gunzip stops after first "original" gzipped file inside "concatenated" gzipped file

In bash, you can concatenate gzipped files and the result is a valid gzipped file. As far as I recall, I have always been able to treat these "concatenated" gzipped files as normal gzipped files (my example code from link above):
echo 'Hello world!' > hello.txt
echo 'Howdy world!' > howdy.txt
gzip hello.txt
gzip howdy.txt
cat hello.txt.gz howdy.txt.gz > greetings.txt.gz
gunzip greetings.txt.gz
cat greetings.txt
Which outputs
Hello world!
Howdy world!
However, when trying to read this same file using Perl's core IO::Uncompress::Gunzip module, it doesn't get past the first original file. Here is the result:
./my_zcat greetings.txt.gz
Hello world!
Here is the code for my_zcat:
#!/bin/env perl
use strict;
use warnings;
use v5.10;
use IO::Uncompress::Gunzip qw($GunzipError);
my $file_name = shift;
my $fh = IO::Uncompress::Gunzip->new($file_name) or die $GunzipError;
while (defined(my $line = readline $fh))
{
print $line;
}
If I totally decompress the files before creating a new gzipped file, I don't have this problem:
zcat hello.txt.gz howdy.txt.gz | gzip > greetings_via_zcat.txt.gz
./my_zcat greetings_via_zcat.txt.gz
Hello world!
Howdy world!
So, what is the difference between greetings.txt.gz and greetings_via_zcat.txt.gz and why might IO::Uncompress::Gunzip work correctly with greetings.txt.gz?
Based on this answer to another question, I'm guessing that IO::Uncompress::Gunzip messes up because of the metadata between the files. But, since greetings.txt.gz is a valid Gzip file, I would expect IO::Uncompress::Gunzip to work.
My workaround for now will be piping from zcat (which of course doesn't help Windows users much):
#!/bin/env perl
use strict;
use warnings;
use v5.10;
my $file_name = shift;
open(my $fh, '-|', "zcat $file_name");
while (defined(my $line = readline $fh))
{
print $line;
}
This is covered explicitly in the IO::Compress FAQ section Dealing with concatenated gzip files. Basically you just have to include the MultiStream option when you construct the IO::Uncompress::Gunzip object.
Here is a definition of the MultiStream option:
MultiStream => 0|1
If the input file/buffer contains multiple
compressed data streams, this option will uncompress the whole lot as
a single data stream.
Defaults to 0.
So your code needs this change
my $fh = IO::Uncompress::Gunzip->new($file_name, MultiStream => 1) or die $GunzipError;

How to delete text in text file in Windows using Perl?

I want to port my perl application to windows.
Currently it calls out to "grep" to delete text in a text file, like so:
system("grep -v '$mcadd' $ARGV[0] >> $ARGV[0].bak");
system("mv $ARGV[0].bak $ARGV[0]");
This works perfectly well in ubuntu, but I'm not sure (a) how to modify my perl script to achieve the same effect on Windows, and (b) whether there is a way to achieve the effect in a way that will work in both environments.
Other way to delete text in perl?
You can use perl's inplace editing facility.
~/pperl_programs$ cat data.txt
hello world
goodbye mars
goodbye perl6
back to perl5
Run this:
use strict;
use warnings;
use 5.020;
my $fname = 'data.txt';
#Always use three arg form of open().
#Don't use bareword filehandles.
#open my $INFILE, '<', $fname
# or die "Couldn't open $fname: $!";
{
local $^I = ".bak"; #Turn on inplace editing for this block only
local #ARGV = $fname; #Set #ARGV for this block only
while (my $line = <>) { #"diamond operator" reads from #ARGV
if ($line !~ /hello/) {
print $line; #This does not go to STDOUT--it goes to a new file that perl creates for you.
}
}
} #Return $^I and #ARGV to their previous values
#close $INFILE;
Here is the result:
$ cat data.txt
goodbye mars
goodbye perl6
back to perl5
With inplace editing turned on, perl takes care of creating a new file, sending print() output to the new file, then when you are done, renaming the new file to the original file name, and saving a copy of the original file with a .bak extension.
system("perl -n -e 'if(\$_ !~ /$mcadd/) { print \$_; }' \$ARGV[0] >> \$ARGV[0].bak");
system("rename \$ARGV[0].bak \$ARGV[0]");
This should work in windows.

Splitting large text file on every blank line

I'm having a bit trouble of splitting a large text file into multiple smaller ones. Syntax of my text file is the following:
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
asdasd #299 yadayada 60 40
content
content
contend done
...and so on
A typical information table in my file has anywhere between 10-40 rows.
I would like this file to be split in n smaller files, where n is the amount of content tables.
That is
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
would be its own separate file, (whateverN.txt)
and
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
again a separate file whateverN+1.txt and so forth.
It seems like awk or Perl are nifty tools for this, but having never used them before the syntax is kinda baffling.
I found these two questions that are almost correspondent to my problem, but failed to modify the syntax to fit my needs:
Split text file into multiple files & How can I split a text file into multiple text files? (on Unix & Linux)
How should one modify the command line inputs, so that it solves my problem?
Setting RS to null tells awk to use one or more blank lines as the record separator. Then you can simply use NR to set the name of the file corresponding to each new record:
awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt
RS:
This is awk's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines, or a regexp, in which case records are separated by matches of the regexp in the input text.
$ cat file.txt
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
asdasd #299 yadayada 60 40
content
content
contend done
$ awk -v RS= '{print > ("whatever-" NR ".txt")}' file.txt
$ ls whatever-*.txt
whatever-1.txt whatever-2.txt whatever-3.txt
$ cat whatever-1.txt
dasdas #42319 blaablaa 50 50
content content
more content
content conclusion
$ cat whatever-2.txt
asdasd #92012 blaablaa 30 70
content again
more of it
content conclusion
$ cat whatever-3.txt
asdasd #299 yadayada 60 40
content
content
contend done
$
You could use the csplit command:
csplit \
--quiet \
--prefix=whatever \
--suffix-format=%02d.txt \
--suppress-matched \
infile.txt /^$/ {*}
POSIX csplit only uses short options and doesn't know --suffix and --suppress-matched, so this requires GNU csplit.
This is what the options do:
--quiet – suppress output of file sizes
--prefix=whatever – use whatever instead fo the default xx filename prefix
--suffix-format=%02d.txt – append .txt to the default two digit suffix
--suppress-matched – don't include the lines matching the pattern on which the input is split
/^$/ {*} – split on pattern "empty line" (/^$/) as often as possible ({*})
Perl has a useful feature called the input record separator. $/.
This is the 'marker' for separating records when reading a file.
So:
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = "\n\n";
my $count = 0;
while ( my $chunk = <> ) {
open ( my $output, '>', "filename_".$count++ ) or die $!;
print {$output} $chunk;
close ( $output );
}
Just like that. The <> is the 'magic' filehandle, in that it reads piped data or from files specified on command line (opens them and reads them). This is similar to how sed or grep work.
This can be reduced to a one liner:
perl -00 -pe 'open ( $out, '>', "filename_".++$n ); select $out;' yourfilename_here
You can use this awk,
awk 'BEGIN{file="content"++i".txt"} !NF{file="content"++i".txt";next} {print > file}' yourfile
(OR)
awk 'BEGIN{i++} !NF{++i;next} {print > "filename"i".txt"}' yourfile
More readable format:
BEGIN {
file="content"++i".txt"
}
!NF {
file="content"++i".txt";
next
}
{
print > file
}
In case you get "too many open files" error as follows...
awk: whatever-18.txt makes too many open files
input record number 18, file file.txt
source line number 1
You may need to close newly created file, before creating a new one, as follows.
awk -v RS= '{close("whatever-" i ".txt"); i++}{print > ("whatever-" i ".txt")}' file.txt
Since it's Friday and I'm feeling a bit helpful... :)
Try this. If the file is as small as you imply it's simplest to just read it all at once and work in memory.
use strict;
use warnings;
# slurp file
local $/ = undef;
open my $fh, '<', 'test.txt' or die $!;
my $text = <$fh>;
close $fh;
# split on double new line
my #chunks = split(/\n\n/, $text);
# make new files from chunks
my $count = 1;
for my $chunk (#chunks) {
open my $ofh, '>', "whatever$count.txt" or die $!;
print $ofh $chunk, "\n";
close $ofh;
$count++;
}
The perl docs can explain any individual commands you don't understand but at this point you should probably look into a tutorial as well.
awk -v RS="\n\n" '{for (i=1;i<=NR;i++); print > i-1}' file.txt
Sets record separator as blank line, prints each record as a separate file numbered 1, 2, 3, etc. Last file (only) ends in blank line.
Try this bash script also
#!/bin/bash
i=1
fileName="OutputFile_$i"
while read line ; do
if [ "$line" == "" ] ; then
((++i))
fileName="OutputFile_$i"
else
echo $line >> "$fileName"
fi
done < InputFile.txt
You can also try split -p "^$"

Extracting the first two characters from a file in perl into another file

I'm having a little bit of trouble with my code below -- I'm trying to figure out how to open up all these text files (.csv files that end in DIS that all have one line in them) and get the first two characters (these are all numbers) from them and print them into another file of the same name, with a ".number" suffix. Some of these .DIS files don't have anything in them, in which case I want to print "0".
Lastly, I would like to go through each original .DIS file and delete the first 3 characters -- I did this through bash.
my #DIS = <*.DIS>;
foreach my $file (#DIS){
my $name = $file;
my $output = "$name.number";
open(INHANDLE, "< $file") || die("Could not open file");
while(<INHANDLE>){
open(OUT_FILE,">$output") || die;
my $line = $_;
chomp ($line);
my $string = $line;
if ($string eq ""){
print "0";
} else {
print substr($string,0,2);
}
}
system("sed -i 's/\(.\{3\}\)//' $file");
}
When I run this code, I get a list of numbers are concatenated together and empty .DIS.number files. I'm rather new to Perl, so any help would be appreciated!
When I run this code, I get a list of numbers are concatenated together and empty .DIS.number files.
This is because of this line.
print substr($string,0,2);
print defaults to printing to STDOUT (ie. the screen). You need to give it the filehandle to print to.
print OUT_FILE substr($string,0,2);
They're being concatenated because print just prints what you tell it to, it won't put newlines in for you (there are some global variables which can change this, don't mess with them). You have to add the newline yourself.
print OUT_FILE substr($string,0,2), "\n";
As a final note, when working with files in Perl I would suggest using lexical filehandles, Path::Tiny, and autodie. They will avoid a great number of classic problems working with files in Perl.
I suggest you do it like this
Each *.dis file is opened and the contents read into $text. Then a regex substitution is used to remove the first three characters from the string and capture the first two in $1
If the substitution succeeded then the contents of $1 are written to the number file, otherwise the original file is empty (or shorter than two characters) and a zero is written instead. The remaining contents of $text are then written back to the *.dis file
use strict;
use warnings;
use v5.10.1;
use autodie;
for my $dis_file ( glob '*.DIS' ) {
my $text = do {
open my $fh, '<', $dis_file;
<$fh>;
};
my $num_file = "$dis_file.number";
open my $dis_fh, '>', $dis_file;
open my $num_fh, '>', $num_file;
if ( defined $text and $text =~ s/^(..).?// ) {
print $num_fh "$1\n";
print $dis_fh $text;
}
else {
print $num_fh "0\n";
print $dis_fh "-\n";
}
}
this awk script extract the first two chars of each file to it's own file. Empty files expected to have one empty line based on the spec.
awk 'FNR==1{pre=substr($0,1,2);pre=length(pre)==2?pre:0; print pre > FILENAME".number"}' *.DIS
This will remove the first 3 chars
cut -c 4-
Bash for loop will be better to do both, which we'll need to modify the awk script little bit
for f in *.DIS;
do awk 'NR==1{pre=substr($0,1,2);$0=length(pre)==2?pre:0; print}' $f > $f.number;
cut -c 4- $f > $f.cut;
done
explanation: loop through all files in *.DTS, for the first line of each file, try to get first two chars (1,2) of the line ($0) assign to pre. If the length of pre is not two (either the line is empty or with 1 char only) set the line to 0 or else use pre; print the line, output file name will be input file appended with .number suffix. The $0 assignment is a trick to save couple keystrokes since print without arguments prints $0, otherwise you can provide the argument.
Ideally you should quote "$f" since it may contain space in file name...

Print to out file

I am trying to find intersecting lines between two files. One of the files is 'Sample_hg19_mapped.bed' and the other one 'intersect.RData' has some of the same data as the first one.
Bed file:
chrM 16338 16363 HWI-ST575:220:C2MMMACXX:3:1112:17158:21371 255 -
chrM 16352 16377 HWI-ST575:220:C2MMMACXX:3:1102:7906:41988 255 -
chrM 16352 16377 HWI-ST575:220:C2MMMACXX:3:2113:18341:36393 255 -
chrM 16376 16401 HWI-ST575:220:C2MMMACXX:3:1310:14517:85268 255 -
RData file:
HWI-ST575:220:C2MMMACXX:3:1310:14517:85268
HWI-ST575:220:C2MMMACXX:3:2113:18341:36393
HWI-ST575:220:C2MMMACXX:3:2113:45341:56393
And as an output, it needs to give the line of BED file which has the same value in the RData.file. For example, the first and the second value of RData exists in BED file,but not the third one, so in output it needs to be :
chrM 16376 16401 HWI-ST575:220:C2MMMACXX:3:1310:14517:85268 255 -
chrM 16352 16377 HWI-ST575:220:C2MMMACXX:3:2113:18341:36393 255 -
I managed it with those code :
perl -ane '$f=$F[0].$F[1]; print "$k{$f}$_" if $k{$f}; $k{$f}=$_;' Sample_hg19_mapped.bed intersect.RData
But those lines that match are on the screen and I want them to keep in the file but I cannot make the output file. I tried this one by changing a lot:
####!/bin/bash
perl -ane '$f=$F[0].$F[1]';"Sample_hg19_mapped.bed intersect.RData"
if $k{$f};$k{$f}=$_ {
print "$k{$f}$_";
} else {
print "epic fail";
}
open($f, ">", "output.txt")
or die "cannot open > output.txt: $!";
close $f;
print "done\n";
But I have so many errors like:
/var/spool/slurmd/job2572366/slurm_script: line 3: Sample_hg19_mapped.bed intersect.RData: command not found
/var/spool/slurmd/job2572366/slurm_script: line 6: syntax error near unexpected token `}'
/var/spool/slurmd/job2572366/slurm_script: line 6: `} else {'
Can you maybe help me on this?
Thank you so much
If your command works but outputs to the screen, simply redirect that to a file:
command > output.txt
e.g.
perl -ane '$f=$F[0].$F[1]; print "$k{$f}$_" if $k{$f}; $k{$f}=$_;' Sample_hg19_mapped.bed intersect.RData > output.txt
If you want to remove all the empty lines you can add next if /^\s*$/; to the start:
perl -ane 'next if /^\s*$/; $f=$F[0].$F[1]; print "$k{$f}$_" if $k{$f}; $k{$f}=$_;' Sample_hg19_mapped.bed intersect.RData > output.txt
This will skip any input lines which are only whitespace.
Your code is abit messy and the erros are from there although if you want to output to a file you can do this:
open (MYFILE, '>>NameOfFile');
print MYFILE $variable
Have a try with this:
This uses your RData vales as hash keys, and then looks for them in the bed file, printing any matches to 'output.txt'.
use strict;
use warnings;
use autodie;
open my $bed, '<', 'in.txt';
open my $rdata, '<', 'Rdata.txt';
my (%bed, %rdata);
while(<$rdata>){
chomp;
$rdata{$_} = 2; # Each line is a key in the hash %rdata
}
open my $out_file, '>', 'output.txt';
while(<$bed>){
chomp;
next unless /chrM/;
my #split = split/\t/;
print $out_file "$_\n" if $rdata{$split[3]}; # will print to output.txt any line where the 4th column matches a key from %rdata
}
The following perl one-liner should do what you need:
perl -lane'
BEGIN { $x = pop; %h = map { chomp; $_ => 1 } <>; #ARGV = $x }
print if /./ && $h{$F[3]}
' intersect.RData Sample_hg19_mapped.bed
We load the intersect.RData in a hash map in the BEGIN block
In the main body we check if the third field from Sample_hg19_mapped.bed file is present in our hash map. If it does then print the line.
If the output looks fine to you then you can redirect to another file.

Resources