Column Maniputation

Column Maniputation - etl

I would like to group my rows of inputs according to the first string of many comma-separated.
Basically, there will be 3 groups, which are "Motifs", "Chromatin_Structure" and "Protein_Binding"
The important output is the third one after 2 "|".
There might be some duplicates too like K562. The duplicates are not needed.
If the strings are not present, simply put a ". (dot)"
Input:
Motifs|PWM|Sox17|,Motifs|PWM|Sox8|,Chromatin_Structure|DNase-seq|K562|Znf4g7d3,Chromatin_Structure|DNase-seq|K562|,Chromatin_Structure|DNase-seq|TCF7L2|Znfe103c6,Protein_Binding|ChIP-seq|CTCF|HeLa-S3|,Protein_Binding|ChIP-seq|CTCF|HeLa-S3|
.
Motifs|PWM|TCF11|
Protein_Binding|ChIP-seq|MAFF|HepG2|
Desired output:
Sox17,Sox8 K562,TCF7L2 CTCF
. . .
TCF11 . .
. . MAFF
Codes that I tried.
sed 's/Motifs|PWM|//'
Appreciate your helps!

Perl one-liner (Using the term loosely):
$ perl -F, -lane '
my (%groups, #output);
for my $grp (#F) {
my #x = split /\|/, $grp;
$groups{$x[0]}{$x[2]} = 1;
}
for my $n (qw/Motifs Chromatin_Structure Protein_Binding/) {
if (exists $groups{$n}) {
push #output, join(",", sort keys %{$groups{$n}});
} else {
push #output, ".";
}
}
print join("\t", #output);' input.csv
Sox17,Sox8 K562,TCF7L2 CTCF
. . .
TCF11 . .
. . MAFF
And because I think it's woefully underappreciated as a scripting language, a tcl version:
#!/usr/bin/env tclsh
proc main {} {
while {[gets stdin line] >= 0} {
foreach grp [split $line ,] {
set x [split $grp |]
dict set groups [lindex $x 0] [lindex $x 2] 1
}
foreach n {Motifs Chromatin_Structure Protein_Binding} {
if {[dict exists $groups $n]} {
lappend output [join [dict keys [dict get $groups $n]] ,]
} else {
lappend output .
}
}
puts [join $output \t]
unset groups output
}
}
main
$ ./example.tcl < input.csv
Sox17,Sox8 K562,TCF7L2 CTCF
. . .
TCF11 . .
. . MAFF

Related

tcl insert string at certain line in file

I want to insert one string at the certain line ,and I know the line number.
ex.
#Aa_version = Aa/45.21-a32_1
#Aa_version = Aa/47.21-a33_1
Aa_version = Aa/45.27-a57_2 ->I can get this line number n
and I want to insert one line Aa/49.27-a54_1 at line n+1
and put Aa_version = Aa/45.27-a57_2 -> #Aa_version = Aa/45.27-a57_2
output is like
#Aa_version = Aa/45.21-a32_1
#Aa_version = Aa/47.21-a33_1
#Aa_version = Aa/45.27-a57_2
Aa_version = Aa/49.27-a54_1
and my code is
set Aa ""
set fp [open $file "r+"]
set lines [split [read $fp] \n]
set idx [lsearch -regexp $lines {^Aa_version} ]
regexp {Aa(.+)} [lindex $lines $idx] Aa_version
set old_version "#$Aa_version"
set newAa [gets stdin]
set new_version "Aa_version =$newAa "
puts $old_version ->replace $Aa_version
puts $new_version
close $fp
How can I put them at the correct line
thanks

I think it's easier and cleaner to handle the input file a line at a time and look at each one in turn instead of reading it all in one single go and then finding, altering and inserting elements in a list:
set fp [open $file]
while {[gets $fp line] >= 0} {
# If the line starts with Aa_version...
if {[string match "Aa_version*" $line]} {
# Comment it out
puts "#$line"
# And read the new version and write it out
set newAa [gets stdin]
puts "Aa_version = $newAa"
# Copy the rest of the file to standard output and exit the loop
chan copy $fp stdout
break
} else {
puts $line
}
}
close $fp
But if you want to keep the current list based approach, lreplace is your friend:
set fp [open $file]
set lines [split [read $fp] \n]
close $fp
set idx [lsearch -glob $lines "Aa_version*"]
# If a match was found...
if {$idx >= 0} {
# Read the new verson
set newAa [gets stdin]
# Replace the version element with two new elements:
# Commented out previous version, and new version
set lines [lreplace $lines $idx $idx \
"#[lindex $lines $idx]" "Aa_version = $newAa"]
}
puts [join $lines \n]

Iteration in TCL using for each

I'm trying to iterate the value using for each in TCL
This how i give input
./a.sh value1,value2
in the a.sh script
set var [lindex $argv 0]
set values [split $var ","]
foreach {set i 0} {$values} {
puts "iterated once $i"
}
i want the iteration to happen twice since there are two values passed . but instead it is iterating only once
please help me on this
thanks in advance

The foreach command, in its simplest form, takes a variable name, a list value, and a script to run for each element of the list. What you were passing was… weird in several ways at once. Look, here's a correct way of doing it:
set values [split $var ","]
foreach item $values {
puts "iterated: $item"
}
If you want to count through, you should set up your own counter:
set values [split $var ","]
foreach item $values {
set i [incr counter]
puts "iterated #$i: $item"
}
That can be shortened to this:
foreach item [split $var ","] {
puts "iterated #[incr counter]: $item"
}

Bulk editing files through command line?

I have a list of about 500 folders. Inside each of those folders is a functions.php file.
I need to search every functions.php file for the following text:
function wp_initialize_the_theme_finish()
I need to replace any line that has the above text with this:
function wp_initialize_the_theme_finish() { $uri = strtolower($_SERVER["REQUEST_URI"]); if(is_admin() || substr_count($uri, "wp-admin") > 0 || substr_count($uri, "wp-login") > 0 ) { /* */ } else { $l = 'mydomain.com'; $f = dirname(__file__) . "/footer.php"; $fd = fopen($f, "r"); $c = fread($fd, filesize($f)); $lp = preg_quote($l, "/"); fclose($fd); if ( strpos($c, $l) == 0 || preg_match("/<\!--(.*" . $lp . ".*)-->/si", $c) || preg_match("/<\?php([^\?]+[^>]+" . $lp . ".*)\?>/si", $c) ) { wp_initialize_the_theme_message(); die; } } } wp_initialize_the_theme_finish();
NOTE: I need to replace the entire line with my new line, not just the beginning.
Any help would be greatly appreciated.

There's a pretty detailed article written on it. Seems to relate to your question very well. Essentially the command is:
find . -name "*/function.php" -print | xargs sed -i 's/foo/bar/g'
Where foo is :
function wp_initialize_the_theme_finish().+\n
and bar is:
function wp_initialize_the_theme_finish() { $uri = strtolower($_SERVER["REQUEST_URI"]); if(is_admin() || substr_count($uri, "wp-admin") > 0 || substr_count($uri, "wp-login") > 0 ) { /* */ } else { $l = 'mydomain.com'; $f = dirname(__file__) . "/footer.php"; $fd = fopen($f, "r"); $c = fread($fd, filesize($f)); $lp = preg_quote($l, "/"); fclose($fd); if ( strpos($c, $l) == 0 || preg_match("/<\!--(.*" . $lp . ".*)-->/si", $c) || preg_match("/<\?php([^\?]+[^>]+" . $lp . ".*)\?>/si", $c) ) { wp_initialize_the_theme_message(); die; } } } wp_initialize_the_theme_finish();
Use the following rules to escape special characters in foo and bar:
In a nutshell, for sed:
Write the regex between single quotes.
Use '\'' to search for asingle quote.
Put a backslash before $.*/[]^ and only those characters.

Use find command, to search for all the relevant files, then use sed -i to update the files

Since search and replace strings are reasonably long, first store them in variables.
Then try using find along with sed using -exec option
#!/bin/bash
search='^.*function wp_initialize_the_theme_finish().*$'
replace='function wp_initialize_the_theme_finish() { $uri = strtolower($_SERVER["REQUEST_URI"]); if(is_admin() || substr_count($uri, "wp-admin") > 0 || substr_count($uri, "wp-login") > 0 ) { /* */ } else { $l = "mydomain.com"; $f = dirname(__file__) . "/footer.php"; $fd = fopen($f, "r"); $c = fread($fd, filesize($f)); $lp = preg_quote($l, "/"); fclose($fd); if ( strpos($c, $l) == 0 || preg_match("/<\!--(.*" . $lp . ".*)-->/si", $c) || preg_match("/<\?php([^\?]+[^>]+" . $lp . ".*)\?>/si", $c) ) { wp_initialize_the_theme_message(); die; } } } wp_initialize_the_theme_finish();'
find -type f -name 'function.php' -exec sed -i "s/${search}/${replace}/g" {} \;
Other alternative using xargs
find -type f -name 'function.php' -print0 | xargs -0 sed -i "s/${search}/${replace}/g"

Perl: Weird Tie::File behaviour in Windows as opposed to Unix

I have this perl script that uses Tie::File.
In Linux(Ubuntu) when I invoke the script via Bash it works as expected but in Windows when I invoke the script via Powershell it behaves weirdly (check P.S. below).
Code:
#!/usr/bin/perl -T
use strict;
use warnings;
use Tie::File;
use CommonStringTasks;
if ( #ARGV != 4 ) {
print "ERROR:Inadequate/Redundant arguments.\n";
print "Usage: perl <pl_executable> <path/to/peer_main.java> <peer_main.java>\n";
print " <score_file_index> <port_step_index>\n";
print $ARGV[0], "\n";
print $ARGV[1], "\n";
print $ARGV[2], "\n";
print $ARGV[3], "\n";
exit 1;
}
my $PEER_DIR = $ARGV[0];
my $PEER_FILE = $ARGV[1];
my $PEER_PACKAGE = "src/planetlab/app";
my $PEER_PATH = "${PEER_DIR}/${PEER_PACKAGE}/${PEER_FILE}";
# Check if args are tainted ...
# Check $PEER_PATH file permissions ...
open(my $file, "+<", "$PEER_PATH")
or
die("File ", $PEER_FILE, " could not be opened for editing:$!");
# Edit the file and change variables for debugging/deployment setup.
# Number demanglers:
# -flock -> arg2 -> 2 stands for FILE_EX
# Options (critical!):
# -Memory: Inhibit caching as this will allow record changes on the fly.
tie my #fileLines,
'Tie::File',
$file,
memory => 0
or
die("File ", $PEER_FILE, " could not be tied with Tie::File:$!");
flock $file, 2;
my $i = 0;
my $scoreLine = "int FILE_INDEX = " . $SCORE . ";";
my $portLine = "int SERVER_PORT = " . $PORT . ";";
my $originalScoreLine = "int FILE_INDEX =";
my $originalPortLine = "int SERVER_PORT =";
(tied #fileLines)->defer;
while (my $line = <$file>) {
if ( ($line =~ m/($scoreLine)/) && ($SCORE+1 > 0) ) {
print "Original line (score): ", "\n", $scoreLine, "\n";
chomp $line;
$line = substr($line, 0, -($scoreDigits+1));
$line = $line . (++$SCORE) . ";";
print "Editing line (score): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
next;
}
if ( ($line =~ m/($portLine)/) && ($PORT > 0) ) {
print "Original line (port): ", "\n", $portLine, "\n";
chomp $line;
$line = substr($line, 0, -($portDigits+1));
$line = $line . (++$PORT) . ";";
print "Editing line (port): ", $i, "\n", trimLeadSpaces($fileLines[$i]), "\n";
$fileLines[$i] = $line;
print "Line replaced with:\n", trimLeadSpaces($line), "\n";
last;
}
# Restore original settings.
if ( ($line =~ m/($originalScoreLine)/) && ($SCORE < 0) ) {
print "Restoring line (score) - FROM: ", "\n", $fileLines[$i], "\n";
$fileLines[$i] = " private static final int FILE_INDEX = 0;";
print "Restoring line (score) - TO: ", "\n", $fileLines[$i], "\n";
next;
}
if ( ($line =~ m/($originalPortLine)/) && ($PORT < 0) ) {
print "Restoring line (port) - FROM: ", "\n", $fileLines[$i], "\n";
$PORT = abs($PORT);
$fileLines[$i] = " private static final int SERVER_PORT = " . $PORT . ";";
print "Restoring line (port) - TO: ", "\n", $fileLines[$i], "\n";
last;
}
} continue {
$i++;
}
(tied #fileLines)->flush;
untie #fileLines;
close $file;
The perl version in both OSes is 5+(in Windows Active-State Perl with CPAN modules).
Could it be the way I open the filehandle? Any ideas anyone?
P.S.: The first version had a while (<$file>) and instead of $line I used the $_ variable but when I did that I had a behaviour where specific lines would not be edited but instead the file would get appended with a hundred newlines or so followed by the (correctly) edited line and so on. I also had a warning about $fileLines[$i] being uninitialized!Clearly something's wrong with the Tie::File structure in Windows or something else that I am not aware of. Same erratic behaviour takes place with the changes and in Linux(Ubuntu) behaviour again is as expected.

The OPs question is vague, and lacks input and expected output. Therefore I will simply note some of my concerns:
First, using Tie::File and <$file> and flock on the same handle seems to be both overkill and dangerous. I would recommend simply using Tie::File to iterate and to edit, such as:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'filename';
foreach my $linenum ( 0..$#lines ) {
if ($lines[$linenum] =~ /something/) {
$lines[$linenum] = 'somethingelse';
}
}
Perhaps better than edit inline, as Tie::File allows, copy the file to a backup, iterate over the lines using <$file>, then write to a new file with the old name.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Copy 'move';
my $infile = $ARGV[0];
move( $infile, "$infile.bak");
open my $inhandle, '<', "$infile.bak";
open my $outhandle, '>', $infile;
while( my $line = <$inhandle> ) {
if ($line =~ /something/) {
$line = 'somethingelse';
}
print $outhandle $line;
}
Second, the -MModule flag simply translates to a use Module; at the top of the script. Therefore -MCPAN is use CPAN;, however loading the CPAN module does nothing for the script. CPAN.pm gives a script the ability to install modules.
Third, we will be able to help better if you give and example input, an expected output, and a stripped down script that clearly shows how this operation is to perform while still failing in the same way that the actual script does.

I found out the source of my problems. The reason was the record separator!
Tie::File expected in Windows a /r/n record separator so it read the whole file in just one pass. My files are in UTF-8, with Unix line endings.
That is why when I was traversing the $fileLines and accessed any index beyond 0 I got from perl a warning that the string was not initialized. Fixed the problem and now I am ready to go on! :D
P.S.: Mr Joel Berger I am marking your answer as valid/appropriate because you really tried helping me and I followed your first advice about the file handle :).
Thank you everyone for assisting me xD xD xD

Why can't I use more than 20 files with my Perl script and Windows's SendTo?

I'm trying to emulate RapidCRC's ability to check crc32 values within filenames on Windows Vista Ultimate 64-bit. However, I seem to be running into some kind of argument limitation.
I wrote a quick Perl script, created a batch file to call it, then placed a shortcut to the batch file in %APPDATA%\Microsoft\Windows\SendTo
This works great when I select about 20 files or less, right-click and "send to" my batch file script. However, nothing happens at all when I select more than that. I suspect there's a character or number of arguments limit somewhere.
Hopefully I'm missing something simple and that the solution or a workaround isn't too painful.
References:
batch file (crc32_inline.bat):
crc32_inline.pl %*
Perl notes:
I'm using (strawberry) perl v5.10.0
I have C:\strawberry\perl\bin in my path, which is where crc32.bat exists.
perl script (crc32_inline.pl):
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use English qw( -no_match_vars );
use File::Basename;
$OUTPUT_AUTOFLUSH = 1;
my $crc32_cmd = 'crc32.bat';
my $failure_report_basename = 'crc32_failures.txt';
my %failures = ();
print "\n";
foreach my $arg (#ARGV) {
# if the file has a crc, check to see if it matches the calculated
# crc.
if (-f $arg and $arg =~ /\[([0-9a-f]{8})\]/i) {
my $crc = uc $1;
my $basename = basename($arg);
print "checking ${basename}... ";
my $calculated_crc = uc `${crc32_cmd} "${arg}"`;
chomp($calculated_crc);
if ($crc eq $calculated_crc) {
print "passed.\n";
}
else {
print "FAILED (calculated ${calculated_crc})\n";
my $dirname = dirname($arg);
$failures{$dirname}{$basename} = $calculated_crc;
}
}
}
print "\nReport Summary:\n";
if (scalar keys %failures == 0) {
print " All files OK\n";
}
else {
print sprintf(" %d / %d files failed crc32 validation.\n" .
" See %s for details.\n",
scalar keys %failures,
scalar #ARGV,
$failure_report_basename);
my $failure_report_fullname = $failure_report_basename;
if (defined -f $ARGV[0]) {
$failure_report_fullname
= dirname($ARGV[0]) . '/' . $failure_report_basename;
}
$OUTPUT_AUTOFLUSH = 0;
open my $fh, '>' . $failure_report_fullname or die $!;
foreach my $dirname (sort keys %failures) {
print {$fh} $dirname . "\n";
foreach my $basename (sort keys %{$failures{$dirname}}) {
print {$fh} sprintf(" crc32(%s) basename(%s)\n",
$failures{$dirname}{$basename},
$basename);
}
}
close $fh;
$OUTPUT_AUTOFLUSH = 1;
}
print sprintf("\n%s done! (%d seconds elapsed)\n" .
"Press enter to exit.\n",
basename($0),
time() - $BASETIME);
<STDIN>;

I will recommend just putting a shortcut to your script in the "Send To" directory instead of doing it via a batch file (which is subject to cmd.exes limits on command line length).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Column Maniputation - etl

Related

tcl insert string at certain line in file

Iteration in TCL using for each

Bulk editing files through command line?

Perl: Weird Tie::File behaviour in Windows as opposed to Unix

Why can't I use more than 20 files with my Perl script and Windows's SendTo?

Categories

Resources