I have a print_dot() function that outputs dot on stdout.
That way I can do:
$ ./myprogram < input | dot -T x11
It works great when I try to print one graph.
Now when I print several graphs, nothing shows up. The dot window is blank, X11 and dot take all the CPU. Nothing is printed on stderr.
$ echo -e "graph { a -- b }" | dot -T x11 # work
$ echo -e "graph { a -- b } \n graph { c --d }" | dot -T x11 # doesn't work
# it seems to be interpreted nonetheless
$ echo -e "graph { a -- b } \n graph { c -- d } " | dot -T xdot
graph {
...
}
graph {
...
}
Also, when I remove the \n between the 2 graphs, only the first graph is interpreted (what a nice feature...):
$ echo -e "graph { a -- b } graph { c -- d } " | dot -T xdot
graph {
...
}
Piping the xdot output to dot again doesn't fix the problem.
So, how does one render multiple graphs with graphviz?
One calls dot multiple times. Or one puts everything into a single graph, taking care to avoid duplication of names.
Use gvpack
$ echo -e "graph { a -- b }\ngraph { c -- d }" | gvpack -u | dot -Tpng > graphs.png
Result
Simple script that reads graphs on stdin and opens multiple dot instance.
#!/usr/bin/perl
my $o;
my #l;
while(<>) {
if(/^\s*(di)?graph/) {
push #l, $o;
$o = '';
}
$o .= $_;
}
if($o =~ /graph/) {
push #l, $o;
}
for(#l) {
if(fork() == 0) {
open my $p, '| dot -T x11' or die $!;
print $p $_;
close $p;
exit 0;
}
}
Related
There is a Capture the Flag challenge
I have two files; one with scrambled text like this with about 550 entries
dnaoyt
cinuertdso
bda
haey
tolpap
...
The second file is a dictionary with about 9,000 entries
radar
ccd
gcc
fcc
historical
...
The goal is to find the right, unscrambled version of the word, which is contained in the dictionary file.
My approach is to sort the characters from the first word from the first file and then look up if the first word from the second file has the same length. If so then sort that too and compare them.
This is my fully functional bash script, but it is very slow.
#!/bin/bash
while IFS="" read -r p || [ -n "$p" ]
do
var=0
ro=$(echo $p | perl -F -lane 'print sort #F')
len_ro=${#ro}
while IFS="" read -r o || [ -n "$o" ]
do
ro2=$(echo $o | perl -F -lane 'print sort # F')
len_ro2=${#ro2}
let "var+=1"
if [ $len_ro == $len_ro2 ]; then
if [ $ro == $ro2 ]; then
echo $o >> new.txt
echo $var >> whichline.txt
fi
fi
done < dictionary.txt
done < scrambled-words.txt
I have also tried converting all characters to ASCII integers and sum each word, but while comparing I realized that the sum of a different char pattern may have the same sum.
[edit]
For the records:
- no anagrams contained in dictionary
- to get the flag, you need to export the unscrambled words as one blob and ans make a SHA-Hash out of it (thats the flag)
- link to ctf for guy who wanted the files https://challenges.reply.com/tamtamy/user/login.action
You're better off creating a lookup dictionary (keyed by the sorted word) from the dictionary file.
Your loop body is executed 550 * 9,000 = 4,950,000 times (O(N*M)).
The solution I propose executes two loops of at most 9,000 passes each (O(N+M)).
Bonus: It finds all possible solutions at no cost.
#!/usr/bin/perl
use strict;
use warnings qw( all );
use feature qw( say );
my $dict_qfn = "dictionary.txt";
my $scrambled_qfn = "scrambled-words.txt";
sub key { join "", sort split //, $_[0] }
my %dict;
{
open(my $fh, "<", $dict_qfn)
or die("Can't open \"$dict_qfn\": $!\n");
while (<$fh>) {
chomp;
push #{ $dict{key($_)} }, $_;
}
}
{
open(my $fh, "<", $scrambled_qfn)
or die("Can't open \"$scrambled_qfn\": $!\n");
while (<$fh>) {
chomp;
my $matches = $dict{key($_)};
say "$_ matches #$matches" if $matches;
}
}
I wouldn't be surprised if this only takes one millionths of the time of your solution for the sizes you provided (and it scales so much better than yours if you were to increase the sizes).
I would do something like this with gawk
gawk '
NR == FNR {
dict[csort()] = $0
next
}
{
print dict[csort()]
}
function csort( chars, sorted) {
split($0, chars, "")
asort(chars)
for (i in chars)
sorted = sorted chars[i]
return sorted
}' dictionary.txt scrambled-words.txt
Here's perl-free solution I came up with using sort and join:
sort_letters() {
# Splits each letter onto a line, sorts the letters, then joins them
# e.g. "hello" becomes "ehllo"
echo "${1}" | fold-b1 | sort | tr -d '\n'
}
# For each input file...
for input in "dict.txt" "words.txt"; do
# Convert each line to [sorted] [original]
# then sort and save the results with a .sorted extension
while read -r original; do
sorted=$(sort_letters "${original}")
echo "${sorted} ${original}"
done < "${input}" | sort > "${input}.sorted"
done
# Join the two files on the [sorted] word
# outputting the scrambled and unscrambed words
join -j 1 -o 1.2,2.2 "words.txt.sorted" "dict.txt.sorted"
I tried something very alike, but a bit different.
#!/bin/bash
exec 3<scrambled-words.txt
while read -r line <&3; do
printf "%s" ${line} | perl -F -lane 'print sort #F'
done>scrambled-words_sorted.txt
exec 3>&-
exec 3<dictionary.txt
while read -r line <&3; do
printf "%s" ${line} | perl -F -lane 'print sort #F'
done>dictionary_sorted.txt
exec 3>&-
printf "" > whichline.txt
exec 3<scrambled-words_sorted.txt
while read -r line <&3; do
counter="$((++counter))"
grep -n -e "^${line}$" dictionary_sorted.txt | cut -d ':' -f 1 | tr -d '\n' >>whichline.txt printf "\n" >>whichline.txt
done
exec 3>&-
As you can see I don't create a new.txt file; instead I only create whichline.txt with a blank line where the word doesn't match. You can easily paste them up to create new.txt.
The logic behind the script is nearly the logic behind yours, with the exception that I called perl less times and I save two support files.
I think (but I am not sure) that creating them and cycle only one file will be better than ~5kk calls of perl. This way "only" ~10k times is called.
Finally, I decided to use grep because it's (maybe) the fastest regex matcher, and searching for the entire line the lenght is intrinsic in the regex.
Please, note that what #benjamin-w said is still valid and, in that case, grep will reply badly and I did not managed it!
I hope this could help [:
I have this simple plat file (file.txt)
a43
test1
abc
cvb
bnm
test2
test1
def
ijk
xyz
test2
kfo
I need all lines between test1 and test2 in two forms, the firte one create two new files like
newfile1.txt :
test1
abc
cvb
bnm
test2
newfile2.txt
test1
def
ijk
xyz
test2
and the second form create only one new file like :
newfile.txt
test1abccvbbnmtest2
test1defijkxyztest2
Do you have any propositions?
EDIT
For the second form. I used this
sed -n '/test1/,/test2/p' file.txt > newfile.txt
But it give me a result like
test1abccvbbnmtest2test1defijkxyztest2
I need a return line like :
test1abccvbbnmtest2
test1defijkxyztest2
You can use this awk:
awk -v fn="newfile.txt" '/test1/ {
f="newfile" ++n ".txt";
s=1
} s {
print > f;
printf "%s", $0 > fn
} /test2/ {
close(f);
print "" > fn;
s=0
} END {
close(fn)
}' file
Perl, like sed and other languages, has the ability to select ranges of lines from a file, so it's a good fit for what you're trying to do.
This solution ended up being a lot more complicated than I thought it would be. I see no good reason to use it over #anubhava's awk solution. But I wrote it, so here it is:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
use constant {
RANGE_START => qr/\Atest1\z/,
RANGE_END => qr/\Atest2\z/,
SUMMARY_FILE => 'newfile.txt',
GROUP_FILE => 'newfile%d.txt'
};
my $n = 1; # starting number of group file
my #wg; # storage for "working group" of lines
# Open summary file to write to.
open(my $sfh, '>', SUMMARY_FILE) or die $!;
while (my $line = <>) {
chomp $line;
# If the line is within the range, add it to our working group.
push #wg, $line if $line =~ RANGE_START .. $line =~ RANGE_END;
if ($line =~ RANGE_END) {
# We are at the end of a group, so summarize it and write it out.
unless (#wg > 2) {
# Discard any partial or empty groups.
#wg = ();
next;
}
# Write a line to the summary file.
$sfh->say(join '', #wg);
# Write out all lines to the group file.
my $group_file = sprintf(GROUP_FILE, $n);
open(my $gfh, '>', $group_file) or die $!;
$gfh->say(join "\n", #wg);
close($gfh);
printf STDERR "WROTE %s with %d lines\n", $group_file, scalar #wg;
# Get ready for the next group.
$n++;
#wg = ();
}
}
close($sfh);
printf STDERR "WROTE %s with %d groups\n", SUMMARY_FILE, $n - 1;
To use it, write the above lines into a file named e.g. ranges.pl, and make it executable with chmod +x ranges.pl. Then:
$ ./ranges.pl plat.txt
WROTE newfile1.txt with 5 lines
WROTE newfile2.txt with 5 lines
WROTE newfile.txt with 2 groups
$ cat newfile1.txt
test1
abc
cvb
bnm
test2
$ cat newfile.txt
test1abccvbbnmtest2
test1defijkxyztest2
For the second for you can add a new line after "test2" adding \n
sed -n '/test1/,/test2/p' file.txt | sed -e 's/test2/test2\n/g' > newfile.txt
sed is not useful to create multiple files so for the first one you should find another solution.
I am using bash to loop through a large input file (contents.txt) that looks like:
searchterm1
searchterm2
searchterm3
...in an effort to remove search terms from the file if they are not used in a code base. I am trying to use grep and awk, but no success. I also want to exclude the images and constants directories
#/bin/bash
while read a; do
output=`grep -R $a ../website | grep -v ../website/images | grep -v ../website/constants | grep -v ../website/.git`
if [ -z "$output" ]
then echo "$a" >> notneeded.txt
else echo "$a used $($output | wc -l) times" >> needed.txt
fi
done < constants.txt
The desired effect of this would be two files. One for showing all of the search terms that are found in the code base(needed.txt), and another for search terms that are not found in the code base(notneeded.txt).
needed.txt
searchterm1 used 4 times
searchterm3 used 10 times
notneeded.txt
searchterm2
I've tried awk as well in a similar fashion but I cannot get it to loop and output as desired
Not sure but it sounds like you're looking for something like this (assuming no spaces in your file names):
awk '
NR==FNR{ terms[$0]; next }
{
for (term in terms) {
if ($0 ~ term) {
hits[term]++
}
}
}
END {
for (term in terms) {
if (term in hits) {
print term " used " hits[term] " times" > "needed.txt"
}
else {
print term > "notneeded.txt"
}
}
}
' constants.txt $( find ../website -type f -print | egrep -v '\.\.\/website\/(images|constants|\.git)' )
There's probably some find option to make the egrep unnecessary.
**Edit: Okay, so I've tried implementing everyone's advice so far.
-I've added quotes around each variable "$1" and "$codon" to avoid whitespace.
-I've added the -ioc flag to grep to avoid caps.
-I tried using tr -d' ', however that leads to a runtime error because it says -d' ' is an invalid option.
Unfortunately I am still seeing the same problem. Or a different problem, which is that it tells me that every codon appears exactly once. Which is a different kind of wrong.
Thanks for everything so far - I'm still open to new ideas. I've updated my code below.**
I have this bash script that is supposed to count all permutations of (A C G T) in a given file.
One line of the script is not giving me the desired result and I don't know why - especially because I can enter the exact same line of code in the command prompt and get the desired result.
The line, executed in the command prompt, is:
cat dnafile | grep -o GCT | wc -l
This line tells me how many times the regular expression "GCT" appears in the file dnafile. When I run this command the result I get is 10 (which is accurate).
In the code itself, I run a modified version of the same command:
cat $1 | grep -o $codon | wc -l
Where $1 is the file name, and $codon is the 3-letter combination. When I run this from within the program, the answer I get is ALWAYS 0 (which is decidedly not accurate).
I was hoping one of you fine gents could enlighten this lost soul as to why this is not working as expected.
Thank you very, very much!
My code:
#!/bin/bash
#countcodons <dnafile> counts occurances of each codon in sequence contained within <dnafile>
if [[ $# != 1 ]]
then echo "Format is: countcodons <dnafile>"
exit
fi
nucleos=(a c g t)
allCods=()
#mix and match nucleotides to create all codons
for x in {0..3}
do
for y in {0..3}
do
for z in {0..3}
do
perm=${nucleos[$x]}${nucleos[$y]}${nucleos[$z]}
allCods=("${allCods[#]}" "$perm")
done
done
done
#for each codon, use grep to count # of occurances in file
len=${#allCods[*]}
for (( n=0; n<len; n++ ))
do
codon=${allCods[$n]}
occs=`cat "$1" | grep -ioc "$codon" | wc -l`
echo "$codon appears: $occs"
# if (( $occs > 0 ))
# then
# echo "$codon : $occs"
# fi
done
exit
You're generating your sequences in lowercase. Your code greps for gct, not GCT. You want to add the -i switch to grep. Try:
occs=`grep -ioc $codon $1`
You've got your logic backwards - you shouldn't have to read your input file once for every codon, you should only have to read it once and check each line for every codon.
You didn't supply any sample input or expected output so it's untested but something like this is the right approach:
awk '
BEGIN {
nucleosStr="a c g t"
split(nucleosStr,nucleos)
#mix and match nucleotides to create all codons
for (x in nucleos) {
for (y in nucleos) {
for (z in nucleos) {
perm = nucleos[x] nucleos[y] nucleos[z]
allCodsStr = allCodsStr (allCodsStr?" ":"") perm
}
}
}
split(allCodsStr,allCods)
}
{
#for each codon, count # of occurances in file
for (n in allCods) {
codon = allCods[n]
if ( tolower($0) ~ codon ) {
occs[n]++
}
}
}
END {
for (n in allCods) {
printf "%s appears: %d\n", allCods[n], occs[n]
}
}
' "$1"
I expect you'll see a huge performance improvement with that approach if your file is moderately large.
Try:
occs=`cat $1 | grep -o $codon | wc -l | tr -d ' '`
The problem is that wc indents the output, so $occs has a bunch of spaces at the beginning.
Hey I try to write a littel bash script. This should copy a dir and all files in it. Then it should search each file and dir in this copied dir for a String (e.g #ForTestingOnly) and then this save the line number. Then it should go on and count each { and } as soon as the number is equals it should save againg the line number. => it should delete all the lines between this 2 numbers.
I'm trying to make a script which searchs for all this annotations and then delete the method which is directly after this ano.
Thx for help...
so far I have:
echo "please enter dir"
read dir
newdir="$dir""_final"
cp -r $dir $newdir
cd $newdir
grep -lr -E '#ForTestingOnly' * | xargs sed -i 's/#ForTestingOnly//g'
now with grep I can search and replace the #ForTestingOnly anot. but I like to delete this and the following method...
Give this a try. It's oblivious to braces in comments and literals, though, as David Gelhar warned. It only finds and deletes the first occurrence of the "#ForTestingOnly" block (under the assumption that there will only be one anyway).
#!/bin/bash
find . -maxdepth 1 | while read -r file
do
open=0 close=0
# start=$(sed -n '/#ForTestingOnly/{=;q}' "$file")
while read -r line
do
case $line in
*{*) (( open++ )) ;;
*}*) (( close++ ));;
'') : ;; # skip blank lines
*) # these lines contain the line number that the sed "=" command printed
if (( open == close ))
then
break
fi
;;
esac
# split braces onto separate lines dropping all other chars
# print the line number once per line that contains either { or }
# done < <(sed -n "$start,$ { /[{}]/ s/\([{}]\)/\1\n/g;ta;b;:a;p;=}" "$file")
done < <(sed -n "/#ForTestingOnly/,$ { /[{}]/ s/\([{}]\)/\1\n/g;ta;b;:a;p;=}" "$file")
end=$line
# sed -i "${start},${end}d" "$file"
sed -i "/#ForTestingOnly/,${end}d" "$file"
done
Edit: Removed one call to sed (by commenting out and replacing a few lines).
Edit 2:
Here's a breakdown of the main sed line:
sed -n "/#ForTestingOnly/,$ { /[{}]/ s/\([{}]\)/\1\n/g;ta;b;:a;p;=}" "$file"
-n - only print lines when explicitly requested
/#ForTestingOnly/,$ - from the line containing "#ForTestingOnly" to the end of the file
s/ ... / ... /g perform a global (per-line) substitution
\( ... \) - capture
[{}] - the characters that appear in the list bewteen the square brackets
\1\n - substitute what was captured plus a newline
ta - if a substitution was made, branch to label "a"
b - branch (no label means "to the end and begin the per-line cycle again for the next line) - this branch functions as an "else" for the ta, I could have used T instead of ta;b;:a, but some versions of sed don't support T
:a - label "a"
p - print the line (actually, print the pattern buffer which now consists of possibly multiple lines with a "{" or "}" on each one)
= - print the current line number of the input file
The second sed command simply says to delete the lines starting at the one that has the target string and ending at the line found by the while loop.
The sed command at the top which I commented out says to find the target string and print the line number it's on and quit. That line isn't necessary since the main sed command is taking care of starting in the right place.
The inner whileloop looks at the output of the main sed command and increments counters for each brace. When the counts match it stops.
The outer while loop steps through all the files in the current directory.
I fixed the bugs in the old version. The new versions has two scripts: an awk script and a bash driver.
The driver is:
#!/bin/bash
AWK_SCRIPT=ann.awk
for i in $(find . -type f -print); do
while [ 1 ]; do
cmd=$(awk -f $AWK_SCRIPT $i)
if [ -z "$cmd" ]; then
break
else
eval $cmd
fi
done
done
the new awk script is:
BEGIN {
# line number where we will start deleting
start = 0;
}
{
# check current line for the annotation
# we're looking for
if($0 ~ /#ForTestingOnly/) {
start = NR;
found_first_open_brace = 0;
num_open = 0;
num_close = 0;
}
if(start != 0) {
if(num_open == num_close && found_first_open_brace == 1) {
print "sed -i \'\' -e '" start "," NR " d' " ARGV[1];
start = 0;
exit;
}
for(i = 1; i <= length($0); i++) {
c = substr($0, i, 1);
if(c == "{") {
found_first_open_brace = 1;
num_open++;
}
if(c == "}") {
num_close++;
}
}
}
}
Set the path to the awk script in the driver then run the driver in the root dir.