How to operation two set that contain structured data.
e.g.
set(set(<a b c>), set(<d e f>)) ⊆ set(set(<a b c>), set(<d e f>), set(<g h i>))#True
set(set(<a b c>), set(<d e f>)) eq set(set(<a b c>), set(<d e f>), set(<g h i>))#false
set(set(<a b c>), set(<d e f>)) ∩ set(set(<a b c>), set(<d e f>), set(<g h i>))#set(<a b c>), set(<d e f>))
Regardless of values in a Set, you can use the eqv operator to find out if they are the same:
$ raku -e 'say <a b c>.Set eqv <c b a>.Set'
True
$ raku -e 'say <a b c>.Set eqv <d b a>.Set'
False
$ raku -e 'say set(<a b c>.Set,<a b d>.Set) eqv set(<d b a>.Set,<c b a>.Set)'
True
Related
I'm trying to output something that resembles as ls output. The ls command outputs like this:
file1.txt file3.txt file5.txt
file2.txt file4.txt
But I this sample list:
a b c d e f g h i j k l m n o p q r s t u v w x y z
to appear as:
a e i m q u y
b f j n r v z
c g k o s w
d h l p t x
In that case, it gave 7 columns which is fine, but I wanted up to 8 columns max. Next the following list:
a b c d e f g h i j k l m n o p q r s t u v w
will have to show as:
a d g j m p s v
b e h k n q t w
c f i l o r u
And "a b c d e f g h" will have to show as is because it is already 8 columns in 1 line, but:
a b c d e f g h i
will show as:
a c e g i
b d f h
And:
a b c d e f g h i j
a c e g i
b d f h j
One way:
#!/usr/bin/env tclsh
proc columnize {lst {columns 8}} {
set len [llength $lst]
set nrows [expr {int(ceil($len / (0.0 + $columns)))}]
set cols [list]
for {set n 0} {$n < $len} {incr n $nrows} {
lappend cols [lrange $lst $n [expr {$n + $nrows - 1}]]
}
for {set n 0} {$n < $nrows} {incr n} {
set row [list]
foreach col $cols {
lappend row [lindex $col $n]
}
puts [join $row " "]
}
}
columnize {a b c d e f g h i j k l m n o p q r s t u v w x y z}
puts ----
columnize {a b c d e f g h i j k l m n o p q r s t u v w}
puts ----
columnize {a b c d e f g h}
puts ----
columnize {a b c d e f g h i}
puts ----
columnize {a b c d e f g h i j}
The columnize function first figures out how many rows are needed with a simple division of the length of the list by the number of columns requested, then splits the list up into chunks of that length, one per column, and finally iterates through those sublists extracting the current row's element for each column, and prints the row out as a space-separated list.
What I have expected is the output like below:
[before character h is null and assign with '#". After character h
are "e","l","l".]
[before character e is "h". After character e are
"l","l","o".]
[before character l are "h" and "e". After character l
are "l" and "o".]
[before character l are "h" and "e". After character
l are "l" and "o".]
[before character l are "h","e","l". After
character l is "o".]
[before character o are "e","l","l". After
character o is null and assign with '#".]
# # # h e l l
# # h e l l o
# h e l l o #
h e l l o # #
e l l o # # #
# # # w o n d
# # w o n d e
# w o n d e r
w o n d e r f
o n d e r f u
n d e r f u l
d e r f u l #
e r f u l # #
r f u l # # #
Input file:
h e l l o
w o n d e r f u l
Code:
awk -v s1="# # #"
'BEGIN{v=length(s1)}
{$0=s1 $0 s1;num=split($0, A,"");
for(i=v+1;i<=num-v;i++){
q=i-v;p=i+v;
while(q<=p){
Q=Q?Q OFS A[q]:A[q];q++
};
print Q;Q=""
}
}' InputFile
But the result I got is:
# # # h e l
# # h e l l
# # h e l l
# h e l l o
# h e l l o #
h e l l o #
e l l o # #
e l l o # #
l l o # # #
# # # w o n
# # w o n d
# # w o n d
# w o n d e
# w o n d e
w o n d e r
o n d e r
o n d e r f
n d e r f
n d e r f u
d e r f u
d e r f u l
e r f u l #
e r f u l #
r f u l # #
r f u l # #
f u l # # #
How to solve it? Please guide me. Thanks
Add gsub(/ /,"") to the top of #fedorqui's answer to your previous question, change ## to ### and change 5 to 7 and you get:
$ cat tst.awk
{
gsub(/ /,"")
n=length($0)
$0 = "###" $0 "###"
gsub(/./, "& ")
for (i=1; i<=2*n; i+=2)
print substr($0, i, 7*2-1)
print ""
}
$ awk -f tst.awk file
# # # h e l l
# # h e l l o
# h e l l o #
h e l l o # #
e l l o # # #
# # # w o n d
# # w o n d e
# w o n d e r
w o n d e r f
o n d e r f u
n d e r f u l
d e r f u l #
e r f u l # #
r f u l # # #
I have the following string.
melody = "F# G# A# B |A# G# F# |A# B C# D# |C# B A# |F# F#
|F# F# |F#F#G#G#A#A#BB|A# G# F# "
I want to convert F# to f, G# to g etc.
melody.gsub(/C#/, 'c').gsub(/D#/,'d').gsub(/F#/,'f').gsub(/G#/,'g').gsub(/A#/,'a')
The above gives a desired output. But I am wondering if I can use gsub only once.
"f g a B |a g f |a B c d |c B a |f f |f f |ffggaaBB|a g f "
String#gsub accepts an optional block: return value of the block is used as replacement string:
melody.gsub(/[CDFGA]#/) { |x| x[0].downcase }
# => "f g a B |a g f |a B c d |c B a |f f |f f |ffggaaBB|a g f "
You can use hash too.
melody.gsub(/[CDFGA]#/, {'C#' => 'c', 'D#' => 'd', 'F#' => 'f', 'G#' => 'g', 'A#' => 'a'})
i'm trying to understand how Ruby's stdout actually works, since i'm struggling with the output of some code.
Actually, within my script i'm using a unix sort, which works fine from termina, but this is what i get from ruby, suppose you have this in your file (tsv)
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
My ruby code is this:
#raw_file=File.open(ARGV[0],"r") unless File.open(ARGV[0],"r").nil?
tmp_raw=File.new("#{#pwd}/tmp_raw","w")
`cut -f1,6,3,4,2,5,9,12 #{#raw_file.path} | sort -k1,1 -k8,8 > #{tmp_raw.path}`
This is what i get (misplaced separators):
a b c d e f i
1a b c d e f g h i l m
1
Whats happening here?
When running from terminal i get no separators misplacement
enter code here
Instead of writing to a temporary file, passing the file via argument etc, you can use Ruby's open3 module to create the pipeline in a more Ruby-friendly manner (instead of relying on the underlying shell):
require 'open3'
raw_file = File.open(ARGV[0], "r")
commands = [
["cut", "-f1,6,3,4,2,5,9,12"],
["sort", "-k1,1", "-k8,8"],
]
result = Open3.pipeline_r(*commands, in: raw_file) do |out|
break out.read
end
puts result
Shell escaping problems, for example, become a thing from the past, and no temporary files are necessary, since pipes are used.
I would, however, advise doing this kind of processing in Ruby itself, instead of calling external utilities; you're getting no benefit from using Ruby here, you're just doing shell stuff.
As Linuxios says, your code never uses STDOUT, so your question doesn't make a lot of sense.
Here's a simple example showing how to do this all in Ruby.
Starting with an input file called "test.txt":
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
This code:
File.open('test_out.txt', 'w') do |test_out|
File.foreach('test.txt') do |line_in|
chars = line_in.split
test_out.puts chars.values_at(0, 5, 2, 3, 1, 4, 8, 10).sort_by{ |*c| [c[0], c[7]] }.join("\t")
end
end
Creates this output in 'test_out.txt':
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
Read about values_at and sort_by.
I have a small bioinformatics problem that I think should be easy to solve. Related to "genotype phasing". But I'm not sure how to tackle it.
In the extract below, the first column are identifiers, the subsequent columns are binary genotypes labelled with "a" or "b". "-" means missing value.
Si_gnF.scaffold10533.53688bp_tag414456 b a a b b a b a a a b a b b a b a a b b a a b b
Si_gnF.scaffold10533.76297bp_tag414484 a b b a a b a b b b a b a a b a b b a - b b a a
Si_gnF.scaffold10533.98416bp_tag414526 a b b a a b a b b b a b a a b a b b a a b b a a
Si_gnF.scaffold10534.48805bp_tag414546 b a a b a b a b b b b b b a a a a b a b b b b a
Si_gnF.scaffold10535.1091787bp_tag414684 a a a b b a a a b a b a a a a b b b a a b b a a
Si_gnF.scaffold10535.1151107bp_tag414765 b b b a a b b b a b a - b b b a a a b b a a b b
Si_gnF.scaffold10535.1220879bp_tag414877 a a a b b a a a b a b a a a a b b b a a b b a a
Si_gnF.scaffold10535.1304464bp_tag414988 b b b a a b b b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1347462bp_tag415047 b b b a a b b b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1379804bp_tag415090 b b b a a b b b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1540335bp_tag415345 a a a b b a a a b a b a a a a b b b a a b b a a
Si_gnF.scaffold10535.1585442bp_tag415410 a a a b b a a a b a b a a a a b b b a a b b a a
Si_gnF.scaffold10535.1609908bp_tag415431 b b b a a b a b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1711158bp_tag415567 b b b a a b b b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1744394bp_tag415609 b b b a a b b b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1751886bp_tag415620 a a a b b a a a b a b a a a a b b b a a b b a a
Si_gnF.scaffold10535.1752774bp_tag415622 a a a b b a a a b a b a a a a b b b a a b b a a
Si_gnF.scaffold10535.1789478bp_tag415675 b b - a a b b b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1800135bp_tag415687 b b b a a b b b a b a b b b b a a a b b a a b b
Si_gnF.scaffold10535.1885424bp_tag415814 a a a b b a a a b a b a a a a b b b a a b b a a
Basically, I want to minimize the number of differences between lines. (I cannot edit individual columns, but can flip the labels on whole lines). The result for the first four lines would be this:
Si_gnF.scaffold10533.53688bp_tag414456 b a a b b a b a a a b a b b a b a a b b a a b b
Si_gnF.scaffold10533.76297bp_tag414484 b a a b b a b a a a b a b b a b a a b - a a b b <-- this one flipped
Si_gnF.scaffold10533.98416bp_tag414526 b a a b b a b a a a b a b b a b a a b b a a b b <-- this one flipped
Si_gnF.scaffold10533.53688bp_tag414456 b a a b b a b a a a b a b b a b a a b b a a b b
As a first step I'll need to make pairwise comparisons. But what is a good way of quantifying the differences, so that I know for which lines labels must be flipped? (2 consecutive lines rarely match 100%; there can be multiple (even many) mismatches as well as missing values).
(ideally in ruby or R)
You can use the Levenshtein algorithm to quantify the difference between two strings. One way to do it:
require 'text' # See http://rubygems.org/gems/text
lines # => a array with each line
def compare(line1, line2)
Text::Levenshtein.distance(line1.sub(/.*\s/, '').sort,
line2.sub(/.*\s/, '').sort)
end
compare(lines[0], lines[1]) # => 1 (one value different)
(If "a b a a" is not equal to "a a a b", remove the sort from the method.)