cat file_a
aaa
bbb
ccc
cat file_b
ddd
eee
fff
cat file_x
bbb
ccc
ddd
eee
I want to cat file_a file_b | remove_from_stream_what_is_in(file_x)
Result:
aaa
fff
If there is no basic filter to do this with, then I wonder if there is a way with ruby -ne '...'.
Try:
$ cat file_a file_b | grep -vFf file_x
aaa
fff
-v means remove matching lines.
-F tells grep to treat the match patterns as fixed strings, not regular expressions.
-f file_x tells grep to get the match patterns from the lines of file_x.
Other options that you may want to consider are:
-w tells grep to match only complete words.
-x tells grep to match only complete lines.
IO.write('file_a', %w| aaa bbb ccc |.join("\n")) #=> 11
IO.write('file_b', %w| ddd eee fff |.join("\n")) #=> 11
IO.write('file_x', %w| bbb ccc ddd eee |.join("\n")) #=> 15
From Ruby:
IO.readlines('file_a', chomp: true) + IO.readlines('file_b', chomp: true) -
IO.readlines('file_x', chomp: true)
#=> ["aaa", "fff"]
Related
I need to find links on a page consisting of two words.How can this be done with Xpath?
<div class="navbar">
<p>
Aaa aaa
Bbb
Ccc ccc
Ddd
Eee
Fff fff ff
</p>
</div>
If you can differentiate the strings by the count of spaces, you could use this XPath-1.0 expression:
/div/p/a[string-length(normalize-space(.))-string-length(translate(normalize-space(.),' ',''))=1]
This matches all two-word-strings.
My 2cts by removing not space characters and also counting spaces
XPath 1: //a[count(translate(normalize-space(.),"abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ","")) = 1]
echo -e 'cat //a[translate(normalize-space(.),"abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ","") = " "] \n bye' | xmllint --shell test.html
/ > cat //a[translate(normalize-space(.),"abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ","") = " "]
-------
Aaa aaa
-------
Ccc ccc
/ > bye
Using length of remaining spaces
Xpath 2: //a[string-length(translate(normalize-space(.),"abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ","")) = 1]
echo -e 'cat //a[string-length(translate(normalize-space(.),"abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ","")) = 1] \n bye' | xmllint --shell test.html
/ > cat //a[string-length(translate(normalize-space(.),"abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ","")) = 1]
-------
Aaa aaa
-------
Ccc ccc
/ > bye
I have 2 CSV files, where the 1st one is my main CSV that contains all the columns I need. The 2nd CSV contains 2 columns, where the 1st column is an identifier, and the 2nd column is replacement value. For example
Main.csv
aaa 111 bbb 222 ccc 333
ddd 444 eee 555 fff 666
iii 777 jjj 888 kkk 999
lll 101 eee 201 nnn 301
replacement.csv
bbb abc
jjj def
eee ghi
I want the results to look like the following, where for example the 3rd column of the main.csv is the identifier and 1st column of the replacement.csv. By using that as an identifier, the 5th column of main.csv should be replaced with 2nd column of replacement.csv. Also, the main.csv can have repeated values, so all the values should be changed to the appropriate replacement value
aaa 111 bbb 222 abc 333
ddd 444 eee 555 ghi 666
iii 777 jjj 888 def 999
lll 101 eee 201 ghi 301
I tried a code like this
while read col1 col2 col3 col4 col5 col6
do
while read col7 col8
do
if[$col7==col3]
then
col5=col8
fi
done < RepCSV
done < MainCSV > MainCSV
But it did not work.
I'm quite new to bash, so the help will be appreciated. Thanks in advance
Using awk:
$ awk '
NR==FNR { # process the first file
a[$1]=$2 # hash $2 to a, $1 as key
next # next record
}
{ # second file
$5=($3 in a?a[$3]:$5) ยค replace $5 based on $3
}1' replacement main
aaa 111 bbb 222 abc 333
ddd 444 eee 555 ghi 666
iii 777 jjj 888 def 999
lll 101 eee 201 ghi 301
How to do natural sort on uniq -c output?
When the counts are <10, the uniq -c | sort output looks fine:
alvas#ubi:~/testdir$ echo -e "aaa\nbbb\naa\ncd\nada\naaa\nbbb\naa\nccd\naa" > test.txt
alvas#ubi:~/testdir$ cat test.txt
aaa
bbb
aa
cd
ada
aaa
bbb
aa
ccd
aa
alvas#ubi:~/testdir$ cat test.txt | sort | uniq -c | sort
1 ada
1 ccd
1 cd
2 aaa
2 bbb
3 aa
but when the counts are > 10 and even in thousands/hundreds the sort messes up because it's sorting by string and not by natural integer sort:
alvas#ubi:~/testdir$ echo -e "aaa\nbbb\naa\nnaa\nnaa\naa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnnaa\ncd\nada\naaa\nbbb\naa\nccd\naa" > test.txt
alvas#ubi:~/testdir$ cat test.txt | sort | uniq -c | sort
10 naa
1 ada
1 ccd
1 cd
1 nnaa
2 aaa
2 bbb
4 aa
How to do natural sort output of "uniq -c" in descending/acsending order?
Use -n in your sort command, so that it sorts numerically. Also -r allows you to reverse the result:
$ sort test.txt | uniq -c | sort -n
1 ada
1 ccd
1 cd
1 nnaa
2 aaa
2 bbb
4 aa
10 naa
$ sort test.txt | uniq -c | sort -nr
10 naa
4 aa
2 bbb
2 aaa
1 nnaa
1 cd
1 ccd
1 ada
From man sort:
-n, --numeric-sort
compare according to string numerical value
-r, --reverse
reverse the result of comparisons
here is the question: I have two files:
file1:
aaa
bbb
ccc
ddd
file2:
bbb
ddd
HOW TO USE DIFF TO GET THIS OUTPUT (only differences)
aaa
ccc
If what you want is records unique to file1, then :
$ comm -23 <(sort file1) <(sort file2)
aaa
ccc
I'm using RUBY 's regular expression to deal with text such as
${1:aaa|bbbb}
${233:aaa | bbbb | ccc ccccc }
${34: aaa | bbbb | cccccccc |d}
${343: aaa | bbbb | cccccccc |dddddd ddddddddd}
${3443:a aa|bbbb|cccccccc|d}
${353:aa a| b b b b | c c c c c c c c | dddddd}
I want to get the trimed text between each pipe line. For example, for the first line of my upper example, I want to get the result aaa and bbbb, for the second line, I want aaa, bbbb and ccc ccccc. Now I have wrote a piece of regular expression and a piece of ruby code to test it:
array = "${33:aaa|bbbb|cccccccc}".scan(/\$\{\s*(\d+)\s*:(\s*[^\|]+\s*)(?:\|(\s*[^\|]+\s*))+\}/)
puts array
Now my problem is the (?:\|(\s*[^\|]+\s*))+ part can't create multiple groups. I don't know how to solve this problem, because the number of text I need in each line is variable. Can anyone help?
When you repeat a capturing group in a regular expression, the capturing group only stores the text matched by its last iteration. If you need to capture multiple iterations, you'll need to use more than one regex. (.NET is the only exception to this. Its CaptureCollection provides the matches of all iterations of a capturing group.)
In your case, you could do a search-and-replace to replace ^\d+: with nothing. That strips off the number and colon at the start of your string. Then call split() using the regex \s*\|\s* to split the string into the elements delimited by vertical bars.
Why don't you split your string?
str = "${233:aaa | bbbb | ccc ccccc }"
str.split(/\d+|\$|\{|\}|:|\|/).select{|v| !v.empty? }.select{|v| !v.empty? }.map{|v| v.strip}.join(', ')
#=> "aaa, bbb, cc cccc"
This might help you
Script
a = [
'${1:aaa|bbbb}',
'${233:aaa | bbbb | ccc ccccc }',
'${34: aaa | bbbb | cccccccc |d}',
'${343: aaa | bbbb | cccccccc |dddddd ddddddddd}',
'${3443:a aa|bbbb|cccccccc|d}',
'${353:aa a| b b b b | c c c c c c c c | dddddd}'
]
a.each do |input|
puts input
input.scan(/[:|]([^|}]+)/).flatten.each do |s|
puts s.gsub(/(^\s+|\s+$)/, '') # trim
end
end
Output
${1:aaa|bbbb}
aaa
bbbb
${233:aaa | bbbb | ccc ccccc }
aaa
bbbb
ccc ccccc
${34: aaa | bbbb | cccccccc |d}
aaa
bbbb
cccccccc
d
${343: aaa | bbbb | cccccccc |dddddd ddddddddd}
aaa
bbbb
cccccccc
dddddd ddddddddd
${3443:a aa|bbbb|cccccccc|d}
a aa
bbbb
cccccccc
d
${353:aa a| b b b b | c c c c c c c c | dddddd}
aa a
b b b b
c c c c c c c c
dddddd
Instead of trying to do everything at once, divide and conquer:
DATA.each do |line|
line =~ /:(.+)\}/
items = $1.strip.split( /\s* \| \s*/x )
p items
end
__END__
${1:aaa|bbbb}
${233:aaa | bbbb | ccc ccccc }
${34: aaa | bbbb | cccccccc |d}
${343: aaa | bbbb | cccccccc |dddddd ddddddddd}
${3443:a aa|bbbb|cccccccc|d}
${353:aa a| b b b b | c c c c c c c c | dddddd}
If you want to do it with a single regex, you can use scan, but this seems more difficult to grok:
DATA.each do |line|
items = line.scan( /[:|] ([^|}]+) /x ).flatten.map { |i| i.strip }
p items
end