I have a page with content and table like this:
table
tr
tr
tr
tr (needed)
tr
tr
tr
tr
tr
tr
tr
tr (needed)
tr
tr
tr
tr
tr
tr
tr
tr(needed)
...
I want to extract the tr in these positions
4, 12, 20, 28 .....
so in other words, i want to take the tr in position 4 and then get the tr 4+8 and then get the tr 4+8+8.... and so on
is there anyway to get that using xpath?
Here's how you can do this quite concisely and efficiently, and without any fixed limit on the possible number of trs:
//table/tr[(position() mod 8) = 4]
This will select tr number 4, 12, 20, ... 8n + 4, etc.
Related
I have two strings:
l1='a1 a2 b1 b2 c1 c2'
l2='a1 b3 c1'
And I want to check if each element of string l2 exists in l1, and then remove it from l1.
Is it possible to do that without a for loop?
You can do this:
l1=$(comm -23 <(echo "$l1" | tr ' ' '\n' | sort) <(echo "$l2" | tr ' ' '\n' | sort) | tr '\n' ' ')
The comm compares lines and outputs the lines that are unique to the first input, unique to the second input, and common to both. The -23 option suppresses the second two sets of outputs, so it just reports the lines that are unique to the first input.
Since it requires the input to be sorted lines, I first pipe the variables to tr to put each word on its own line, and then sort to sort it. <(...) is a common shell extension called process substitution that allows a command to be used where a filename is expected; it's available in bash and zsh, for example (see here for a table that lists which shells have it).
At the end I use tr again to translate the newlines back to spaces.
DEMO
If you don't have process substution, you can emulate it with named pipes:
mkfifo p1
echo "$l1" | tr ' ' '\n' | sort > p1 &
mkfifo p2
echo "$l2" | tr ' ' '\n' | sort > p2 &
l1=$(comm p1 p2 | tr '\n' ' ')
rm p1 p2
I am trying to use this command to remove all < and > and + characters from a string. But the command tr is inserting a space character in its place. I want to also remove all spaces.
In
<ModelName>.<123456798123465>.
Out
ModelName . 123456798123465 .
Expected Output
ModelName.123456798123465.
Command used
String | tr '<>+,' ' '
What am I missing here ?
If you want to delete specific characters, use tr -d SET. tr SET ' ' means to translate all characters in SET to space characters.
$ echo '<ModelName>.<123456798123465>.' | tr '<>+' ' '
ModelName . 123456798123465 .
$ echo '<ModelName>.<123456798123465>.' | tr -d '<>+'
ModelName.123456798123465.
You could use sed instead of tr like below.
$ echo '<ModelName>.<123456798123465>.' | sed 's/[+<>]//g'
ModelName.123456798123465.
Character class [+<>] matches all the + or < or > symbols. By replacing the matched characters with an empty string will give the desired output.
What you need is the -d option of tr command
$ echo "<ModelName>.<123456798123465>." | tr -d '<>+,'
ModelName.123456798123465.
From the man pages
-d, --delete
delete characters in SET1, do not translate
How can I extract words which contains the pattern "arum" from the following line:
Agarum anoestrum alabastrum sun antirumor alarum antiserum ambulacrum antistrumatic Anatherum antistrumous androphorum antrum 4. foodstuff foody nonfood Aplectrum
So words like sun, 4., foody, nonfood should be removed.
You can use grep:
echo "Agarum anoestrum sun" | tr ' ' '\n' | grep "arum"
tr is used to split the input string in one word per line, since grep operates on a per-line basis and would display the whole line.
If you want the output to be in one line again, use:
echo "Agarum anoestrum sun" | tr ' ' '\n' | grep "arum" | tr '\n' ' '
Using grep -Eo:
grep -Eo 'a[[:alnum:]]*rum' file
arum
anoestrum
alabastrum
antirum
alarum
antiserum
ambulacrum
antistrum
atherum
antistrum
androphorum
antrum
Try:
echo Agarum anoestrum alabastrum sun antirumor alarum antiserum ambulacrum antistrumatic Anatherum antistrumous androphorum antrum 4. foodstuff foody nonfood Aplectrum | awk '{for (i=1;i<NF;i++) { if (match($i, "[aA][[:alnum:]]*[rR][uU][mM]") != 0) { printf ("%s ", $i) }} print}'
I have two files and I need to print the words only (not complete lines) that are in the first file not in the second file. I have tried wdiff but it prints complete lines and is not useful.
Sample of file:
وكان مكنيل وقتها رئيس رابطة مؤرخي أمريكا ـ
كما فهمت - من شاهد الحادثة. ثم يصف كيف قدم
مكنيل الرجلين الخصمين, فكانت له صرامته, إذ
حدد عشرين دقيقة فقط لكل منهما أن يقدم رأيه
وحجته, ثم وقت للرد, ثم يجيبان عن أسئلة قليلة
من القاعة, والمناقشة في وقت محدد.
Make two files that contain each word on its own line, and sort them. Then use comm:
$ cat fileA
ﻮﻛﺎﻧ ﻢﻜﻨﻴﻟ ﻮﻘﺘﻫﺍ ﺮﺌﻴﺳ ﺭﺎﺒﻃﺓ ﻡﺅﺮﺨﻳ ﺄﻣﺮﻴﻛﺍ ـ
ﻚﻣﺍ ﻒﻬﻤﺗ - ﻢﻧ ﺵﺎﻫﺩ ﺎﻠﺣﺍﺪﺛﺓ. ﺚﻣ ﻲﺼﻓ ﻚﻴﻓ ﻕﺪﻣ
$ cat fileB
ﻮﻘﺘﻫﺍ ﺮﺌﻴﺳ ﺭﺎﺒﻃﺓ ﺄﻣﺮﻴﻛﺍ ـ
ﻚﻣﺍ ﻒﻬﻤﺗ - ﻢﻧ ﺵﺎﻫﺩ ﻲﺼﻓ ﻚﻴﻓ ﻕﺪﻣ
$ tr ' ' '\n' < fileA | sort > fileA-sorted
$ tr ' ' '\n' < fileB | sort > fileB-sorted
$ comm -23 fileA-sorted fileB-sorted
ﺎﻠﺣﺍﺪﺛﺓ.
ﺚﻣ
ﻢﻜﻨﻴﻟ
ﻡﺅﺮﺨﻳ
ﻮﻛﺎﻧ
$
This can also be written on a single line in bash:
comm -23 <(tr ' ' '\n' < fileA | sort) <(tr ' ' '\n' < fileB | sort)
This is not an answer, but a comment too long to be a comment. I'm sorry - I don't yet know the etiquette in this case, so please let me know if there's a better way to do this.
I thought both the approaches given in other answers were interesting, but was concerned that the grep version would require m * n comparisons, where m and n are the numbers of words in each file respectively.
I'm running bash on OSX and ran the following smoke test to compare:
Grab two random selections of 10K words from my dictionary:
gsort -R /usr/share/dict/words | head -n 10000 > words1
gsort -R /usr/share/dict/words | head -n 10000 > words2
Compare the running time for each solution:
Using comm:
time comm -23 <(tr ' ' '\n' < words1 | sort) <(tr ' ' '\n' < words2 | sort)
Result:
real 0m0.143s
user 0m0.225s
sys 0m0.018s
Using grep:
time grep -wf <(tr ' ' '\n' < words1) <(tr ' ' '\n' < words2)
Result:
real 1m25.988s
user 1m25.925s
sys 0m0.063s
I'm not sure about memory complexity. I'd be interested in any criticism of this analysis, or commentary on how to evaluate which solution is better?
You can avoid sorting (specially if input files are pretty huge) using grep:
grep -wf <(tr ' ' '\n' < file1) <(tr ' ' '\n' < file2)
how to list all word of length 3 without duplication ?
using tr ' ' '\n' < cca1.txt | grep '^.\{3\}$'
list all word of length 3
but when add sort -u to be tr ' ' '\n' < cca1.txt | grep '^.\{3\}$' |sort -u
to list words of length 3 without duplication
it list part of words not whole words of length 3
any suggestion?
sort -u can be tricky.
simply use:
tr ' ' '\n' < cca1.txt | grep '^...$' | sort | uniq