copy the last column in the first position awk - bash

I want to copy the first value of colum in the first position and comment out the old value.
For example :
word1 word2 1233425 -----> 1233425 word1 word2 #1233425
word1 word2 word3 49586 -----> 49586 word1 word2 word3 #49586
I don't know the number of words preceding the number.
I tried with an awk script :
awk '{$1="";score=$NF;$NF="";print $score $0 #$score}' file
But It does not work.

What about this? It is pretty similar to yours.
$ awk '{score=$NF; $NF="#"$NF; print score, $0}' file
1233425 word1 word2 #1233425
49586 word1 word2 word3 #49586
Note that in your case you are emptying $1, which is not necessary. Just store score as you did and then add # to the beginning of $NF.

Using awk
awk '{f=$NF;$NF="#" $NF;print f,$0}' file
Since we posted the same answer, here is a shorter variation :)
awk '{$0=$NF FS$0;$NF="#"$NF}1' file
$0=$NF FS$0 add last field to line
$NF="#"$NF add # to last field.
1 print line

A perl way to do it:
perl -pe 's/^(.+ )(\d+)/$2 $1 #$2/' infile

sed 's/\(.*\) \([^[:blank:]]\{1,\}\)/\2 \1 #\2/' YourFile
with GNU sed add -posix option

Related

Bash Unix search for a list of words in multiple files

I have a list of words I need to check in more one hundred text files.
My list of word's file named : word2search.txt.
This text file contains N word :
Word1
Word2
Word3
Word4
Word5
Word6
Wordn
So far I've done this bash file :
#!/bin/bash
listOfWord2Find=/home/mobaxterm/MyDocuments/word2search.txt
while IFS= read -r listOfWord2Find
do
echo "$listOfWord2Find"
grep -l -R "$listOfWord2Find" /home/mobaxterm/MyDocuments/txt/*.txt
echo "================================================================="
done <"$listOfWord2Find"
The result does not satisfy me, I can hardly exploit the result
Word1
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
/home/mobaxterm/MyDocuments/txt/file2.txt
/home/mobaxterm/MyDocuments/txt/file3.txt
=================================================================
Word2
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file4.txt
/home/mobaxterm/MyDocuments/txt/file5.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word4
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word5
/home/mobaxterm/MyDocuments/txt/new 6.txt
=================================================================
This is what i want to see :
/home/mobaxterm/MyDocuments/txt/file1.txt : Word1, Word2, Word3, Word4
/home/mobaxterm/MyDocuments/txt/file2.txt : Word1
/home/mobaxterm/MyDocuments/txt/file3.txt : Word1
/home/mobaxterm/MyDocuments/txt/file4.txt : Word3
/home/mobaxterm/MyDocuments/txt/file5.txt : Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt : Word1, Word2, Word3, Word4, Word5, Word6
I do not understand why my script doesnt show me the Word6(there are files which contains this word6). It stops at word5. To avoid this issue, I've added a new line blablabla (I'm sure to not find this occurence).
If you can help me on this subject :)
Thank you.
Another much more elegant approach to search all words on each file. One file at a time.
Use grep command multi pattern option -f, --file=FILE, and print matched lines with -o, --only-matching
Then to pipe massage the resulting words into csv list.
Like this:
script.sh
#!/bin/bash
for currFile in $*; do
matched_words_list=$(grep --only-matching --file=$WORDS_LIST $currFile |sort|uniq|awk -vORS=', ' 1|sed "s/, $//")
printf "%s : %s\n" "$currFile" "$matched_words_list"
done
script.sh output
Passing words list file in environment variable: WORDS_LIST
Passing inspected files list as arguments list input.*.txt
export WORDS_LIST=./words.txt; ./script.sh input.*.txt
input.1.txt : word1, word2
input.2.txt : word4
input.3.txt :
Explanation:
using words.txt:
word2
word1
word5
word4
using input.1.txt:
word1
word2
word3
word3
word1
word3
And pipe massage the grep command
grep --file=words.txt -o input.1.txt |sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2
output 1
List all matched words from words.txt in inspected file input.1.txt
grep --file=words.txt -o input.1.txt
word1
word2
word1
output 2
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
grep --file=words.txt -o input.1.txt|sort
word1
word1
word2
output 3
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
grep --file=words.txt -o input.1.txt|sort|uniq
word1
word2
output 4
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv list from the unique words
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1
word1,word2,
output 5
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv list from the unique words
Than remove trailing , from csv list
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2
The suggest strategy is to scan each line once with all words.
Suggest to write gawk script, which is standard Linux awk
script.awk
FNR == NR { # Only in first file having match words list
matchWordsArr[++wordsCount] = $0; # read match words into ordered array
matchedWordInFile[wordsCount] = 0; # reset matchedWordInFile array
}
FNR != NR { # Read line in inspected file
for (i in matchWordsArr) { # scan line for all match words
if ($0 ~ matchWordsArr[i]) matchedWordInFile[i]++; # if word is mached increment respective matchedWordInFile[i]
}
}
ENDFILE{ # on each file read completion
if (FNR != NR) { # if not first file
outputLine = sprintf("%s: ", FILENAME); # assign outputLine header to current fileName
for (i in matchWordsArr) { # iterate over matched words
if (matchedWordInFile[i] == 0) continue; # skip unmatched words
outputLine = sprintf("%s%s%s", outputLine, seprator, matchWordsArr[i]); # append matched word to outputLine
matchedWordInFile[i] = 0; # reset matched words array
seprator = ","; # set words list seperator ","
}
print outputLine;
}
outputLine = seprator = ""; # reset words list seperator "" and outputLine
}
input.1.txt:
word1
word2
word3
input.2.txt:
word3
word4
word5
input.3.txt:
word3
word7
word8
words.txt
word2
word1
word5
word4
running:
$ awk -f script.awk words.txt input.*.txt
input.1.txt: word2,word1
input.2.txt: word5,word4
input.3.txt:
Just grep:
grep -f list.txt input.*.txt
-f FILENAME allows to use a file with patterns for grep to search.
If you want to display the filename along with the match, pass -H in addition to that:
grep -Hf list.txt input.*.txt

grep remove exact matches from line without removing the whole line

I have patterns.txt file and I would like to remove all exact matches of patterns from FILE.txt. The FILE.txt is the following:
word1 word2
word3 word4
word5 word6
The pattern file contains:
word1
word6
The expected output is:
word2
word3 word4
word5
The command below removes the whole row where there is an exact match. How can I only remove the exact match from a line without removing the whole line? I don't want to use for-loops to achieve this.
cat FILE.txt | grep -wvf pattern.txt
With sed:
re=$(tr '\n' '|' < patterns.txt)
sed -r "s/$re//; s/^[[:space:]]*//" file
word2
word3 word4
word5
Note: Make sure patterns.txt does not have a trailing new line or extra new lines since | will end up in each of those positions.
You may try this awk:
awk 'FNR == NR {pats[$1]; next} {more=0; for (i=1; i<=NF; ++i) if (!($i in pats)) printf "%s", (more++ ? OFS : "") $i; print ""}' patterns.txt file
word2
word3 word4
word5
A more readable version:
awk '
FNR == NR {
pats[$1]
next
}
{
more = 0
for (i=1; i<=NF; ++i)
if (!($i in pats))
printf "%s", (more++ ? OFS : "") $i
print ""
}' patterns.txt file
In order to split the text with a word (string) as a delimiter, you can use awk.
awk -F 'word' '{print $1;print $2}' file.txt
In case you want to display only what is after the delimiter then it would be:
awk -F 'word' '{print $2}' file.txt
In order to change the pattern continuously then you might have to create a loop.
My first thought was to do just what #anubhava did. Then I thought that perl might be good for that: perl has good capabilities for filtering lists. The problem is that perl doesn't have an FNR variable. But I played around with a2p and came up with this:
perl -lane '
$FNR = $. - $FNRbase;
if ($. == $FNR) {
$ignore{$F[0]} = 1;
} else {
print join " ", grep {not exists $ignore{$_}} #F;
}
} continue {
$FNRbase = $. if eof
' pattern.txt FILE.txt

How to find lines of a file where 2 firsts words are differents from previous and next line

Consider the following file:
word1 word2 word3
word1 word2 word3
word6 word7 word8
word6 word7 word9
word9 word10 word4
word1 word2 word5
word1 word2 word5
I search for a shell command line to output lines where 2 first words are different from previous and next line.
Expected output:
word9 word10 word4
Any idea?
case 1: each line has same number of words (fields)
uniq can skip initial fields but not trailing fields
rev reverses the characters on a line
Since each line has the same number of fields (1 trailing), we can do:
<file rev | uniq -u -f1 | rev
case 2: arbitrary number of words on each line
We can write an awk script that keeps track of the current and the previous two lines and prints the previous one when appropriate:
awk <file '
{
# does current line match previous line?
diff = !( $1==p1 && $2==p2 )
# print stashed line if not duplicate
if (diff && pdiff) print p0
# stash current line data
pdiff=diff; p0=$0; p1=$1; p2=$2
}
END {
# print the final line if appropriate
if (pdiff) print p0
}
'
I guess there is some redundancy here but works
$ awk '{k=$1 FS $2}
k!=p && p!=pp {print p0}
{p0=$0; pp=p; p=k}
END {if(p!=pp) print}' file
word9 word10 word4

how can I remove a pattern from the begining of lines between two words using sed or awk

I want to remove a pattern in the begining of each line of a paragraph that contains word1 in the first line and end with word2 for example if I have the following file and I want to subsitute --MW by nothing
--MW Word1 this is paragraph number 1
--MW aaa
--MW bbb
--MW ccc
--MW word2
I want to get as result :
Word1 this is paragraph number 1
aaa
bbb
ccc
word2
Thanks in advance
Using sed
sed '/Word1/,/word2/s/--MW //' file
Using awk
awk '/Word1/,/word2/{sub(/--MW /,a)}1' file
Both act on lines between and including the matched phrases and the do a substitution on each line. They print all lines.
If you have your text in myfile.txt you could try:
awk 'BEGIN{f=0}$2=="Word1"{f=1}{if (f==1) {$1="";print $0}else{print $0}}$2=="word2"{f=0}' myfile.txt
If you are sure the pattern is going to be in the beginning of the line, then this command might help:
sed 's/^--MW //' file.txt
Please test and let us know if this worked fine with you.
Hopefully, this will do it for you:
$ echo "--MW Word1 this is paragraph number 1" | cut -d ' ' -f 2-
Where you pass the text to cut command and remove the first token, using space as token separator, while keeping the rest of tokens,i.e., from second to the end.

Removing line numbers (not entire line) from a file in unix

How can I remove line numbers from a file if the line numbers have been added by 'nl'?
example:
file1:
word1 word2
word3
word4
After command: nl file1 > file2
This is the exact command used.
file2:
1 word1 word2
2 word3
3 word4
Here comes the part where it revolves around.
Removing the line numbers from file2 and storing the lines in file3 (Or if possible, removing the numbers in file 2 whilst keeping the lines in file 2).
file3:
word1 word2
word3
word4
sed 's/ *[0-9]*.//' file2 > file3
Yep. As it was answered here:
you can use awk:
cat file | awk '{print $2}' > newfile
you can use cut:
cat file | cut -f2 > newfile
The cut utility will work. In this case, you have only one word in the line, so you can use just cut -f2, but if you had more columns, cut -f2- will preserve all except the first.
Something like this should solve it:
cut -d\ -f2- < file1
This will remove only the first word/number from each string in file2 and put the rest in file3:
awk '{$1 = ""; print $0;}' file2 > file3
file3:
word1 word2
word3
word4
Assuming you can't just cp file1 file3....
ni file1 > file2
sed 's/[0-9]*[ ]*\(.*\)$/\1/' file1 > file3

Resources