How can I remove line numbers from a file if the line numbers have been added by 'nl'?
example:
file1:
word1 word2
word3
word4
After command: nl file1 > file2
This is the exact command used.
file2:
1 word1 word2
2 word3
3 word4
Here comes the part where it revolves around.
Removing the line numbers from file2 and storing the lines in file3 (Or if possible, removing the numbers in file 2 whilst keeping the lines in file 2).
file3:
word1 word2
word3
word4
sed 's/ *[0-9]*.//' file2 > file3
Yep. As it was answered here:
you can use awk:
cat file | awk '{print $2}' > newfile
you can use cut:
cat file | cut -f2 > newfile
The cut utility will work. In this case, you have only one word in the line, so you can use just cut -f2, but if you had more columns, cut -f2- will preserve all except the first.
Something like this should solve it:
cut -d\ -f2- < file1
This will remove only the first word/number from each string in file2 and put the rest in file3:
awk '{$1 = ""; print $0;}' file2 > file3
file3:
word1 word2
word3
word4
Assuming you can't just cp file1 file3....
ni file1 > file2
sed 's/[0-9]*[ ]*\(.*\)$/\1/' file1 > file3
Related
I have a list of words I need to check in more one hundred text files.
My list of word's file named : word2search.txt.
This text file contains N word :
Word1
Word2
Word3
Word4
Word5
Word6
Wordn
So far I've done this bash file :
#!/bin/bash
listOfWord2Find=/home/mobaxterm/MyDocuments/word2search.txt
while IFS= read -r listOfWord2Find
do
echo "$listOfWord2Find"
grep -l -R "$listOfWord2Find" /home/mobaxterm/MyDocuments/txt/*.txt
echo "================================================================="
done <"$listOfWord2Find"
The result does not satisfy me, I can hardly exploit the result
Word1
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
/home/mobaxterm/MyDocuments/txt/file2.txt
/home/mobaxterm/MyDocuments/txt/file3.txt
=================================================================
Word2
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file4.txt
/home/mobaxterm/MyDocuments/txt/file5.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word4
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word5
/home/mobaxterm/MyDocuments/txt/new 6.txt
=================================================================
This is what i want to see :
/home/mobaxterm/MyDocuments/txt/file1.txt : Word1, Word2, Word3, Word4
/home/mobaxterm/MyDocuments/txt/file2.txt : Word1
/home/mobaxterm/MyDocuments/txt/file3.txt : Word1
/home/mobaxterm/MyDocuments/txt/file4.txt : Word3
/home/mobaxterm/MyDocuments/txt/file5.txt : Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt : Word1, Word2, Word3, Word4, Word5, Word6
I do not understand why my script doesnt show me the Word6(there are files which contains this word6). It stops at word5. To avoid this issue, I've added a new line blablabla (I'm sure to not find this occurence).
If you can help me on this subject :)
Thank you.
Another much more elegant approach to search all words on each file. One file at a time.
Use grep command multi pattern option -f, --file=FILE, and print matched lines with -o, --only-matching
Then to pipe massage the resulting words into csv list.
Like this:
script.sh
#!/bin/bash
for currFile in $*; do
matched_words_list=$(grep --only-matching --file=$WORDS_LIST $currFile |sort|uniq|awk -vORS=', ' 1|sed "s/, $//")
printf "%s : %s\n" "$currFile" "$matched_words_list"
done
script.sh output
Passing words list file in environment variable: WORDS_LIST
Passing inspected files list as arguments list input.*.txt
export WORDS_LIST=./words.txt; ./script.sh input.*.txt
input.1.txt : word1, word2
input.2.txt : word4
input.3.txt :
Explanation:
using words.txt:
word2
word1
word5
word4
using input.1.txt:
word1
word2
word3
word3
word1
word3
And pipe massage the grep command
grep --file=words.txt -o input.1.txt |sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2
output 1
List all matched words from words.txt in inspected file input.1.txt
grep --file=words.txt -o input.1.txt
word1
word2
word1
output 2
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
grep --file=words.txt -o input.1.txt|sort
word1
word1
word2
output 3
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
grep --file=words.txt -o input.1.txt|sort|uniq
word1
word2
output 4
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv list from the unique words
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1
word1,word2,
output 5
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv list from the unique words
Than remove trailing , from csv list
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2
The suggest strategy is to scan each line once with all words.
Suggest to write gawk script, which is standard Linux awk
script.awk
FNR == NR { # Only in first file having match words list
matchWordsArr[++wordsCount] = $0; # read match words into ordered array
matchedWordInFile[wordsCount] = 0; # reset matchedWordInFile array
}
FNR != NR { # Read line in inspected file
for (i in matchWordsArr) { # scan line for all match words
if ($0 ~ matchWordsArr[i]) matchedWordInFile[i]++; # if word is mached increment respective matchedWordInFile[i]
}
}
ENDFILE{ # on each file read completion
if (FNR != NR) { # if not first file
outputLine = sprintf("%s: ", FILENAME); # assign outputLine header to current fileName
for (i in matchWordsArr) { # iterate over matched words
if (matchedWordInFile[i] == 0) continue; # skip unmatched words
outputLine = sprintf("%s%s%s", outputLine, seprator, matchWordsArr[i]); # append matched word to outputLine
matchedWordInFile[i] = 0; # reset matched words array
seprator = ","; # set words list seperator ","
}
print outputLine;
}
outputLine = seprator = ""; # reset words list seperator "" and outputLine
}
input.1.txt:
word1
word2
word3
input.2.txt:
word3
word4
word5
input.3.txt:
word3
word7
word8
words.txt
word2
word1
word5
word4
running:
$ awk -f script.awk words.txt input.*.txt
input.1.txt: word2,word1
input.2.txt: word5,word4
input.3.txt:
Just grep:
grep -f list.txt input.*.txt
-f FILENAME allows to use a file with patterns for grep to search.
If you want to display the filename along with the match, pass -H in addition to that:
grep -Hf list.txt input.*.txt
I have patterns.txt file and I would like to remove all exact matches of patterns from FILE.txt. The FILE.txt is the following:
word1 word2
word3 word4
word5 word6
The pattern file contains:
word1
word6
The expected output is:
word2
word3 word4
word5
The command below removes the whole row where there is an exact match. How can I only remove the exact match from a line without removing the whole line? I don't want to use for-loops to achieve this.
cat FILE.txt | grep -wvf pattern.txt
With sed:
re=$(tr '\n' '|' < patterns.txt)
sed -r "s/$re//; s/^[[:space:]]*//" file
word2
word3 word4
word5
Note: Make sure patterns.txt does not have a trailing new line or extra new lines since | will end up in each of those positions.
You may try this awk:
awk 'FNR == NR {pats[$1]; next} {more=0; for (i=1; i<=NF; ++i) if (!($i in pats)) printf "%s", (more++ ? OFS : "") $i; print ""}' patterns.txt file
word2
word3 word4
word5
A more readable version:
awk '
FNR == NR {
pats[$1]
next
}
{
more = 0
for (i=1; i<=NF; ++i)
if (!($i in pats))
printf "%s", (more++ ? OFS : "") $i
print ""
}' patterns.txt file
In order to split the text with a word (string) as a delimiter, you can use awk.
awk -F 'word' '{print $1;print $2}' file.txt
In case you want to display only what is after the delimiter then it would be:
awk -F 'word' '{print $2}' file.txt
In order to change the pattern continuously then you might have to create a loop.
My first thought was to do just what #anubhava did. Then I thought that perl might be good for that: perl has good capabilities for filtering lists. The problem is that perl doesn't have an FNR variable. But I played around with a2p and came up with this:
perl -lane '
$FNR = $. - $FNRbase;
if ($. == $FNR) {
$ignore{$F[0]} = 1;
} else {
print join " ", grep {not exists $ignore{$_}} #F;
}
} continue {
$FNRbase = $. if eof
' pattern.txt FILE.txt
I have one file that is a list of numbers, and another file (same number of lines) in which I need the length of each line to match the number of the line in the other file. For example:
file 1:
5
8
7
11
15
file 2:
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
output:
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
I've tried using awk and cut together but I keep getting the error "fatal: attempt to use array `line' in a scalar context". I'm not sure how else to go about this. Any guidance is much appreciated!
awk is probably more appropriate, but you can also do:
while read line <&3; do
read len <&4; echo "${line:0:$len}";
done 3< file2 4< file1
awk is your tool for this: one of
# read all the lengths, then process file2
awk 'NR == FNR {len[NR] = $1; next} {print substr($0, 1, len[FNR])}' file1 file2
# fetch a line from file1 whilst processing file2
awk '{getline len < lenfile; print substr($0, 1, len)}' lenfile=file1 file2
another awk
$ paste file1 file2 | awk '{print substr($2,1,$1)}'
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
Using Perl
perl -lne ' BEGIN { open($f,"file1.txt");#x=<$f>;close($f) }
print substr($_,0,$x[$.-1]) ' file2.txt
with the given inputs
$ cat cmswen1.txt
5
8
7
11
15
$ cat cmswen2.txt
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
$ perl -lne ' BEGIN { open($f,"cmswen1.txt");#x=<$f>;close($f) } print substr($_,0,$x[$.-1]) ' cmswen2.txt
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
$
I want to remove a pattern in the begining of each line of a paragraph that contains word1 in the first line and end with word2 for example if I have the following file and I want to subsitute --MW by nothing
--MW Word1 this is paragraph number 1
--MW aaa
--MW bbb
--MW ccc
--MW word2
I want to get as result :
Word1 this is paragraph number 1
aaa
bbb
ccc
word2
Thanks in advance
Using sed
sed '/Word1/,/word2/s/--MW //' file
Using awk
awk '/Word1/,/word2/{sub(/--MW /,a)}1' file
Both act on lines between and including the matched phrases and the do a substitution on each line. They print all lines.
If you have your text in myfile.txt you could try:
awk 'BEGIN{f=0}$2=="Word1"{f=1}{if (f==1) {$1="";print $0}else{print $0}}$2=="word2"{f=0}' myfile.txt
If you are sure the pattern is going to be in the beginning of the line, then this command might help:
sed 's/^--MW //' file.txt
Please test and let us know if this worked fine with you.
Hopefully, this will do it for you:
$ echo "--MW Word1 this is paragraph number 1" | cut -d ' ' -f 2-
Where you pass the text to cut command and remove the first token, using space as token separator, while keeping the rest of tokens,i.e., from second to the end.
I want to copy the first value of colum in the first position and comment out the old value.
For example :
word1 word2 1233425 -----> 1233425 word1 word2 #1233425
word1 word2 word3 49586 -----> 49586 word1 word2 word3 #49586
I don't know the number of words preceding the number.
I tried with an awk script :
awk '{$1="";score=$NF;$NF="";print $score $0 #$score}' file
But It does not work.
What about this? It is pretty similar to yours.
$ awk '{score=$NF; $NF="#"$NF; print score, $0}' file
1233425 word1 word2 #1233425
49586 word1 word2 word3 #49586
Note that in your case you are emptying $1, which is not necessary. Just store score as you did and then add # to the beginning of $NF.
Using awk
awk '{f=$NF;$NF="#" $NF;print f,$0}' file
Since we posted the same answer, here is a shorter variation :)
awk '{$0=$NF FS$0;$NF="#"$NF}1' file
$0=$NF FS$0 add last field to line
$NF="#"$NF add # to last field.
1 print line
A perl way to do it:
perl -pe 's/^(.+ )(\d+)/$2 $1 #$2/' infile
sed 's/\(.*\) \([^[:blank:]]\{1,\}\)/\2 \1 #\2/' YourFile
with GNU sed add -posix option