Removing line numbers (not entire line) from a file in unix

Removing line numbers (not entire line) from a file in unix - bash

How can I remove line numbers from a file if the line numbers have been added by 'nl'?
example:
file1:
word1 word2
word3
word4
After command: nl file1 > file2
This is the exact command used.
file2:
1 word1 word2
2 word3
3 word4
Here comes the part where it revolves around.
Removing the line numbers from file2 and storing the lines in file3 (Or if possible, removing the numbers in file 2 whilst keeping the lines in file 2).
file3:
word1 word2
word3
word4

sed 's/ *[0-9]*.//' file2 > file3

Yep. As it was answered here:
you can use awk:
cat file | awk '{print $2}' > newfile
you can use cut:
cat file | cut -f2 > newfile

The cut utility will work. In this case, you have only one word in the line, so you can use just cut -f2, but if you had more columns, cut -f2- will preserve all except the first.

Something like this should solve it:
cut -d\ -f2- < file1

This will remove only the first word/number from each string in file2 and put the rest in file3:
awk '{$1 = ""; print $0;}' file2 > file3
file3:
word1 word2
word3
word4

Assuming you can't just cp file1 file3....
ni file1 > file2
sed 's/[0-9]*[ ]*\(.*\)$/\1/' file1 > file3

Related

Bash Unix search for a list of words in multiple files

I have a list of words I need to check in more one hundred text files.
My list of word's file named : word2search.txt.
This text file contains N word :
Word1
Word2
Word3
Word4
Word5
Word6
Wordn
So far I've done this bash file :
#!/bin/bash
listOfWord2Find=/home/mobaxterm/MyDocuments/word2search.txt
while IFS= read -r listOfWord2Find
do
echo "$listOfWord2Find"
grep -l -R "$listOfWord2Find" /home/mobaxterm/MyDocuments/txt/*.txt
echo "================================================================="
done <"$listOfWord2Find"
The result does not satisfy me, I can hardly exploit the result
Word1
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
/home/mobaxterm/MyDocuments/txt/file2.txt
/home/mobaxterm/MyDocuments/txt/file3.txt
=================================================================
Word2
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file4.txt
/home/mobaxterm/MyDocuments/txt/file5.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word4
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word5
/home/mobaxterm/MyDocuments/txt/new 6.txt
=================================================================
This is what i want to see :
/home/mobaxterm/MyDocuments/txt/file1.txt : Word1, Word2, Word3, Word4
/home/mobaxterm/MyDocuments/txt/file2.txt : Word1
/home/mobaxterm/MyDocuments/txt/file3.txt : Word1
/home/mobaxterm/MyDocuments/txt/file4.txt : Word3
/home/mobaxterm/MyDocuments/txt/file5.txt : Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt : Word1, Word2, Word3, Word4, Word5, Word6
I do not understand why my script doesnt show me the Word6(there are files which contains this word6). It stops at word5. To avoid this issue, I've added a new line blablabla (I'm sure to not find this occurence).
If you can help me on this subject :)
Thank you.

Another much more elegant approach to search all words on each file. One file at a time.
Use grep command multi pattern option -f, --file=FILE, and print matched lines with -o, --only-matching
Then to pipe massage the resulting words into csv list.
Like this:
script.sh
#!/bin/bash
for currFile in $*; do
matched_words_list=$(grep --only-matching --file=$WORDS_LIST $currFile |sort|uniq|awk -vORS=', ' 1|sed "s/, $//")
printf "%s : %s\n" "$currFile" "$matched_words_list"
done
script.sh output
Passing words list file in environment variable: WORDS_LIST
Passing inspected files list as arguments list input.*.txt
export WORDS_LIST=./words.txt; ./script.sh input.*.txt
input.1.txt : word1, word2
input.2.txt : word4
input.3.txt :
Explanation:
using words.txt:
word2
word1
word5
word4
using input.1.txt:
word1
word2
word3
word3
word1
word3
And pipe massage the grep command
grep --file=words.txt -o input.1.txt |sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2
output 1
List all matched words from words.txt in inspected file input.1.txt
grep --file=words.txt -o input.1.txt
word1
word2
word1
output 2
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
grep --file=words.txt -o input.1.txt|sort
word1
word1
word2
output 3
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
grep --file=words.txt -o input.1.txt|sort|uniq
word1
word2
output 4
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv list from the unique words
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1
word1,word2,
output 5
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv list from the unique words
Than remove trailing , from csv list
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2

The suggest strategy is to scan each line once with all words.
Suggest to write gawk script, which is standard Linux awk
script.awk
FNR == NR { # Only in first file having match words list
matchWordsArr[++wordsCount] = $0; # read match words into ordered array
matchedWordInFile[wordsCount] = 0; # reset matchedWordInFile array
}
FNR != NR { # Read line in inspected file
for (i in matchWordsArr) { # scan line for all match words
if ($0 ~ matchWordsArr[i]) matchedWordInFile[i]++; # if word is mached increment respective matchedWordInFile[i]
}
}
ENDFILE{ # on each file read completion
if (FNR != NR) { # if not first file
outputLine = sprintf("%s: ", FILENAME); # assign outputLine header to current fileName
for (i in matchWordsArr) { # iterate over matched words
if (matchedWordInFile[i] == 0) continue; # skip unmatched words
outputLine = sprintf("%s%s%s", outputLine, seprator, matchWordsArr[i]); # append matched word to outputLine
matchedWordInFile[i] = 0; # reset matched words array
seprator = ","; # set words list seperator ","
}
print outputLine;
}
outputLine = seprator = ""; # reset words list seperator "" and outputLine
}
input.1.txt:
word1
word2
word3
input.2.txt:
word3
word4
word5
input.3.txt:
word3
word7
word8
words.txt
word2
word1
word5
word4
running:
$ awk -f script.awk words.txt input.*.txt
input.1.txt: word2,word1
input.2.txt: word5,word4
input.3.txt:

Just grep:
grep -f list.txt input.*.txt
-f FILENAME allows to use a file with patterns for grep to search.
If you want to display the filename along with the match, pass -H in addition to that:
grep -Hf list.txt input.*.txt

grep remove exact matches from line without removing the whole line

I have patterns.txt file and I would like to remove all exact matches of patterns from FILE.txt. The FILE.txt is the following:
word1 word2
word3 word4
word5 word6
The pattern file contains:
word1
word6
The expected output is:
word2
word3 word4
word5
The command below removes the whole row where there is an exact match. How can I only remove the exact match from a line without removing the whole line? I don't want to use for-loops to achieve this.
cat FILE.txt | grep -wvf pattern.txt

With sed:
re=$(tr '\n' '|' < patterns.txt)
sed -r "s/$re//; s/^[[:space:]]*//" file
word2
word3 word4
word5
Note: Make sure patterns.txt does not have a trailing new line or extra new lines since | will end up in each of those positions.

You may try this awk:
awk 'FNR == NR {pats[$1]; next} {more=0; for (i=1; i<=NF; ++i) if (!($i in pats)) printf "%s", (more++ ? OFS : "") $i; print ""}' patterns.txt file
word2
word3 word4
word5
A more readable version:
awk '
FNR == NR {
pats[$1]
next
}
{
more = 0
for (i=1; i<=NF; ++i)
if (!($i in pats))
printf "%s", (more++ ? OFS : "") $i
print ""
}' patterns.txt file

In order to split the text with a word (string) as a delimiter, you can use awk.
awk -F 'word' '{print $1;print $2}' file.txt
In case you want to display only what is after the delimiter then it would be:
awk -F 'word' '{print $2}' file.txt
In order to change the pattern continuously then you might have to create a loop.

My first thought was to do just what #anubhava did. Then I thought that perl might be good for that: perl has good capabilities for filtering lists. The problem is that perl doesn't have an FNR variable. But I played around with a2p and came up with this:
perl -lane '
$FNR = $. - $FNRbase;
if ($. == $FNR) {
$ignore{$F[0]} = 1;
} else {
print join " ", grep {not exists $ignore{$_}} #F;
}
} continue {
$FNRbase = $. if eof
' pattern.txt FILE.txt

Use bash to cut lines in one file to lengths explicitly stated in another

I have one file that is a list of numbers, and another file (same number of lines) in which I need the length of each line to match the number of the line in the other file. For example:
file 1:
5
8
7
11
15
file 2:
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
output:
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
I've tried using awk and cut together but I keep getting the error "fatal: attempt to use array `line' in a scalar context". I'm not sure how else to go about this. Any guidance is much appreciated!

awk is probably more appropriate, but you can also do:
while read line <&3; do
read len <&4; echo "${line:0:$len}";
done 3< file2 4< file1

awk is your tool for this: one of
# read all the lengths, then process file2
awk 'NR == FNR {len[NR] = $1; next} {print substr($0, 1, len[FNR])}' file1 file2
# fetch a line from file1 whilst processing file2
awk '{getline len < lenfile; print substr($0, 1, len)}' lenfile=file1 file2

another awk
$ paste file1 file2 | awk '{print substr($2,1,$1)}'
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno

Using Perl
perl -lne ' BEGIN { open($f,"file1.txt");#x=<$f>;close($f) }
print substr($_,0,$x[$.-1]) ' file2.txt
with the given inputs
$ cat cmswen1.txt
5
8
7
11
15
$ cat cmswen2.txt
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
$ perl -lne ' BEGIN { open($f,"cmswen1.txt");#x=<$f>;close($f) } print substr($_,0,$x[$.-1]) ' cmswen2.txt
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
$

how can I remove a pattern from the begining of lines between two words using sed or awk

I want to remove a pattern in the begining of each line of a paragraph that contains word1 in the first line and end with word2 for example if I have the following file and I want to subsitute --MW by nothing
--MW Word1 this is paragraph number 1
--MW aaa
--MW bbb
--MW ccc
--MW word2
I want to get as result :
Word1 this is paragraph number 1
aaa
bbb
ccc
word2
Thanks in advance

Using sed
sed '/Word1/,/word2/s/--MW //' file
Using awk
awk '/Word1/,/word2/{sub(/--MW /,a)}1' file
Both act on lines between and including the matched phrases and the do a substitution on each line. They print all lines.

If you have your text in myfile.txt you could try:
awk 'BEGIN{f=0}$2=="Word1"{f=1}{if (f==1) {$1="";print $0}else{print $0}}$2=="word2"{f=0}' myfile.txt

If you are sure the pattern is going to be in the beginning of the line, then this command might help:
sed 's/^--MW //' file.txt
Please test and let us know if this worked fine with you.

Hopefully, this will do it for you:
$ echo "--MW Word1 this is paragraph number 1" | cut -d ' ' -f 2-
Where you pass the text to cut command and remove the first token, using space as token separator, while keeping the rest of tokens,i.e., from second to the end.

copy the last column in the first position awk

I want to copy the first value of colum in the first position and comment out the old value.
For example :
word1 word2 1233425 -----> 1233425 word1 word2 #1233425
word1 word2 word3 49586 -----> 49586 word1 word2 word3 #49586
I don't know the number of words preceding the number.
I tried with an awk script :
awk '{$1="";score=$NF;$NF="";print $score $0 #$score}' file
But It does not work.

What about this? It is pretty similar to yours.
$ awk '{score=$NF; $NF="#"$NF; print score, $0}' file
1233425 word1 word2 #1233425
49586 word1 word2 word3 #49586
Note that in your case you are emptying $1, which is not necessary. Just store score as you did and then add # to the beginning of $NF.

Using awk
awk '{f=$NF;$NF="#" $NF;print f,$0}' file
Since we posted the same answer, here is a shorter variation :)
awk '{$0=$NF FS$0;$NF="#"$NF}1' file
$0=$NF FS$0 add last field to line
$NF="#"$NF add # to last field.
1 print line

A perl way to do it:
perl -pe 's/^(.+ )(\d+)/$2 $1 #$2/' infile

sed 's/\(.*\) \([^[:blank:]]\{1,\}\)/\2 \1 #\2/' YourFile
with GNU sed add -posix option

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Removing line numbers (not entire line) from a file in unix - bash

sed 's/ [0-9].//' file2 > file3

Yep. As it was answered here: you can use awk: cat file | awk '{print $2}' > newfile you can use cut: cat file | cut -f2 > newfile

The cut utility will work. In this case, you have only one word in the line, so you can use just cut -f2, but if you had more columns, cut -f2- will preserve all except the first.

Something like this should solve it: cut -d\ -f2- < file1

This will remove only the first word/number from each string in file2 and put the rest in file3: awk '{$1 = ""; print $0;}' file2 > file3 file3: word1 word2 word3 word4

Assuming you can't just cp file1 file3.... ni file1 > file2 sed 's/[0-9][ ]\(.*\)$/\1/' file1 > file3

Related

Bash Unix search for a list of words in multiple files

grep remove exact matches from line without removing the whole line

Use bash to cut lines in one file to lengths explicitly stated in another

how can I remove a pattern from the begining of lines between two words using sed or awk

copy the last column in the first position awk

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Removing line numbers (not entire line) from a file in unix - bash

sed 's/ *[0-9]*.//' file2 > file3

Yep. As it was answered here: you can use awk: cat file | awk '{print $2}' > newfile you can use cut: cat file | cut -f2 > newfile

The cut utility will work. In this case, you have only one word in the line, so you can use just cut -f2, but if you had more columns, cut -f2- will preserve all except the first.

Something like this should solve it: cut -d\ -f2- < file1

This will remove only the first word/number from each string in file2 and put the rest in file3: awk '{$1 = ""; print $0;}' file2 > file3 file3: word1 word2 word3 word4

Assuming you can't just cp file1 file3.... ni file1 > file2 sed 's/[0-9]*[ ]*\(.*\)$/\1/' file1 > file3

Related

Bash Unix search for a list of words in multiple files

grep remove exact matches from line without removing the whole line

Use bash to cut lines in one file to lengths explicitly stated in another

how can I remove a pattern from the begining of lines between two words using sed or awk

copy the last column in the first position awk

Categories

Resources

sed 's/ [0-9].//' file2 > file3

Assuming you can't just cp file1 file3.... ni file1 > file2 sed 's/[0-9][ ]\(.*\)$/\1/' file1 > file3