Bash - omit lines starting with a mis-spelled word (using hunspell) - bash

I have a file words.txt in which each line is a word, followed by a TAB, followed by an integer (which represents the word's frequency). I want to generate a new file containing only those lines where the word is spelled correctly.
Using cat words.txt | hunspell -1 -G > ok_words.txt I can get a list of correct words, but how can I also include the remainder of each line (ie the TAB and the number)?
Input:
adwy 27
bird 10
cat 12
dog 42
erfgq 9
fish 2
Desired Output:
bird 10
cat 12
dog 42
fish 2

The easiest way would be to use the join command:
$ join words.txt ok_words.txt
bird 10
cat 12
dog 42
fish 2
or to preserve tabs:
$ join -t $'\t' words.txt ok_words.txt
bird 10
cat 12
dog 42
fish 2

Related

Multiple lines added to vim line by line

Can you please help add multiple lines of txt to the file via bash script through vim?
I tried this:
vim -c "3 s/^/
add-this-line1
add-this-line2
add-this-line3/" -c "wq" /var/www/html/webserver/output_file.txt
But, the output of the file looks like this:
3 add-this-line1 add-this-line2 add-this-line3
What I want to do is, add the lines one by one FROM the line 3 in the output_file.txt not at the line 3 one next to another.
This is more of a job for ed, IMO
seq 10 > file
ed file <<END_ED
3a
first
second
third
.
wq
END_ED
For those new to ed, the line with the dot signals the end of "insert mode".
file now contains:
1
2
3
first
second
third
4
5
6
7
8
9
10
if you really want to do it via vim, I believe you need to insert new lines in your substitution:
vim -c "3 s/^/add-this-line1\radd-this-line2\radd-this-line3\r/" -c "wq" /var/www/html/webserver/output_file.txt
With ex or ed if available/acceptable.
printf '%s\n' '3a' 'foo' 'bar' 'baz' 'more' . 'w output_file.txt' | ex -s input_file.txt
Replace ex with ed and it should be the same output.
Using a bash array to store the data that needs to be inserted.
to_be_inserted=(foo bar baz more)
printf '%s\n' '3a' "${to_be_inserted[#]}" . 'w output_file.txt' | ex -s inputfile.txt
Again change ex to ed should do the same.
If the input file needs to be edited in-place then remove the output_file.txt just leave the w.
Though It seems you want to insert from the beginning of the line starting from line number "3 s/^/
Give the file.txt that was created by running
printf '%s\n' {1..10} > file.txt
A bit of shell scripting would do the trick.
#!/usr/bin/env bash
start=3
to_be_inserted=(
foo
bar
baz
more
)
for i in "${to_be_inserted[#]}"; do
printf -v output '%ds/^/%s/' "$start" "$i"
ed_array+=("$output")
((start++))
done
printf '%s\n' "${ed_array[#]}" ,p Q | ed -s file.txt
Output
1
2
foo3
bar4
baz5
more6
7
8
9
10
Change Q to w if in-place editing is needed.
Remove the ,p if you don't want to see the output.

how to use sed to replace the specific line/lines in a file with the contents from another file

I want to replace several lines in one of my files with the contents (which consists of the same lines) from another file which is located in another folder with the sed command.
For example: file1.txt is in /storage/file folder, and it looks like this:
'ABC'
'EFG' 001
HJK
file2.txtis located in /storage folder, and it looks like this:
'kkk' 123456789
yyy
so I want to use the content of file2.txt (which is one line) to replace the 2nd and 3rd line of file1.txt, and `file1.txt' should become like this:
'ABC'
'kkk' 123456789
yyy
I probably should make my questions more clear. So I'm trying to write a shell script which can be used to change several lines of a file (let's call it old.txt) with the new contents that I supplied in other files (which only contains the contents to be updated to the old file, for example, these files are dataA.txt,dataB.txt...... ).
Let's say, I want to replace the 3rd line of old.txt which is:
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 100 77760 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
with the new data that I supplied in dataA.txt which is:
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
and to replace the 15th to 18th lines of the old.txt file which looks like:
100 0 1
101 1 2
102 2 1.5
103 4 52
with the supplied `dataB.txt' file which looks like (also contain 4 lines):
-100
-101
-102
-103
As I'm totally new to shell script programming, and I only used sedbefore, I tried the following command:
to change the 3ed line, I did sed -i '3c r ../../dataA.txt' old.txt, r ../../dataA.txt is to find the location of dataA.txt. However, as c needs to be followed by the content that to be changed rather the path of the content that to be changed. so I'm not very sure how to correctly use sed. Another idea that I'm thinking is to insert the dataA.txt ,dataB.txt... in front of the line that I want to modify and then deleted the old lines. But I'm still not sure how to do it after I googled for so long...
To replace a range of lines with entire contents of another file:
sed -e '15r file2' -e '15,18d' file1
To replace a single line with entire contents of another file:
sed -e '2{r file2' -e 'd}' file1
If you don't know whether file2 ends in newline or not, you can use the below trick (see What does this mean in Linux sed '$a\' a.txt):
sed '$ a\' file2 | sed -e '3{r /dev/stdin' -e 'd}' file1
The main trick is to use r command to add contents from the other file for the starting line address. And then delete the line(s) to be replaced. The -e option is needed because everything after r will be treated as filename.
Note that these have been tested with GNU sed, I'm not sure if it will vary for other implementations.
See my github repo for more examples, such as matching lines based on regex instead of line numbers.
It is trivial with ed
printf '%s\n' '2,$d' 'r /storage/file2.txt' ,p Q | ed -s /storagefile/file1.txt
A syntax that should work with more variety of Unix shells.
printf '2,$d\nr /storage/file2.txt\n,p\nQ\n' | ed -s /storage/file/file1.txt
2,$d means 2 and $ are the line addresses, 2 is line 2 and $ is the last line in the buffer and d means delete.
,p means print everything to stdout which is your screen.
Q means silence the error which q will not.
With ed to change line 3 of a file with another content of a file, without using shell variables.
First delete the content of line 3 of the file.
printf '%s\n' '3d' ,p Q | ed -s file1.txt
Then add the content of the other file, say file2.txt at line 3.
printf '2r file2.txt' ,p Q | ed -s file1.txt
To replace a group/set of lines in a file with the content of another file.
First delete the lines, say 15 to 18 from say file1.txt
printf '%s\n' '15,18d' ,p Q | ed -s file1.txt
Then add the content of say file2.txt to line 15 of file1.txt
printf '%s\n' '14r file2.txt' ,p Q | ed -s file1.txt
The Q does not edit anything replace it with w to edit files.
The r appends so 14 r means append the content of another file after line 14 which makes it line 15. Same is true with 2 r append to line 2 which makes it line 3.
Also all of that can be done with one line, this code was adopted with your data/files names. Also this assumes that all the text file are in the same directory where you will run the code below, otherwise add the absolute path of the files in question.
printf '%s\n' '3d' '2r dataA.txt' '15,18d' '14r dataB.txt' ,n Q | ed -s old.txt
Replace the Q with w If you're satisfied with the output and if you want to actually edit the old.txt
the ,n prints everything to stdout which is your screen but with a line number at the front.
To have an idea of what the actual code is being pipe to ed remove or comment out the pipe | and all the code after that.
See info ed or man ed for more info about ed
An example of that ed script.
Create a new directory and cd into it.
mkdir temp && cd temp
cat dataA.txt
Output
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
cat dataB.txt
Output
-100
-101
-102
-103
cat old.txt
Output
foo
bar
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 100 77760 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
a
b
c
d
e
f
g
h
i
j
k
100 0 1
101 1 2
102 2 1.5
103 4 52
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
The script.
printf '%s\n' '3d' '2r dataA.txt' '15,18d' '14r dataB.txt' ,n w | ed -s old.txt
Output
1 foo
2 bar
3 'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
4 a
5 b
6 c
7 d
8 e
9 f
10 g
11 h
12 i
13 j
14 k
15 -100
16 -101
17 -102
18 -103
19 l
20 m
21 n
22 o
23 p
24 q
25 r
26 s
27 t
28 u
29 v
30 w
31 x
32 y
33 z
The actual old.txt
cat old.txt
Output
foo
bar
'TIME_STEPS' 'TIME CYCLE' 'ELAPSED' 500 8520 0 1.e+99 1. 9999 1. 1.e-20 1.e+99
a
b
c
d
e
f
g
h
i
j
k
-100
-101
-102
-103
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

How to get the length of each word in a column without AWK, sed or a loop? [duplicate]

This question already has answers here:
Length of string in bash
(11 answers)
Closed 6 years ago.
Is it even possible? I currently have a one-liner to count the number of words in a file. If I output what I currently have it looks like this:
3 abcdef
3 abcd
3 fec
2 abc
This is all done in 1 line without loops and I was thinking if I could add a column with length of each word in a column. I was thinking I could use wc -m to count the characters, but I don't know if I can do that without a loop?
As seen in the title, no AWK, sed, perl.. Just good old bash.
What I want:
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3
Where the last column is length of each word.
while read -r num word; do
printf '%s %s %s\n' "$num" "$word" "${#word}"
done < file
You can do something like this also:
File
> cat test.txt
3 abcdef
3 abcd
3 fec
2 abc
Bash script
> cat test.txt.sh
#!/bin/bash
while read line; do
items=($line) # split the line
strlen=${#items[1]} # get the 2nd item's length
echo $line $strlen # print the line and the length
done < test.txt
Results
> bash test.txt.sh
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3

Randomly sample lines retaining commented header lines

I'm attempting to randomly sample lines from a (large) file, while always retaining a set of "header lines". Header lines are always at the top of the file and unlike any other lines, begin with a #.
The actual file format I'm dealing with is a VCF, but I've kept the question general
Requirements:
Output all header lines (identified by a # at line start)
The command / script should (have the option to) read from STDIN
The command / script should output to STDOUT
For example, consider the following sample file (file.in):
#blah de blah
1
2
3
4
5
6
7
8
9
10
An example output (file.out) would be:
#blah de blah
10
2
5
3
4
I have a working solution (in this case selecting 5 non-header lines at random) using bash. It is capable of reading from STDIN (I can cat the contents of file.in into the rest of the command) however it writes to a named file rather than STDOUT:
cat file.in | tee >(awk '$1 =~ /^#/' > file.out) | awk '$1 !~ /^#/' | shuf -n 5 >> file.out
By using process substitution (thanks Tom Fenech), both commands are seen as files.
Then using cat we can concatenate these "files" together and output to STDOUT.
cat <(awk '/^#/' file) <(awk '!/^#/' file | shuf -n 10)
Input
#blah de blah
1
2
3
4
5
6
7
8
9
10
Output
#blah de blah
1
9
8
4
7
2
3
10
6
5

Doing multi-staged text manipulation on the command line?

I have a file with a bunch of text in it, separated by newlines:
ex.
"This is sentence 1.\n"
"This is sentence 2.\n"
"This is sentence 3. It has more characters then some other ones.\n"
"This is sentence 4. Again it also has a whole bunch of characters.\n"
I want to be able to use some set of command line tools that will, for each line, count the number of characters in each line, and then, if there are more than X characters per that line, split on periods (".") and then count the number of characters in each element of the split line.
ex. of final output, by line number:
1. 24
2. 24
3. 69: 20, 49 (i.e. "This is sentence 3" has 20 characters, "It has more characters then some other ones" has 49 characters)
wc only takes as input a file name, so I'm having trouble directing it it to take in a text string to do character count on
head -n2 processed.txt | tr "." "\n" | xargs -0 -I line wc -m line
gives me the error: ": open: No such file or directory"
awk is perfect for this. The code below should get you started and you can work out the rest:
awk -F. '{print length($0),NF,length($1)}' yourfile
Output:
23 2 19
23 2 19
68 3 19
70 3 19
It uses a period as the field separator (-F.), prints the length of the whole line ($0), the number of fields (NF), and the length of the first field ($1).
Here is another little example that prints the whole line and the length of each field:
awk -F. '{print $0;for(i=0;i<NF;i++)print length($i)}' yourfile
"This is sentence 1.\n"
23
19
"This is sentence 2.\n"
23
19
"This is sentence 3. It has more characters then some other ones.\n"
68
19
44
"This is sentence 4. Again it also has a whole bunch of characters.\n"
70
19
46
By the way, "wc" can process strings sent to its stdin like this:
echo -n "Hello" | wc -c
5
How about:
head -n2 processed.txt | tr "." "\n" | wc -m line
You should understand better what xargs does and how pipes work. Do google for a good tutorial on those before using them =).
xargs passes each line separately to the next utility. This is not what you want: you want wc to get all the lines here. So just pipe the entire output of tr to it.

Resources