My task
I have a file A.txt with the following content.
aijdish uhuih
buh iiu hhuih
zhuh hiu
d uhiuhg ui
...
I want to select lines with these words aijdish, d, buh ...
I only know that I can:
cat A.txt | grep "aijdish" > temp.txt
cat A.txt | grep "d" >> temp.txt
cat A.txt | grep "buh" >> temp.txt
...
But I have several thousands of words need to select this time, how can I do this under bash?
Since you have many words you want to look for I suggest putting the pattern into a file and use greps -f option:
$ cat grep-pattern.txt
aijdish
buh
d
$ grep -f grep-pattern.txt inputfile
aijdish uhuih
buh iiu hhuih
d uhiuhg ui
But if you have words like d you might want to add the -w option to match only whole words and not parts of words.
grep -wf grep-pattern.txt inputfile
$ grep -E "aijdish|d|buh" inputfile
aijdish uhuih
buh iiu hhuih
d uhiuhg ui
Store the words to be searched in a file (say a.txt) and then write a script for searching every line in a.txt and matching it in the required file
Related
Suppose I am writing a shell script foo.bash to concatenate the contents of test/*.txt with a comma like that:
> cat test/x.txt
a b c
> cat test/y.txt
1 2 3
> foo.bash test
a b c,
1 2 3
How would you write such a script ?
Could you please try following(in case you want to concatenate lines of files, line by line with comma).
paste -d, *.txt
EDIT2: To concatenate all .txt files contents with , try following once(needed GNU awk).
awk 'ENDFILE{print ","} 1' *.txt | sed '$d'
What about
for file in /tmp/test/*.txt; do
echo -n "$(cat "$file"),"
done | sed 's/.$//'
or maybe
for file in /tmp/test/*.txt; do
sed 's/$/,/' "$file"
done | sed 's/.$//'
You could use regex to do achive it.The command below grabs the content of every file and appends it after the name of the file ($ARGV).
$ grep -ER '*'
a.txt:a
b.txt:b
c.txt:c
$ perl -pe 's/^(.*)\n$/$ARGV:\1,/;' * > file.txt
$ cat file.txt
a.txt:a,b.txt:b,c.txt:c,
I want to merge a large number of files into a single file and this merge file should happen based on ascending order of the file name. I have tried the below command and it works as intended but the only problem is that after the merge the output.txt file contains whole data in a single line because all the input files have only one line of data without any newline.
Is there any way to merge each file data into output.txt as separate line rather than merging every file data into a single line?
My list of files has the naming format of 9999_xyz_1.json, 9999_xyz_2.json, 9999_xyz_3.json, ....., 9999_xyz_12000.json.
Example:
$ cat 9999_xyz_1.json
abcdef
$ cat 9999_xyz_2.json
12345
$ cat 9999_xyz_3.json
Hello
Expected output.txt:
abcdef
12345
Hello
Actual output:
$ ls -d -1 -v "$PWD/"9999_xyz_*.json | xargs cat
abcdef12345
EDIT:
Since my input files won't contain any spaces or special characters like backslash or quotes, I decided to use the below command which is working for me as expected.
find . -name '9999_xyz_*.json' -type f | sort -V | xargs awk 1 > output.txt
Tried with file name containing a space and below are the results with 2 different commands.
Example:
$ cat 9999_xyz_1.json
abcdef
$ cat 9999_ xyz_2.json -- This File name contains a space
12345
$ cat 9999_xyz_3.json
Hello
Expected output.txt:
abcdef
12345
Hello
Command:
find . -name '9999_xyz_*.json' -print0 -type f | sort -V | xargs -0 awk 1 > output.txt
Output:
Successfuly completed the merge as expected but with an error at the end.
abcdef
12345
hello
awk: cmd. line:1: fatal: cannot open file `
' for reading (No such file or directory)
Command:
Here I have used the sort with -zV options to avoid the error occured in the above command.
find . -name '9999_xyz_*.json' -print0 -type f | sort -zV | xargs -0 awk 1 > output.txt
Output:
Command completed successfully but results are not as expected. Here the file name having space is treated as last file after the sort. The expectation is that the file name with space should be at second position after the sort.
abcdef
hello
12345
I would approach this with a for loop, and use echo to add the newline between each file:
for x in `ls -v -1 -d "$PWD/"9999_xyz_*.json`; do
cat $x
echo
done > output.txt
Now, someone will invariably comment that you should never parse the output of ls, but I'm not sure how else to sort the files in the right order, so I kept your original ls command to enumerate the files, which worked according to your question.
EDIT
You can optimize this a lot by using awk 1 as #oguzismail did in his answer:
ls -d -1 -v "$PWD/"9999_xyz_*.json | xargs awk 1 > output.txt
This solution finishes in 4 seconds on my machine, with 12000 files as in your question, while the for loop takes 13 minutes to run. The difference is that the for loop launches 12000 cat processes, while the xargs needs only a handful to awk processes, which is a lot more efficient.
Note: if want to you upvote this, make sure to upvote #oguzismail's answer too, since using awk 1 is his idea. But his answer with printf and sort -V is safer, so you probably want to use that solution anyway.
Don't parse the output of ls, use an array instead.
for fname in 9999_xyz_*.json; do
index="${fname##*_}"
index="${index%.json}"
files[index]="$fname"
done && awk 1 "${files[#]}" > output.txt
Another approach that relies on GNU extensions:
printf '%s\0' 9999_xyz_*.json | sort -zV | xargs -0 awk 1 > output.txt
I have a small bash script as follows :
cat foo.txt | grep "balt" > bar_file
Ideally what I would like to happen is that every word that contains "balt", I would like removed from the foo.txt file. Can I get direction on how to basically move words from one file from another based on whats grepped.
As a side note: There is no need to use cat and pipe its output to grep since you can pass the filename directly to grep which reduces a single process execution.
As for your question you can -o option of grep to get matching words only having balt in them along with \b boundary checking like this:
$ cat foo.txt
abcd baltabcd xyz
xdef abbaltcd xyz
balt
$ grep -o '\b\w*balt\w*\b' foo.txt
baltabcd
abbaltcd
balt
$ grep -o '\b\w*balt\w*\b' foo.txt > bar_file
$ cat bar_file
baltabcd
abbaltcd
balt
$
As you can see grep matches 0 or more word characters present before or after balt and puts that into another file.
Example words were: baltabcd, abbaltcd and balt
Basically what I'm trying to do is move lines 1 through 4 from A.txt
and replace the lines 5 through 8 in B.txt with them.
I figured out how to get the first four lines with sed,
but I cannot figure out how to "send" them to replace the lines in the second txt file.
cat A.txt
1 a
2 b
3 c
4 d
5 e
cat B.txt
one
two
three
four
five
six
seven
eigh
nine
Result
one
two
three
four
1 a
2 b
3 c
4 d
nine
This might work for you (GNU sed):
sed -i -e '5,8R a.txt' -e '5,8d' b.txt
for your example, this awk one-liner works too:
awk 'NR>4&&NR<9{getline $0<"a.txt"}7' b.txt
this prints the expected output, you need play with redirection if you want to save it back to b.txt.
This awk should do:
awk 'FNR==NR {a[NR]=$0;next} FNR>=5 && FNR<=8 {$0=a[FNR-4]}1' A.txt B.txt > tmp && mv tmp B.txt
It stores the lines of A.txt in an array named a
Then if line number of B.txt is between 5 and 8 replace value using info from array a
Result is stored in a temp file tmp and then moved back to B.txt
#!/usr/local/bin/bash -x
sed -n '1,4p' B.txt > B.txt.tmp
sed -n '1,4p' A.txt >> B.txt.tmp
sed -n '9p' B.txt >> B.txt.tmp
mv B.txt B.txt.bak
mv B.txt.tmp B.txt
This is static. Still, as long as you know your line addresses, this will work.
If you want support for variable span lengths, you will need to do something like this in your files:-
#----------numbers-begin----------
one
two
three
four
#----------numbers-end----------
From there, you can get to them inside the file with:-
sed -n '/--numbers-begin--/,/--numbers-end--/p' <filename> > newfile
Not only does that give you anchors to play with, but sed printing is my own preferred method of importing strings for variables in scripts, because it doesn't cause the shell to try and literally interpret the text as a command, as cat does for some reason.
The other thing that you can do in future files, is something like this:-
numbers:one
numbers:two
numbers:three
numbers:four
words:dog
words:cat
words:rat
Then:-
#!/usr/local/bin/bash
for i in $(sed -n '/^/,/$/p' file)
do
if [ $(echo ${i} | sed -n '/numbers/p' ]
then
echo ${i} | cut -d':' -f2 >> numbers-only-file
fi
done
Data structuring. It's all about the data structuring. Structure your data properly, and you will have practically no work at all.
I have 2 files:
file1
string list:
ben
john
eric
file2
few rows:
ben when to school
my mother went out
the dog is big
john has FB
eric is nice guy
expoted file :
my mother went out
the dog is big
I would like to use grep -v and remove the rows that contains the strings from list.
this is the idea but the wrong command :
grep -v `cat file2` file1 > out
Thanks
Asaf
Using a file for patterns:
grep -v -f "file2" file1 > out
or else you can do:
grep -v -e "string1" -e "string2" file1
If I understand the question correctly, 4 matched lines should be removed one by one, and a single grep can't do it unless you define a REGEXP containing those 4 lines, which is tricky
But this script should do it
cp file2 out
cat file1 | while read STR ; do
LINE="`grep \"$STR\" file2`"
sed -i "/^$LINE\$/d" out
done
cat out