I need to remove subdomains from file:
.domain.com
.sub.domain.com -- this must be removed
.domain.com.uk
.sub2.domain.com.uk -- this must be removed
so i have used sed :
sed '/\.domain.com$/d' file
sed '/\.domain.com.uk$/d' file
and this part was simple, but when i try to do it in the loop problems appears:
while read line
do
sed '/\$line$/d' filename > filename
done < filename
I suppose it is "." and $ problem , have tried escaping it in many ways but i am out of ideas now.
A solution inspired by NeronLeVelu's idea:
#!/bin/bash
#set -x
domains=($(rev domains | sort))
for i in `seq 0 ${#domains[#]}` ;do
domain=${domains[$i]}
[ -z "$domain" ] && continue
for j in `seq $i ${#domains[#]}` ;do
[[ ${domains[$j]} =~ $domain.+ ]] && domains[$j]=
done
done
for i in `seq 0 ${#domains[#]}` ;do
[ -n "${domains[$i]}" ] && echo ${domains[$i]} | rev >> result.txt
done
For cat domains:
.domain.com
.sub.domain.com
.domain.co.uk
.sub2.domain.co.uk
sub.domain.co.uk
abc.yahoo.com
post.yahoo.com
yahoo.com
You get cat result.txt:
.domain.co.uk
.domain.com
yahoo.com
sed -n 's/.*/²&³/;H
$ {x;s/$/\
/
: again
s|\(\n\)²\([^³]*\)³\(.*\)\1²[^³]*\2³|\1\2\3|
t again
s/[²³]//g;s/.\(.*\)./\1/
p
}' YourFile
Load the file in working buffer then remove (iterative) any line that end with an earlier one, finally priont the result. Use of temporary edge delimiter easier to manage than \n in pattern
--posix -e for GNU sed (tested from AIX)
Your loop is a bit confusing because you're trying to use sed to delete patterns from a file but you take the patterns from the same file.
If you really want to remove subdomains from filename then I suppose you need more something like the following:
#!/bin/bash
set -x
cp domains domains.tmp
while read domain
do
sed -r -e "/[[:alnum:]]+${domain//./\\.}$/d" domains.tmp > domains.tmp2
cp domains.tmp2 domains.tmp
done < dom.txt
Where cat domains is:
.domain.com
.sub.domain.com
.domain.co.uk
.sub2.domain.co.uk
sub.domain.co.uk
abc.yahoo.com
post.yahoo.com
and cat dom.txt is:
.domain.com
.domain.co.uk
.yahoo.com
Running the script on these inputs results in:
$ cat domains.tmp
.domain.com
.domain.co.uk
Each iteration will remove subdomains of domain currently read from dom.txt, store it in a temporary file the contents of which is used in the next iteration for additional filtering.
It's good to try your scripts with set -x, you'll see some of the substitutions, etc.
Related
it is my first question here. I have a folder called "materials", which has 40 text files in it. I am basically trying to combine the text files that contain the word "carbon"(both in capitalized and lowercase form)in it into a single file with leaving newlines between them. I used " grep -w carbon * " to identify the files that contain the word carbon. I just don't know what to do after this point. I really appreciate all your help!
grep -il carbon materials/*txt | while read line; do
echo ">> Adding $line";
cat $line >> result.out;
echo >> result.out;
done
Explanation
grep searches the strings in the files. -i ignores the case for the searched string. -l prints on the filename containing the string
while command loops over the files containing the string
cat with >> appends to the results.out
echo >> adds new line after appending each files content to result.out
Execution
$ ls -1 materials/*.txt
materials/1.txt
materials/2.txt
materials/3.txt
$ grep -i carbon materials/*.txt
materials/1.txt:carbon
materials/2.txt:CARBON
$ grep -irl carbon materials/*txt | while read line; do echo ">> Adding $line"; cat $line >> result.out; echo >> result.out; done
>> Adding materials/1.txt
>> Adding materials/2.txt
$ cat result.out
carbon
CARBON
Try this (assuming your filenames don't contain newline characters):
grep -iwl carbon ./* |
while IFS= read -r f; do cat "$f"; echo; done > /tmp/combined
If it is possible that your filenames may contain newline characters and your shell is bash, then:
grep -iwlZ carbon ./* |
while IFS= read -r -d '' f; do cat "$f"; echo; done > /tmp/combined
grep is assumed to be GNU grep (for the -w and -Z options). Note that these will leave a trailing newline character in the file /tmp/combined.
I have been working in bash, and need to create a string argument. bash is a newish for me, to the point that I dont know how to build a string in bash from a list.
// foo.txt is a list of abs file names.
/foo/bar/a.txt
/foo/bar/b.txt
/delta/test/b.txt
should turn into: a.txt,b.txt,b.txt
OR: /foo/bar/a.txt,/foo/bar/b.txt,/delta/test/b.txt
code
s = ""
for file in $(cat foo.txt);
do
#what goes here? s += $file ?
done
myShellScript --script $s
I figure there was an easy way to do this.
with for loop:
for file in $(cat foo.txt);do echo -n "$file",;done|sed 's/,$/\n/g'
with tr:
cat foo.txt|tr '\n' ','|sed 's/,$/\n/g'
only sed:
sed ':a;N;$!ba;s/\n/,/g' foo.txt
This seems to work:
#!/bin/bash
input="foo.txt"
while IFS= read -r var
do
basename $var >> tmp
done < "$input"
paste -d, -s tmp > result.txt
output: a.txt,b.txt,b.txt
basename gets you the file names you need and paste will put them in the order you seem to need.
The input field separator can be used with set to create split/join functionality:
# split the lines of foo.txt into positional parameters
IFS=$'\n'
set $(< foo.txt)
# join with commas
IFS=,
echo "$*"
For just the file names, add some sed:
IFS=$'\n'; set $(sed 's|.*/||' foo.txt); IFS=,; echo "$*"
What I have is a file (let's call it 'xfile'), containing lines such as
file1 <- this line goes to file1
file2 <- this goes to file2
and what I want to do is run a script that does the work of actually taking the lines and writing them into the file.
The way I would do that manually could be like the following (for the first line)
(echo "this line goes to file1"; echo) >> file1
So, to automate it, this is what I tried to do
IFS=$'\n'
for l in $(grep '[a-z]* <- .*' xfile); do
$(echo $l | sed -e 's/\([a-z]*\) <- \(.*\)/(echo "\2"; echo)\>\>\1/g')
done
unset IFS
But what I get is
-bash: file1(echo "this content goes to file1"; echo)>>: command not found
-bash: file2(echo "this goes to file2"; echo)>>: command not found
(on OS X)
What's wrong?
This solves your problem on Linux
awk -F ' <- ' '{print $2 >> $1}' xfile
Take care in choosing field-separator in such a way that new files does not have leading or trailing spaces.
Give this a try on OSX
You can use the regex capabilities of bash directly. When you use the =~ operator to compare a variable to a regular expression, bash populates the BASH_REMATCH array with matches from the groups in the regex.
re='(.*) <- (.*)'
while read -r; do
if [[ $REPLY =~ $re ]]; then
file=${BASH_REMATCH[1]}
line=${BASH_REMATCH[2]}
printf '%s\n' "$line" >> "$file"
fi
done < xfile
I have a file called file_names_list.txt which contains absolute file names, for example, the first line is:
~/Projects/project/src/files/file.mm
I run a script to grep each of these files,
for file in $(cat file_names_list.txt); do
echo "doing file: $file"
grep '[ \t]*if (.* = .*) {' $file | while read -r line ; do ...
and I get the output:
doing file: ~/Projects/project/src/files/file.mm
grep: ~/Projects/project/src/files/file.mm: No such file or directory
But if I go to the terminal and enter
grep '[ \t]*if (.* = .*) {' ~/Projects/project/src/files/file.mm
I get the proper grep output
What's the problem here? I'm out of ideas
The problem is with the ~ character. That character gets expanded to your home directory when you use it in bash but in this case, it is just another character stored in the variable $file. To see the difference, try this:
file='~'
echo $file
echo ~
So now you have to either recreate the file file_names_list.txt or try to fix it, e.g. with sed:
sed -i -e "s|^~/|$HOME/|" file_names_list.txt
Also note that it would be prefereable to use a while loop instead of a for loop:
while IFS= read -r file; do
# write your code here
done < file_names_list.txt
You can use your script like this:
while IFS= read -r f; do
grep '[ \t]*if .* = .* {' "${f/#\~/\$HOME}"
done < file_names_list.txt
Since ~ cannot be stored in a variable and expanded we are replacing starting ~ by $HOME in each line in this BASH expression: "${f/#\~/\$HOME}"
I would like to remove a file name only from the following configuration file.
Configuration File -- test.conf
knowledgebase/arun/test.rf
knowledgebase/arunraj/tester/test.drl
knowledgebase/arunraj2/arun/test/tester.drl
The above file should be read. And removed contents should went to another file called output.txt
Following are my try. It is not working to me at all. I am getting empty files only.
#!/bin/bash
file=test.conf
while IFS= read -r line
do
# grep --exclude=*.drl line
# awk 'BEGIN {getline line ; gsub("*.drl","", line) ; print line}'
# awk '{ gsub("/",".drl",$NF); print line }' arun.conf
# awk 'NF{NF--};1' line arun.conf
echo $line | rev | cut -d'/' -f 1 | rev >> output.txt
done < "$file"
Expected Output :
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test
There's the dirname command to make it easy and reliable:
#!/bin/bash
file=test.conf
while IFS= read -r line
do
dirname "$line"
done < "$file" > output.txt
There are Bash shell parameter expansions that will work OK with the list of names given but won't work reliably for some names:
file=test.conf
while IFS= read -r line
do
echo "${line%/*}"
done < "$file" > output.txt
There's sed to do the job — easily with the given set of names:
sed 's%/[^/]*$%%' test.conf > output.txt
It's harder if you have to deal with names like /plain.file (or plain.file — the same sorts of edge cases that trip up the shell expansion).
You could add Perl, Python, Awk variants to the list of ways of doing the job.
You can get the path like this:
path=${fullpath%/*}
It cuts away the string after the last /
Using awk one liner you can do this:
awk 'BEGIN{FS=OFS="/"} {NF--} 1' test.conf
Output:
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test