Remove specific words from a text file in bash - bash

I want to remove specific words from a txt file in bash.
Here is my current script:
echo "Sequenzia Import Tag Sidecar Processor v0.2"
echo "=============================================================="
rootfol=$(pwd)
echo "Selecting files from current folder........"
images=$(ls *.jpg *.jpeg *.png *.gif)
echo "Converting sidecar files to folders........"
for file in $images
do
split -l 8 "$file.txt" tags-
for block in tags-*
do
foldername=$(cat "$rootfol/$block" | tr '\r\n' ' ')
FOO_NO_EXTERNAL_SPACE="$(echo -e "${foldername}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
mkdir "$FOO_NO_EXTERNAL_SPACE" > /dev/null
cd "$FOO_NO_EXTERNAL_SPACE"
done
mv "$rootfol/$file" "$file"
cd "$rootfol"
rm tags-* $file.txt
done
echo "DONE! Move files to import folder"
What it does is read the txt file that is named the same as a image and create folders that are interpreted as tags during a import into a Sequenzia image board (based in myimoutobooru) (https://code.acr.moe/kazari/sequenzia).
What i want to do is remove specific words (actually there symbol combinations) from the sidecar file so that they do not cause issues with the import process.
Combinations like ">_<" and ":o" i want to remove from the file.
What can i add that allows me do this with a list of illegal words considering my current script.

Before the line "split -l 8 "$file.txt" tags-" I suggest you clean up the $file.txt using something like:
sef -f sedscript <"$file.txt" >tempfile
sedscript is a file that you create beforehand containing all your unwanted strings, e.g.
s/>_<//g
s/:o//g
You'd change your split command to use tempfile.
Experimenting with stdin/stdout on my PC suggests that multiple matches in a sed script are executed in the same pass over the input file. Therefore is the file is large, this appraoch avoids reading the file multiple times.
another variant of this approach is:
sed -e s/>_<//g -e s/:o//g <infile >outfile
repeat the
-e s/xxx//g
option as many times as required.

You can create a file which lists out your illegal strings and iterate through the lines of the file, using regex to remove each one from your input like this.

Related

Sed & Mac OS Terminal: How to remove parentheses content from the first line of every file?

I am on Mac Os 10.14.6 and have a directory that contains subdirectories that all contain text files. Altogether, there are many hundreds of text files.
I would like to go through the text files and check for any content in the first line that is in parentheses. If such content is found, then the parentheses (and content in the parentheses) should be removed.
Example:
Before removal:
The new world (82 edition)
After removal:
The new world
How would I do this?
Steps I have tried:
Google around, it seems SED would be best for this.
I have found this thread, which provides SED code for removing bracketed content.
sed -e 's/([^()]*)//g'
However, I am not sure how to adapt it to work on multiple files and also to limit it to the first line of those files. I found this thread which explains how to use SED on multiple files, but I am not sure how to adapt the example to work with parentheses content.
Please note: As long as the solution works on Mac OS terminal, then it does not need to use SED. However, from Googling, SED seems to be the most suited.
I managed to achieve what you're after simply by using a bash script and sed together, as so:
#!/bin/bash
for filename in $PWD/*.txt; do
sed -i '' '1 s/([^()]*)//g' $filename
done
The script simply iterates over all the .txt files in $PWD (the current working directory, so that you can add this script to your bin and run it anywhere), and then runs the command
sed -ie '1 s/([^()]*)//g' $filename
on the file. By starting the command with the number 1 we tell sed to only work on the first line of the file :)
Edit: Best Answer
The above works fine in a directory where all contained objects are files, and not including directories; in other words, the above does not perform recursive search through directories.
Therefore, after some research, this command should perform exactly what the question asks:
find . -name "*.txt" -exec sed -i '' '1 s/([^()]*)//g' {} \;
I must iterate, and reiterate, that you test this on a backup first to test it works. Otherwise, use the same command as above but change the '' in order to control the creation of backups. For example,
find . -name "*.txt" -exec sed -i '.bkp' '1 s/([^()]*)//g' {} \;
This command will perform the sed replace in the original file (keeping the filename) but will create a backup file for each with the appended .bkp, for example test1.txt becomes test1.txt.bkp. This a safer option, but choose what works best for you :)
Good try,
The command you where looking for single line:
sed -E '1s|\([^\)]+\)||'
The command to replace each input file first line:
sed -Ei '1s|\([^\)]+\)||' *.txt
example:
echo "The new world (82 edition)" |sed -E '1s|\([^\)]+\)||'
The new world
Explanation
sed -Ei E option: the extended RegExp syntax, i option: for in-place file replacement
sed -Ei '1s|match RegExp||' for first line only, replace first matched RegExp string with empty string
\([^\)]+\) RegExp matching: start with (, [^\)]any char not ), + - more than once, terminate with )
Try:
# create a temporary file
tmp=$(mktemp)
# for each something in _the current directory_
for i in *; do
# if it is not a file, don't parse it
if [ ! -f "$i" ]; then continue; fi
# remove parenthesis on first line, save the output in temporary file
sed '1s/([^)]*)//g' "$i" > "$tmp"
# move temporary file to the original file
mv "$tmp" "$i"
done
# remove temporary file
rm "$tmp"

Script that lists all file names in a folder, along with some text after each name, into a txt file

I need to create a file that lists all the files in a folder into a text file, along with a comma and the number 15 after. For example
My folder has video.mp4, video2.mp4, picture1.jpg, picture2.jpg, picture3.png
I need the text file to read as follows:
video.mp4,15
video2.mp4,15
picture1.jpg,15
picture2.jpg,15
picture3.png,15
No spaces, just filename.ext,15 on each line. I am using a raspberry pi. I am aware that the command ls > filename.txt would put all the file names into a folder, but how would I get a ,15 after every line?
Thanks
bash one-liner:
for f in *; do echo "$f,15" >> filename.txt; done
To avoid opening the output file on each iteration you may redirect the entire output with > filename.txt:
for f in *; do echo "$f,15"; done > filename.txt
$ printf '%s,15\n' *
picture1.jpg,15
picture2.jpg,15
picture3.png,15
video.mp4,15
video2.mp4,15
This will work if those are the only files in the directory. The format specifier %s,15\n will be applied to each of printf's arguments (the names in the current directory) and they will be outputted with ,15 appended (and a newline).
If there are other files, then the following would work too, regardless of whether there are files called like this or not:
$ printf '%s,15\n' video.mp4 video2.mp4 picture1.jpg picture2.jpg "whatever this is"
video.mp4,15
video2.mp4,15
picture1.jpg,15
picture2.jpg,15
whatever this is,15
Or, on all MP4, PNG and JPEG files:
$ printf '%s,15\n' *.mp4 *.jpg *.png
video.mp4,15
video2.mp4,15
picture1.jpg,15
picture2.jpg,15
picture3.png,15
Then redirect this to a file with printf ...as above... >output.txt.
If you're using Bash, then this will not make use of any external utility, as printf is built into the shell.
You need to do something like this:
#!/bin/bash
for i in $(ls folder_name); do
echo $i",15" >> filename.txt;
done
It's possible to do this in one line, however, if you want to create a script, consider code readability in the long run.
Edit 1: better solution
As #CristianRamon-Cortes suggested in the comments below, you should not rely on the output of ls because of the problems explained in this discussion: why not parse ls. As such, here's how you should write the script instead:
#!/bin/bash
cd folder_name
for i in *; do
echo $i",15" >> filename.txt;
done
You can skip the part cd folder_name if you are already in the folder.
Edit 2: Enhanced solution:
As suggested by #kusalananda, you'd better do the redirection after done to avoid opening the file in each iteration of the for loop, so the script will look like this:
#!/bin/bash
cd folder_name
for i in *; do
echo $i",15";
done > filename.txt
Just 1 command line using 2 msr commands recusively (-r) search specific files:
msr -rp your-dir1,dir2,dirN -l -f "\.(mp4|jpg|png)$" -PAC | msr -t .+ -o '$0,15' -PIC > save-file.txt
If you want to sort by time, add --wt to first command like: msr --wt -l -rp your-dirs
Sort by size? Add --sz but only the prior one is effective if use both --sz and --wt.
If you want to exclude some directory, add like: --nd "^(test|garbage)$"
remove tail \r\n in save-file.txt : msr -p save-file.txt -S -t "\s+$" -o "" -R
See msr.exe / msr.gcc48 etc in my open project https://github.com/qualiu/msr tools directory.
A solution without a loop:
ls | xargs -i echo {},15 > filename.txt

find specific text in a directory and delete the lines from the files

I want to find specific text in a directory, and then delete the lines from the files that include the specific text.
Now I have two questions:
How can I achieve the task?
What is wrong with What I have tried? I have tried the methods below, but failed. the details are following:
grep -rnw "./" -e "webdesign"
This searches the current directory with pattern "webdesign", and I get the result:
.//pages/index.html:1:{% load webdesign %}
.//pages/pricing.html:1:{% load webdesign %}
.//prototypes.py:16: 'django.contrib.webdesign',
Then I use sed to remove the lines from those files, which doesn't work, only get blank file ( I mean it deletes all my file content):
sed -i "/webdesign/d" ./pages/index.html
or
sed "/webdesign/d" ./pages/index.html > ./pages/index.html
My software environment is: OS X Yosemite, Mac Terminal, Bash
A loop in bash will do the trick provided that there are no filenames with spaces (in which case other solutions are possible, but this is the simplest)
for i in `grep -lrnw "yourdirectory/" -e "webdesign"`
do
sed "/webdesign/d" $i > $i.tmp
# safety to avoid destroying the file if problem arises (disk full?)
if [ $? = 0 ] ; then
mv -f $i.tmp $i
fi
done
note that you should not locate this script in the current directory because it contains webdesign and it will be modified as well :)
Thanks to choroba, I know that -i option doesn't work like wished. But it has another meaning or it would be rejected by the opt parser. It has something to do with suffixes, well, it doesn't matter now, but it's difficult to see the problem at first.
Without -i you cannot work on a file in-place. And redirecting output to the input just destroys the input file (!). That's why your solution did not work.
You can install GNU sed that supports the -i option, then
sed -i '/webdesign/d' files
should work. Note that it's safer to use -i~ to create a backup.
You cannot write to the same file you're reading from, that's why
sed /webdesign/d file > file
doesn't work (it overwrites the file before you can read anything from it). Create a temporary file
sed /webdesign/d file > file.tmp
mv file.tmp file

Search text and append to each end of line of text file - OSX

I'm new to OSX command line tools.
I am trying to find a block of text in a file and append this text at the end of all lines in another text file. At run time I don't know what this text will be, I just know it will be located within "BEGINHMM" and "ENDHMM". Also, I don't know the makeup of the destination file, except for that it will not be an empty text file.
The command which finds the block of text of interest is:
sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto
where "proto" is a text file containing the text of interest.
I've been trying to pipe the output of the above command to another 'sed' command, in the following manner:
xargs -I '{}' sed -i .bak 's/$/{}/' monophones0.txt
but I am getting some bizarre results, I see the "{}" inserted in the text for example.
I've also tried piping to:
xargs -0 sed -i .bak 's/$/&/' monophones0.txt
but I just get the printout (similar to terminal echo) of the text I am trying to grab.
Ultimately I want to loop over several 'proto' files in multiple directories and copy the text between the "BEGINHMM", "ENDHMM" block in each directory, and append the selected text to that directory's monophones.txt lines.
I am running the commands in the terminal, bash, OSX 10.12.2
Any help would be appreciated.
(1) Your sed command is of the form sed -n '/A/,/B/p'; this will include the lines on which A and B occur, even if these strings do not appear at the beginning of the line. This form may have other surprises in store for you as well (what do expect will happen if B is missing or repeated?), but the remainder of this post assumes that's what you want.
(2) It's not clear how you intend to specify the "proto" files, but you do indicate they might be in several directories, so for the remainder of this post, I'll assume they are listed, one per line, in a file named proto.txt in each directory. This will ensure that you don't run into any limitations on command-line length, but the following can easily be modified if you don't want to create such a file.
(3) Here is a script which will use the sed command you've mentioned to copy segments from each of the "proto" files specified in a directory to monophones0.txt in the directory in which the script is executed.
#!/bin/bash
OUT=monophones0.txt
cat proto.txt | while read file
do
if [ -r "$file" ] ; then
sed -n '/<BEGINHMM>/,/<ENDHMM>/p' "$file" >> $OUT
elif [ -n "$file" ] ; then
echo "NOT FOUND: $file" >&2
fi
done
Just like what you did before. tmpfile=$(mktemp); sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto >$tmpfile; sed -i .bak "r $tmpfile" monophones0.txt; rm $tmpfile. This is the basic idea; there are other checks you need to perform to make this a robust script.
– 4ae1e1

Write a shell script that replaces multiple strings in multiple files

I need to search through many files in a directory for a list of keywords and add a prefix to all of them. For example, if various files in my directory contained the terms foo, bar, and baz, I would need to change all instances of these terms to: prefix_foo, prefix_bar, and prefix_baz.
I'd like to write a shell script to do this so I can avoid doing the search one keyword at a time in SublimeText (there are a lot of them). Unfortunately, my shell-fu is not that strong.
So far, following this advice, I have created a file called "replace.sed" with all of the terms formatted like this:
s/foo/prefix_foo/g
s/bar/prefix_bar/g
s/baz/prefix_baz/g
The terminal command it suggests to use with this list is:
sed -f replace.sed < old.txt > new.txt
I was able to adapt this to replace instances within the file (instead of creating a new file) by setting up the following script, which I called inline.sh:
#!/bin/sh -e
in=${1?No input file specified}
mv $in ${bak=.$in.bak}
shift
"$#" < $bak > $in
Putting it all together, I ended up with this command:
~/inline.sh old.txt sed -f replace.sed
I tried this and it works, for one file at a time. How would I adapt this to search and replace through all of the files in my entire directory?
for f in *; do
[[ -f "$f" ]] && ~/inline.sh "$f" sed -f ~/replace.sed
done
In a script:
#!/bin/bash
files=`ls -1 your_directory | egrep keyword`
for i in ${files[#]}; do
cp ${i} prefix_${i}
done
This will, of course, leave the originals where they are.

Resources