I have a number of word documents that I'd like to remove some elements from. What I would like to do is as follows:
Copy and paste the entire contents of the word file (may not be necessary) and move it into a text file OR Convert .doc to .txt
Using regex: replace \[.*\] with "" AND replace \(.*\) with ""
Save the result to a text file with the same name as the original word document.
Thoughts and direction appreciated. As it stands now, I don't know how to do any of these things programatically. I'm doing this manually as it stands.
If it matters, I'm using Ubuntu 11.04
Since you're open to using plain text, some improvements to your algo:
Use antiword to automate conversion from doc to tx
Use sed to do in-place regex modification: sed -i -e's/bad/good/' file.txt
Update (in response to comment):
The regexes are fine, but I didn't understand the objective completely:
if you want to replace occurrences of [foo] & (foo) with "" use:
sed -i -e's/\[.*\]/""/g' file.txt; sed -i -e's/\(.*\)/""/g' file.txt
if you want to replace occurrences [foo] & (foo) with "foo" each use:
sed -i -e's/\[\(.*\)\]/"\1"/g' file.txt; sed -i -e's/(\(.*\))/"\1"/g' file.txt
Related
I have a problem: I have a file that, if I knew how, I would like to edit from the command. I would like to locate the file by content on that line.
I am in CyberPatriot, and my team is second in my state. I know someone who is on the number one team and I know one of the people on the first team. It kills me so I want to make a list of commands that I can go off of to make it faster and more efficient.
Imagine I had this file:
example
oof
goo
random
yes
and I wanted to change it to this:
example
oof
goo
random 'added text'
yes
How do I do so?
I know I can use the echo command to add text to the end of a file, but I don't know how to add text to the end of a specific line.
Thanks, Owen
You can use sed for this purpose.
sed 's/random/& Hello World/' file
to append text to the matched string.
You can use ^random$ to make sure the entire line is matched, before appending.
If you need to modify the file directly, you can use the -i flag, which facilitates in-place editing. Further, using -i.bak creates a backup of the original file first before modifying it, as in
sed -i.bak 's/random/& Hello World/' file
The original copy of the file can be found in file.bak
More about sed : https://www.gnu.org/software/sed/manual/sed.html
Use something like below
sed '4!d' file | xargs -I{} sed -i "4s/{}/{} \'added text\'/" file
Basically in the above command, we are getting the 4th line of the file using sed sed '4!d' file and then using this line to replace it with the same text and some new text(added text)
I am doing a find and replace using sed in a bash script. I want to search each file for words with files and no. If both the words are present in the same line then replace red with green else do nothing
sed -i -e '/files|no s/red/green' $file
But I am unable to do so. I am not receiving any error and the file doesn't get updated.
What am I doing wrong here or what is the correct way of achieving my result
/files|no/ means to match lines with either files or no, it doesn't require both words on the same line.
To match the words in either order, use /files.*no|no.*files/.
sed -i -r -e '/files.*no|no.*files/s/red/green/' "$file"
Notice that you need another / at the end of the pattern, before s, and the s operation requires / at the end of the replacement.
And you need the -r option to make sed use extended regexp; otherwise you have to use \| instead of just |.
This might work for you (GNU sed):
sed '/files/{/no/s/red/green/}' file
or:
sed '/files/!b;/no/s/red/green/' file
This method allows for easy extension e.g. foo, bar and baz:
sed '/foo/!b;/bar/!b;/baz/!b;s/red/green/' file
or fee, fie, foe and fix:
sed '/fee/!b;/fi/!b;/foe/!b;/fix/!b;s/bacon/cereal/' file
An awk verison
awk '/files/ && /no/ {sub(/red/,"green")} 1' file
/files/ && /no/ files and no have to be on the same line, in any order
sub(/red/,"green") replace red with green. Use gsub(/red/,"green") if there are multiple red
1 always true, do the default action, print the line.
I have a huge .txt file that I want all spaces, line-breaks, indentations etc removed. It should literally be one long string.
I tried
sed -i 's/\ //g' test.txt
but nothing happens
sed -n "s/[[:blank:]]//g;H
$ {x;s/\n//g;p;}"
The H than $ are needed if you want to include New line due to fact that sed treat by default line by line (so no new line inside a line). The -n and p are needed to avoid double display with use of H
Seems to work ok for me:
[~/Desktop]
==> cat test.txt
the quick brown fox
[~/Desktop]
==> sed -i "s/\ //g" test.txt
[~/Desktop]
==> cat test.txt
thequickbrownfox
Sometimes using " " directly is hard and especially when you use double quotes (which involves that bash will interpret the string before passing it to sed).
sed -i -e 's/\s//g' file.txt
... should work (it works for me). "\s" means all whitespace characters, and with single quotes '', for bash not to interpret it before you passing it to sed.
While you use cygwin I think your OS is windows, then you don't need to use bash to implement your goal. Just open your txt file with the text editor, and replace the while space with nothing, then all of the whit space in you txt file will be removed.
This method can meet almost all kinds of removal. And also can apply in excel or word and so on.
Good luck!
I am attempting to write a bash script that will use sed to replace an entire line in a text file beginning with a given string, and I only want it to perform this replacement for the first match.
For example, in my text file I may have:
hair=brown
age=25
eyes=blue
age=35
weight=177
And I may want to simply replace the first occurrence of a line beginning with "age" with a different number without affecting the 2nd instance of age:
hair=brown
age=55
eyes=blue
age=35
weight=177
So far, I've come up with
sed -i "0,/^PATTERN/s/^PATTERN/PATTERN=XY/" test.txt
but this will only replace the string "age" itself rather than the entire line. I've been trying to throw a "\c" in there somewhere to change the entire line but nothing is working so far. Does anyone have any ideas as to how this can be resolved? Thanks.
Like #ruakh suggests, you can use
sed -i "0,/^PATTERN/ s/^PATTERN=.*$/PATTERN=XY/" test.txt
A shorter and less repetitive way of doing the same would be
sed -i '0,/^\(PATTERN=\).*/s//\1XY/' test.txt
which takes advantage of backreferences and the fact that not specifying a pattern in an s-expression will use the previously matched pattern.
0,...-ranges only work in GNU sed. An alternative might be to use shell redirect with sed:
{ sed '/^\(PATTERN\).*/!n; s//\1VAL;q'; cat ;} < file
or use awk:
awk '$1=="LABEL" && !n++ {$2="VALUE"}1' FS=\\= OFS=\\= file
I am trying to trim a text file, although I have used the following command with no luck:
FIND "word1" C:\Users\Username\Desktop\test.txt | IF EXIST "word1" (DEL "word1")
The syntax is incorrect, I have tried many different combination with no luck.
If you are trying to remove specific text from a file, you can use sed (there are versions available for Windows such as this one). For example, to remove all instances of "word1":
sed -e "s/word1//g" inputfile > outputfile
Or if you want to only remove "word1" when it is not embedded in other text:
sed -e "s/\bword1\b//g" inputfile > outputfile
The second one uses \b to indicate word boundaries. Note that in a Windows command prompt, you need to enclose the sed script in double quotes.