How to remove repeated padding characters in a file with script? - shell

I have a binary file, and the tail always padded by a special character, like as '0x1a'(^Z). I want to remove all
the padding characters from tail.
The problem is how to count the number of repeated bytes, then use
truncate -s -NUMBER_OF_PADDINGS file
to delete them.
Write a C program to count and remove is possible, but that maybe too high-weight, so is there a shell command to do this kind of work?
Part of the file like this
hexdump -v -e '/1 "%X"' file
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007C40C0E89616200000210111011001140000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FF00042A70470000000000000000A00000000000000032000510001F0000000D000900005000B10001A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A1A

Related

Bash script which adds space inside long words in Pages file

I like to convert documents to EPUB format because it is easier for me to read. However, if I do this for for example some code documentation, some really long lines of code are not readable in the EPUB, because they trail off-screen. I would like to automatically insert spaces in any words in a text file (specifically, a Pages document) over a certain length, so they are reduced to say, 10 character words, at maximum. Then, I will convert that Pages document to an EPUB.
How can I write a bash script which goes through a Pages document and inserts spaces into any word longer than, perhaps, 10 characters?
sed is your friend:
$ cat input.txt
a file with a
verylongwordinit to test with.
$ sed 's/[^[:space:]]\{10\}/& /g' input.txt
a file with a
verylongwo rdinit to test with.
For every sequence of 10 non-whitespace characters in each line, add a space after (The & in the replacement text is itself replaced with the matched text).
If you want to change the file inline instead of making a copy, ed comes into play:
ed input.txt <<'EOF'
s/[^[:space:]]\{10\}/& /g
w
EOF
(Or some versions of sed take an -i switch for inline editing)

How do I remove a piece of a string using sed?

I have a need to replace all occurrences of "StoreId..." that have store numbers with leading zeros with store numbers without leading zero's. I've done it successfully using the below:
sed -i '/StoreId=\"0022\"/c\StoreId=\"22\"' $1
But doing it like this is just awful as I need a line for each store number and the script will cycle through each sed statement... it just takes forever.
I would like to have a line like this:
StoreId="0343"
And rewrite it like this:
StoreId="343"
(or StoreId="0030" to StoreId="30")
Basically I just want to drop the leading zeroes in one line. I tried reading through the SED manual, but I'm just not smart enough. Any help would be appreciated.
You don't need to match anything after the leading zeroes.
sed -i -r 's/StoreId="0+/StoreId="/' "$1"
0+ matches a sequence of at least 1 0 characters.

Use sed to count periods, commas, and numbers?

I have a file that looks like this:
19.217.179.33,175.176.12.8
253.149.205.57,174.210.221.195
222.118.178.218,255.99.100.202
241.55.199.243,167.98.204.104
38.224.198.117,21.11.184.68
Each line is 2 IP addresses, separated by a comma. So, each line should meet these requirements:
Has 1 comma.
Has 6 periods.
Has ONLY numbers, commas, and periods.
If a line is missing a period, has more/less than one commas, has a letter, is blank, or anything like that - it isn't correct. Basically I just want to use sed or something similar to loop through each line in the file and make sure each of them meets the above requirements.
Is this something that can be done with sed? I know you can use it to delete files that do/don't have matching strings, but I wasn't sure about counting specific characters or verifying that a line only has certain characters.
Any help would be greatly appreciated. Thanks!
I think grep is a better tool for this. You just want to ensure that each line matches a particular regex, so invert the grep with -v and label the input invalid if any line gets output. Something like:
grep -qvE '^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$' input || echo input is valid
You can simplify that a bit:
IP='([0-9]{1,3}\.){3}[0-9]{1,3}'
grep -qvE "^$IP,$IP$" input || echo input is valid
Or if you are more interested in invalid data:
grep -qvE "^$IP,$IP$" input && echo input is invalid
What I'd do is to think up a regular expression that fits the 'proper' lines, and omits them from printing. Like this:
sed -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/d' file
Everything that remains is a wrong line.
Here's the recipe in more detail:
[0-9]{1,3} between one and three digits
\. literal period (just the period is a wildcard and matches any character)
(...){3} three repetitions of something, so together
([0-9]{1,3}\.){3}[0-9]{1,3} makes up something that looks like an IP address. (Though note that it doesn't enforce the <256 rule, so 999.999.999.999 matches.)
/^ ... $/ the match needs to start at the beginning of the line and run until its end.
'/ ... /d' print everything except lines that match what's inside the two slashes
-r is needed to recognise the {1,3} syntax.
This will find and print the lines that are wrong. If you want to delete the wrong lines, you can easily invert this:
sed -i.bak -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/p' file
-i.bak means keep a backup, but overwrite the input file
-n means don't output anything unless expressly directed to output, and
/ ... /p output all the lines that match this regex.
If you would like to display only information about file contents correctness , you can use this command:
sed -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/!{a \
FILE IS INCORRECT
;q;};$aFILE IS OK'
It's modified version of #chw21 answer, but displays only information text:
FILE IS INCORRECT, or
FILE IS OK.

Cut Typed Line Numbers From Multiple Lines in a Text File

I have a long text file that has numbers in front of each of the paths listed in the file like the example below:
1) /some/path/here/file.txt
1) /some/path/here/1file.edf
2) /some/another_path/here/2file.txt
3) /some/other_path/here/3file.txt
3) /some/other_path/here/4file.edf
3) /some/other_path/here/5file.edf
...
This file continues for thousands of lines. What I have to do is cut the number off the first part of each of these lines so that I could tar the list of files without the numbers interfering. Is there a way that I could do this either using Shell Commands of Emacs?
Regular expressions are your friend here. You can do this in emacs or on the command line. In emacs, if you press C-M-% (control+alt+shift+5) or run query-replace-regexp it will prompt you for a regex to find, and another to replace. If you use ^[0-9]*) (note that there is a space at the end of this), and then leave "repalce" empty, you will replace the number, the paren, and the space with nothing. If you start at the top of the file, you can type ! to replace all, or go one by one to check that it's working for a bit first.
In the regexp, ^ matches the beginning of the line [0-9] means match a single character in the range 0 through 9, the * means "match any number of the previous thing", then the ) matches a literal paren (in emacs this is the default, in shell you'd likely need to escape this with \).
Something like:
cat list.txt | cut -f 2 -d' ' | xargs tar czf archive.tar.gz

How do I alter the n-th line in multiple files using SED?

I have a series of text files that I want to convert to markdown. I want to remove any leading spaces and add a hash sign to the first line in every file. If I run this:
sed -i.bak '1s/ *\(.*\)/\#\1/g' *.md
It alters the first line of the first file and processes them all, leaving the rest of the files unchanged.
What am I missing that will search and replace something on the n-th line of multiple files?
Using bash on OSX 10.7
The problem is that sed by default treats any number of files as a single stream, and thus line-number offsets are relative to the start of the first file.
For GNU sed, you can use the -s (--separate) flag to modify this behavior:
sed -s -i.bak '1s/^ */#/' *.md
...or, with non-GNU sed (including the one on Mac OS X), you can loop over the files and invoke once per each:
for f in *.md; do sed -i.bak '1s/^ */#/' "$f"; done
Note that the regex is a bit simplified here -- no need to match parts of the line that you aren't going to change.
XARgs will do the trick for you:
http://en.wikipedia.org/wiki/Xargs
Remove the *.md from the end of your sed command, then use XArgs to gather your files one at a time and send them to your sed command as a single entity, sorry I don't have time to work it out for you but the wikiPedia article should show you what you need to know.
sed -rsi.bak '1s/^/#/;s/^[ \t]+//' *.md
You don't need g(lobally) at the end of the command(s), because you wan't to replace something at the begin of line, and not multiple times.
You use two commands, one to modify line 1 (1s...), seperated from the second command for the leading blanks (and tabs? :=\t) with a semicolon. To remove blanks in the first line, switch the order:
sed -rsi.bak 's/^[ \t]+//;1s/^/#/' *.md
Remove the \t if you don't need it. Then you don't need a group either:
sed -rsi.bak 's/^ +//;1s/^/#/' *.md
-r is a flag to signal special treatment of regular expressions. You don't need to mask the plus in that case.

Resources