Variable XFILEBASE64 has base64 encoded content and I want to replace some string with that base64 content.
Sure enough base64 is packed with special characters, and I've tried $'\001' as delimiter, still getting the error message. Any suggestions?
XFILEBASE64=`cat ./sample.xml | base64`
cat ./template.xml | sed "s$'\001'<Doc>###%DOCDATA%###<\/Doc>$'\001'<Doc>${XFILEBASE64}<\/Doc>$'\001'g"
> sed: -e expression #1, char 256: unterminated `s' command
EDIT: Looks like problem has nothing to do with sed, it must be hidden in base64 operations.
sample.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a>testsdfasdfasdfasfasdfdasfdads</a>
To reproduce the problem:
foo=`base64 ./sample.xml`
echo $foo | base64 --decode
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<base64: invalid input
The problem was in base64 encoding, -w 0 option of base64 did the trick.
cat ./sample.xml | base64 -w 0
Base64 has only three special characters (wikipedia) - usually +, / and =. You can use for example #, , & or ; with no problem.
Tempo64="$( echo "${XFILEBASE64}" | sed 's/[\\&*./+!]/\\&/g' )"
sed "s!<Doc>###%DOCDATA%###</Doc>!<Doc>${Tempo64}</Doc>!g" ./template.xml
should work and posix compliant
The command
sed "s$'\001'<Doc>###%DOCDATA%###<\/Doc>$'\001'<Doc>${XFILEBASE64}<\/Doc>$'\001'g"
should be written as
sed s$'\001'"<Doc>###%DOCDATA%###<\/Doc>"$'\001'"<Doc>${XFILEBASE64}<\/Doc>"$'\001'g
That's to say, $'...' in double quotes are not special.
Adding to user52028778's comment as I had to do more reading to diagnose what the issue was. The problem is in the base64 command. The default base64 command includes line wraps after 76 characters for some linux distros. This is inserting a line termination into the sed command later when the variable is used causing it to fail.
cat ./sample.xml | base64 --wrap=0
https://linux.die.net/man/1/base64
Related
This is the output of cat command and I don't know what this special character is called that is at the end of the file to even search for. How to remove this special character in bash?
EDIT:
Here is the actual xml file(I am just copy pasting):
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<name>ApexClass</name>
<members>CreditNotesManager</members>
<members>CreditNotesManagerTest</members>
</types>
<version>47.0</version>
</Package>%
It's unclear how the % (percent sign) is ending up in your file; it's easy to remove with sed:
sed -i '' 's/\(</.*>\)%.*/\1/g' file.xml
This will remove the percent and re-save your file. If you want to do a dry-run omit the -i '' portion as this is tells sed to save the file in-line.
As mentioned in the comments, there are many ways to do it. Just be sure you aren't removing something that you want to keep.
If it is just at the last line, this should work. Using ed(1)
printf '%s\n' '$s/%//' w | ed -s file.xml
If you don't need to save changes, you could use grep:
grep -v "%" <file.xml
This uses grep along with it's inverse matching flag -v. This method will remove all instances of the character % and print the result to STOUT. The < character is a method to tell grep which file you're talking about.
EDIT: actually you don't even need the redirection, so:
grep -v "%" file.xml
This is actually a feature of zsh, not bash.
To disable it, unsetopt prompt_cr prompt_sp
The reverse prompt character showing up means that line had an end-of-file before a final ascii linefeed (newline) character.
How to remove this special character at the end of the file
I am trying to make a bash script work on MinGW and it seems the shell is not able to decode something like the following.
t=$(openssl rand -base64 64)
echo "$t" | base64 --decode
Resulting in,
Ԋ7▒%
▒7▒SUfX▒L:o<x▒▒䈏ţ94V▒▒▒▒uW;▒▒pxu▒base64: invalid input
Interestingly, if I output the base64 character and run the command as such, it works.
echo "+e5dcWsijZ83uR2cuKxIDJTTiwTvqB7J0EJ63paJdzGomQxw9BhfPvFTkdKP9I1o
g29pZKjUfDO8/SUNt+idWQ==" | base64 --decode
Anybody knows what I am doing wrong?
Thanks
I solved the above case by passing --ignore-garbage flag to the base64 decode. It ignores the non-alphabet characters.
echo "$t" | base64 --decode --ignore-garbage
However, I would still like to know how did I create "garbage" ;) ?
I think what has happened here is the base64 string contains some embedded spaces, and that causes the actual "invalid input" w (and what you observe to as garbage.)
The openssl rand -base64 64 command introduces some newlines (not spaces), for example,
openssl rand -base64 64 > b64.txt
... then viewing the b64.txt file in an editor I see two separate lines
tPKqKPbH5LkGu13KR6zDdJOBpUGD4pAqS6wKGS32EOyJaK0AmTG4da3fDuOI4T+k
abInqlQcH5k7k9ZVEzv8FA==
... and this implies there is a newline character between the 'k' and 'a'
So the base64 string has this embedded newline. The base64 -d can handle newlines (as demonstrated by your successful example), but, it cannot handle space char.
The newlines can get translated to spaces by some actions of the shell. This is very likely happening by the echo $t I.e. if t has newlines inside it, then echo will just replace then with single space. Actually, how it behaves can depend on shell options, and type of string quotes, if any, applied.
To fix the command, we can remove the newline before piping to the base64 -d command.
One way to do that is to use tr command, e.g. the following works on Linux:
t=$(openssl rand -base64 64 | tr -d '\n')
echo $t | base64 -d
... or alternatively, remove the spaces, again using tr
t=$(openssl rand -base64 64)
echo $t | tr -d ' ' | base64 -d
I'm trying to replace a special character with sed, the character are Þ to replace for ;
The lines of the file are, for example;
0370ÞA020Þ4000011600ÞRED USADOÞ0,00Þ20190414
0370ÞA020Þ4000011601ÞRED USADOÞ0,00Þ20190414
0370ÞA020Þ4000011602ÞRED USADOÞ0,00Þ20190414
Thanks!
Edit
Its worked and solved.
Thanks!!!
Try this - simple substitution work for me
sed 's/Þ/;/g'
That's the job tr was created to do but look at these results:
$ tr 'Þ' ';' < file
0370;;A020;;4000011600;;RED USADO;;0,00;;20190414
0370;;A020;;4000011601;;RED USADO;;0,00;;20190414
0370;;A020;;4000011602;;RED USADO;;0,00;;20190414
$ sed 's/Þ/;/g' < file
0370;A020;4000011600;RED USADO;0,00;20190414
0370;A020;4000011601;RED USADO;0,00;20190414
0370;A020;4000011602;RED USADO;0,00;20190414
tr seems to consider every Þ as being 2 duplicate characters - sed may think the same but while tr is converting a set of chars to a set of chars, sed is converting a regexp to a string and so even if it considers Þ to be 2 characters wide it'll still do what you want. So just an interesting warning about trying to use tr to replace non-ASCII characters - YMMV!
if your data in 'd' file, try gnu sed:
sed -E 'y/Þ/;/' d
If I base64 encode a string which consists of seven characters e.g. abcdefg with the website https://www.base64encode.org/ the result is YWJjZGVmZw==. The trailing "==" characters are padding because the number of input characters cannot be divided by 7.
I've to reproduce this result in bash. So I've tried the following command:
echo "abcdefg" | base64
However, the result is different now:
YWJjZGVmZwo=
I'm using Ubuntu where base64 (GNU coreutils) 8.25 is installed.
I would be glad if someone could give me a hint.
I've just noticed that the reason for the described behaviour is the newline which echo writes at the end. So the correct command is the following which suppress the newline
echo -n "abcdefg" | base64
Then the output is like I expect it:
YWJjZGVmZw==
It is also tricky how a here-string will produce unexpected output. It is probably missing the null character \0.
$ base64 <<<"abcdefg"
YWJjZGVmZwo=
$ printf 'abcdefg' | base64
YWJjZGVmZw==
This question already has answers here:
Base 64 encoding from command line gives different output than other methods
(2 answers)
Closed 7 days ago.
Can anyone explain this?
[vagrant#centos ~]$ echo "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm" | base64
MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQo=
[vagrant#centos ~]$ echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==" | base64 -d
10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm
The first string encodes with o= at the end, but the encoded string with == at the end instead, decodes to the same original string...
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Compare these
echo "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm" | base64 | od -c
echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==" | base64 -D | od -c
echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQo=" | base64 -D | od -c
If we don't send the newline when using echo the o is missing, have a look at this...
echo -n "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm" | base64
It's the newline that's being encoded that gives the o in o=
The = is padding and it might not always be there. Have a look here..
https://en.wikipedia.org/wiki/Base64#Padding
Different implementations may also use different padding characters. You can see some of the differences here
https://en.wikipedia.org/wiki/Base64#Variants_summary_table
From the RFC
3.2. Padding of Encoded Data
In some circumstances, the use of padding ("=") in base-encoded
data is not required or used. In the general case, when
assumptions about the size of transported data cannot be made,
padding is required to yield correct decoded data.
Implementations MUST include appropriate pad characters at the end
of encoded data unless the specification referring to this document
explicitly states otherwise.
The base64 and base32 alphabets use padding, as described below in
sections 4 and 6, but the base16 alphabet does not need it; see
section 8.
When you use $echo, a newline is appended to the end of the output. This newline character is part of the base64 encoding. When you change the 'o' to a '=', you're changing the encoding of the newline character. In this case, the character it decodes to is still not a printable character.
In my terminal, decoding the two string yields the same output, but the string ending in "o=" has a newline, and the string ending in "==" does not.
$> echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQo=" | base64 -d
10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm
$> echo "MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==" | base64 -d
10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm $>
Using $echo -n would allow you to pass the string into base64 without the trailing newline. The string without the newline encodes to the string ending in "==".
On Macs, I noticed I had to append "\c" to the end of my string to get it to work, like this:
[vagrant#centos ~]$ echo "10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm\c" | base64
The result was:
MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbVxjCg==
PHP also encodes it properly, which leads me to believe there is some issue with the base64 program in bash, as I haven't found any mention of 'o' somehow being used as a padding character.
php > echo base64_encode("10IXydrdsc4DVAgxzrXldNw5GMeVAHKG:TAO04JuWz4PBVWYm");
MTBJWHlkcmRzYzREVkFneHpyWGxkTnc1R01lVkFIS0c6VEFPMDRKdVd6NFBCVldZbQ==