I'm searching (without success) for a script, which would work as a batch file and allow me to prepend a UTF-8 text file with a BOM if it doesn't have one.
Neither the language it is written in (perl, python, c, bash) nor the OS it works on, matters to me. I have access to a wide range of computers.
I've found a lot of scripts to do the reverse (strip the BOM), which sounds to me as kind of silly, as many Windows program will have trouble reading UTF-8 text files if they don't have a BOM.
Did I miss the obvious?
Thanks!
The easiest way I found for this is
#!/usr/bin/env bash
#Add BOM to the new file
printf '\xEF\xBB\xBF' > with_bom.txt
# Append the content of the source file to the new file
cat source_file.txt >> with_bom.txt
I know it uses an external program (cat)... but it will do the job easily in bash
Tested on osx but should work on linux as well
NOTE that it assumes that the file doesn't already have BOM (!)
I wrote this addbom.sh using the 'file' command and ICU's 'uconv' command.
#!/bin/sh
if [ $# -eq 0 ]
then
echo usage $0 files ...
exit 1
fi
for file in "$#"
do
echo "# Processing: $file" 1>&2
if [ ! -f "$file" ]
then
echo Not a file: "$file" 1>&2
exit 1
fi
TYPE=`file - < "$file" | cut -d: -f2`
if echo "$TYPE" | grep -q '(with BOM)'
then
echo "# $file already has BOM, skipping." 1>&2
else
( mv "${file}" "${file}"~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
fi
done
edit: Added quotes around the mv arguments. Thanks #DirkR and glad this script has been so helpful!
(Answer based on https://stackoverflow.com/a/9815107/1260896 by yingted)
To add BOMs to the all the files that start with "foo-", you can use sed. sed has an option to make a backup.
sed -i '1s/^\(\xef\xbb\xbf\)\?/\xef\xbb\xbf/' foo-*
If you know for sure there is no BOM already, you can simplify the command:
sed -i '1s/^/\xef\xbb\xbf/' foo-*
Make sure you need to set UTF-8, because i.e. UTF-16 is different (otherwise check How can I re-add a unicode byte order marker in linux?)
As an improvement on Yaron U.'s solution, you can do it all on a single line:
printf '\xEF\xBB\xBF' | cat - source.txt > source-with-bom.txt
The cat - bit says to concatenate to the front of source.txt what's being piped in from the print command. Tested on OS X and Ubuntu.
I find it pretty simple. Assuming the file is always UTF-8(you're not detecting the encoding, you know the encoding):
Read the first three characters. Compare them to the UTF-8 BOM sequence(wikipedia says it's 0xEF,0xBB,0xBF).
If it's the same, print them in the new file and then copy everything else from the original file to the new file.
If it's different, first print the BOM, then print the three characters and only then print everything else from the original file to the new file.
In C, fopen/fclose/fread/fwrite should be enough.
open in notepad. click save-as. under encoding, select "UTF-8(BOM)" (this is under plain "UTF-8").
I've created a script based on Steven R. Loomis's code.
https://github.com/Vdragon/addUTF-8bomb
Checkout https://github.com/Vdragon/C_CPP_project_template/blob/development/Tools/convertSourceCodeToUTF-8withBOM.bash.sh for example of using this script.
in VBA Access:
Dim name As String
Dim tmpName As String
tmpName = "tmp1.txt"
name = "final.txt"
Dim file As Object
Dim finalFile As Object
Set file = CreateObject("Scripting.FileSystemObject")
Set finalFile = file.CreateTextFile(name)
'Add BOM
finalFile.Write Chr(239)
finalFile.Write Chr(187)
finalFile.Write Chr(191)
'transfer text from tmp to final file:
Dim tmpFile As Object
Set tmpFile = file.OpenTextFile(tmpName, 1)
finalFile.Write tmpFile.ReadAll
finalFile.Close
tmpFile.Close
file.DeleteFile tmpName
Here is the batch file I use for this purpose in Windows. It should be saved with ANSI (Windows-1252) encoding for the /p= part.
#echo off
if [%~1]==[] goto usage
if not exist "%~1" goto notfound
setlocal
set /p AREYOUSURE="Adding UTF-8 BOM to '%~1'. Are you sure (Y/[N])? "
if /i "%AREYOUSURE%" neq "Y" goto canceled
:: Main code is here. Create a temp file containing the BOM, then append the requested file contents, and finally overwrite the original file
(echo|set /p=)>"%~1.temp"
type "%~1">>"%~1.temp"
move /y "%~1.temp" "%~1" >nul
#echo Added UTF-8 BOM to "%~1"
pause
exit /b 0
:usage
#echo Usage: %0 ^<FILE_NAME^>
goto end
:notfound
#echo File not found: "%~1"
goto end
:canceled
#echo Operation canceled.
goto end
:end
pause
exit /b 1
You can save the file as e.g. C:\addbom.bat and use the following .reg file to add it to right-click context menu of all files:
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\*\Shell\Add UTF-8 BOM]
[HKEY_CLASSES_ROOT\*\Shell\Add UTF-8 BOM\command]
#="C:\\addbom.bat \"%1\""
Related
So here is my code:
#echo off
set WAL=wallpaper.txt
echo ‰PNG >> %WAL%
echo >> %WAL%
echo >> %WAL%
echo IHDR I ZòRã pHYs Ä Ä•+ iIDATX…í—®â#Çÿws áÈî Qh’TµO#RU[GÕ¸6!iUÈ…ìŒÅ…jæ >> %WAL%
echo véÉL/_eÙÝ\²?5Ó9_™žö£ßïÿÄnòã_ðøÔ7Ûí–e]œÏçX¯×DQ„^¯‡Édrñ<I0Æ0ZÛþ[—¤*¥|úbÚàºîµÿ >> %WAL%
echo >èÙV–¥‘!RJZçyŽN§Û¶é¬)˜™Æ9GÇHÓŽãžëºd[)e¼Ì$IÈGóìRexž!„¡ÀX7iÕ“²,CY–°m›ŒFQDÁJ)éÌ÷}¸®‹<ÏIçZÉ5‡°mžçÁ¶mH)iûŒ¢UU]G)Ev€sËB Š"0ÆH/ÏsdYv5ž‡3‰sÆ9€Ýn‡ñxL{ÆØ£æn†! #¥6› >> %WAL%
echo „ €¢(Ðét ¾ïzeYÒ™eY8†n· Øl6¤·ZÇ18çäG§u¹Åqlì•R ν¥Î&àvúþ.z 5KJ/Ë^¯àü‚-ËÂñx$9Çq¨¤ïÑú’êš¾DÝ„“$A–e—׳Ôå¯ûÕaŒÑ¥åyn¡4M±X,òópOB ,KÌf³»²UUÑút:]ý¬xº/={c˜ÏçF¬)ŠÂh÷h•I“ÉÛíÖ˜bõôi¦}ðb±€ã8R’ì«ÃqS<z\ËåA zVgïûFk ¾NFwý-i~çM§SAðT¯|ËßÎ9 `¿ßÓ³n·KC¦-÷w#4M¿Lâg'îÛ–Û+yËr{5¿ ÇË<ÐÉ•~ IEND®B`‚ >> %WAL%
pause
ren C:\Users\Moi\Desktop\test\wallpaper.txt wallpaper.png
pause
exit
What it does is it writes something into a text file and then converts it to a .png
(the text is what I get when I convert a .png to a .txt)
But it only writes
‰PNG
in the file and doesn't convert it to a .png
Is there another way to do it? Am I doing it wrong? All help is appreciated.
That won't work, because echo will append a CRLF at each end of line, and won't tolerate control characters globally.
The "good" method, with only native tools, is:
Encoding image to batch:
Using Powershell, encode your file (the original image, in your case) in Base64.
Split the B64 file into lines of maximum 4096 characters.
Assign that to consecutive variables in your batch.
Decoding image to disk:
Do an echo of all previously encoded variables in a temporary file (i.e. in %TEMP% folder).
Using Powershell, convert the B64 file to a binary file.
Delete temporary B64 file.
Do what you want with the binary file.
Reference:
Base64 encoding/decoding with Powershell
Can this: command > file.txt 2>&1 be altered in such a way to have it prepend some "header" (and possibly append some "tail") text to each stream?
For example, "file.txt" would read:
[std_output]Hello world![/std_output]
[std_error]Crash![/std_error]
EDIT: The caveat is that this operation should only involve writing to a SINGLE file (i.e. no other temp files should be involved). The writing would be preferred to be atomic, though multiple writes via multiple commands that can be compounded into one single command is also acceptable.
>file.txt echo [std_output]
>tempfile.txt echo [std_error]
command >>file.txt 2>>tempfile.txt
>>file.txt echo [/std_output]
>>tempfile.txt echo [/std_error]
type tempfile.txt >>file.txt
del tempfile.txt
Is possibly a solution. It's so simple I'll not bother to explain it.
For a given set of files ending in .bam within /a/given/path/ I would like to echo a specific string of characters that is variable in length. I have tried the following unsuccessfully:
for dir in /a/given/path/*.bam; do echo ${dir##path/%%.bam}; done
The intention is to echo the bam filenames (from path/ onwards) without the .bam extension, but it just echoes the entire path. If I change to:
for dir in /a/given/path/*.bam; do echo ${dir%%.bam}; done
it will echo
/a/given/path/filename1
/a/given/path/filename22
Ideally I will be able to echo the filename only even for filenames of various lengths (which is preventing me from using echo ${dir:15:9}, for example).
Thanks for your help.
I can think of two ways to do this:
for dir in /a/given/path/*.bam; do dir="${dir##*/}"; echo "${dir%.bam}"; done
(two-step substitution) or
( cd /a/given/path && for dir in *.bam; do echo "${dir%.bam}"; done )
Unix has commands "dirname" and "basename" that can be combined to do what (I think) you are asking for.
for f in "/a/given/path"/*.bam;do
echo $(basename "$(dirname "$f")")/$(basename "$f" .bam);done
All the double-quotes are to handle possible spaces in path- and filenames.
Use basename to get the filename and then cut to discard .bam
for dir in /a/given/path/*.bam; do basename $dir | cut -f 1 -d .; done
i have got this problem: When i put my variables into an external "configstyle" file and import the file with . /var/scripts/siDiagConfig.sh the variables don´t work properly...
Like i have a variable called MTU=1500 and when i echo it, it prints "1500", which is correct. But when I want to use the variable within a grep command like somethingawesome | grep ${MTU} -c the variable is not recognized properly. In this example, the console prints 0, instead of 2... When i reassign the variable with MTU=1500, the code works without any problems....
Any idea, what i could have missed?
Is there any other way, i could put my variables in an external file?
my siDiagConfig.sh File:
#!/bin/bash
....
export MTU=1500
....
edit (Solution):
I remembered, that i created the file on my windows system. I just copied the code in the siDiagConfig.sh, created a new file on the unix system, and pasted the code there. Now it works without any problems =)
Thanks for the help!
[gigauser#gigabox : /scm/gigafolder/toratora/test_aks]
cat conf.config ; echo -----; cat testfile.txt ; echo ------; cat mainfile.sh ; echo --------; ./mainfile.sh
export GIGA=giga
export fifa=FIFA
I'm GIGA
I like fifa
#!/bin/bash
. conf.config
echo GIGA = $GIGA
echo fifa = $fifa
cat -n testfile.txt
echo
echo -- Now lets grep = $GIGA with case insensitive On
echo
echo -`grep -in "${GIGA}" testfile.txt`-;
echo =`grep -ic "${GIGA}" testfile.txt`=
echo
echo Now again but with case insensitive Off
echo -`grep -n "${GIGA}" testfile.txt`-;
echo =`grep -c "${GIGA}" testfile.txt`=
GIGA = giga
fifa = FIFA
1 I'm GIGA
2 I like fifa
-- Now lets grep = giga with case insensitive On
-1:I'm GIGA-
=1=
Now again but with case insensitive Off
--
=0=
[gigauser#gigabox : /scm/gigafolder/toratora/test_aks]
>
Since feature requests to mark a comment as an answer remain declined, I copy the above solution here.
Yes; that \r is the trouble. It is a carriage return. Convert DOS files to Unix files before executing them on Unix. – Jonathan Leffler
Ok, now i remembered, that i created the file on my windows system. I just copied the code in the siDiagConfig.sh, created a new file on the unix system, and pasted the code there. Not it works without any poroblems =) Thanks for the help! – Simons0n
I have a bunch of files that are incomplete: the last line is missing an EOL character.
What's the easiest way to add the newline, using any tool (awk maybe?)?
To add a newline at the end of a file:
echo >>file
To add a line at the end of every file in the current directory:
for x in *; do echo >>"$x"; done
If you don't know in advance whether each file ends in a newline, test the last character first. tail -c 1 prints the last character of a file. Since command substitution truncates any final newline, $(tail -c 1 <file) is empty if the file is empty or ends in a newline, and non-empty if the file ends in a non-newline character.
for x in *; do if [ -n "$(tail -c 1 <"$x")" ]; then echo >>"$x"; fi; done
Vim is great for that because if you do not open a file in binary mode, it will automatically end the file with the detected line ending.
So:
vim file -c 'wq'
should work, regardless of whether your files have Unix, Windows or Mac end of line style.
echo >> filename
Try it before mass use :)