character set changing or what? - utf-8

I am a bit confused about this topic. Same sentences's google result could be different?
peоple in thе wоrld
At first you can search the firts sentence at google and look at the result. And after
people in the world
Seacrh second sentence and notice the differents.
How could it be?

If you dump it as hex, you can see the difference. I just ran it through xxd.
The first sentence:
0000000: 7065 d0be 706c 6520 696e 2074 68d0 b520 pe..ple in th..
0000010: 77d0 be72 6c64 0a w..rld.
The second sentence:
0000000: 7065 6f70 6c65 2069 6e20 7468 6520 776f people in the wo
0000010: 726c 640a rld.
This one appears to be valid ASCII.
Both look very similar, but instead the first sentence uses the unicode character set and uses Cyrillic characters instead of normal ASCII ones.

Related

Jenkins job adding DOS newlines to shell scripts, causing a "bad interpreter" error on UNIX

I am trying to run the following shell script but I am getting the "/bin/bash^M: bad interpreter" error.
Script:
#!/bin/bash
rm -rf tomcat8/webapps/cpproject*
I know that this is happening because I am editing the file on windows and trying to run it on Unix. I tried all the ways of fixing it as suggested on stackoverflow. I tried the following.
Installed Vim editor and saved it
In Notepad++ changed the End Of Line option to Unix. Tried to save the new documents as Unix format
Installed gedit and saved it.
Installed doc2unix and tried it.
I tried a lot other ways but nothing worked. Can some one suggest me how to fix this issue?
Updated:
"xxd before_install1.sh | head" command is showing the followng.
00000000: 2321 2f62 696e 2f62 6173 680a 726d 202d #!/bin/bash.rm -
00000010: 7266 2074 6f6d 6361 7438 2f77 6562 6170 rf tomcat8/webap
00000020: 7073 2f63 7070 726f 6a65 6374 2a ps/cpproject*
Update:
I made sure that I am not running the old file. Because I changed the file name each time and tried it. I made sure that the new file is being executed each time.
Basically I am using codedeploy jenkins plugin which generates the war file and zips it with scripts and appspec.yml file. Since I change the file name each time and update the appspec.yml file with new name, I am making sure that the new file is being uploaded. But it is giving the same error.
Even after it is uploaded to the S3 bucket I downloaded and the zip file and made sure that it contains the latest one and even executed the xxd command and got the following.
$ xxd before_install5.sh | head
00000000: 2321 2f62 696e 2f62 6173 680d 0a72 6d20 #!/bin/bash..rm
00000010: 2d72 6620 746f 6d63 6174 382f 7765 6261 -rf tomcat8/weba
00000020: 7070 732f 6370 7072 6f6a 6563 742a 0d0a pps/cpproject*..
Update:
#Charles Duffy, Thank you for your response. I am developing these scripts locally and pushing them to GitHub. I have a Jenkins job, which takes this code from GitHub repository and packages this into a war file. I am using "CodeDeploy" plugin, which will take this war file and zips it with these shell scripts under scripts folder and appspec.yml file and push it to S3 bucket in AWS. The following is what is happening.
xxd command in my local system file:
$ xxd before_install5.sh | head
00000000: 2321 2f62 696e 2f62 6173 680a 726d 202d #!/bin/bash.rm -
00000010: 7266 2074 6f6d 6361 7438 2f77 6562 6170 rf tomcat8/webap
00000020: 7073 2f63 7070 726f 6a65 6374 2a0a ps/cpproject*.
xxd command on the file after it is uploaded to GitHub:
$ xxd before_install5.sh | head
00000000: 2321 2f62 696e 2f62 6173 680a 726d 202d #!/bin/bash.rm -
00000010: 7266 2074 6f6d 6361 7438 2f77 6562 6170 rf tomcat8/webap
00000020: 7073 2f63 7070 726f 6a65 6374 2a0a ps/cpproject*.
xxd command on the file after it is pulled by Jenkins from GitHub to Local Workspace(in .jenkins/worspace folder)
$ xxd before_install5.sh | head
00000000: 2321 2f62 696e 2f62 6173 680d 0a72 6d20 #!/bin/bash..rm
00000010: 2d72 6620 746f 6d63 6174 382f 7765 6261 -rf tomcat8/weba
00000020: 7070 732f 6370 7072 6f6a 6563 742a 0d0a pps/cpproject*..
xxd command on the file after it is pushed to s3 bucket in AWS:
$ xxd before_install5.sh | head
00000000: 2321 2f62 696e 2f62 6173 680d 0a72 6d20 #!/bin/bash..rm
00000010: 2d72 6620 746f 6d63 6174 382f 7765 6261 -rf tomcat8/weba
00000020: 7070 732f 6370 7072 6f6a 6563 742a 0d0a pps/cpproject*..
I am seeing that the extra characters '0d0a' are added after the code is taken by Jenkins into Local Workspace. It's not a problem with CodeDeploy plugin at this momement, it seems. Is there any way we can avoid this?
Thank you.
If the script is called 'myscript.dos', you can delete the carriage returns with tr:
tr -d '\015' < myscript.dos > myscript.unix
#Charles Duffy, Thank you for your time and patience in resolving this issue. It is finally working. I am just posting the actual problem and solution here so that it will be useful for other.
Work:
I am working on developing a deployment strategy into AWS using Jenkins and AWS CodeDeploy. I am using CodeDeploy plugin in Jenkins. Jenkins will create a war file and CodeDeploy plugin will create a zip file with the war file, appspec.yml and some shell scripts to stop, start servers etc and this zip file will be pushed to S3 and from there CodeDeploy will be triggered and deploy the war file.
Problem:
The shell scripts used to start the server or stop the server are developed in windows system. So it is adding DOS lines to the scripts so they are not running on Unix in EC2 machines. I have removed the DOS lines on the shell scripts by using different techniques. But since Jenkins is running locally (On Windows Machine), when the code is pulled from GutHub it is again adding DOS lines.
Solution: Please check the above questions and the solutions given by all to find out how to identify if DOS line are being added to shell scripts. Since, in my case, the DOS lines are added by Jenkins I have to clear them in Jenkins before it is going to produce a zip file.
Go to Jenkins and configure the shell executable first. Since I have git installed on my system, I have a Shell executable in my system.
Once this is done, go to the Jenkins job created and add "Execute Shell" build step and add the commands shown in the below picture.
This will remove all the DOS lines added to the scripts and your shell scripts will be executed correctly by codedeploy in AWS on Linux system.
Thank you #Charles Duffy and others who helped me in resolving these issues.
Thank you very much,
Subbu.

in ruby in windows, executing the cmd prompt command 'move' gives error "The syntax of the command is incorrect."

in ruby in windows, executing the cmd prompt command 'move' gives error "The syntax of the command is incorrect."
But it works outside of ruby
C:\rubytest>echo asdf>c:\techprogs\azzz.azz
C:\rubytest>del c:\techprogs\azzz.azz
C:\rubytest>echo asdf>c:\techprogs\azzz.azz
C:\rubytest>move /y c:\techprogs\azzz.azz c:\techprogs\autorun.bat
1 file(s) moved.
C:\rubytest>move /y c:\techprogs\azzz.azz c:\techprogs\autorun.bat
The system cannot find the file specified.
C:\rubytest>
All of that above is fine and expected.
Notice I never get an error that says "The syntax of the command is incorrect."
Then try in ruby
I have a simple file with one line
C:\rubytest>type syntaxcommandincorrect.rb
`move /y c:\techprogs\azzz.azz c:\techprogs\autorun.bat`
C:\rubytest>
But it gives that error about the syntax
C:\rubytest>del c:\techprogs\azzz.azz
C:\rubytest>ruby syntaxcommandincorrect.rb
The syntax of the command is incorrect.
C:\rubytest>echo asdf>c:\techprogs\azzz.azz
C:\rubytest>ruby syntaxcommandincorrect.rb
The syntax of the command is incorrect.
C:\rubytest>
The problem here is probably the backslashes which have significant meaning inside of interpolated Ruby strings, double-quoted but also backtick-style shell commands.
As such your command is being interpreted as:
move /y c:^Iechprogs^Gzzz.azz c:^Iechprogs^Gutorun.bat
Where ^I is "\t" which is a tab character, and ^G is "\a" which is a bell character.
Instead:
`move /y c:\\techprogs\\azzz.azz c:\\techprogs\\autorun.bat`
Now remember that Ruby has a very comprehensive library of functions you can use to address this directly. Don't treat it like a fancy shell scripting language:
require 'fileutils'
FileUtils.mv('c:\techprogs\azzz.azz', 'c:\techprogs\autorun.bat', force: true)
Where here I'm using single quotes to avoid the double backslashes and force: true is the equivalent of /y. This uses FileUtils.mv, part of a whole package of useful file and directory manipulation utilities.
As a plus you also get proper exceptions if something goes wrong, or an error code if the move failed.
Added by barlop
Confirming the above. Double backslash fixes it, and i see via doing puts `echo copy /y c:\techprogs...` what happens with single backslash, I see the t of techprogs removed, as c:\techprogs became c:<ascii-9>echprogs. And \autorun became <ascii-7>utorun
C:\rubytest>cmdoutoutwithoutinitbat.rb | xxd
0000000: 6162 6364 6566 670d 0a63 6f70 7920 2f79 abcdefg..copy /y
0000010: 2063 3a09 6563 6870 726f 6773 0775 746f c:.echprogs.uto
0000020: 7275 6e2e 6261 7420 633a 0965 6368 7072 run.bat c:.echpr
0000030: 6f67 7307 7a7a 7a2e 617a 7a0d 0a61 6263 ogs.zzz.azz..abc
0000040: 6465 6667 0d0a 6d6f 7665 202f 7920 633a defg..move /y c:
0000050: 0965 6368 7072 6f67 7307 7a7a 7a2e 617a .echprogs.zzz.az
0000060: 7a20 633a 0965 6368 7072 6f67 7307 7574 z c:.echprogs.ut
0000070: 6f72 756e 2e62 6174 0d0a orun.bat..
C:\rubytest>

SoftQuad DESC or font file binary

I read this question but it doesn't helped me. I am solving a challenge where I have two files, first one was .png which gave me upper half part of an image, second file is SoftQuad DESC or font file binary I am sure that this file should somehow convert into .png file to complete the image. I googled and got hint about magic bytes but I am unable to match the bytes.
These are the first two rows of output of xxd command
00000000: aaaa a6bb 67bb bf18 dd94 15e6 252c 0a2f ....g.......%,./
00000010: fe14 d943 e8b5 6ad5 2264 1632 646e debc ...C..j."d.2dn..
These are the last two rows of output of xxd command
00001c10: 7a05 7f4c 3600 0000 0049 454e 44ae 4260 z..L6....IEND.B`
00001c20: 82
.

netsh add sslcert parameter is incorrect from cmd

Note that, while there is a lot on this issue already, it invariably covers either using this from powershell (where braces and dashes can be an issue) or a typo in the docs where ipport is followed by a colon.
I am in cmd
C:> netsh http add sslcert ipport=0.0.0.0:8180 appid={12345678-db90-4b66-8b01-88f7af2e36bf} certhash=‎1234567890
The parameter is incorrect.
In actual usagge I'm using the correct certhash I got from my certificate store - not the obviously fake one above
So what is going on? Is there a way to get more info?
Explained in my comment:
I'm using the correct certhash… Supposedly "The SHA hash of the certificate. This hash is 20 bytes long and specified as a hex
string" instead of fake 1234567890?
However, there is a harmful format symbol Left-To-Right Mark (Unicode
U+200E) after Equals Sign in your certhash=‎1234567890
Screenshot taken from Unicode Analyzer:
Another way to detect invisible characters using my Alt KeyCode Finder script:
==> mycharmap h=‎1
Ch Unicode Alt? CP IME Alt Alt0 IME 0405/cs-CZ; CP852; ANSI 1250
h U+0068 104 …104… 104 0104 Latin Small Letter H
= U+003D 61 …61… 61 061 Equals Sign
‎ U+200E 8206 …14… Left-To-Right Mark
CP862 he-IL 0253 (ANSI 1255) Hebrew
CP720 ar-EG 0253 (ANSI 1256) Arabic
1 U+0031 49 …49… 49 049 Digit One
h=‎1
==> chcp
Active code page: 852

Strange character for empty line in TextWrangler and cat -v

I have a text file, which on my Mac I open with TextWrangler. I enable the invisible characters to see the line endings. I see that every empty line has a red, upside down question mark in it. Which character is this?
When in the terminal I type cat -v file.txt, it shows these characters as ^# (and the line endings themselves as ^M). What I need to know is the regex of that specific character, like /n for the end of line.
In the hex dump, I see the following:
0000000: 312e 300d 0a00 0d0a 2231 3130 3030 3030 1.0....."1100000
0000010: 3030 3222 3b22 3922 3b22 5354 4422 3b3b 002";"9";"STD";;
0000020: 3b0d 0a22 3131 3030 3030 3030 3639 223b ;.."1100000069";
If I manually remove the strange characters, and make a new hex dump, I see:
0000000: 312e 300d 0a0d 0a22 3131 3030 3030 3030 1.0...."11000000
0000010: 3032 223b 2239 223b 2253 5444 223b 3b3b 02";"9";"STD";;;
0000020: 0d0a 2231 3130 3030 3030 3036 3922 3b22 .."1100000069";"
The difference is a byte sequence 00. Is there an encoding in which this 00 is required for empty lines?
The red inverted question mark, you are looking at, is apparently a NULL / NUL character. Whether or not it makes any difference does depend on the application writing/reading the files in question. (So, it's most likely not a general encoding issue of sorts. Compare: Wikipedia.)
Once you made the hidden characters visible in TextWrangler, you can mark that/any character (or character sequence for that matter), and copy it to the Find input field using CMD + E. The NULL character shows up as \x{00} on my machine.
Alternatively, you might use -> Text -> Zap Gremlins... with (at least) Null (ASCII 0) characters checked, Replace with code selected, and were told \x00. Either one of these should work when searching for these characters - no matter whether grep is enabled or not. Not sure, though, whether \s should actually find it as well in grep mode - it does not on my machine. But \W does grep it.
Please comment, if and as this requires adjustment / further detail.

Resources