I have a script that has a variable that might contain some weird characters: 🍿 ✔. I need to remove them but, honestly, I don't even know where to begin to match those characters. I can't copy and paste them into my script, they just show up as ?? ?. How can a match those characters with sed or awk? I don't have the ability to use perl or php or anything much beyond sed or awk due to system availability.
First, put some flag strings around your special chars and then hexdump -C so you can easily see them. Then use HEX code to write the sed command. For example:
[STEP 118] # cat file
>>>🍿 ✔<<<
[STEP 119] # hexdump -C file
00000000 3e 3e 3e f0 9f 8d bf 20 e2 9c 94 3c 3c 3c 0a |>>>.... ...<<<.|
^^^^^^^^^^^^^^^^^^^^^^^^
[STEP 120] # sed -e $'s/\xf0\x9f\x8d\xbf\x20\xe2\x9c\x94//g' file # need to use the $'...'
>>><<<
[STEP 121] #
Then remove the added flag strings when all is done.
Try this - (file contain some control M and the character that you have mentioned in the question and I am trying to print only the alphanumeric character)
$cat f
hello vipin
street1
pin 12345
🍿 ✔
$awk '/[[:alnum:]]/ {print }' f
hello vipin
street1
pin 12345
Looks like control M character is getting disappeared after saving the input file on SO.
$ cat file
some weird characters: 🍿 ✔. I need to remove
second line of some weird characters: 🍿 ✔. I need to remove
$ tr -c -d '[:print:][:space:]' < file
some weird characters: . I need to remove
second line of some weird characters: . I need to remove
The solution I ended up using was just changing the encoding of the script to UTF-8 instead of ASCII. I did this with notepad++. Then I could work with the character directly instead of some roundabout way of converting to hex (which I couldn't do anyway as the variable is an environmental variable and not from a file) or something else. I also didn't need to use awk or sed as the following was much simpler:
cleaned_var=${environmental_variable//" 🍿 ✔"}
Related
Since my reputation is too low to post an image I will reproduce the terminal
output where my question originated from:
username#computer:/run$ cat rsyslogd.pid
599username#computer:/run$ cat acpid.pid
636
username#computer:/run$
cat acpid.pid
comes with a linebreak whereas
cat rsyslog.pid
doesn't.
But if I open both files there is no visible difference (e.g. the file
acpid.pid
doesn't have an additional blank line)
The Question is: Why does one .pid file come with a linebreak and the other one doesn't?
Addditional Information: My operating system is Ubuntu 18.04.3
The rsyslogd.pid file probably doesn't end with a newline character (ASCII 0x0A).
You didn't mention how you opened the files, but, I suspect you used a text editor which will not display non-printable characters (like newline and backspace). Rather than using a text editor try looking at the raw file with the hexdump tool. Then compare the hex values against an ASCII table. I think you will find that the non-printable characters after the 599 and 636 are different.
hexdump -C rsyslogd.pid
hexdump -C acpid.pid
The following sequence of commands reproduces your output. The key is to use the -n flag for the echo command to create a file without a newline character at the end.
$ echo -n test > file_no_new_line.txt
$ echo test > file_with_new_line.txt
$ cat file_no_new_line.txt
test$ cat file_with_new_line.txt
test
$
Here is the output of hexdump for the two files shown in my example.
$ hexdump -C file_no_new_line.txt
00000000 74 65 73 74 |test|
00000004
$ hexdump -C file_with_new_line.txt
00000000 74 65 73 74 0a |test.|
00000005
$
The command output, in this case from cat, and the shell prompt ($) running into each other is also shell dependent. If the behavior can't be reproduce with the steps above try another shell (e.g. /bin/sh)
I have to read specific lines form a input file received in command line and alter all the lines and append it into another text file.
The input file has these lines(this set of lines apears only once):
00 TOOL | Value of SIMIN:
00 TOOL | /absolute/path/to/some/required/file_1
00 TOOL | /absolute/path/to/some/required/file_2
00 TOOL | /absolute/path/to/some/required/file_3
00 TOOL |
00 TOOL | Value of SIMOUT:
Lines we need to read are in-between two specific patterned.
Starting pattern:
00 TOOL | Value of SIMIN:
Ending pattern:
00 TOOL |
00 TOOL | Value of SIMOUT:
Later I need to convert it to following lines and append it to another file:
export SIMIN=/absolute/path/to/some/required/file_1:$SIMIN
export SIMIN=/absolute/path/to/some/required/file_2:$SIMIN
export SIMIN=/absolute/path/to/some/required/file_3:$SIMIN
Got to know from this post how to read a file line by line.
But in my requirement, I may need to read the file in a buffer at a time, search for the above said line patterns and pick up the middle lines and alter them.
Any help/suggestion will be highly appreciated!
Could you please try following.
awk '
/SIMOUT/{
flag=""
}
/SIMIN/{
flag=1
}
flag && match($0,/\/[^ ]*/){
print "export SIMIN=" substr($0,RSTART,RLENGTH)":$SIMIN"
}
' Input_file
In case you want to take its output into a output file append > output_file to above code.
EDIT: Adding solution from ED sir too here.
awk '/SIMOUT/{f=0} f&&/\//{printf "export SIMIN=%s\n", $NF} /SIMIN/{f=1}' Input_file
I need to check my string variable for presence of extended ASCII characters, one byte, decimal code 128-255. If any is there, replace it with multiple character hex equivalent, ready for further grep command etc.
Example string: "Ørsted\ Salg", I need it to be converted to "\xD8rsted\ Salg".
I know the way to do it with hastable in Bash 4:
declare -A symbolHashTable=(
["Ø"]="D8"
);
currSearchTerm="Ørsted\ Salg"
for curRow in "${!symbolHashTable[#]}"; do
currSearchTerm=$(echo $currSearchTerm | sed s/$curRow/'\\x'${symbolHashTable[$curRow]}/)
done
, but that seems too tedious for 127 cases. There should be a way to do it shorter and probably faster, without writing all the symbols.
I can detect whether the string has any of the characters in it with:
echo $currSearchTerm | grep -P "[\x80-\xFF]"
I am almost sure there is a way to make sed do it, but I get lost somewhere in the "replace with" part.
You can easily do this with Perl:
#!/bin/bash
original='Ørsted'
replaced=$(perl -pe 's/([\x80-\xFF])/"\\x".unpack "H*", $1/eg' <<< "$original")
echo "The original variable's hex encoding is:"
od -t x1 <<< "$original"
echo "Therefore I converted $original into $replaced"
Here's the output when the file and terminal is ISO-8859-1:
The original variable's hex encoding is:
0000000 d8 72 73 74 65 64 0a
0000007
Therefore I converted Ørsted into \xd8rsted
Here's the output when the file and terminal is UTF-8:
The original variable's hex encoding is:
0000000 c3 98 72 73 74 65 64 0a
0000010
Therefore I converted Ørsted into \xc3\x98rsted
In both cases it works as expected.
I have a file containing hex representations of code from a small program, and am trying to actually convert it into the program itself.
For example, here is a sample of such text, stored in a file, input.txt:
8d
00
a1
21
53
57
43
48
0e
00
bb
I am using the following BASh snippet to convert the file to a binary file:
rm outfile; while read h; do echo -n ${h}; echo -ne \\x${h} >> outfile; done < input.txt
After opening the output file in VIM:
¡!SWCH»
And then converting it to hex representation via xxd:
0000000: 8d00 a121 5357 4348 0e00 bb0a ...!SWCH....
This is all good, except for one thing: There is a trailing byte, 0a, trailing at the end of my binary output file. This happens for every program file I work with. How is the trailing 0a being appending to every output binary file? It's not present in my input file.
Thank you.
Simply, use xxd directly from a bash like
xxd outfile > outfile.hex
and you will see, here isn't any 0a.
The 0a is appended somewhere when the vim sends a line to xxd command. If you want convert inside vim - try use
vim -b outfile
what open the outfile in binary mode.
I have some lines in a plat file. Take 2 line for instance:
1 aa bb 05 may 2014 cc G 14-MAY-2014 hello world
j sd az 20140505 sd G 14-MAY-2014 hello world haha
So maybe you have noticed, I can count neither the number of the char, nor the number of the space, because the lines are not well aligned, and the forth field, sometimes it's like 20140505, sometimes it's like 05 may 2014. So what I want, is to try to match the G , or match the 14-MAY-2014. Then I can easily get the following fields: hello world or hello world haha. So Can anyone help me? thank you!
Assuming your lines are in a file called test.txt:
cat test.txt | sed -r 's/^.*-[0-9]{4}\s//'
This is using GNU sed on a Linux system. There are many other ways. Here i simply remove anything up to and including the date from the begiining of the line.
sed -r 's/^.*-[0-9]{4}\s//'
-r = extendes reg ex, makes things like the quantor {4} possible
's/ ... //' = s is for substitute,
it matches the first part and replaces it with the second.
since the resocond part is empty, it's a remove/delete
^ = start of line
.* = any character, any number of times
-[0-9]{4} = a dash, followed by four digits ([0-9]), the year part of the date
\s = any white space
You can make use of lookbehind regex of perl:
perl -lne '/(?<=14-MAY-2014)(.*)/ && print $1' file
It will print anything after 14-MAY-2014.
You can also use grep if it supports -P:
grep -Po '(?<=14-MAY-2014)(.*)' file