remove set of characters surrounding value - bash

I'm redirecting output of an API call to file
however I always get the following characters surrounding the value I need
domainid='^[[39;49;00m^[[33;01m75307d12-e3f4-4a96-ac23-e2a9439f8299^[[39;49;00m'
Desired output
domainid='75307d12-e3f4-4a96-ac23-e2a9439f8299'
I really have no idea how to clean the output and make it look like the above.
Any suggestions will be highly appreciated.
Thank you

Those are ANSI control characters, or escape sequences, and they typically are used to add colors, underline, and so forth to your output.
First order of business is to check if your API command line tool supports a no-color mode. That would solve your problem at the source.
Barring that, try this Server Fault answer, which has a command to clear ANSI sequences out of a text file using sed.

You could remove the undesired characters by replacing the line with just the submatches you want to keep:
... | sed -r "s/(domainid=).*([0-9a-f]{8}(-[0-9a-f]{4}){3}-[0-9a-f]{12}).*/\1'\2'/i"

Related

How to handle two dashes in ReST

I'm using Sphinx to document a command line utility written in Python. I want to be able to document a command line option, such as --region like this:
**--region** <region_name>
in ReST and then use Sphinx to to generate my HTML and man pages for me.
This works great when generating man pages but in the generated HTML, the -- gets turned into - which is incorrect. I have found that if I change my source ReST document to look like this:
**---region** <region_name>
The HTML generates correctly but now my man pages have --- instead of --. Also incorrect.
I've tried escaping the dashes with a backslash character (e.g. \-\-) but that had no effect.
Any help would be much appreciated.
This is a configuration option in Sphinx that is on by default: the html_use_smartypants option (http://sphinx-doc.org/config.html?highlight=dash#confval-html_use_smartypants).
If you turn off the option, then you will have to use the Unicode character '–' if you want an en-dash.
With
**-\\-region** <region_name>
it should work.
In Sphinx 1.6 html_use_smartypants has been deprecated, and it is no longer necessary to set html_use_smartypants = False in your conf.py or as an argument to sphinx-build. Instead you should use smart_quotes = False.
If you want to use the transformations formerly provided by html_use_smartypants, instead it is recommended to use smart_quotes, e.g., smart_quotes = True.
Note that at the time of this writing Read the Docs pins sphinx==1.5.3, which does not support the smart_quotes option. Until then, you'll need to continue using html_use_smartypants.
EDIT It appears that Sphinx now uses smartquotes instead of docutils smart_quotes. h/t #bad_coder.
To add two dashes, add the following:
.. include:: <isotech.txt>
|minus|\ |minus|\ region
Note the backward-slash and the space. This avoids having a space between the minus signs and the name of the parameter.
You only need to include isotech.txt once per page.
With this solution, you can keep the extension smartypants and write two dashes in every part of the text you need. Not just in option lists or literals.
As commented by #mzjn, the best way to address the original submitter's need is to use Option Lists.
The format is simple: a sequence of lines that start with -, --, + or /, followed by the actual option, (at least) two spaces and then the option's description:
-l long listing
-r reversed sorting
-t sort by time
--all do not ignore entries starting with .
The number of spaces between option and description may vary by line, it just needs to be at least two, which allows for a clear presentation (as above) on the source, as well as on the generated document.
Option Lists have syntax for an option argument as well (just put an additional word or several words enclosed in <> before the two spaces); see the linked page for details.
The other answers on this page targeted the original submitter's question, this one addresses their actual need.

Windows SED command - simple search and replace without regex

How should I use 'sed' command to find and replace the given word/words/sentence without considering any of them as the special character?
In other words hot to treat find and replace parameters as the plain text.
In following example I want to replace 'sagar' with '+sagar' then I have to give following command
sed "s/sagar/\\+sagar#g"
I know that \ should be escaped with another \ ,but I can't do this manipulation.
As there are so many special characters and theie combinations.
I am going to take find and replace parameters as input from user screen.
I want to execute the sed from c# code.
Simply, I do not want regular expression of sed to use. I want my command text to be treated as plain text?
Is this possible?
If so how can I do it?
While there may be sed versions that have an option like --noregex_matching, most of them don't have that option. Because you're getting the search and replace input by prompting a user, you're best bet is to scan the user input strings for reg-exp special characters and escape them as appropriate.
Also, will your users expect for example, their all caps search input to correctly match and replace a lower or mixed case string? In that case, recall that you could rewrite their target string as [Ss][Aa][Gg][Aa][Rr], and replace with +Sagar.
Note that there are far fewer regex characters used on the replacement side, with '&' meaning "complete string that was matched", and then the numbered replacment groups, like \1,\2,.... Given users that have no knowledge or expectation that they can use such characters, the likelyhood of them using is \1 in their required substitution is pretty low. More likely they may have a valid use for &, so you'll have to scan (at least) for that and replace with \&. In a basic sed, that's about it. (There may be others in the latest gnu seds, or some of the seds that have the genesis as PC tools).
For a replacement string, you shouldn't have to escape the + char at all. Probably yes for \. Again, you can scan your user's "naive" input, and add escape chars as need.
Finally if you're doing this for a "package" that will be distributed, and you'll be relying on the users' version of sed, beware that there are many versions of sed floating around, some that have their roots in Unix/Linux, and others, particularly of super-sed, that (I'm pretty sure) got started as PC-standalones and has a very different feature set.
IHTH.

Best way to read output of shell command

In Vim, What is the best (portable and fast) way to read output of a shell command? This output may be binary and thus contain nulls and (not) have trailing newline which matters. Current solutions I see:
Use system(). Problems: does not work with NULLs.
Use :read !. Problems: won’t save trailing newline, tries to be smart detecting output format (dos/unix/mac).
Use ! with redirection to temporary file, then readfile(, "b") to read it. Problems: two calls for fs, shellredir option also redirects stderr by default and it should be less portable ('shellredir' is mentioned here because it is likely to be set to a valid value).
Use system() and filter outputs through xxd. Problems: very slow, least portable (no equivalent of 'shellredir' for pipes).
Any other ideas?
You are using a text editor. If you care about NULs, trailing EOLs and (possibly) conflicting encodings, you need to use a hex editor anyway?
If I need this amount of control of my operations, I use the xxd route indeed, with
:se binary
One nice option you seem to miss is insert mode expression register insertion:
C-r=system('ls -l')Enter
This may or may not be smarter/less intrusive about character encoding business, but you could try it if it is important enough for you.
Or you could use Perl or Python support to effectively use popen
Rough idea:
:perl open(F, "ls /tmp/ |"); my #lines = (<F>); $curbuf->Append(0, #lines)

How to escape unicode characters in bash prompt correctly

I have a specific method for my bash prompt, let's say it looks like this:
CHAR="༇ "
my_function="
prompt=\" \[\$CHAR\]\"
echo -e \$prompt"
PS1="\$(${my_function}) \$ "
To explain the above, I'm builidng my bash prompt by executing a function stored in a string, which was a decision made as the result of this question. Let's pretend like it works fine, because it does, except when unicode characters get involved
I am trying to find the proper way to escape a unicode character, because right now it messes with the bash line length. An easy way to test if it's broken is to type a long command, execute it, press CTRL-R and type to find it, and then pressing CTRL-A CTRL-E to jump to the beginning / end of the line. If the text gets garbled then it's not working.
I have tried several things to properly escape the unicode character in the function string, but nothing seems to be working.
Special characters like this work:
COLOR_BLUE=$(tput sgr0 && tput setaf 6)
my_function="
prompt="\\[\$COLOR_BLUE\\] \"
echo -e \$prompt"
Which is the main reason I made the prompt a function string. That escape sequence does NOT mess with the line length, it's just the unicode character.
The \[...\] sequence says to ignore this part of the string completely, which is useful when your prompt contains a zero-length sequence, such as a control sequence which changes the text color or the title bar, say. But in this case, you are printing a character, so the length of it is not zero. Perhaps you could work around this by, say, using a no-op escape sequence to fool Bash into calculating the correct line length, but it sounds like that way lies madness.
The correct solution would be for the line length calculations in Bash to correctly grok UTF-8 (or whichever Unicode encoding it is that you are using). Uhm, have you tried without the \[...\] sequence?
Edit: The following implements the solution I propose in the comments below. The cursor position is saved, then two spaces are printed, outside of \[...\], then the cursor position is restored, and the Unicode character is printed on top of the two spaces. This assumes a fixed font width, with double width for the Unicode character.
PS1='\['"`tput sc`"'\] \['"`tput rc`"'༇ \] \$ '
At least in the OSX Terminal, Bash 3.2.17(1)-release, this passes cursory [sic] testing.
In the interest of transparency and legibility, I have ignored the requirement to have the prompt's functionality inside a function, and the color coding; this just changes the prompt to the character, space, dollar prompt, space. Adapt to suit your somewhat more complex needs.
#tripleee wins it, posting the final solution here because it's a pain to post code in comments:
CHAR="༇"
my_function="
prompt=\" \\[`tput sc`\\] \\[`tput rc`\\]\\[\$CHAR\\] \"
echo -e \$prompt"
PS1="\$(${my_function}) \$ "
The trick as pointed out in #tripleee's link is the use of the commands tput sc and tput rc which save and then restore the cursor position. The code is effectively saving the cursor position, printing two spaces for width, restoring the cursor position to before the spaces, then printing the special character so that the width of the line is from the two spaces, not the character.
(Not the answer to your problem, but some pointers and general experience related to your issue.)
I see the behaviour you describe about cmd-line editing (Ctrl-R, ... Cntrl-A Ctrl-E ...) all the time, even without unicode chars.
At one work-site, I spent the time to figure out the diff between the terminals interpretation of the TERM setting VS the TERM definition used by the OS (well, stty I suppose).
NOW, when I have this problem, I escape out of my current attempt to edit the line, bring the line up again, and then immediately go to the 'vi' mode, which opens the vi editor. (press just the 'v' char, right?). All the ease of use of a full-fledged session of vi; why go with less ;-)?
Looking again at your problem description, when you say
my_function="
prompt=\" \[\$CHAR\]\"
echo -e \$prompt"
That is just a string definition, right? and I'm assuming your simplifying the problem definition by assuming this is the output of your my_function. It seems very likely in the steps of creating the function definition, calling the function AND using the values returned are a lot of opportunities for shell-quoting to not work the way you want it to.
If you edit your question to include the my_function definition, and its complete use (reducing your function to just what is causing the problem), it may be easier for others to help with this too. Finally, do you use set -vx regularly? It can help show how/wnen/what of variable expansions, you may find something there.
Failing all of those, look at Orielly termcap & terminfo. You may need to look at the man page for your local systems stty and related cmds AND you may do well to look for user groups specific to you Linux system (I'm assuming you use a Linux variant).
I hope this helps.

How to enumerate unique characters in a UTF-8 document? With sed?

I'm converting some Polish<->English dictionaries from RTF to HTML. The Polish special characters are coming out fine. But IPA (International Phonetic Alphabet) glyphs get changed to funny things, depending on what program I use for conversion. For example, /ˈbiːrɪ/ comes out as /ÈbiùrI/ or /∪βιρΙ/.
I'd like to correct these documents with a search & replace, but I want to make sure I don't miss any characters and don't want to manually pore over dictionary entries. I'd like to output a list of all unique, NON-ascii characters in a document.
I found this thread:
Find Unique Characters in a File
... and I tried the following two proposals:
sed -e "s/./\0\n/g" inputfile | sort -u
sed -e "s/(.)/\1\n/g" inputfile | sort -u
They both work nicely, and seem to both generate the same output. My problem is that they only output standard ASCII characters, and what I'm looking for is exactly the opposite.
The sed tool looks awesome, but I don't have time to learn it right now (though I intend to later). I'm hoping the solution will be clear to someone who's already mastered this tool, and they can save me a lot of time. [-:
Thanks in advance!
This is not a sed solution but a Python solution. It reads the contents of a file, takes it as UTF-8 and then turns it into a set (thus throwing away duplicates), throws away ASCII characters (0-127), sorts it and then joins it back together again with a blank line between each character:
'\n'.join(sorted(set(unicode(open(inputfile).read(), 'utf-8')) - set(chr(i) for i in xrange(128))))
As something you'd run from the command line if you felt so inclined,
python -c "print '\n'.join(sorted(set(unicode(open('inputfile').read(), 'utf-8')) - set(chr(i) for i in xrange(128))))"
(You could also use ''.join instead of '\n'.join which would list the characters without a newline in between.)

Resources