how to print user1 from user1#10.129.12.121 using shell scripting or sed - bash

I wanted to print the name from the entire address by shell scripting. So user1#12.12.23.234 should give output "user1" and similarly 11234#12.123.12.23 should give output 11234

Reading from the terminal:
$ IFS=# read user host && echo "$user"
<user1#12.12.23.234>
user1
Reading from a variable:
$ address='user1#12.12.23.234'
$ cut -d# -f1 <<< "$address"
user1
$ sed 's/#.*//' <<< "$address"
user1
$ awk -F# '{print $1}' <<< "$address"
user1

Using bash in place editing:
EMAIL='user#server.com'
echo "${EMAIL%#*}
This is a Bash built-in, so it might not be very portable (it won't run with sh if it's not linked to /bin/bash for example), but it is probably faster since it doesn't fork a process to handle the editing.
Using sed:
echo "$EMAIL" | sed -e 's/#.*//'
This tells sed to replace the # character and as many characters that it can find after it up to the end of line with nothing, ie. removing everything after the #.
This option is probably better if you have multiple emails stored in a file, then you can do something like
sed -e 's/#.*//' emails.txt > users.txt
Hope this helps =)

I tend to use expr for this kind of thing:
address='user1#12.12.23.234'
expr "$address" : '\([^#]*\)'
This is a use of expr for its pattern matching and extraction abilities. Translated, the above says: Please print out the longest prefix of $address that doesn't contain an #.
The expr tool is covered by Posix, so this should be pretty portable.
As a note, some historical versions of expr will interpret an argument with a leading - as an option. If you care about guarding against that, you can add an extra letter to the beginning of the string, and just avoid matching it, like so:
expr "x$address" : 'x\([^#]*\)'

Related

How to remove the username/hostname line from an output on Korn Shell?

I run the command
df -gP /data1 /data2 | grep -v File | awk '{print $1}' |
awk -F/dev/ '$0=$2' | tr '\n' '
on the AIX shell (ksh) and it prints the output below:
lv_data01 lv_data02 root#testhost:/
However, I would like the output to be printed this way. Could someone help?
lv_data01 lv_data02
Using grep … | awk … | awk … is not necessary; a single awk could do the whole job. So could sed and it might even be easier. I'd be tempted to deal with the spacing by using:
x=$(df … | sed …); echo $x
The tr command, once corrected, replaces newlines with spaces, so the prompt follows without a newline before it. The ; echo suggestion adds the missing newline; the echo $x suggestion (note no double quotes) does too.
As for the sed command:
sed -n '/File/!{ s/[[:space:]].*//; s%^.*/dev/%%p; }'
Don't print anything by default
If the line doesn't match File (doing the work of grep -v):
remove the first space (blank or tab) and everything after it (doing the work of awk '{print $1}')
replace everything up to /dev/ with nothing and print (doing the work of awk -F/dev/ '{$0=$2}')
The command substitution and capture, followed by echo, deals with spaces and newlines.
So, my suggested solution is:
x=$(df -gP /data1 /data2 | sed -n '/File/!{ s/[[:space:]].*//; s%^.*/dev/%%p; }'); echo $x
You could add unset x after the echo if you are going to be using this directly in the shell and not in a shell script. If it'll be encapsulated in a shell script, you don't have to worry about it.
I'm blithely assuming the output from df -gP won't contain a path such as this, with two occurrences of /dev:
/who/knows/dev/lv_data01/dev/bin
If that's a real problem, you can fix the sed script, but I don't think it will be. It's one thing the second awk script in the question handles differently.

How to insert one character in front of a variable using sed

I want to turn this input_variable = 1
into input_variable = 01
From previous posts here I tried this but didn't work:
sed -e "s/\0" <<< "$input_variable"
I get:
Syntax error: redirection unexpected
What do I do wrong?
Thanks!
EDIT
Thanks to Benjamin I found a workaround (I would still like to know why the sed didn't work):
new_variable="0$input_variable"
While it can be done with sed, simple assignment in your script can do exactly what you want done. For example, if you have input_variable=1 and want input_variable=01, you can simply add a leading 0 by assignment:
input_variable="0${input_variable}"
or for additional types of numeric formatting you can use the printf -v option and take advantage of the format-specifiers provided by the printf function. For example:
printf -v input_variable "%02d" $input_variable
will zero-pad input_variable to a length of 2 (or any width you specify with the field-width modifier). You can also just add the leading zero regardless of the width with:
printf -v input_variable "0%s" $input_variable
sed is an excellent tool, but it isn't really the correct tool for this job.
You don't close the substitution command. Each substitution command must contain 3 delimiters
sed -e 's/pattern/replacement/' <<< 'text' # 3 backslashes
What you want to do could be done with:
sed -e 's/.*/0&/' <<< $input_variable
EDIT:
You are probably using Ubuntu and stumbled upon dash also known as the Almquist shell, which does not have the <<< redirection operator. The following would be a POSIX-compliant alternative, which works with dash as well:
sed -e 's/.*/0&/' <<~
$input_variable
~
And also this:
echo $input_variable | sed -e 's/.*/0&/'
To have the variable take on the new value, do this:
input_variable=$(echo $input_variable | sed -e 's/.*/0&/')
That's however not how you would write the shell script. Shell scripts usually give out some textual output, rather than setting external variables:
So, the script, let's call it append_zero.sh:
#!/bin/sh
echo $1 | sed 's/.*/0&/'
and you would execute it like this:
$ input_variable=1
$ input_variable=$(append_zero.sh input_variable)
$ echo $input_variable
01
This way you have a working shell script that you can reuse with any Unix system that has a POSIX compliant /bin/sh

Optimize shell script for multiple sed replacements

I have a file containing a list of replacement pairs (about 100 of them) which are used by sed to replace strings in files.
The pairs go like:
old|new
tobereplaced|replacement
(stuffiwant).*(too)|\1\2
and my current code is:
cat replacement_list | while read i
do
old=$(echo "$i" | awk -F'|' '{print $1}') #due to the need for extended regex
new=$(echo "$i" | awk -F'|' '{print $2}')
sed -r "s/`echo "$old"`/`echo "$new"`/g" -i file
done
I cannot help but think that there is a more optimal way of performing the replacements. I tried turning the loop around to run through lines of the file first but that turned out to be much more expensive.
Are there any other ways of speeding up this script?
EDIT
Thanks for all the quick responses. Let me try out the various suggestions before choosing an answer.
One thing to clear up: I also need subexpressions/groups functionality. For example, one replacement I might need is:
([0-9])U|\10 #the extra brackets and escapes were required for my original code
Some details on the improvements (to be updated):
Method: processing time
Original script: 0.85s
cut instead of awk: 0.71s
anubhava's method: 0.18s
chthonicdaemon's method: 0.01s
You can use sed to produce correctly -formatted sed input:
sed -e 's/^/s|/; s/$/|g/' replacement_list | sed -r -f - file
I recently benchmarked various string replacement methods, among them a custom program, sed -e, perl -lnpe and an probably not that widely known MySQL command line utility, replace. replace being optimized for string replacements was almost an order of magnitude faster than sed. The results looked something like this (slowest first):
custom program > sed > LANG=C sed > perl > LANG=C perl > replace
If you want performance, use replace. To have it available on your system, you'll need to install some MySQL distribution, though.
From replace.c:
Replace strings in textfile
This program replaces strings in files or from stdin to stdout. It accepts a list of from-string/to-string pairs and replaces each occurrence of a from-string with the corresponding to-string. The first occurrence of a found string is matched. If there is more than one possibility for the string to replace, longer matches are preferred before shorter matches.
...
The programs make a DFA-state-machine of the strings and the speed isn't dependent on the count of replace-strings (only of the number of replaces). A line is assumed ending with \n or \0. There are no limit exept memory on length of strings.
More on sed. You can utilize multiple cores with sed, by splitting your replacements into #cpus groups and then pipe them through sed commands, something like this:
$ sed -e 's/A/B/g; ...' file.txt | \
sed -e 's/B/C/g; ...' | \
sed -e 's/C/D/g; ...' | \
sed -e 's/D/E/g; ...' > out
Also, if you use sed or perl and your system has an UTF-8 setup, then it also boosts performance to place a LANG=C in front of the commands:
$ LANG=C sed ...
You can cut down unnecessary awk invocations and use BASH to break name-value pairs:
while IFS='|' read -r old new; do
# echo "$old :: $new"
sed -i "s~$old~$new~g" file
done < replacement_list
IFS='|' will give enable read to populate name-value in 2 different shell variables old and new.
This is assuming ~ is not present in your name-value pairs. If that is not the case then feel free to use an alternate sed delimiter.
Here is what I would try:
store your sed search-replace pair in a Bash array like ;
build your sed command based on this array using parameter expansion
run command.
patterns=(
old new
tobereplaced replacement
)
pattern_count=${#patterns[*]} # number of pattern
sedArgs=() # will hold the list of sed arguments
for (( i=0 ; i<$pattern_count ; i=i+2 )); do # don't need to loop on the replacement…
search=${patterns[i]};
replace=${patterns[i+1]}; # … here we got the replacement part
sedArgs+=" -e s/$search/$replace/g"
done
sed ${sedArgs[#]} file
This result in this command:
sed -e s/old/new/g -e s/tobereplaced/replacement/g file
You can try this.
pattern=''
cat replacement_list | while read i
do
old=$(echo "$i" | awk -F'|' '{print $1}') #due to the need for extended regex
new=$(echo "$i" | awk -F'|' '{print $2}')
pattern=${pattern}"s/${old}/${new}/g;"
done
sed -r ${pattern} -i file
This will run the sed command only once on the file with all the replacements. You may also want to replace awk with cut. cut may be more optimized then awk, though I am not sure about that.
old=`echo $i | cut -d"|" -f1`
new=`echo $i | cut -d"|" -f2`
You might want to do the whole thing in awk:
awk -F\| 'NR==FNR{old[++n]=$1;new[n]=$2;next}{for(i=1;i<=n;++i)gsub(old[i],new[i])}1' replacement_list file
Build up a list of old and new words from the first file. The next ensures that the rest of the script isn't run on the first file. For the second file, loop through the list of replacements and perform them each one by one. The 1 at the end means that the line is printed.
{ cat replacement_list;echo "-End-"; cat YourFile; } | sed -n '1,/-End-/ s/$/³/;1h;1!H;$ {g
t again
:again
/^-End-³\n/ {s///;b done
}
s/^\([^|]*\)|\([^³]*\)³\(\n\)\(.*\)\1/\1|\2³\3\4\2/
t again
s/^[^³]*³\n//
t again
:done
p
}'
More for fun to code via sed. Try maybe for a time perfomance because this start only 1 sed that is recursif.
for posix sed (so --posix with GNU sed)
explaination
copy replacement list in front of file content with a delimiter (for line with ³ and for list with -End-) for an easier sed handling (hard to use \n in class character in posix sed.
place all line in buffer (add the delimiter of line for replacement list and -End- before)
if this is -End-³, remove the line and go to final print
replace each first pattern (group 1) found in text by second patttern (group 2)
if found, restart (t again)
remove first line
restart process (t again). T is needed because b does not reset the test and next t is always true.
Thanks to #miku above;
I have a 100MB file with a list of 80k replacement-strings.
I tried various combinations of sed's sequentially or parallel, but didn't see throughputs getting shorter than about a 20-hour runtime.
Instead I put my list into a sequence of scripts like "cat in | replace aold anew bold bnew cold cnew ... > out ; rm in ; mv out in".
I randomly picked 1000 replacements per file, so it all went like this:
# first, split my replace-list into manageable chunks (89 files in this case)
split -a 4 -l 1000 80kReplacePairs rep_
# next, make a 'replace' script out of each chunk
for F in rep_* ; do \
echo "create and make executable a scriptfile" ; \
echo '#!/bin/sh' > run_$F.sh ; chmod +x run_$F.sh ; \
echo "for each chunk-file line, strip line-ends," ; \
echo "then with sed, turn '{long list}' into 'cat in | {long list}' > out" ; \
cat $F | tr '\n' ' ' | sed 's/^/cat in | replace /;s/$/ > out/' >> run_$F.sh ;
echo "and append commands to switch in and out files, for next script" ; \
echo -e " && \\\\ \nrm in && mv out in\n" >> run_$F.sh ; \
done
# put all the replace-scripts in sequence into a main script
ls ./run_rep_aa* > allrun.sh
# make it executable
chmod +x allrun.sh
# run it
nohup ./allrun.sh &
.. which ran in under 5 mins, a lot less than 20 hours !
Looking back, I could have used more pairs per script, by finding how many lines would make up the limit.
xargs --show-limits </dev/null 2>&1 | grep --color=always "actually use:"
Maximum length of command we could actually use: 2090490
So just under 2MB; how many pairs would that be for my script ?
head -c 2090490 80kReplacePairs | wc -l
76923
So it seems I could have used 2 * 40000-line chunks
to expand on chthonicdaemon's solution
live demo
#! /bin/sh
# build regex from text file
REGEX_FILE=some-patch.regex.diff
# test
# set these with "export key=val"
SOME_VAR_NAME=hello
ANOTHER_VAR_NAME=world
escape_b() {
echo "$1" | sed 's,/,\\/,g'
}
regex="$(
(echo; cat "$REGEX_FILE"; echo) \
| perl -p -0 -e '
s/\n#[^\n]*/\n/g;
s/\(\(SOME_VAR_NAME\)\)/'"$(escape_b "$SOME_VAR_NAME")"'/g;
s/\(\(ANOTHER_VAR_NAME\)\)/'"$(escape_b "$ANOTHER_VAR_NAME")"'/g;
s/([^\n])\//\1\\\//g;
s/\n-([^\n]+)\n\+([^\n]*)(?:\n\/([^\n]+))?\n/s\/\1\/\2\/\3;\n/g;
'
)"
echo "regex:"; echo "$regex" # debug
exec perl -00 -p -i -e "$regex" "$#"
prefixing lines with -+/ allows empty "plus" values, and protects leading whitespace from buggy text editors
sample input: some-patch.regex.diff
# file format is similar to diff/patch
# this is a comment
# replace all "a/a" with "b/b"
-a/a
+b/b
/g
-a1|a2
+b1|b2
/sg
# this is another comment
-(a1).*(a2)
+b\1b\2b
-a\na\na
+b
-a1-((SOME_VAR_NAME))-a2
+b1-((ANOTHER_VAR_NAME))-b2
sample output
s/a\/a/b\/b/g;
s/a1|a2/b1|b2/;;
s/(a1).*(a2)/b\1b\2b/;
s/a\na\na/b/;
s/a1-hello-a2/b1-world-b2/;
this regex format is compatible with sed and perl
since miku mentioned mysql replace:
replacing fixed strings with regex is non-trivial,
since you must escape all regex chars,
but you also must handle backslash escapes ...
naive escaper:
echo '\(\n' | perl -p -e 's/([.+*?()\[\]])/\\\1/g'
\\(\n

How can I strip first X characters from string using sed?

I am writing shell script for embedded Linux in a small industrial box. I have a variable containing the text pid: 1234 and I want to strip first X characters from the line, so only 1234 stays. I have more variables I need to "clean", so I need to cut away X first characters and ${string:5} doesn't work for some reason in my system.
The only thing the box seems to have is sed.
I am trying to make the following to work:
result=$(echo "$pid" | sed 's/^.\{4\}//g')
Any ideas?
The following should work:
var="pid: 1234"
var=${var:5}
Are you sure bash is the shell executing your script?
Even the POSIX-compliant
var=${var#?????}
would be preferable to using an external process, although this requires you to hard-code the 5 in the form of a fixed-length pattern.
Here's a concise method to cut the first X characters using cut(1). This example removes the first 4 characters by cutting a substring starting with 5th character.
echo "$pid" | cut -c 5-
Use the -r option ("use extended regular expressions in the script") to sed in order to use the {n} syntax:
$ echo 'pid: 1234'| sed -r 's/^.{5}//'
1234
Cut first two characters from string:
$ string="1234567890"; echo "${string:2}"
34567890
pipe it through awk '{print substr($0,42)}' where 42 is one more than the number of characters to drop. For example:
$ echo abcde| awk '{print substr($0,2)}'
bcde
$
Chances are, you'll have cut as well. If so:
[me#home]$ echo "pid: 1234" | cut -d" " -f2
1234
Well, there have been solutions here with sed, awk, cut and using bash syntax. I just want to throw in another POSIX conform variant:
$ echo "pid: 1234" | tail -c +6
1234
-c tells tail at which byte offset to start, counting from the end of the input data, yet if the the number starts with a + sign, it is from the beginning of the input data to the end.
Another way, using cut instead of sed.
result=`echo $pid | cut -c 5-`
I found the answer in pure sed supplied by this question (admittedly, posted after this question was posted). This does exactly what you asked, solely in sed:
result=\`echo "$pid" | sed '/./ { s/pid:\ //g; }'\``
The dot in sed '/./) is whatever you want to match. Your question is exactly what I was attempting to, except in my case I wanted to match a specific line in a file and then uncomment it. In my case it was:
# Uncomment a line (edit the file in-place):
sed -i '/#\ COMMENTED_LINE_TO_MATCH/ { s/#\ //g; }' /path/to/target/file
The -i after sed is to edit the file in place (remove this switch if you want to test your matching expression prior to editing the file).
(I posted this because I wanted to do this entirely with sed as this question asked and none of the previous answered solved that problem.)
Rather than removing n characters from the start, perhaps you could just extract the digits directly. Like so...
$ echo "pid: 1234" | grep -Po "\d+"
This may be a more robust solution, and seems more intuitive.
This will do the job too:
echo "$pid"|awk '{print $2}'

String Manipulation in Bash

I am a newbie in Bash and I am doing some string manipulation.
I have the following file among other files in my directory:
jdk-6u20-solaris-i586.sh
I am doing the following to get jdk-6u20 in my script:
myvar=`ls -la | awk '{print $9}' | egrep "i586" | cut -c1-8`
echo $myvar
but now I want to convert jdk-6u20 to jdk1.6.0_20. I can't seem to figure out how to do it.
It must be as generic as possible. For example if I had jdk-6u25, I should be able to convert it at the same way to jdk1.6.0_25 so on and so forth
Any suggestions?
Depending on exactly how generic you want it, and how standard your inputs will be, you can probably use AWK to do everything. By using FS="regexp" to specify field separators, you can break down the original string by whatever tokens make the most sense, and put them back together in whatever order using printf.
For example, assuming both dashes and the letter 'u' are only used to separate fields:
myvar="jdk-6u20-solaris-i586.sh"
echo $myvar | awk 'BEGIN {FS="[-u]"}; {printf "%s1.%s.0_%s",$1,$2,$3}'
Flavour according to taste.
Using only Bash:
for file in jdk*i586*
do
file="${file%*-solaris*}"
file="${file/-/1.}"
file="${file/u/.0_}"
do_something_with "$file"
done
i think that sed is the command for you
You can try this snippet:
for fname in *; do
newname=`echo "$fname" | sed 's,^jdk-\([0-9]\)u\([0-9][0-9]*\)-.*$,jdk1.\1.0_\2,'`
if [ "$fname" != "$newname" ]; then
echo "old $fname, new $newname"
fi
done
awk 'if(match($9,"i586")){gsub("jdk-6u20","jdk1.6.0_20");print $9;}'
The if(match()) supersedes the egrep bit if you want to use it. You could use substr($9,1,8) instead of cut as well.
garph0 has a good idea with sed; you could do
myvar=`ls jdk*i586.sh | sed 's/jdk-\([0-9]\)u\([0-9]\+\).\+$/jdk1.\1.0_\2/'`
You're needing the awk in there is an artifact of the -l switch on ls. For pattern substitution on lines of text, sed is the long-time champion:
ls | sed -n '/^jdk/s/jdk-\([0-9][0-9]*\)u\([0-9][0-9]*\)$/jdk1.\1.0_\2/p'
This was written in "old-school" sed which should have greater portability across platforms. The expression says:
don't print lines unless they match -n
on lines beginning with 'jdk' do:
on a line that contains only "jdk-IntegerAuIntegerB"
change it to "jdk.1.IntegerA.0_IntegerB"
and print it
Your sample becomes even simpler as:
myvar=`echo *solaris-i586.sh | sed 's/-solaris-i586\.sh//'`

Resources