How can I strip first X characters from string using sed? - bash

I am writing shell script for embedded Linux in a small industrial box. I have a variable containing the text pid: 1234 and I want to strip first X characters from the line, so only 1234 stays. I have more variables I need to "clean", so I need to cut away X first characters and ${string:5} doesn't work for some reason in my system.
The only thing the box seems to have is sed.
I am trying to make the following to work:
result=$(echo "$pid" | sed 's/^.\{4\}//g')
Any ideas?

The following should work:
var="pid: 1234"
var=${var:5}
Are you sure bash is the shell executing your script?
Even the POSIX-compliant
var=${var#?????}
would be preferable to using an external process, although this requires you to hard-code the 5 in the form of a fixed-length pattern.

Here's a concise method to cut the first X characters using cut(1). This example removes the first 4 characters by cutting a substring starting with 5th character.
echo "$pid" | cut -c 5-

Use the -r option ("use extended regular expressions in the script") to sed in order to use the {n} syntax:
$ echo 'pid: 1234'| sed -r 's/^.{5}//'
1234

Cut first two characters from string:
$ string="1234567890"; echo "${string:2}"
34567890

pipe it through awk '{print substr($0,42)}' where 42 is one more than the number of characters to drop. For example:
$ echo abcde| awk '{print substr($0,2)}'
bcde
$

Chances are, you'll have cut as well. If so:
[me#home]$ echo "pid: 1234" | cut -d" " -f2
1234

Well, there have been solutions here with sed, awk, cut and using bash syntax. I just want to throw in another POSIX conform variant:
$ echo "pid: 1234" | tail -c +6
1234
-c tells tail at which byte offset to start, counting from the end of the input data, yet if the the number starts with a + sign, it is from the beginning of the input data to the end.

Another way, using cut instead of sed.
result=`echo $pid | cut -c 5-`

I found the answer in pure sed supplied by this question (admittedly, posted after this question was posted). This does exactly what you asked, solely in sed:
result=\`echo "$pid" | sed '/./ { s/pid:\ //g; }'\``
The dot in sed '/./) is whatever you want to match. Your question is exactly what I was attempting to, except in my case I wanted to match a specific line in a file and then uncomment it. In my case it was:
# Uncomment a line (edit the file in-place):
sed -i '/#\ COMMENTED_LINE_TO_MATCH/ { s/#\ //g; }' /path/to/target/file
The -i after sed is to edit the file in place (remove this switch if you want to test your matching expression prior to editing the file).
(I posted this because I wanted to do this entirely with sed as this question asked and none of the previous answered solved that problem.)

Rather than removing n characters from the start, perhaps you could just extract the digits directly. Like so...
$ echo "pid: 1234" | grep -Po "\d+"
This may be a more robust solution, and seems more intuitive.

This will do the job too:
echo "$pid"|awk '{print $2}'

Related

"grep"ing first 12 of last 24 character from a line

I am trying to extract "first 12 of last 24 character" from a line, i.e.,
for a line:
species,subl,cmp= 1 4 1 s1,torque= 0.41207E-09-0.45586E-13
I need to extract "0.41207E-0".
(I have not written the code, so don't curse me for its formatting. )
I have managed to do this via:
var_s=`grep "species,subl,cmp= $3 $4 $5" $tfile |sed -n '$s/.*\(........................\)$/\1/p'|sed -n '$s/\(............\).*$/\1/p'`
but, is there any more readable way of doing this, rather then counting dots?
EDIT
Thanks to both of you;
so, I have sed,awk grep and bash.
I will run that in loop, for 100's of file.
so, can you also suggest me which one is most efficient, wrt time?
One way with GNU sed (without counting dots):
$ sed -r 's/.*(.{11}).{12}/\1/' file
0.41207E-09
Similarly with GNU grep:
$ grep -Po '.{11}(?=.{12}$)' file
0.41207E-09
Perhaps a python solution may also be helpful:
python -c 'import sys;print "\n".join([a[-24:-13] for a in sys.stdin])' < file
0.41207E-09
I'm not sure your example data and question match up so just change the values in the {n} quantifier accordingly.
Simplest is using pure bash:
echo "${str:(-24):12}"
OR awk can also do that:
awk '{print substr($0, length($0)-23, 12)}' <<< $str
OUTPUT:
0.41207E-09
EDIT: For using bash solution on a file:
while read l; do echo "${l:(-24):12}"; done < file
Another one, less efficient but has the advantage of making you discover new tools
`echo "$str" | rev | cut -b 1-24 | rev | cut -b 1-12
You can use awk to get first 12 characters of last 24 characters from a line:
awk '{substr($0,(length($0)-23))};{print substr($0,(length($0)-10))}' myfile.txt

how to print user1 from user1#10.129.12.121 using shell scripting or sed

I wanted to print the name from the entire address by shell scripting. So user1#12.12.23.234 should give output "user1" and similarly 11234#12.123.12.23 should give output 11234
Reading from the terminal:
$ IFS=# read user host && echo "$user"
<user1#12.12.23.234>
user1
Reading from a variable:
$ address='user1#12.12.23.234'
$ cut -d# -f1 <<< "$address"
user1
$ sed 's/#.*//' <<< "$address"
user1
$ awk -F# '{print $1}' <<< "$address"
user1
Using bash in place editing:
EMAIL='user#server.com'
echo "${EMAIL%#*}
This is a Bash built-in, so it might not be very portable (it won't run with sh if it's not linked to /bin/bash for example), but it is probably faster since it doesn't fork a process to handle the editing.
Using sed:
echo "$EMAIL" | sed -e 's/#.*//'
This tells sed to replace the # character and as many characters that it can find after it up to the end of line with nothing, ie. removing everything after the #.
This option is probably better if you have multiple emails stored in a file, then you can do something like
sed -e 's/#.*//' emails.txt > users.txt
Hope this helps =)
I tend to use expr for this kind of thing:
address='user1#12.12.23.234'
expr "$address" : '\([^#]*\)'
This is a use of expr for its pattern matching and extraction abilities. Translated, the above says: Please print out the longest prefix of $address that doesn't contain an #.
The expr tool is covered by Posix, so this should be pretty portable.
As a note, some historical versions of expr will interpret an argument with a leading - as an option. If you care about guarding against that, you can add an extra letter to the beginning of the string, and just avoid matching it, like so:
expr "x$address" : 'x\([^#]*\)'

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

How to remove the last character from a bash grep output

COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2`
outputs something like this
"Abc Inc";
What I want to do is I want to remove the trailing ";" as well. How can i do that? I am a beginner to bash. Any thoughts or suggestions would be helpful.
This will remove the last character contained in your COMPANY_NAME var regardless if it is or not a semicolon:
echo "$COMPANY_NAME" | rev | cut -c 2- | rev
I'd use sed 's/;$//'. eg:
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | sed 's/;$//'`
foo="hello world"
echo ${foo%?}
hello worl
I'd use head --bytes -1, or head -c-1 for short.
COMPANY_NAME=`cat file.txt | grep "company_name" | cut -d '=' -f 2 | head --bytes -1`
head outputs only the beginning of a stream or file. Typically it counts lines, but it can be made to count characters/bytes instead. head --bytes 10 will output the first ten characters, but head --bytes -10 will output everything except the last ten.
NB: you may have issues if the final character is multi-byte, but a semi-colon isn't
I'd recommend this solution over sed or cut because
It's exactly what head was designed to do, thus less command-line options and an easier-to-read command
It saves you having to think about regular expressions, which are cool/powerful but often overkill
It saves your machine having to think about regular expressions, so will be imperceptibly faster
I believe the cleanest way to strip a single character from a string with bash is:
echo ${COMPANY_NAME:: -1}
but I haven't been able to embed the grep piece within the curly braces, so your particular task becomes a two-liner:
COMPANY_NAME=$(grep "company_name" file.txt); COMPANY_NAME=${COMPANY_NAME:: -1}
This will strip any character, semicolon or not, but can get rid of the semicolon specifically, too.
To remove ALL semicolons, wherever they may fall:
echo ${COMPANY_NAME/;/}
To remove only a semicolon at the end:
echo ${COMPANY_NAME%;}
Or, to remove multiple semicolons from the end:
echo ${COMPANY_NAME%%;}
For great detail and more on this approach, The Linux Documentation Project covers a lot of ground at http://tldp.org/LDP/abs/html/string-manipulation.html
Using sed, if you don't know what the last character actually is:
$ grep company_name file.txt | cut -d '=' -f2 | sed 's/.$//'
"Abc Inc"
Don't abuse cats. Did you know that grep can read files, too?
The canonical approach would be this:
grep "company_name" file.txt | cut -d '=' -f 2 | sed -e 's/;$//'
the smarter approach would use a single perl or awk statement, which can do filter and different transformations at once. For example something like this:
COMPANY_NAME=$( perl -ne '/company_name=(.*);/ && print $1' file.txt )
don't have to chain so many tools. Just one awk command does the job
COMPANY_NAME=$(awk -F"=" '/company_name/{gsub(/;$/,"",$2) ;print $2}' file.txt)
In Bash using only one external utility:
IFS='= ' read -r discard COMPANY_NAME <<< $(grep "company_name" file.txt)
COMPANY_NAME=${COMPANY_NAME/%?}
Assuming the quotation marks are actually part of the output, couldn't you just use the -o switch to return everything between the quote marks?
COMPANY_NAME="\"ABC Inc\";" | echo $COMPANY_NAME | grep -o "\"*.*\""
you can strip the beginnings and ends of a string by N characters using this bash construct, as someone said already
$ fred=abcdefg.rpm
$ echo ${fred:1:-4}
bcdefg
HOWEVER, this is not supported in older versions of bash.. as I discovered just now writing a script for a Red hat EL6 install process. This is the sole reason for posting here.
A hacky way to achieve this is to use sed with extended regex like this:
$ fred=abcdefg.rpm
$ echo $fred | sed -re 's/^.(.*)....$/\1/g'
bcdefg
Some refinements to answer above. To remove more than one char you add multiple question marks. For example, to remove last two chars from variable $SRC_IP_MSG, you can use:
SRC_IP_MSG=${SRC_IP_MSG%??}
cat file.txt | grep "company_name" | cut -d '=' -f 2 | cut -d ';' -f 1
I am not finding that sed 's/;$//' works. It doesn't trim anything, though I'm wondering whether it's because the character I'm trying to trim off happens to be a "$". What does work for me is sed 's/.\{1\}$//'.

String Manipulation in Bash

I am a newbie in Bash and I am doing some string manipulation.
I have the following file among other files in my directory:
jdk-6u20-solaris-i586.sh
I am doing the following to get jdk-6u20 in my script:
myvar=`ls -la | awk '{print $9}' | egrep "i586" | cut -c1-8`
echo $myvar
but now I want to convert jdk-6u20 to jdk1.6.0_20. I can't seem to figure out how to do it.
It must be as generic as possible. For example if I had jdk-6u25, I should be able to convert it at the same way to jdk1.6.0_25 so on and so forth
Any suggestions?
Depending on exactly how generic you want it, and how standard your inputs will be, you can probably use AWK to do everything. By using FS="regexp" to specify field separators, you can break down the original string by whatever tokens make the most sense, and put them back together in whatever order using printf.
For example, assuming both dashes and the letter 'u' are only used to separate fields:
myvar="jdk-6u20-solaris-i586.sh"
echo $myvar | awk 'BEGIN {FS="[-u]"}; {printf "%s1.%s.0_%s",$1,$2,$3}'
Flavour according to taste.
Using only Bash:
for file in jdk*i586*
do
file="${file%*-solaris*}"
file="${file/-/1.}"
file="${file/u/.0_}"
do_something_with "$file"
done
i think that sed is the command for you
You can try this snippet:
for fname in *; do
newname=`echo "$fname" | sed 's,^jdk-\([0-9]\)u\([0-9][0-9]*\)-.*$,jdk1.\1.0_\2,'`
if [ "$fname" != "$newname" ]; then
echo "old $fname, new $newname"
fi
done
awk 'if(match($9,"i586")){gsub("jdk-6u20","jdk1.6.0_20");print $9;}'
The if(match()) supersedes the egrep bit if you want to use it. You could use substr($9,1,8) instead of cut as well.
garph0 has a good idea with sed; you could do
myvar=`ls jdk*i586.sh | sed 's/jdk-\([0-9]\)u\([0-9]\+\).\+$/jdk1.\1.0_\2/'`
You're needing the awk in there is an artifact of the -l switch on ls. For pattern substitution on lines of text, sed is the long-time champion:
ls | sed -n '/^jdk/s/jdk-\([0-9][0-9]*\)u\([0-9][0-9]*\)$/jdk1.\1.0_\2/p'
This was written in "old-school" sed which should have greater portability across platforms. The expression says:
don't print lines unless they match -n
on lines beginning with 'jdk' do:
on a line that contains only "jdk-IntegerAuIntegerB"
change it to "jdk.1.IntegerA.0_IntegerB"
and print it
Your sample becomes even simpler as:
myvar=`echo *solaris-i586.sh | sed 's/-solaris-i586\.sh//'`

Resources