Iterate over string without space as seperator in bash-shell - bash

I'm trying to parse some output of xmllint since hours, but I can't get it to work like i would need it.
Output of "xmllint --xpath "//fub/#name" menu.xml"*
name="Kessel" name="Lager" name="Puffer " name="Boiler.sen"
name="Boiler.jun" name="HK Senior" name="HK Junior" name="Fbh"
name="Solar" name="F.Wärme" name="Sys "
Now I need to seperate all the names (inclusive spaces) and get them in to seperate variables.
My approach was this:
fubNames=$(xmllint --xpath "//fub/#name" menu.xml | sed 's/name=//g')
for name in $fubNames
do
echo $name
done
but this does not workout because the for-loop seperates the string on spaces.
i need the names with spaces. (note: some names have a space at the end)
Does anyone know how to do this properly?

I suggest:
xmllint --xpath "//fub/#name" menu.xml | grep -o '"[^"]*"' | while IFS= read -r name; do echo "$name"; done

grep approach:
xmllint --xpath "//fub/#name" menu.xml | grep -Po 'name=\K\"([^"]+)\"'
The output:
"Kessel"
"Lager"
"Puffer "
"Boiler.sen"
"Boiler.jun"
"HK Senior"
"HK Junior"
"Fbh"
"Solar"
"F.Wärme"
"Sys "
-P option, allows Perl regular expressions
-o option, tells to print only matched parts

Related

Grep and awk use

i try one day but dont fixed. I dont know this method.
content query --uri content://com.android.contacts/contacts | grep "+9053158888" | awk -F'[,,= ]' '{cmd="content delete --uri content://com.android.contacts/contacts/"$(NF-3);system(cmd)}'
but not finding
My string
Row: 9991 last_time_contacted=0, phonetic_name=NULL, custom_ringtone=NULL, contact_status_ts=NULL, pinned=0, photo_id=NULL, photo_file_id=NULL, contact_status_res_package=NULL, contact_chat_capability=NULL, contact_status_icon=NULL, display_name_alt=+90532555688, sort_key_alt=+90532555688, in_visible_group=1, starred=0, contact_status_label=NULL, phonebook_label=#, is_user_profile=0, has_phone_number=1, display_name_source=40, phonetic_name_style=0, send_to_voicemail=0, lookup=0r10070-24121C1814241820221C1A14.3789r10071-24121C1814241820221C1A14.0r10072-24121C1814241820221C1A14.0r10073-24121C1814241820221C1A14.0r10074-24121C1814241820221C1A14.0r10075-24121C1814241820221C1A14.0r10078-24121C1814241820221C1A14.0r10082-24121C1814241820221C1A14.0r10083-24121C1814241820221C1A14.0r10084-24121C1814241820221C1A14.0r10085-24121C1814241820221C1A14.0r10086-24121C1814241820221C1A14.0r10087-24121C1814241820221C1A14.0r10092-24121C1814241820221C1A14.0r10094-24121C1814241820221C1A14.0r10097-24121C1814241820221C1A14, phonebook_label_alt=#, contact_last_updated_timestamp=1612984348874, photo_uri=NULL, phonebook_bucket=213, contact_status=NULL, display_name=+90532555688, sort_key=+90532555688, photo_thumb_uri=NULL, contact_presence=NULL, in_default_directory=1, times_contacted=0, _id=10097, name_raw_contact_id=10070, phonebook_bucket_alt=213
i need string " _id=10097 "
You may use this grep to find word _id followed by a = and 1+ digits:
... | grep -Eo '\b_id=[0-9]+'
_id=10097
To get all occurrences of if try following, written and tested with shown samples in GNU grep. Where str is your shell variable have your shown sample input in it.
echo "$str" | grep -oP ', \K_id=\d+'
OR try with awk:
echo "$str" |
awk 'match($0,/, _id=[0-9]+/){print substr($0,RSTART+2,RLENGTH-2)}'
Above will output as:
_id=10097

Extract values from a property file using bash

I have a variable which contains key/values separated by space:
echo $PROPERTY
server_geo=BOS db.jdbc_url=jdbc\:mysql\://mysql-test.com\:3306/db02 db.name=db02 db.hostname=/mysql-test.com datasource.class.xa=com.mysql.jdbc.jdbc2.optional.MysqlXADataSource server_uid=BOS_mysql57 hibernate33.dialect=org.hibernate.dialect.MySQL5InnoDBDialect hibernate.connection.username=db02 server_labels=mysql57,mysql5,mysql db.jdbc_class=com.mysql.jdbc.Driver db.schema=db02 hibernate.connection.driver_class=com.mysql.jdbc.Driver uuid=a19ua19 db.primary_label=mysql57 db.port=3306 server_label_primary=mysql57 hibernate.dialect=org.hibernate.dialect.MySQL5InnoDBDialect
I'd need to extract the values of the single keys, for example db.jdbc_url.
Using one code snippet I've found:
echo $PROPERTY | sed -e 's/ db.jdbc_url=\(\S*\).*/\1/g'
but that returns also other properties found before my key.
Any help how to fix it ?
Thanks
If db.name always follow db.jdbc_url, then use grep lookaround,
$ echo "${PROPERTY}" | grep -oP '(?<=db.jdbc_url=).*(?=db.name)'
jdbc\:mysql\://mysql-test.com\:3306/db02
or add the VAR to an array,
$ myarr=($(echo $PROPERTY))
$ echo "${myarr[1]}" | grep -oP '(?<=db.jdbc_url=).*(?=$)'
jdbc\:mysql\://mysql-test.com\:3306/db02
This is caused because you are using the substitute command (sed s/.../.../), so any text before your regex is kept as is. Using .* before db\.jdbc_url along with the begin (^) / end ($) of string marks makes you match the whole content of the variable.
In order to be totaly safe, your regex should be :
sed -e 's/^.*db\.jdbc_url=\(\S*\).*$/\1/g'
You can use grep for this, like so:
echo $PROPERTY | grep -oE "db.jdbc_url=\S+" | cut -d'=' -f2
The regex is very close to the one you used with sed.
The -o option is used to print the matched parts of the matching line.
Edit: if you want only the value, cut on the '='
Edit 2: egrep say it is deprecated, so use grep -oE instead, same result. Just to cover all bases :-)

Remove Leading Spaces from a variable in Bash

I have a script that exports a XML file to my desktop and then extracts all the data in the "id" tags and exports that to a csv file.
xmlstarlet sel -t -m '//id[1]' -v . -n </users/$USER/Desktop/List.xml > /users/$USER/Desktop/List2.csv
I then use the following command to add commas after each number and store it as a variable.
devices=$(sed "s/$/,/g" /users/$USER/Desktop/List2.csv)
If I echo that variable I get an output that looks like this:
123,
124,
125,
etc.
What I need help with is removing those spaces so that output will look like 123,124,125 with no leading space. I've tried multiple solutions that I can't get to work. Any help would be amazing!
If you don't want newlines, don't tell xmlstarlet to put them there in the first place.
That is, change -n to -o , to put a comma after each value rather than a newline:
{ xmlstarlet sel -t -m '//id[1]' -v . -o ',' && printf '\n'; } \
<"/users/$USER/Desktop/List.xml" \
>"/users/$USER/Desktop/List2.csv"
The printf '\n' here puts a final newline at the end of your CSV file after xmlstarlet has finished writing its output.
If you don't want the trailing , this leaves on the output file, the easiest way to be rid of it is to read the result of xmlstarlet into a variable and manipulate it there:
content=$(xmlstarlet sel -t -m '//id[1]' -v . -o ',' <"/users/$USER/Desktop/List.xml")
printf '%s\n' "${content%,}" >"/users/$USER/Desktop/List2.csv"
For a sed solution, try
sed ':a;N;$!ba;y/\n/,/' /users/$USER/Desktop/List2.csv
or if you want a comma even after the last:
sed ':a;N;$!ba;y/\n/,/;s/$/,/' /users/$USER/Desktop/List2.csv
but then more easy would be
cat /users/$USER/Desktop/List2.csv | tr "\n" ","

How to use sed to extract a string [duplicate]

This question already has answers here:
BASH extract value after string in variable Not file [duplicate]
(2 answers)
Closed last year.
I need to extract a number from the output of a command: cmd. The output is type: 1000
So my question is how to execute the command, store its output in a variable and extract 1000 in a shell script. Also how do you store the extracted string in a variable?
This question has been answered in pieces here before, it would be something like this:
line=$(sed -n '2p' myfile)
echo "$line"
if [ `echo $line || grep 'type: 1000' ` ] then;
echo "It's there!";
fi;
Store output of sed into a variable
String contains in Bash
EDIT: sed is very limited, you would need to use bash, perl or awk for what you need.
This is a typical use case for grep:
output=$(cmd | grep -o '[0-9]\+')
You can write the output of a command or even a pipeline of commands into a shell variable using so called command substitution:
variable=$(cmd);
In comments it appeared that the output of cmd contains more lines than the type : 1000. In this case I would suggest sed:
output=$(cmd | sed -n 's/type : \([0-9]\+\)/\1/p;q')
You tagged your question as sed but your question description does not restrict other tools, so here's a solution using awk.
output = `cmd | awk -F':' '/type: [0-9]+/{print $2}'`
Alternatively, you can use the newer $( ) syntax. Some find the newer syntax preferable and it can be conveniently nested, without the need for escaping backtics.
output = $(cmd | awk -F':' '/type: [0-9]+/{print $2}')
If the output is rigidly restricted to "type: " followed by a number, you can just use cut.
var=$(echo 'type: 1000' | cut -f 2 -d ' ')
Obviously you'll have to pipe the output of your command to cut, I'm using echo as a demo.
In addition, I'd use grep and then cut if the string you are searching is more complex. If we assume there can be all kind of numbers in the text, but only one occurrence of "type: " followed by a number, you can use the command:
>> var=$(echo "hello 12 type: 1000 foo 1001" | grep -oE "type: [0-9]+" | cut -f 2 -d ' ')
>> echo $var
1000
You can use the | operator to send the output of one command to another, like so:
echo " 1\n 2\n 3\n" | grep "2"
This sends the string " 1\n 2\n 3\n" to the grep command, which will search for the line containing 2. It sound like you might want to do something like:
cmd | grep "type"
Here is a plain sed solution that uses a regualar expression to find the number in your string:
cmd | sed 's/^.*type: \([0-9]\+\)/\1/g'
^ means from the start
.* can be any character (also none)
\([0-9]\+\) are numbers (minimum one character)
\1 means it takes the first pattern it finds (and only in this case) and uses it as replacement for the whole string

extract substring from lines using grep, awk,sed or etc

I have a files with many lines like:
lily weisy
I want to extract www.youtube.com/user/airuike and lily weisy, and then I also want to separate airuike from www.youtube.com/user/
so I want to get 3 strings: www.youtube.com/user/airuike, airuike and lily weisy
how to achieve this? thanks
do this:
sed -e 's/.*href="\([^"]*\)".*>\([^<]*\)<.*/link:\1 name:\2/' < data
will give you the first part. But I'm not sure what you are doing with it after this.
Since it is html, and html should be parsed with a html parser and not with grep/sed/awk, you could use the pattern matching function of my Xidel.
xidel yourfile.html -e '<a class="yt-uix-sessionlink yt-user-name " dir="ltr">{$link := #href, $user := substring-after($link, "www.youtube.com/user/"), $name:=text()}</a>*'
Or if you want a CSV like result:
xidel yourfile.html -e '<a class="yt-uix-sessionlink yt-user-name " dir="ltr">{string-join((#href, substring-after(#href, "www.youtube.com/user/"), text()), ", ")}</a>*' --hide-variable-names
It is kind of sad, that you also want to have the airuike string, otherwise it could be as simple as
xidel /yourfile.html -e '{$name}*'
(and you were supposed to be able to use xidel '{$name}*', but it seems I haven't thought the syntax through. Just one error check and it is breaking everything. )
$ awk '{split($0,a,/(["<>]|:\/\/)/); u=a[4]; sub(/.*\//,"",a[4]); print u,a[4],a[12]}' file
www.youtube.com/user/airuike airuike lily weisy
I think something like this must work
while read line
do
href=$(echo $line | grep -o 'http[^"]*')
user=$(echo $href | grep -o '[^/]*$')
text=$(echo $line | grep -o '[^>]*<\/a>$' | grep -o '^[^<]*')
echo href: $href
echo user: $user
echo text: $text
done < yourfile
Regular expressions basics: http://en.wikipedia.org/wiki/Regular_expression#POSIX_Basic_Regular_Expressions
Upd: checked and fixed

Resources