i have 2 file.
analizeddata.txt:
A001->A002->A003->A004
A001->A005->A007
A022->A033
[...]
and
matrix.txt:
A001|Scott
A002|Bob
A003|Mark
A004|Jane
A005|Elion
A007|Brooke
A022|Meggie
A023|Tif
[..]
How i can replace in analizeddata.txt, or obtain a new file, with the second column of matrix.txt?
The expected output file will be as:
Scott->Bob->Mark->Jane
Scott->Elion->Brooke
Meggie->Tif
[...]
Thanks
Just use sed to replace the string what you want.
sed 's/|/\//g' matrix.txt will generate the replace pattern likes A001/Scott which will be used as regexp/replacement of the second sed s/regexp/replacement/ command.
sed -i option will update directly analizeddata.txt file, back up it before exec this command.
for replace_mode in $(sed 's/|/\//g' matrix.txt); do sed -i 's/'$replace_mode'/g' analizeddata.txt; done
Suggesting awk script:
awk -F"|" 'FNR==NR{arr[$1]=$2;next}{for(i in arr)gsub(i,arr[i])}1' matrix.txt analizeddata.txt
with provided sample data, results:
Scott->Bob->Mark->Jane
Scott->Elion->Brooke
Meggie->A033
I want to read xml file and set its value into a variable.
for example ,
qhr2400.xml
<XML>
<OPERATION type="1">
<TABLENAME>TABLE</TABLENAME>
<ROWSET>
<ROW>
<CLLI>518</CLLI>
<COLLECTION_DATE>06/04/20 00:45:00</COLLECTION_DATE>
<SS7RT>99</SS7RT>
<AQPRT_1>84</AQPRT_1>
<L7RMSUOCT_01>80</L7RMSUOCT_01>
<L7RMSUOCT_02>80</L7RMSUOCT_02>
</ROW>
</ROWSET>
</OPERATION>
</XML>
I want its value in a variable like $CLLI =518, $COLLECTION_DATE = 06/04/20 00:45:00, SS7RT = 99..
so that I can use these values further to write an insert query.
Basically I want to load this .xml data into a database table.
this is what I tried.
read_xml.sh
awk 'NF==1 && (/ +<[a-zA-Z]+>/ || /^<[a-zA-Z]+>/ || / +<\/[a-zA-Z]+>/){
next
}
{
sub(/^ +/,"")
gsub(/\"|<|>/,"",$0);
sub(/\/.*/,"");
if($0){
print
}
}
' qhr2400.xml
Output
OPERATION type=1
CLLI5018
COLLECTION_DATE06
SS7RT99
AQPRT_184
L7RMSUOCT_0180
L7RMSUOCT_0280
Any help is appreciated.
Thanks!
Don't parse XML/HTML with regex, use a proper XML/HTML parser and a powerful xpath query.
theory :
According to the compiling theory, XML/HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML/HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.
Check this thread too, why-its-not-possible-to-use-regex-to-parse-html-xml
realLife©®™ everyday tool in a shell :
You can use one of the following :
xmllint often installed by default with libxml2, xpath1
xmlstarlet can edit, select, transform... Not installed by default, xpath1
xpath installed via perl's module XML::XPath, xpath1
xidel xpath3
saxon-lint my own project, wrapper over #Michael Kay's Saxon-HE Java library, xpath3
or you can use high level languages and proper libs, I think of :
python's lxml (from lxml import etree)
perl's XML::LibXML, XML::XPath, XML::Twig::XPath, HTML::TreeBuilder::XPath
ruby nokogiri, check this example
php DOMXpath, check this example
Check: Using regular expressions with HTML tags
Fot this, you need an XML parser and xpath query in your shell, see:
$ xidel -se '//CLLI/text()' file.xml
When fixed your XML (opening/closing tag missmatch: TABLENANE/TABLENAME):
xmllint --xpath '//CLLI/text()' file
This command is installed with libxml2 and is far than exotic because it's installed by default on many Linux distros
Output
518
So now, you can retrieve all wanted values in shell variables, one example:
$ collectiondate=$(xidel -se '//COLLECTION_DATE/text()' file)
$ echo "$collectiondate"
But, please, don't use awk nor regex to parse XML.
There's others tools, check:
How to execute XPath one-liners from shell?
Check too: Using regular expressions with HTML tags (same thing for XML)
Going further
declare -A arr
for i in CLLI COLLECTION_DATE SS7RT; do
read arr[$i] < <(xmllint --xpath "//$i/text()" file.xml)
done
Now you have an associative array with CLLI COLLECTION_DATE SS7RT keys:
Keys:
printf '%s\n' "${!arr[#]}"
CLLI
SS7RT
COLLECTION_DATE
Values:
$ printf '%s\n' "${arr[#]}"
518
99
06/04/20 00:45:00
for COLLECTION_DATE:
$ echo "${arr[COLLECTION_DATE]}"
06/04/20 00:45:00
It's possible to feed a numeric array in one line too:
readarray a < <(xidel -se '//*[self::CLLI or self::COLLECTION_DATE or self::SS7RT]/text()' file.xml)
I want its value in a variable like $CLLI =518, $COLLECTION_DATE = 06/04/20 00:45:00, SS7RT = 99.. so that I can use these values further to write an insert query.
I'm going to interpret this as; you want every child-node, and its value, in the "ROW"-node exported as a variable.
As "Gilles Quenot" already mentioned, please don't parse xml with regex. I'd suggest you give xidel a try.
You could do it manually and call xidel for each and every node...
CLLI=$(xidel -s qhr2400.xml -e '//CLLI')
COLLECTION_DATE=$(xidel -s qhr2400.xml -e '//COLLECTION_DATE')
[...]
...but xidel itself can also export variables, multiple at once even:
#multiple queries, multiple declarations:
xidel -s qhr2400.xml -e 'CLLI:=//CLLI' -e 'COLLECTION_DATE:=//COLLECTION_DATE' -e '[...]' --output-format=bash
#or one query, multiple declarations:
xidel -s qhr2400.xml -e 'CLLI:=//CLLI,COLLECTION_DATE:=//COLLECTION_DATE,[...]' --output-format=bash
CLLI='518'
COLLECTION_DATE='06/04/20 00:45:00'
[...]
The output are just strings. To actually set/export these variables you have to use Bash's eval built-in command:
eval "$(xidel -s qhr2400.xml -e 'CLLI:=//CLLI,COLLECTION_DATE:=//COLLECTION_DATE,[...]' --output-format=bash)"
And finally, to do it fully automatic for every child-node in the "ROW"-node:
xidel -s qhr2400.xml -e '//ROW/*/name()'
CLLI
COLLECTION_DATE
SS7RT
AQPRT_1
L7RMSUOCT_01
L7RMSUOCT_02
xidel -s qhr2400.xml -e 'for $x in //ROW/*/name() return eval(x"//ROW/{$x}")'
518
06/04/20 00:45:00
99
84
80
80
xidel -s qhr2400.xml -e 'for $x in //ROW/*/name() return eval(x"{$x}:=//ROW{$x}")[0]' --output-format=bash
CLLI='518'
COLLECTION_DATE='06/04/20 00:45:00'
SS7RT='99'
AQPRT_1='84'
L7RMSUOCT_01='80'
L7RMSUOCT_02='80'
result=
eval "$(xidel -s qhr2400.xml -e 'for $x in //ROW/*/name() return eval(x"{$x}:=//ROW{$x}")[0]' --output-format=bash)"
Another approach is to use XSLT (XSL Transformation)
Here is a fixed and indented version of the OP's XML file:
$ cat demo.xml
<XML>
<OPERATION type="1">
<TABLENAME>TABLE</TABLENAME>
<ROWSET>
<ROW>
<CLLI>518</CLLI>
<COLLECTION_DATE>06/04/20 00:45:00</COLLECTION_DATE>
<SS7RT>99</SS7RT>
<AQPRT_1>84</AQPRT_1>
<L7RMSUOCT_01>80</L7RMSUOCT_01>
<L7RMSUOCT_02>80</L7RMSUOCT_02>
</ROW>
</ROWSET>
</OPERATION>
</XML>
This is the stylesheet I will use:
$ cat demo.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="ROW">
<xsl:text>CLLI="</xsl:text><xsl:value-of select="CLLI"/><xsl:text>" </xsl:text>
<xsl:text>COLLECTION_DATE="</xsl:text><xsl:value-of select="COLLECTION_DATE"/><xsl:text>" </xsl:text>
<xsl:text>SS7RT="</xsl:text><xsl:value-of select="SS7RT"/><xsl:text>" </xsl:text>
<xsl:text>AQPRT_1="</xsl:text><xsl:value-of select="AQPRT_1"/><xsl:text>" </xsl:text>
<xsl:text>L7RMSUOCT_01="</xsl:text><xsl:value-of select="L7RMSUOCT_01"/><xsl:text>" </xsl:text>
<xsl:text>L7RMSUOCT_02="</xsl:text><xsl:value-of select="L7RMSUOCT_02"/><xsl:text>" </xsl:text>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Here is a simple shell script which uses xsktproc to transform demo.xml into suitable text for input to eval in order to create shell variables for required element values.
$ cat demo.sh
#!/bin/bash
eval $(xsltproc demo.xsl demo.xml)
echo "CLLI: $CLLI"
echo "COLLECTION_DATE: $COLLECTION_DATE"
echo "SS7RT: $SS7RT"
echo "AQPRT_1: $AQPRT_1"
echo "L7RMSUOCT_01: $L7RMSUOCT_01"
echo "L7RMSUOCT_02: $L7RMSUOCT_02"
Run the script:
$ ./demo.sh
CLLI: 518
COLLECTION_DATE: 06/04/20 00:45:00
SS7RT: 99
AQPRT_1: 84
L7RMSUOCT_01: 80
L7RMSUOCT_02: 80
$
read_xml.sh
gawk '
BEGIN {
FS="<|>"
}
// {
{
if($3 ~ /[0-9]/) { vars[$2] = $3; next }
}
}
END {
print vars["CLLI"]
print vars["SS7RT"]
print vars["COLLECTION_DATE"]
# etc...
}
' qhr2400.xml
result:
518
99
06/04/20 00:45:00
of course, instead of printing in END, you can use these variables from the vars array for something.
Rejecting AWK as an XML or HTML pareser is unreasonable. AWK is great as a parser for any files, including damaged xml files. Using AWK requires more thought, instead you don't need to install any exotic software. You can save the xml file so that AWK reads some lines incorrectly but the same can be said about xml analysis tools.
EDIT:
We fix the XML file error - splitting the field into several lines.
file qhr2400.xml contains:
<CLLI>
518
</CLLI>
instead of
<CLLI>518</CLLI>
call:
cat qhr2400.xml |tr -d '\n' |sed 's/ *//g' |sed 's/</\n</g' |awk -f readxml.awk
readxml.awk is now:
BEGIN {
FS="<|>"
}
// {
{
if($3 ~ /[0-9]/) { vars[$2] = $3; next }
}
}
END {
print vars["CLLI"]
print vars["SS7RT"]
print vars["COLLECTION_DATE"]
# etc...
}
the result is correct
EDIT2
For some time, there has been a worrying fashion for adding complexity instead of simplifying the environment. The use of a ready-made additional tool is usually a quick solution and may tempt you with its simplicity of use. Unfortunately, it is not always possible to install a huge Perl or Python or Ruby environment, e.g. on a built-in system with 32MB Flash, it is not always possible to compile any smaller tool for your processor architecture or company policy can rightly prohibit adding anything to the standard set, there is also sense for one-time processing of the file. AWK, sed, tr are usually equipped and it is the only rescue then. Also, not always parsing an XML file means wanting to extract key-value pairs, it can be something completely different, e.g.
"ROW> <CLLI> 518 </CLLI> <COLLECTION" which makes useless ready analytical tools based on xpath. AWK is a programming language written specifically for parsing text files in a practicaly unlimited way if we add standard unix tools.
However, if you have little experience, better rely on ready-made solutions if possible.
I would like to add a counter to each line in a text file which is indexed over the alphabet. Are there any straight forward ways of doing this? For instance, if my text file looks like this
textAblabla
textBblabla
textCblabla
...
textABblabla
textACblabla
it should be converted into
A textAblabla
B textBblabla
C textCblabla
...
AB textABblabla
AC textACblabla
EDIT: It should be noted that it may very well be the case that each line is distinct. I now highlight this since it seems that my original question has caused some confusion.
EDIT 2: I do not want to print the list, I want to modify (overwrite) the original file with the enumeration.
With Perl:
perl -e '$x=A; while (<>) {print $x++." $_";}' file
Adapting Cyrus' answer to add the -p command-line option and changing the code to overwrite the original file:
$ perl -pi -e 'BEGIN { $x="A" }; $_ = $x++ . " $_"' your_file_here
With sed
sed -E 's/([^A-Z]+)([A-Z]+)(.*)/\2 \1\2\3/' infile
add -i to overwrite
I'm on Linux OS. I have a file to modify in my bash script.
My original file is like that:
...
ERIC-1898
HELENE-5456
THOMAS-54565
IRON-06516
...
And I'd like to modify this file with duplicate words (and -SYSTEM- word in second field), and add double quotes.
So, the result has to be like that:
...
"ERIC-1898" "ERIC-SYSTEM-1898"
"HELENE-5456" "HELENE-SYSTEM-5456"
"THOMAS-54565" "THOMAS-SYSTEM-54565"
"IRON-06516" "IRON-SYSTEM-06516"
...
How can I do that, for example with sed?
With sed and two capture groups:
$ sed 's/\(.*-\)\(.*\)/"&" "\1SYSTEM-\2"/' infile
"ERIC-1898" "ERIC-SYSTEM-1898"
"HELENE-5456" "HELENE-SYSTEM-5456"
"THOMAS-54565" "THOMAS-SYSTEM-54565"
"IRON-06516" "IRON-SYSTEM-06516"
Assuming that there is exactly one hyphen per input line.
awk solution:
awk -F'-' '{printf("\"%s\" \"%s-SYSTEM-%s\"\n", $1FS$2,$1,$2)}' file
The output would be like:
"ERIC-1898" "ERIC-SYSTEM-1898"
"HELENE-5456" "HELENE-SYSTEM-5456"
"THOMAS-54565" "THOMAS-SYSTEM-54565"
"IRON-06516" "IRON-SYSTEM-06516"
Not using external program:
#!/bin/bash
IFS=$'-'
while read -r first second;do
echo "\"$first-$second\" \"$first-SYSTEM-$second\""
done <infile
awk '{sub(/ /,"\" \"");print "\042" $0 "\042"}' file
"ERIC-1898" "ERIC-SYSTEM-1898"
"HELENE-5456" "HELENE-SYSTEM-5456"
"THOMAS-54565" "THOMAS-SYSTEM-54565"
"IRON-06516" "IRON-SYSTEM-06516"
I do some research on how to replace text between delimiters, but because of lack knowledge in awk and sed I couldn't adjust command for my problem. The most similar question I found here, but after adjusting command to awk '/^(name=|&)/{f=f?0:1}f&&/*/{$0="//" $0}1' file it didn't work. Also, I would like do replace using variable instead of doing replace in file. And if I didn't ask to much, very short explanation would be great :)
I have next url in variable $url and variable $new=unnamed384:
http://www.example.com/?name=unnamed293&file=4
I need to replace text between "name=" and "&" with variable $new.
E.g. This is variable $url before:
http://www.example.com/?name=unnamed293&file=4
This is variable $url after:
http://www.example.com/?name=unnamed384&file=4
How about:
$ new="unnamed384"
$ url="http://www.example.com/?name=unnamed293&file=4"
$ sed "s/name=[^&]*/name=$new/" <<< $url
http://www.example.com/?name=unnamed384&file=4
s/(.*\?name=)[^\&]*(&.*)/$1$n$2/g
The above will do.
tested below(used with perl)
> echo "http://www.example.com/?name=unnamed293&file=4"
http://www.example.com/?name=unnamed293&file=4
> echo "http://www.example.com/?name=unnamed293&file=4" | perl -lne '$n="unamed394";$_=~s/(.*\?name=)[^\&]*(&.*)/$1$n$2/g;print'
http://www.example.com/?name=unamed394&file=4
>
$new='unnamed384';
$echo 'http://www.example.com/?name=unnamed293&file=4' | awk -F'[=&]' '{ print $1"=""'"$new"'""&"$3"="$4 }'
http://www.example.com/?name=unnamed384&file=4