I'm trying to use awk to change the color of cells in an HTML table. Ideally, I would be able to use awk to locate the Nth instance (a variable passed from earlier in the script) of "tg-6k2t" after "Bob" and change the color code to "tg-b5xm". This is a giant HTML table with many different people's names.
<tr>
<td class="tg-6k2t">Bob</td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
</tr>
My desired output would be
<tr>
<td class="tg-6k2t">Bob</td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
<td class="tg-b5xm"></td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
</tr>
You can do it with an Awk statement as follows,
awk -v count=6 '/"tg-6k2t".*Bob/{x=count}x--==1{sub(/tg-6k2t/,"tg-b5xm")}1' file
which generates the output as below, meaning the 6th line from the line matching Bob, change the variable to your convenience.
<tr>
<td class="tg-6k2t">Bob</td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
<td class="tg-6k2t"></td>
<td class="tg-b5xm"></td>
</tr>
Related
<blockTable colWidths="34.0,134.0,50.0,50.0,71.0,100.0,100.0" repeatRows="1" style="Table11B">
<tr>
<td>
<para style="P5">2</para>
</td>
</tr>
</blockTable>
I want execute the above code only if object.rentmaterial is not empty.
You can give your condition like this
<para style="P2">
[[ x.fieldname== True]]
[[ x.date ]]
</para>
I have managed to extract data from a website, then get relevant data from the extracted webpage. Now I am stuck as to how to extract data from <td> cols. into an array for data manipulation ?
My extracted HTML is following:
<tbody>
<tr>
<td>abc3207</td>
<td>151</td>
<td>Lorem Ipsum</td>
<td>Off Campus</td>
<td>OFF</td>
<td>12 of 999 </td>
<td> </td>
<td> </td>
<td>Get</td>
</tr>
<tr>
<td>abc3207</td>
<td>151</td>
<td>Dolor Sit Amet</td>
<td>Mount Lawley</td>
<td>ON</td>
<td>45 of 999 </td>
<td>Activity</td>
<td> </td>
<td>Get</td>
</tr>
</tbody>
I am doing this using a bash script as I must do it via bash only.
To parse html or xml, you'd better use dedicated command line tools as xmlstarlet or xmllint.
But with your html sample, you can try this :
mapfile td < <(sed -n 's/[\t ]*<td[^>]*>\(.*\)<\/td>/\1/p' file)
for td in "${td[#]}"; do
printf "$td"
done
sed extracts all td contents and pass the result to mapfile using process substitution.
mapfile stores each line from the process substitution in an array variable named $td.
It will work with your simple html with :
one td tag per line
opening and closing td on same line
I have group of html files where i have to extract content between <hr> and </hr> tags.I have done everything except this extraction.What i have done is
1.Loaded all html files and store it in #html_files.
2.Then I am storing each file's content in #useful_files array.
3.Then I am looping the #useful_files array and checking each line where <hr> is found.If found I need next lines of content in #elements array.
Is it possible.Am I in the right?
foreach(#html_files){
$single_file = $_;
$elemets = ();
open $fh, '<', $dir.'/'.$single_file or die "Could not open '$single_file' $!\n";
#useful_files = ();
#useful_files = <$fh>;
foreach(#useful_files){
$line = $_;
chomp($line);
if($line =~ /<hr>/){
#elements = $line;
}
}
create(#elements,$single_file)
}
Thanks !!!
My input html file will be like this
<HR SIZE="3" style="COLOR:#999999" WIDTH="100%" ALIGN="CENTER">
<P STYLE="margin-top:0px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. </FONT></P>
<P STYLE="font-size:12px;margin-top:0px;margin-bottom:0px"> </P>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%" BORDER="0" STYLE="BORDER-COLLAPSE:COLLAPSE">
<TR>
<TD WIDTH="45%"></TD>
<TD VALIGN="bottom" WIDTH="1%"></TD>
<TD WIDTH="4%"></TD>
<TD VALIGN="bottom"></TD>
<TD WIDTH="4%"></TD>
<TD VALIGN="bottom" WIDTH="1%"></TD>
<TD WIDTH="44%"></TD></TR>
<TR>
<TD VALIGN="top"></TD>
<TD VALIGN="bottom"><FONT SIZE="1"> </FONT></TD>
<TD VALIGN="bottom"></TD>
<TD VALIGN="bottom"><FONT SIZE="1"> </FONT></TD>
<TD VALIGN="bottom"><FONT STYLE="font-family:Times New Roman" SIZE="2">Title:</FONT></TD>
<TD VALIGN="bottom"><FONT SIZE="1"> </FONT></TD>
<TD VALIGN="bottom"><FONT STYLE="font-family:Times New Roman" SIZE="2">John</FONT></TD></TR>
</TABLE>
<p Style='page-break-before:always'>
<HR SIZE="3" style="COLOR:#999999" WIDTH="100%" ALIGN="CENTER">
The html code which i have copied here is just the sample.I need the exact content between the <hr> in the #elementsarray.
In a simplest way you may do this:
my #cont;
foreach (#ARGV) {
open my $fh,'<',$_;
push #cont,join('',map { chomp; $_ } <$fh>)=~m%<hr>(.*?)</hr>%g;
}
#print join("\n",#cont,'');
And yes, dont worry: all files will be closed on exit "automagically" :)
Hint: uncomment print statement to see the result.
You can use grep in the command line:
grep -Pzo '<hr>\K((.|\n)*)(?=</hr>)' file.html
This will allow you to extract anything between <hr> and </hr> even if new lines are present.
Example:
tiago#dell:/tmp$ grep -Pzo '<hr>\K((.|\n)*)(?=</hr>)' <<< '<hr>a b c d </hr>'
a b c d
tiago#dell:/tmp$ grep -Pzo '<hr>\K((.|\n)*)(?=</hr>)' <<< $'<hr>a b\nc d </hr>'
a b
c d
And of course you can run grep against multiple files.
I know people say not to parse HTML with a regex, but this seems like the kind of relatively simple task that warrants the use of a regex.
Try this:
if ($line =~ m/<hr>(.*?)<\/hr>/){
push #elements, $1;
}
This will extract the text between <hr> and </hr> and store it in the next index in the #elements array.
Also you should ALWAYS use strict; and use warnings; at the top of your code! This will stop you from making dumb mistakes and prevent many needless headaches down the road.
You should also close your file after you are done extracting its contents into the #useful_files array! close $fh;
(On a side note, the name of this array is misleading. I would suggest you name it something like #lines or #file_contents since it contains the contents of a single file... not multiple files as your variable name seems to suggest.)
I am trying to parse an html table in order to obtain the values. See here.
<tr>
<th>CLI:</th>
<td>0044123456789</td>
</tr>
<tr>
<th>Call Type:</th>
<td>New Enquiry</td>
</tr>
<tr>
<th class=3D"nopaddingtop">Caller's Name:</th>
<td class=3D"nopaddingtop"> </td>
</tr>
<tr>
<th class=3D"nopaddingmid"></th>
<td class=3D"nopaddingmid">Mr</td>
</tr>
<tr>
<th class=3D"nopaddingmid"></th>
<td class=3D"nopaddingmid">Lee</td>
</tr>
<tr>
<th class=3D"nopaddingbot"></th>
<td class=3D"nopaddingbot">Butler</td>
</tr>
I want to read the values associated wit the "CLI", "Call Type", and "Caller's Name" into separate variables using sed / awk.
For example:
cli="0044123456789"
call_type="New Enquiry"
caller_name="Mr Lee Butler"
How can I do this?
Many thanks, Neil.
One example for CLI one :
var=$(xmllint --html --xpath '//th[contains(., "CLI")]/../td/text()' file.html)
echo "$var"
For the multi <tr> part :
$ for key in {4..6}; do
xmllint \
--html \
--xpath "//th[contains(., 'CLI')]/../../tr[$key]/td/text()" file.html
printf ' '
done
echo
Output:
Mr Lee Butler
I am dealing with some html code and i got stucked in some problem. Here is the extract of some code and the format is exactly the same
<tr>
<td nowrap valign="top" class="table_1row"><a name="d071301" id="d071301"></a>13-Jul-2011</td>
<td width="21%" valign="top" class="table_1row">LCQ8: Personal data of job</td>
Here i have to match with
<tr>
<td nowrap valign="top"
and insert something before <tr> .the problem occurs as i have to match a pattern in different lines.
i have tried
grep -c "<tr>\n<td nowrap valign="top"" test.html
grep -c "<tr>\n*<td nowrap valign="top"" test.html
grep -c "<tr>*<td nowrap valign="top"" test.html
to test but none of them works.So i have two dimension to figure out the problem:
Match <td nowrap valign="top" and insert in the line above
Match whole string
<tr>
<td nowrap valign="top"
Would anyone suggest a way to doing it in either way?
Using sed you can perfom replacement on multiple lines. Its also easy to substitute the match.
sed "/\s*<tr>\s*/ { N; s/.*<tr>\n\s*<td.*/insertion\n&/ }"
This cryptic line basically say:
match a line with (/\s*<tr>\s*/)
continue on next line (N)
substitute the matched pattern whit the insertion and the matched string, where & represent the matched string (s/.*<tr>\n\s*td.*/insertion\n&/)
Sed is very powerful to perform substitution, its a nice to know tool. See this manual if you want to learn more about sed:
http://www.grymoire.com/Unix/Sed.html
Try grep -P "tr>\s*\n\s*<td".
It's not clear how it will help you to insert something before <tr>, but anyway.
Quoted strings do not nest, you need to escape the quote characters, or use single quotes instead of double quotes.