AWK split string

AWK split string - bash

I need to split following string into two parts in AWK. below line is one line.
Dec 10 03:38:49 cgnat1.dd.com 1 2015 Dec 9 14:38:47 02-g4-adsl - - NAT44 - [UserbasedW - 100.70.92.248 vrf-testnet - 222.222.34.125 - 38912 39935 - - ]
Dec 10 03:38:52 cgnat2.dd.com 1 2015 Dec 9 14:38:51 01-g2-adsl - - NAT44 - [UserbasedW - 100.70.21.77 vrf-testnet - 222.222.34.42 - 1024 2047 - - ][UserbasedW - 100.70.21.36 vrf-testnet - 222.222.34.38 - 64512 65535 - - ]
First part:
Dec 10 03:38:49 cgnat1.dd.com 1 2015 Dec 9 14:38:47 02-g4-adsl - - NAT44 -
Second part:
[UserbasedW - 100.70.92.248 vrf-testnet - 222.222.34.125 - 38912 39935 - - ]
How can I achieve this?

awk -F '[' '{print $1 "\n" FS $2}'

Related

transforming html to csv with xmllint

I'm parsing a html file successfully with xmllint but when I combine two or more xpath expressions I get only one occurrence and not all of them.
When I run the expressions separately I get something like this:
Expression:
xmllint --html --xpath "//h3[contains(text(),'Rodada')]/../following-sibling::div//span[contains(#class,'partida-desc')][2]/text()" 2012-campeonato-brasileiro.html 2>/dev/null
Result:
Couto Pereira - Curitiba - PR
Aflitos - Recife - PE
Serra Dourada - Goiania - GO
But when I combine the expressions:
prefix="//h3[contains(text(),'Rodada')]/../following-sibling::div"
xmllint --html --xpath "normalize-space(concat($prefix//span[contains(#class,'partida-desc')]/text(),';',$prefix//div[contains(#class,'pull-left')]//img/#title,';',$prefix//div[contains(#class,'pull-right')]//img/#title,';',$prefix//strong/span/text(),';',$prefix//span[contains(#class,'partida-desc')][2]/text()))" 2012-campeonato-brasileiro.html 2>/dev/null
Result:
Sáb, 19/05/2012 18:30 - Jogo: 3 ;Palmeiras - SP;Portuguesa - SP;1 x 1; Pacaembu - Sao Paulo - SP
It works but stop at the first result. I can't make it parse all the file.
To run this example, you can download the html from here
curl https://www.cbf.com.br/futebol-brasileiro/competicoes/campeonato-brasileiro-serie-a/2012 --compressed > /tmp/2012-campeonato-brasileiro.html

With any call to functions like normalize-space or concat in XPath 1.0, if you call it on an argument being a node-set only the value of the first node in the node-set is used.
In XPath 2 and later you can use e.g. //foo/normalize-space() or //foo/concat(.//bar, ';', .//baz) or string-join(//foo, ';').
With pure XPath 1.0 you would need to iterate in the host language (e.g. shell or XSLT or Java) and then concatenate in the host language.

Concat will operate on the first node of a nodeset.
The following command adds more processing to take advantage of xmllint shell
echo -e "cd //h3[contains(text(),'Rodada')]/../following-sibling::div \n cat .//span[contains(#class,'partida-desc')]/text() | .//div[contains(#class,'pull-left')]//img/#title | .//div[contains(#class,'pull-right')]//img/#title | .//strong/span/text() | .//span[contains(#class,'partida-desc')][2]/text() \nbye\n" | \
xmllint --html --shell 2012-campeonato-brasileiro.html 2>/dev/null | \
tr -s ' ' | grep -v '^ *$' | \
gawk 'BEGIN{ RS="(\n -------){3,3}"; FS="\n -------\n"; OFS=";"} {if(NR>2) { print gensub(/\n/,"","g",$1),gensub(/title="([^"]+)"/,"\\1","g",$2),gensub(/title="([^"]+)"/,"\\1","g",$3),$4,$5}}'
Result
Sáb, 19/05/2012 21:00 - Jogo: 4 ; Figueirense - SC; Náutico - PE;2 x 1; Orlando Scarpelli - Florianopolis - SC
Dom, 20/05/2012 16:00 - Jogo: 8 ; Ponte Preta - SP; Atlético - MG;0 x 1; Moisés Lucarelli - Campinas - SP
Dom, 20/05/2012 16:00 - Jogo: 5 ; Corinthians - SP; Fluminense - RJ;0 x 1; Pacaembu - Sao Paulo - SP
Dom, 20/05/2012 16:00 - Jogo: 7 ; Botafogo - RJ; São Paulo - SP;4 x 2; João Havelange - Rio de Janeiro - RJ
Dom, 20/05/2012 16:00 - Jogo: 6 ; Internacional - RS; Coritiba - PR;2 x 0; Beira-Rio - Porto Alegre - RS
Dom, 20/05/2012 18:30 - Jogo: 1 ; Vasco da Gama - RJ; Grêmio - RS;2 x 1; São Januário - Rio de Janeiro - RJ
Dom, 20/05/2012 18:30 - Jogo: 2 ; Bahia - BA; Santos - SP;0 x 0; Pituaçu - Salvador - BA
.... (more records)
More clean up might be needed since field contain leading/trailing spaces.
Note: html needs to be converted to unix new lines
dos2unix 2012-campeonato-brasileiro.html

Thanks for your answers!
Considering my alternatives that's my best solution so far.
´#!/bin/bash
RODADAS=$(xmllint --html --xpath "//h3[contains(text(),'Rodada ')]/text()" $1 2>/dev/null)
while read i
do
for x in {1..10}
do
PREFIX="//h3[contains(text(), '$i')]/../following-sibling::div/ul/li[$x]";
xmllint --html --xpath "normalize-space(concat($PREFIX//span[contains(#class,'partida-desc')]/text(),';',$PREFIX//div[contains(#class,'pull-left')]//img/#title,';',$PREFIX//div[contains(#class,'pull-right')]//img/#title,';',$PREFIX//strong/span/text(),';',$PREFIX//span[contains(#class,'partida-desc')][2]/text()))" $1 2>/dev/null;
done
done <<< "$RODADAS"´
Run:
./html-csv.sh 2012-campeonato-brasileiro.html
Result:
Sáb, 01/12/2012 19:30 - Jogo: 373 ;Santos - SP;Palmeiras - SP;3 x 1; Vila Belmiro - Santos - SP
Dom, 02/12/2012 17:00 - Jogo: 372 ;Fluminense - RJ;Vasco da Gama - RJ;1 x 2; João Havelange - Rio de Janeiro - RJ
Dom, 02/12/2012 17:00 - Jogo: 374 ;São Paulo - SP;Corinthians - SP;3 x 1; Pacaembu - Sao Paulo - SP

Bash find files and filter by date and size

I have a directory with a lot of files in it.
Each day, new files are added automatically.
The filenames are formatted like that :
[GROUP_ID]_[RANDOM_NUMBER].txt
Example : 012_1234.txt
For every day, for every GROUP_ID (032, 024, 044...etc), I want to keep only the biggest file of the day.
So for example, for the two days 27 and 28 march I have :
March 27 - 012_1234.txt - 12ko
March 27 - 012_0243.txt - 3000ko
March 27 - 016_5647.txt - 25ko
March 27 - 024_4354.txt - 20ko
March 27 - 032_8745.txt - 40ko
March 28 - 032_1254.txt - 16ko
March 28 - 036_0456.txt - 30ko
March 28 - 042_7645.txt - 500ko
March 28 - 042_2310.txt - 25ko
March 28 - 042_2125.txt - 34ko
March 28 - 044_4510.txt - 35ko
And I want to have :
March 27 - 012_0243.txt - 3000ko
March 27 - 016_5647.txt - 25ko
March 27 - 024_4354.txt - 20ko
March 27 - 032_8745.txt - 40ko
March 28 - 032_1254.txt - 16ko
March 28 - 036_0456.txt - 30ko
March 28 - 042_7645.txt - 500ko
March 28 - 044_4510.txt - 35ko
I don't find the right bash ls/find command to do that, somebody have an idea ?
With this command, I can display the biggest file for each day.
ls -l *.txt --time-style=+%s |
awk '{$6 = int($6/86400); print}' |
sort -nk6,6 -nrk5,5 | sort -sunk6,6
But I want the biggest file of each GROUP_ID file of each day.
So, if there is one file for "012" group_id file, of 10ko, I want to display it, even if there is bigger files for others group id...

I found myself the solution:
ls -l | tail -n+2 |
awk '{ split($0,var,"_"); group_id=var[5]; print $0" "group_id }' |
sort -k9,9 -k5,5nr |
awk '$10 != x { print } { x = $10 }'
This gives me the biggest file for each group_id, so now I just add to handle the day part.
For information:
tail -n+2: hide the "total" part of the ls command's output
First awk: get the group_id part (012, 036...) and display it after the original line ($0)
Sort: sort on filename and size
Take the biggest size of each group_id (column 10 added by awk at beginning)

Sorting a file containing line delimiters

I have a text file looking like
text_a_3 xxx yyy
- - - - - - - - - - -
text_b_2 xyx zyz
- - - - - - - - - - -
text_b_3 xxy zyy
- - - - - - - - - - -
text_a_2 foo bar
- - - - - - - - - - -
text_a_1 foo bla
- - - - - - - - - - -
text_b_1 bla bla
I want to sort this file numerically, based on the first field, so that my output would look like:
text_a_1 foo bla
- - - - - - - - - - -
text_a_2 foo bar
- - - - - - - - - - -
text_a_3 xxx yyy
- - - - - - - - - - -
text_b_1 bla bla
- - - - - - - - - - -
text_b_2 xyx zyz
- - - - - - - - - - -
text_b_3 xxy zyy
I thought sort would do the job. I thus tried
sort -n name_of_my_file
sort -k1 -n name_of_my_file
But it gives
- - - - - - - - - - -
- - - - - - - - - - -
- - - - - - - - - - -
- - - - - - - - - - -
- - - - - - - - - - -
text_a_1 foo bla
text_a_2 foo bar
text_a_3 xxx yyy
text_b_1 bla bla
text_b_2 xyx zyz
text_b_3 xxy zyy
The option --field-separator is not of any help.
Is there any way to achieve this with a one-line, sort based command ?
Or is the only solution to extract text containing lines, sort them, and insert line delimiters afterwards ?

Using GNU awk only, and relying with internal sort function asort() and record separator set to dashes line:
awk -v RS='- - - - - - - - - - -\n' '
{a[++c]=$0}
END{
asort(a)
for(i=1;i<=c;i++)
printf "%s%s",a[i],(i==c?"":RS)
}' name_of_my_file
The script first fills the content of the input file into the array a. When the file is read, the array is sorted and then printed with the same input record separator.

When the line delimiters are all on the even lines, you can use
paste -d'\r' - - < yourfile | sort -n | tr '\r' '\n'
I actually prefer removing the delimiters in front, sort and add them afterwards, so please reconsider your requirements:
grep -Ev "(- )*-" yourfile | sort -n | sed 's/$/\n- - - - - - - - - - -/'

Following sort + awk may help you.
sort -t"_" -k2 -k3 Input_file | awk '/^-/ && !val{val=$0} !/^-/{if(prev){print prev ORS val};prev=$0} END{print prev}'
Adding a non-one liner form of solution too now.
sort -t"_" -k2 -k3 Input_file |
awk '
/^-/ && !val{
val=$0}
!/^-/{
if(prev){
print prev ORS val};
prev=$0
}
END{
print prev
}'

How to merge text files with common pair of strings in their lines

I have two text files with the following line format:
Value - Value - Number
I need to merge these files in a new one that contains only the lines with the common Value - Value pairs followed by the two Number values.
For example if I have these files:
File1.txt
Jack - Mark - 12
Alex - Ryan - 15
Jack - Ryan - 22
File2.txt
Paul - Bill - 11
Jack - Mark - 18
Jack - Ryan - 20
The merged file will contain:
Jack - Mark - 12 - 18
Jack - Ryan - 22 - 20
How can I do this?

awk to the rescue!
awk -F' - ' 'BEGIN{OFS=FS}
NR==FNR{a[$1,$2]=$3;next}
($1,$2) in a{print $1,$2,a[$1,$2],$3}' file1 file2
Jack - Mark - 12 - 18
Jack - Ryan - 22 - 20
alternatively, with decorate/join/undecorate
$ join <(sort file1 | sed 's/ - /-/') <(sort file2 | sed 's/ - /-/') |
sed 's/-/ - /'
Jack - Mark - 12 - 18
Jack - Ryan - 22 - 20

split string into array in bash scripting [duplicate]

This question already has answers here:
How do I split a string on a delimiter in Bash?
(37 answers)
Closed 7 years ago.
I need to split string into array.
here is my string
`Dec 10 03:39:13 cgnat2.dd.com 1 2015 Dec 9 14:39:11 01-g3-adsl - - NAT44 -
[UserbasedW - 100.70.24.236 vrf-testnet - 222.222.34.65 - 3072 4095 - - ]
[UserbasedW - 100.70.25.9 vrf-testnet - 222.222.34.65 - 16384 17407 - - ]
[UserbasedW - 100.70.25.142 vrf-testnet - 222.222.34.69 - 9216 10239 - - ]`
my output array should be
Dec 10 03:39:13 cgnat2.dd.com 1 2015 Dec 9 14:39:11 01-g3-adsl - - NAT44 -
[UserbasedW - 100.70.24.236 vrf-testnet - 222.222.34.65 - 3072 4095 - - ]
[UserbasedW - 100.70.25.9 vrf-testnet - 222.222.34.65 - 16384 17407 - - ]
[UserbasedW - 100.70.25.142 vrf-testnet - 222.222.34.69 - 9216 10239 - - ]
how can i do this ??

There are a number of ways to approach the separation into an array. In bash, you have command substitution and herestrings that cam make it relatively painless. You will need to control word-splitting, which you can do by setting the internal field separator to only break on newlines. To load the separated string into the array, you have several options. Presuming you have the string contained in variable named string, you can do something similar to the following.
oifs="$IFS" ## save current Internal Field Separator
IFS=$'\n' ## set to break on newline only
## separate string into array using sed and herestring
array=( $(sed 's/[ ][[]/\n[/g' <<<"$string") )
IFS="$oifs" ## restore original IFS
This is not the only way to approach the matter. You can also use process substitution along with read to add each of the separated lines to the array with a ... while read -r line; do array+=("$line"); done < <(sed ...) approach.

It's not entirely clear what you are trying to do, but the following pipeline does split your string at the boundaries defined by square brackets:
sed -e $'s/\\(\\]\\)/\\1\t/g' -e $'s/\\(\\[\\)/\t\\1/g' | tr '\t' '\n' | grep -v '^$'
Output:
Dec 10 03:39:13 cgnat2.dd.com 1 2015 Dec 9 14:39:11 01-g3-adsl - - NAT44 -
[UserbasedW - 100.70.24.236 vrf-testnet - 222.222.34.65 - 3072 4095 - - ]
[UserbasedW - 100.70.25.9 vrf-testnet - 222.222.34.65 - 16384 17407 - - ]
[UserbasedW - 100.70.25.142 vrf-testnet - 222.222.34.69 - 9216 10239 - - ]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

AWK split string - bash

awk -F '[' '{print $1 "\n" FS $2}'

Related

transforming html to csv with xmllint

Bash find files and filter by date and size

Sorting a file containing line delimiters

How to merge text files with common pair of strings in their lines

split string into array in bash scripting [duplicate]

Categories

Resources