This is the code
for f in tmp_20100923*.xml
do
str1=`more "$f"|grep count=`
i=`echo $str1 | awk -F "." '{print($2)}'`
j=`echo $i | awk -F " " '{print($2)}'` // output is `count="0"`
sed 's/count=//g' $j > $k; echo $k;
done
I tried to get value 0 from above output using sed filter but no success. Could you please advise how can i separate 0 from string count="0" ?
You can have AWK do everything:
for f in tmp_20100923*.xml
do
k=$(awk -F '.' '/count=/ {split($2,a," "); print gensub("count=","","",a[2])}')
done
Edit:
Based on your comment, you don't need to split on the decimal. You can also have AWK do the summation. So you don't need a shell loop.
awk '/count=/ { sub("count=","",$2); gsub("\042","",$2); sum += $2} END{print sum}' tmp_20100923*.xml
Remove all non digits from $j:
echo ${j//[^0-9]/}
you are trying to sed a file whose name is $j
Instead you can
echo $j | sed 's/count=//g'
You can use this sed regexp:
sed 's/count="\(.*\)"/\1/'
However your script has another problem:
j=`echo $i | awk -F " " '{print($2)}'` // output is `count="0"`
sed 's/count=//g' $j > $k; echo $k;
should be
j=`echo $i | awk -F " " '{print($2)}'` // output is `count="0"`
echo $j | sed 's/count=//g'
or better:
echo $i | awk -F " " '{print($2)}' | sed 's/count=//g'
'sed' accepts filenames as input. $j is a shell variable where you put the output of another program (awk).
Also, the ">" redirection puts things in a file. You wrote ">$k" and then "echo $k", as if >$k wrote the output of sed in the $k variable.
If you want to keep the output of sed in a $k variable write instead:
j=`echo $i | awk -F " " '{print($2)}'` // output is `count="0"`
k=`echo $j | sed 's/count=//g'`
This should snag everything between the quotes.
sed -re 's/count="([^"]+)"/\1/g'
-r adds --regexp-extended to be able to cool stuff with regular expressions, and the expression I've given you means:
search for count=",
then store ( any character that's not a " ), then
make sure it's followed by a ", then
replace everything with the stuff in the parenthesis (\1 is the first register)
Related
I have a file log_file which has contents such as
CCO O-MR1 Sync:No:3:No:346:Yes
CCO P Sync:No:1:No:106:Yes
CCO P Checkout:Yes:1:No:10:No
CCO O-MR1 Checkout(2.2):Yes:1:No:10:No
I am trying to obtain the 4 fields based on ":" delimiter
The script that I have is
#!/bin/bash
log_file=$1
for i in `cat $log_file` ; do
echo $i
field_a=`echo $i | awk -F '[:]' '{print $1}'`
echo $field_a
field_b=`echo $i | awk -F '[:]' '{print $2}'`
echo $lfield_b
...
done
but the value that this code gives for field_a is wrong, it splits the line based on " " delimiter.
echo $i also prints wrong value.
What else can I use to correct this?
This is covered in detail in BashFAQ #1. To summarize, use a while read loop with IFS set to contain (only) the characters that should be used to split fields.
while IFS=: read -r field_a field_b other_fields; do
echo "field_a is $field_a"
echo "field_b is $field_b"
echo "Remaining fields are $other_fields"
done <"$log_file"
I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script
Here is my code
cd /bin/
echo *xyz?2* | cut -f 1 -d '.'
Please, how can i change this command to display files without extension ?
Bests.
Dump the filenames into an array and then use parameter expansion:
$ arr=(xyz?2*); echo "${arr[*]%.*}"
xyz32281 xyz32406 xyz32459 xyz3252 xyz7214 xyz8286
Assuming your filenames don't have any whitespace or glob characters.
You can just use printf '%s\n' instead of echo in your command:
printf '%s\n' *xyz?2* | cut -f 1 -d '.'
xyz32281
xyz32406
xyz32459
xyz3252
xyz7214
xyz8286
If you must use echo then use awk as this:
echo *xyz?2* | awk '{for(i=1; i<=NF; i++) print (split($i, a, /\./)==2 ? a[1] : $i)}'
xyz32281
xyz32406
xyz32459
xyz3252
xyz7214
xyz8286
This awk command iterated through each filename matched by glob pattern and splits each name by dot. If dot is found then first part is printed otherwise full filename is printed.
Your problem is that all files of echo *xyz?2* are shown in one line. When the filenames are without spaces/newlines, you can fix this by moving them to different lines and joining theem again when finished.
echo *xyz?2* | tr ' ' '\n' | cut -f 1 -d '.' | tr '\n' ' '| sed '$s/ $/\n/'
You can do this a lot easier with sed:
echo *xyz?2* | sed 's/[.][^. ]*//g'
I have a file, lets call it 'a.txt' and this file contains the following text line
do to what
I'm wondering what the SED command is to reverse the order of this text to make it look like
what to do
Do I have to do some sort of append? Like append 'do' to 'to' so it would look like
to ++ do (used ++ just to make it clear)
I know tac can do something related
$ cat file
do to what
$ tac -s' ' file
what to do $
Where the -s defines the separator, which is by default a newline.
I would use awk to do this:
awk '{ for (i=NF; i>=1; i--) printf (i!=1) ? $i OFS : $i "\n" }' file.txt
Results:
what to do
EDIT:
If you require a one-liner to modify your file "in-place", try:
{ rm file.txt && awk '{ for (i=NF; i>=1; i--) printf (i!=1) ? $i OFS : $i "\n" }' > file.txt; } < file.txt
sed answer
As this question was tagged sed, my 1st answer was:
First (using arbitraty _ to mark viewed spaces, when a.txt contain do to what:
sed -e '
:a;
s/\([^_]*\) \([^ ]*\)/\2_\1/;
ta;
y/_/ /;
' a.txt
what to do
than, when a.txt contain do to to what:
sed -e '
:a;
s/^\(\|.* \)\([^+ ]\+\) \2\([+]*\)\(\| .*\)$/\1\2\3+\4/g;
ta;
:b;
s/\([^_]*\) \([^ ]*\)/\2_\1/;
tb;
y/_/ /;
' <<<'do to to to what'
what to++ do
There is one + for each supressed duplicated word:
sed -e ':a;s/^\(\|.* \)\([^+ ]\+\) \2\([+]*\)\(\| .*\)$/\1\2\3+\4/g;ta;
:b;s/\([^_]*\) \([^ ]*\)/\2_\1/;tb;
y/_/ /;' <<<'do do to what what what what'
what+++ to do+
bash answer
But as there is a lot of people searching for simple bash solutions, there is a simple way:
xargs < <(uniq <(tac <(tr \ \\n <<<'do do to what what what what')))
what to do
this could be written:
tr \ \\n <<<'do do to what what what what' | tac | uniq | xargs
what to do
or even with some bash scripting:
revcnt () {
local wrd cnt plut out="";
while read cnt wrd; do
printf -v plus %$((cnt-1))s;
out+=$wrd${plus// /+}\ ;
done < <(uniq -c <(tac <(tr \ \\n )));
echo $out
}
Will do:
revcnt <<<'do do to what what what what'
what+++ to do+
Or as pure bash
revcnt() {
local out i;
for ((i=$#; i>0; i--))
do
[[ $out =~ ${!i}[+]*$ ]] && out+=+ || out+=\ ${!i};
done;
echo $out
}
where submited string have to be submitted as argument:
revcnt do do to what what what what
what+++ to do+
Or if prossessing standard input (or from file) is required:
revcnt() {
local out i arr;
while read -a arr; do
out=""
for ((i=${#arr[#]}; i--; 1))
do
[[ $out =~ ${arr[i]}[+]*$ ]] && out+=+ || out+=\ ${arr[i]};
done;
echo $out;
done
}
So you can process multiple lines:
revcnt <<eof
do to what
do to to to what
do do to what what what what
eof
what to do
what to++ do
what+++ to do+
This might work for you (GNU sed):
sed -r 'G;:a;s/^\n//;t;s/^(\S+|\s+)(.*)\n/\2\n\1/;ta' file
Explanation:
G add a newline to the end of the pattern space (PS)
:a loop name space
s/^\n//;t when the newline is at the front of the PS, remove it and print line
s/^(\S+|\s+)(.*)\n/\2\n\1/;ta insert either a non-space or a space string directly after the newline and loop to :a
The -r switch makes the regexp easier-on-the-eye (grouping (...), alternation ...|... and the metacharacter for one-or-more + are relieved of the need of a backslash prefix).
Alternative:
sed -E 'G;:a;s/^(\S+)(\s*)(.*\n)/\3\2\1/;ta;s/.//' file
N.B. To reverse the line, adapt the above solution to:
sed -E 'G;:a;/^(.)(.*\n)/\2\1/;ta;s/.//' file
May be you would like perl for this:
perl -F -lane '#rev=reverse(#F);print "#rev"' your_file
As Bernhard said, tac can be used here:
#!/usr/bin/env bash
set -eu
echo '1 2 3
2 3 4
3 4 5' | while IFS= read -r; do
echo -n "$REPLY " | tac -s' '
echo
done
$ ./1.sh
3 2 1
4 3 2
5 4 3
I believe my example is more helpful.
Here is a script which reads words from the file replaced.txt and displays the output each word in each line, But I want to display all the outputs in a single line.
#!/bin/sh
echo
echo "Enter the word to be translated"
read a
IFS=" " # Set the field separator
set $a # Breaks the string into $1, $2, ...
for a # a for loop by default loop through $1, $2, ...
do
{
b= grep "$a" replaced.txt | cut -f 2 -d" "
}
done
Content of "replaced.txt" file is given below:
hllo HELLO
m AM
rshbh RISHABH
jn JAIN
hw HOW
ws WAS
ur YOUR
dy DAY
This question can't be appropriate to what I asked, I just need the help to put output of the script in a single line.
Your entire script can be replaced by:
#!/bin/bash
echo
read -r -p "Enter the words to be translated: " a
echo $(printf "%s\n" $a | grep -Ff - replaced.txt | cut -f 2 -d ' ')
No need for a loop.
The echo with an unquoted argument removes embedded newlines and replaces each sequence of multiple spaces and/or tabs with one space.
One hackish-but-simple way to remove trailing newlines from the output of a command is to wrap it in printf %s "$(...) ". That is, you can change this:
b= grep "$a" replaced.txt | cut -f 2 -d" "
to this:
printf %s "$(grep "$a" replaced.txt | cut -f 2 -d" ") "
and add an echo command after the loop completes.
The $(...) notation sets up a "command substitution": the command grep "$a" replaced.txt | cut -f 2 -d" " is run in a subshell, and its output, minus any trailing newlines, is substituted into the argument-list. So, for example, if the command outputs DAY, then the above is equivalent to this:
printf %s "DAY "
(The printf %s ... notation is equivalent to echo -n ... — it outputs a string without adding a trailing newline — except that its behavior is more portably consistent, and it won't misbehave if the string you want to print happens to start with -n or -e or whatnot.)
You can also use
awk 'BEGIN { OFS=": "; ORS=" "; } NF >= 2 { print $2; }'
in a pipe after the cut.