rearrange data - bash

If I have a list of data in a text file seperated by a new line, is there a way to append something to the start, then the data, then append something else then the data again?
EG a field X would become new X = X;
Can you do this with bash or sed or just unix tools like cut?
EDIT:
I am trying to get "ITEM_SITE_ID :{$row['ITEM_SITE_ID']} " .
I am using this line awk '{ print "\""$1 " {:$row['$1']} " }'
And I get this "ITEM_SITE_ID {:$row[]}
What have I missed?

I think the problem is your single quotes are not properly escaped, which is actually impossible to do.
With sed:
sed "s/\(.*\)/\1 = \1;/"
Or in your case:
sed "s/\(.*\)/\"\1 :{\$row['\1']}\"/"
And with bash:
while read line
do
echo "\"$line :{\$row['$line']}\""
done
And actually you can do it in awk using bashes $'' strings:
awk $'{ print "\\"" $1 " :{$row[\'" $1 "\']}\\"" }'

Awk is often the perfect tool for tasks like this. For your specific example:
awk '{ print "new " $1 " = " $1 ";" }'

Related

how to select the last line of the shell output

Hi I have a shell command like this.
s3=$(awk 'BEGIN{ print "S3 bucket path" }
/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
/s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
The output of the above command like this.
echo $s3
2018-02-21T17:58:22,
2018-02-21T17:58:26,
2018-02-21T18:05:33,
2018-02-21T18:05:34
I want to select the last line only. I need the last output like this.
2018-02-21T18:05:34
I tried like this.
awk -v $s3 '{print $(NF)}'
Not working.Any help will be appreciated.
In general, command | tail -n 1 prints the last line of the output from command. However, where command is of the form awk '... { ... print something }' you can refactor to awk '... { ... result = something } END { print result }' to avoid spawning a separate process just to discard the other output. (Conversely, you can replace awk '/condition/ { print something }' | head -n 1 with awk '/condition/ { print something; exit }'.)
If you already have the result in a shell variable s3 and want to print just the last line, a parameter expansion echo "${s3##*$'\n'}" does that. The C-style string $'\n' to represent a newline is a Bash extension, and the parameter expansion operator ## to remove the longest matching prefix isn't entirely portable either, so you should make sure the shebang line says #!/bin/bash, not #!/bin/sh
Notice also that $s3 without quotes is an error unless you specifically require the shell to perform whitespace tokenization and wildcard expansion on the value. You should basically always use double quotes around variables except in a couple of very specific scenarios.
Your Awk command would not work for two reasons; firstly, as explained in the previous paragraph, you are setting s3 to the first token of the variable, and the second is your Awk script (probably a syntax error). In more detail, you are basically running
awk -v s3=firstvalue secondvalue thirdvalue '{ print $(NF) }'
^ value ^ script to run ^ names of files ...
where you probably wanted to say
awk -v s3=$'firstvalue\nsecondvalue\nthirdvalue' '{ print $(NF) }'
But even with quoting, your script would set v to something but then tell Awk to (ignore the variable and) process standard input, which on the command line leaves it reading from your terminal. A fixed script might look like
awk 'END { print }' <<<"$s3"
which passes the variable as standard input to Awk, which prints the last line. The <<<value "here string" syntax is also a Bash extension, and not portable to POSIX sh.
much simple way is
command | grep "your filter" | tail -n 1
or directly
command | tail -n 1
You could try this:
echo -e "This is the first line \nThis is the second line" | awk 'END{print}'
another approach can be, processing the file from the end and exiting after first match.
tac file | awk '/match/{print; exit}'
Hi you can do it just by adding echo $s3 | sed '$!d'
s3=$(awk 'BEGIN{ print "S3 bucket path" }/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 } /s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
echo $s3 | sed '$!d'
It will simply print:-
2018-02-21T18:05:34
Hope this will help you.

How to manipulate text to one side of a delimiter while preserving text on the opposite side

I am trying to translate some documents in which every line is of the form:
name1:text to be translated
name2:text to be translated
I am using translate-shell to perform the translations. trans -b :es -input ~/path/to/file
The desired output would be:
name1:texto a traducir
name2:texto a traducir
But instead I am getting this output:
nombre1:texto a traducir
nombre2:texto a traducir
If I had to guess I would guess the answer probably lies in separating the fields with awk but I'm having difficulty understanding the man pages well enough to figure out how to do it properly. Right now I'm doing this
awk -F: '/:/ { print $1 ": " $2 }' ~/path/to/file
to separate the fields and then attempting to work with each field separately. But I am confused about the pattern-action statement awk. Can I run another command within the awk environment? So far all my attempts to do so have resulted in syntax errors.
Here is a recipe involving cut and paste:
cut the names and texts into two separated files:
cut -d: -f1 yourfile > names.txt
cut -d: -f2- yourfile > text.txt
translate text.txt using whatever workflow you are using at the moment
combine the old names.txt with the translated text:
paste -d: names.txt yourtranslated_text
I think #LarsFischer has the best answer so far but just in case you have some reason to need to use awk and you can pass individual strings to "trans" and the text to be translated cannot contain newlines, this is how you'd do it:
awk '
{
name = text = $0
sub(/:.*/,"",name)
sub(/[^:]+:/,"",text)
cmd = "trans args \"" text "\""
if ( (cmd | getline rslt) > 0 ) {
print name ":" rslt
}
close(cmd)
}
' file
Well, I can't get the translate-shell to work but maybe something like this:
awk -v dq='"' -F: '{printf "%s ", $1; gsub(/^.*:/,""); system("trans -b :es "dq""$0""dq)}' test.in
another alternative is paste the original and translated files together and cut the needed fields, that is
paste -d: original translation | cut -d: -f1,4

Modify content inside quotation marks, BASH

Good day to all,
I was wondering how to modify the content inside quotation marks and left unmodified the outside.
Input line:
,,,"Investigacion,,, desarrollo",,,
Output line:
,,,"Investigacion, desarrollo",,,
Initial try:
sed 's/\"",,,""*/,/g'
But nothing happens, thanks in advance for any clue
The idiomatic awk way to do this is simply:
$ awk 'BEGIN{FS=OFS="\""} {sub(/,+/,",",$2)} 1' file
,,,"Investigacion, desarrollo",,,
or if you can have more than one set of quoted strings on each line:
$ cat file
,,,"Investigacion,,, desarrollo",,,"foo,,,,bar",,,
$ awk 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) sub(/,+/,",",$i)} 1' file
,,,"Investigacion, desarrollo",,,"foo,bar",,,
This approach works because everything up to the first " is field 1, and everything from there to the second " is field 2 and so on so everything between "s is the even-numbered fields. It can only fail if you have newlines or escaped double quotes inside your fields but that'd affect every other possible solution too so you'd need to add cases like that to your sample input if you want a solution that handles it.
Using a language that has built-in CSV parsing capabilities like perl will help.
perl -MText::ParseWords -ne '
print join ",", map { $_ =~ s/,,,/,/; $_ } parse_line(",", 1, $_)
' file
,,,"Investigacion, desarrollo",,,
Text::ParseWords is a core module so you don't need to download it from CPAN. Using the parse_line method we set the delimiter and a flag to keep the quotes. Then just do simple substitution and join the line to make your CSV again.
Using egrep, sed and tr:
s=',,,"Investigacion,,, desarrollo",,,'
r=$(egrep -o '"[^"]*"|,' <<< "$s"|sed '/^"/s/,\{2,\}/,/g'|tr -d "\n")
echo "$r"
,,,"Investigacion, desarrollo",,,
Using awk:
awk '{ p = ""; while (match($0, /"[^"]*,{2,}[^"]*"/)) { t = substr($0, RSTART, RLENGTH); gsub(/,+/, ",", t); p = p substr($0, 1, RSTART - 1) t; $0 = substr($0, RSTART + RLENGTH); }; $0 = p $0 } 1'
Test:
$ echo ',,,"Investigacion,,, desarrollo",,,' | awk ...
,,,"Investigacion, desarrollo",,,
$ echo ',,,"Investigacion,,, desarrollo",,,",,, "' | awk ...
,,,"Investigacion, desarrollo",,,", "

Passing variables into awk from bash

I am writing a shell script file in which I have to print certain columns of a file. So I try to use awk. The column numbers are calculated in the script. Nprop is a variable in a for loop, that changes from 1 to 8.
avg=1+3*$nprop
awk -v a=$avg '{print $a " " $a+1 " " $a+2}' $filename5 >> neig5.dat
I have tried the following also:
awk -v a=$avg '{print $a " " $(a+1) " " $(a+2) }' $filename5 >> neig5.dat
This results in printing the first three columns all the time.
avg=1+3*$nprop
This will set $avg to 1+3*4, literally, if $prop is 4 for instance. You should be evaluating that expression:
avg=$(( 1+3*$nprop ))
And use the version of the awk script with parenthesis.
This single awk script is a translation of what you want:
awk '{j=0;for(i=4;i<=25;i=3*++j+1)printf "%s %s %s ",$i,$(i+1),$(i+2);print ""}'
You don't need to parse your file 8 times in a shell loop just parse it once with awk.
Use a BEGIN{ } block to create a couple of awk variables:
avg=$((1+3*$nprop))
awk -v a=$avg 'BEGIN{ap1=a+1;ap2=a+2} {print $a " " $ap1 " " $ap2}' $filename5 >> neig5.dat
awk -v n="$nprop" 'BEGIN{x=3*n} {a=x; print $++a, $++a, $++a}' file
If you just want your seed value (nprop) to increment on every pass of the file and process the file 8 times, get rid of your external loop and just do this:
awk 'BEGIN{for (i=2;i<=8;i++) ARGV[++ARGC] = ARGV[1]} {a=3*NR/FNR; print $++a, $++a, $++a}' file
In GNU awk you can replace NR/FNR with ARGIND.

What is the optimal way to extract values between braces in bash/awk?

I have the output in this format:
Infosome - infotwo: (29333) - data-info-ids: (33389, 94934)
I want to extract the last two numbers in the last pair of braces. Some times there is only a single number in the last pair of braces.
This is the code I used.
echo "Infosome - infotwo: (29333) - data-info-ids: (33389, 94934)" | \
tr "," " " | tr "(" " " | tr ")" " " | awk -F: '{print $4}'
Is a more clean way to extract the values? or a more optimal way?
Try this:
awk -F '[()]' '{print $(NF-1)}' input | tr -d ,
It's kind of refactoring of your command.
awk -F\( '{gsub("[,)]", " ", $NF); print $NF}' input
will give
33389 94934
I am a bit unclear about the meaning of "optimal"/"professional" in this problem's context, but this only uses one command/tool, not sure if that qualifies.
Or building on #kev's approach (but not needing tr to eliminate the comma):
awk -F'[(,)]' '{print $4, $5}' input
outputs:
33389 94934
This can also be done in pure bash. Assuming the text always looks like the sample in the question, the following should work:
$ text="Infosome - infotwo: (29333) - data-info-ids: (33389, 94934)"
$ result="${text/*(}"
$ echo ${result//[,)]}
33389 94934
This uses shell "parameter expansion" (which you can search for in bash's man page) to strip the string in much the same way you did using tr. Strictly speaking, the quotes in the second line are not necessary, but they help with StackOverflow syntax highlighting. :-)
You could alternately make this a little more flexible by looking for the actual field you're interested in. If you're using GNU awk, you can specify RS with multiple characters:
$ gawk -vRS=" - " -vFS=": *" '
{ f[$1]=$2; }
END {
print f["data-info-ids"];
# Or you could strip the non-numeric characters to get just numbers.
#print gensub(/[^0-9 ]/,"","g",f["data-info-ids"]);
}' <<<"$text"
I prefer this way, because it actually interprets the input data for what it is -- structured text representing some sort of array.

Resources