Move a column using sed - bash

My Awk script generates this output:
1396.0893854748604 jdbc:mysql 192.168.0.8:3306/ycsb 3
I need to put the final column at the start, but do not wish to swap its position with the first. I need to do this using sed, or another pipe that is not awk.
I have tried variants of this command, but with no luck. My output just stays the same.
sed 's#\(.*\),\(.*\),\(.*\)#\4,\1,\2,\3#g'
Just for clarity my desired output would look like this:
3 1396.0893854748604 jdbc:mysql 192.168.0.8:3306/ycsb

You should use awk for this. It's totes better:
awk '{print $4, $1, $2, $3}' yourfilename
Updated: Oh right... Now I see that you require not using awk again... that's a wierd requirement. Leaving this here because it's an otherwise outstanding answer...

Related

Find Everything between 2 strings -- Sed

I have file which has data in below format.
{"default":true,"groupADG":["ABC","XYZ:mno"],"groupAPR":true}
{"default":true,"groupADG":["PQR"],"groupAPR":true}
I am trying to get output as
"ABC","XYZ:mno"
"PQR"
I tried doing it using sed but somewhere I am going wrong .
sed -e 's/groupADG":[\(.*\)],"groupAPR"/\1/ file.txt
Regards.
Note: If anyone is rating the question negative, I would request to give a reason also for same. As I have tried to fix it myself , since I was unable to do it I posted it here. I also gave my trial example.
Here is one potential solution:
sed -n 's/.*\([[].*[]]\).*/\1/p' file.txt
To exclude the brackets:
sed -n 's/.*\([[]\)\(.*\)\([]]\).*/\2/p'
Also, this would work using AWK:
awk -F'[][]' '{print $2}' file.txt
Just watch out for edge cases (e.g. if there are multiple fields with square brackets in the same line you may need a different strategy)
With your shown samples, following may also help you on same.
awk 'match($0,/\[[^]]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
OR with OP's attempts try with /"groupADG" also:
awk 'match($0,/"groupADG":\[[^]]*/){print substr($0,RSTART+12,RLENGTH-12)}' Input_file
With awk setting FS as [][] and the condition /groupADG/
awk -F'[][]' '/groupADG/ {print $2}' file
"ABC","XYZ:mno"
"PQR"

Grabbing only text/substring between 4th and 7th underscores in all lines of a file

I have a list.txt which contains the following lines.
Primer_Adapter_clean_KL01_BOLD1_100_KL01_BOLD1_100_N701_S507_L001_merged.fasta
Primer_Adapt_clean_KL01_BOLD1_500_KL01_BOLD1_500_N704_S507_L001_merged.fasta
Primer_Adapt_clean_LD03_BOLD2_Sessile_LD03_BOLD2_Sessile_N710_S506_L001_merged.fasta
Now I would like to grab only the substring between the 4th underscore and 7th underscore such that it will appear as below
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
I tried the below awk command but I guess I've got it wrong. Any help here would be appreciated. If this can be achieved via sed, I would be interested in that solution too.
awk -v FPAT="[^__]*" '$4=$7' list.txt
I feel like awk is overkill for this. You can just use cut to select just the fields you want:
$ cut -d_ -f5-7 list.txt
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
awk 'BEGIN{FS=OFS="_"} {print $5,$6,$7}' file
Output:
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

Paste columns side by side on many files using awk

I've been struggling for quite a while on this problem (please note I'm not a really good bash coder, let alone awk).
I have about 10000 files, each formated the same way (quite heavy as well, about 3Mb). I would like to get the 3rd row of each file and paste them side by side on a new file.
I found many solutions using paste, awk, or cut, but none of them worked when working with wildcards. For instance,
paste <(awk '{print $3}' file1 ) <(awk '{print $3}' file2 ) <(awk '{print $3}' file3) > output
would work great if I only had 3 files, but I won't type that for 10000 of them. So I gave it a try with wildcards:
paste <(awk '{print $3}' file* ) > output
And it does paste the 3rd rows, but in a single line. I tried some other codes, but eventually always end up with the same result. Is there a way to paste them side by side using wildcards?
Thank you very much for your help!
Baptiste G.
EDIT 1: With the help of schorsch312, I found a solution that works
for me. Instead of getting the columns and pasting them side by side,
I print each columns as a line and add them one after the other:
for i in ls files*; do
awk '{printf $3i" "}END{print}' $i >> output done
It works but 1/ it's quite slow, and 2/ it's not exactly what I asked
in the title, as my output files is the "transpose". It doesn't really
matter to me because it's only floats and I can transpose it later
with python if needed.
I know that you said awk alone, but I don't know how to do it. Here is a simple bash script which does what you like to do.
# do a loop over all your files
for i in `ls file*`; do
# use awk to get the 3rd row for all files and save output
awk '{print $3}' $i > row_$i
done
# now paste your rows together.
paste row_* > output
# cleanup
rm row_*

Delete every other row in CSV file using AWK or grep

I have a file like this:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,
1000_Tv178.tif,34.66987165
1000_Tv178.tif,
1001_Tv180.tif,65.51335742
1001_Tv180.tif,
1002_Tv184.tif,33.83784863
1002_Tv184.tif,
1002_Tv184.tif,22.82542442
1002_Tv184.tif,
How can I make it like this using a simple Bash command? :
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
Im other words, I need to delete every other row, starting with the second.
Thanks!
hek2mgl's (deleted) answer was on the right track, given the output you actually desire.
awk -F, '$2'
This says, print every row where the second field has a value.
If the second field has a value, but is nothing but whitespace you want to exclude, try this:
awk -F, '$2~/.*[^[:space:]].*/'`
You could also do this with sed:
sed '/,$/d'
Which says, delete every line that ends with a comma. I'm sure there's a better way, I avoid sed.
If you really want to explicitly delete every other row:
awk 'NR%2'
This says, print every row where the row number modulo 2 is not zero. If you really want to delete every even row it doesn't actually matter that it's a comma-delimited file.
awk provides a simple way
awk 'NR % 2' file.txt
This might work for you (GNU sed):
sed '2~2d' file
or:
sed 'n;d' file
Here's the gnu sed equivalent of the awk answers provided. Now you can safely use sed's -i flag, by specifying a backup extension:
sed -n -i.bak 'N;P' file.txt
Note that gawk4 can do this too:
gawk -i inplace -v INPLACE_SUFFIX=".bak" 'NR%2==1' file.txt
Results:
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
If OPs input does not contain space after last number or , this awk can be used.
awk '!/,$/'
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442
But its not robust at all, any space after , brakes it.
This should fix the last space:
awk '!/,[ ]*$/'
Thank for your help guys, but I also had to make a workaround:
Read it into R and then wrote it out again. Then I installed GNU versions of awk and used gawk '{if ((FNR % 2) != 0) {print $0}}'. So if anyone else have the same problem, try it!

How do I print a field from a pipe-separated file?

I have a file with fields separated by pipe characters and I want to print only the second field. This attempt fails:
$ cat file | awk -F| '{print $2}'
awk: syntax error near line 1
awk: bailing out near line 1
bash: {print $2}: command not found
Is there a way to do this?
Or just use one command:
cut -d '|' -f FIELDNUMBER
The key point here is that the pipe character (|) must be escaped to the shell. Use "\|" or "'|'" to protect it from shell interpertation and allow it to be passed to awk on the command line.
Reading the comments I see that the original poster presents a simplified version of the original problem which involved filtering file before selecting and printing the fields. A pass through grep was used and the result piped into awk for field selection. That accounts for the wholly unnecessary cat file that appears in the question (it replaces the grep <pattern> file).
Fine, that will work. However, awk is largely a pattern matching tool on its own, and can be trusted to find and work on the matching lines without needing to invoke grep. Use something like:
awk -F\| '/<pattern>/{print $2;}{next;}' file
The /<pattern>/ bit tells awk to perform the action that follows on lines that match <pattern>.
The lost-looking {next;} is a default action skipping to the next line in the input. It does not seem to be necessary, but I have this habit from long ago...
The pipe character needs to be escaped so that the shell doesn't interpret it. A simple solution:
$ awk -F\| '{print $2}' file
Another choice would be to quote the character:
$ awk -F'|' '{print $2}' file
Another way using awk
awk 'BEGIN { FS = "|" } ; { print $2 }'
And 'file' contains no pipe symbols, so it prints nothing. You should either use 'cat file' or simply list the file after the awk program.

Resources