How can I switch around the content of a line of text - bash

I have a large file (around 39,000 lines of text) that consists of the following:
1:iowemiowe093j4384d
2:98j238d92dd2d
3:98h2d078h78dbe0c
(continues in the same manner)
and I need to reverse the order of the two sections of the lines, so the output would be:
iowemiowe093j4384d:1
98j238d92dd2d:2
98h2d078h78dbe0c:3
Instead, I've tried using cut to do this but have not been able to get it to behave properly (this is in a bash environment), what would be the best way to do this?

awk -F: '{print $2":"$1}' input-file
Or
awk -F: '{print $2,$1}' OFS=: input-file
If you may have more than 2 fields:
awk -F: '{print $NF; for(i=NF-1; i; i-- ) print ":"$i }' input-file
Or
perl -F: -anE '$\=:; say reverse #F' input-file
or
perl -F: -anE 'say join( ':', reverse #F)' input-file
( Both perl solutions are untested, and I believe flawed, each requiring a chop $F[-1] or similar to remove the newline in the input.)

One way using GNU sed:
sed -ri 's/([^:]+):(.*)/\2:\1/' file.txt
Results:
iowemiowe093j4384d:1
98j238d92dd2d:2
98h2d078h78dbe0c:3

Pure Bash and almost as fast as the awk solution from William Pursell, just not as elegant:
paste -d: <(cut -d: -f2 input-file) <(cut -d: -f1 input-file)

Related

uniq sort parsing

I have one file with field separated by ";", like this:
test;group;10.10.10.10;action2
test2;group;10.10.13.11;action1
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test5;group2;10.10.10.12;action5
test6;group4;10.10.13.11;action8
I would like to identify all non-unique IP addresses (3rd column). With the example the extract should be:
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
Sorted by IP address (3rd column).
Ssing simple commands like cat, uniq, sort, awk (not Perl, not Python, only shell).
Any idea?
$ awk -F';' 'NR==FNR{a[$3]++;next}a[$3]>1' file file|sort -t";" -k3
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
awk picks all duplicated ($3) lines
sort sorts by ip
You can also try this solution using grep, cut, sort, uniq, and a casual process substitution in the middle.
grep -f <(cut -d ';' -f3 file | sort | uniq -d) file | sort -t ';' -k3
It is not really elegant (I actually prefer the awk answer given above), but I think worth sharing, since it accomplishes what you want.
here is another awk assisted pipeline
$ awk -F';' '{print $0 "\t" $3}' file | sort -sk2 | uniq -Df1 | cut -f1
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
single pass, so special caching; also keeps the original order (stable sorting). Assumes tab doesn't appear in the fields.
This is very similar to Kent's answer, but with a single pass through the file. The tradeoff is memory: you need to store the lines to keep. This uses GNU awk for the PROCINFO variable.
awk -F';' '
{count[$3]++; lines[$3] = lines[$3] $0 ORS}
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for (key in count)
if (count[key] > 1)
printf "%s", lines[key]
}
' file
The equivalent perl
perl -F';' -lane '
$count{$F[2]}++; push #{$lines{$F[2]}}, $_
} END {
print join $/, #{$lines{$_}}
for sort grep {$count{$_} > 1} keys %count
' file
awk + sort + uniq + cut:
$ awk -F ';' '{print $0,$3}' <file> | sort -k2 | uniq -D -f1 | cut -d' ' -f1
sort + awk
$ sort -t';' -k3,3 | awk -F ';' '($3==k){c++;b=b"\n"$0}($3!=k){if (c>1) print b;c=1;k=$3;b=$0}END{if(c>1)print b}
awk
$ awk -F ';' '{b[$3"_"++k[$3]]=$0; }
END{for (i in k) if(k[i]>1) for(j=1;j<=k[i];j++) print b[i"_"j] } <file>
This buffers the full file (same as sort does) and keeps track how many times a key k is appearing. At the end, if the key appears more then ones, print the full set.
test2;group;10.10.13.11;action1
test6;group4;10.10.13.11;action8
test;group;10.10.10.10;action2
test3;group3;10.10.10.10;action3
tes4;group;10.10.10.10;action4
If you want it sorted :
$ awk -F ';' '{b[$3"_"++k[$3]]=$0; }
END{ asorti(k,l);
for (i in l) if(k[l[i]]>1) for(j=1;j<=k[l[i]];j++) print b[l[i]"_"j] } <file>

Bash: concenate lines in csv file (1+2, 3+4 etc)

I have a bash file with increasing integers in the first column and some text behind.
1,text1a,text1b
2,text2a,text2b
3,text3a,text3b
4,text4a,text4b
...
I would like to add line 1+2, 3+4 etc. and add the outcome to a new csv file.
The desired output would be
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
...
A second option without the numbers would be great as well. The actual input would be
1,text,text,,,text#text.com,2,text.text,text
2,text,text,,,text#text.com,3,text.text,text
3,text,text,,,text#text.com,2,text.text,text
4,text,text,,,text#text.com,3,text.text,text
Desired outcome
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
$ pr -2ats, file
gives you
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
UPDATE
for the second part
$ cut -d, -f2- file | pr -2ats,
will give you
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
text,text,,,text#text.com,2,text.text,text,text,text,,,text#text.com,3,text.text,text
awk solution:
awk '{ printf "%s%s",$0,(!(NR%2)? ORS:",") }' input.csv > output.csv
The output.csv content:
1,text1a,text1b,2,text2a,text2b
3,text3a,text3b,4,text4a,text4b
----------
Additional approach (to skip numbers):
awk -F',' '{ printf "%s%s",$2 FS $3,(!(NR%2)? ORS:FS) }' input.csv > output.csv
The output.csv content:
text1a,text1b,text2a,text2b
text3a,text3b,text4a,text4b
3rd approach (for your extended input):
awk -F',' '{ sub(/^[0-9]+,/,"",$0); printf "%s%s",$0,(!(NR%2)? ORS:FS) }' input.csv > output.csv
With bash, cut, sed and paste:
paste -d, <(cut -d, -f 2- file | sed '2~2d') <(cut -d, -f 2- file | sed '1~2d')
Output:
text1a,text1b,text2a,text2b
text3a,text3b,text4a,text4b
I hoped to get started with something simple as
printf '%s,%s\n' $(<inputfile)
This turns out wrong when you have spaces inside your text fields.
The improvement is rather a mess:
source <(echo "printf '%s,%s\n' $(sed 's/.*/"&"/' inputfile|tr '\n' ' ')")
Skipping the first filed can be done in the same sed command:
source <(echo "printf '%s,%s\n' $(sed -r 's/([^,]*),(.*)/"\2"/' inputfile|tr '\n' ' ')")
EDIT:
This solution will fail when it has special characters, so you should use a simple solution as
cut -f2- file | paste -d, - -

Bash string replace on command result

I have a simple bash script which is getting the load average using uptime and awk, for example
LOAD_5M=$(uptime | awk -F'load averages:' '{ print $2}' | awk '{print $2}')
However this includes a ',' at the end of the load average
e.g.
0.51,
So I have then replaced the comma with a string replace like so:
LOAD_5M=${LOAD_5M/,/}
I'm not an awk or bash wizzkid so while this gives me the result I want, I am wondering if there is a succinct way of writing this, either by:
Using awk to get the load average without the comma, or
Stripping the comma in a single line
You can do that in same awk command:
uptime | awk -F 'load averages?: *' '{split($2, a, ",? "); print a[2]}'
1.32
The 5 min load is available in /proc/loadavg. You can simply use cut:
cut -d' ' -f2 /proc/loadavg
With awk you can issue:
awk '{print $2}' /proc/loadavg
If you are not working on Linux the file /proc/loadavg will not being present. In this case I would suggest to use sed, like this:
uptime | sed 's/.*, \(.*\),.*,.*/\1/'
uptime | awk -F'load average:' '{ print $2}' | awk -F, '{print $2}'
0.38
(My uptime output has 'load average:' singular)
The load average numbers are always the last 3 fields in the 'uptime' output so:
IFS=' ,' read -a uptime_fields <<<"$(uptime)"
LOAD_5M=${uptime_fields[#]: -2:1}

Awk and head not identifying columns properly

Here is my code that I want to use to separate 3 columns from hist.txt into 2 separate files, hist1.dat with first and second column and hist2.dat with first and third column. The columns in hist.txt may be separated with more than one space. I want to save in histogram1.dat and histogram2.dat the first n lines until the last nonzero value.
The script creates histogram1.dat correct, but histogram2.dat contains all the lines from hist2.dat.
hist.txt is like :
http://pastebin.com/JqgSKZrP
#!bin/bash
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $2;}' > hist1.dat
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $3;}' > hist2.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist1.dat) hist1.dat > histogram1.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist2.dat) hist2.dat > histogram2.dat
What is the cause of this problem? Might it be due to some special restriction with head?
Thanks.
For your first histogram, try
awk '$2 ~ /000000/{exit}{print $1, $2}' hist.txt
and for your second:
awk '$3 ~ /000000/{exit}{print $1, $3}' hist.txt
Hope I understood you correctly...

Delete text in file after a match

I have a file with the following:
/home/adversion/web/wp-content/plugins/akismet/index1.php: PHP.Mailer-7 FOUND
/home/beckydodman/web/oldshop/images/google68274020601e.php: Trojan.PHP-1 FOUND
/home/resurgence/web/Issue 272/Batch 2 for Helen/keynote_Philip Baldwin (author revise).doc: W97M.Thus.A FOUND
/home/resurgence/web/Issue 272/from Helen/M keynote_Philip Baldwin.doc: W97M.Thus.A FOUND
/home/skda/web/clients/sandbox/wp-content/themes/editorial/cache/external_dc8e1cb5bf0392f054e59734fa15469b.php: Trojan.PHP-58 FOUND
I need to clean this file up by removing everything after the colon (:).
so that it looks like this:
/home/adversion/web/wp-content/plugins/akismet/index1.php
/home/beckydodman/web/oldshop/images/google68274020601e.php
/home/resurgence/web/Issue 272/Batch 2 for Helen/keynote_Philip Baldwin (author revise).doc
/home/resurgence/web/Issue 272/from Helen/M keynote_Philip Baldwin.doc
/home/skda/web/clients/sandbox/wp-content/themes/editorial/cache/external_dc8e1cb5bf0392f054e59734fa15469b.php
Use awk:
$ awk -F: '{print $1}' input
/home/adversion/web/wp-content/plugins/akismet/index1.php
/home/beckydodman/web/oldshop/images/google68274020601e.php
/home/resurgence/web/Issue 272/Batch 2 for Helen/keynote_Philip Baldwin (author revise).doc
/home/resurgence/web/Issue 272/from Helen/M keynote_Philip Baldwin.doc
/home/skda/web/clients/sandbox/wp-content/themes/editorial/cache/external_dc8e1cb5bf0392f054e59734fa15469b.php
or cut
$ cut -d: -f1 input
or sed
$ sed 's/:.*$//' input
or perl in awk-mode
$ perl -F: -lane 'print $F[0]' input
finally, pure bash
#!/bin/bash
while read line
do
echo ${line%%:*}
done < input
This should be enough
awk -F: '{print $1}' file-name
Here a none sed/awk solution
cut -d : -f 1 [filename]
pipe that through sed:
$ echo "/home/adversion/web/wp-content/plugins/akismet/index1.php: PHP.Mailer-7 FOUND" | sed 's/: .*$//'
/home/adversion/web/wp-content/plugins/akismet/index1.php
Will work as long as ': ' doesn't appear twice. Note that the awk / cut examples above are more likely to fail as they match ':' not ': '

Resources