Swap in shell from file - bash

Through cut -d":" -f1,3 I made a new file which looks like this:
username1:number1
username2:number2
username3:number3
But my point is, I want to my file to looks like this:
number1:username1
number2:username2
number3:username3
I tried that cut -d":" -f3,1 but it still gets me username1:number1 even when I want to be that 3rd column be the 1st and the 1st column to print it like a last one ... Any help with that ?

cut -f3,1 will print the same as cut -f1,3. Use awk:
awk -F: '{print $3 FS $1}' file

I like awk for this sort of thing. You've already got an awk answer though.
To do this in pure bash, you'd do the following:
while IFS=: read -r one two; do printf '%s:%s\n' "$two" "$one"; done < input.txt
The IFS variable is the field separator used to slice up the input into separate variables for read, and I'm using printf to gives us predictably formatted output.

Related

Awk strings enclosed in brackets

I'm trying to make a script to make reading logs easier. I'm having trouble extracting a string enclosed in brackets.
I want to extract the thread ID of a log which looks like this:
[CURRENT_DATE][THREAD_ID][PROCESS_NAME]Some random text here
I have tried this but it prints the CURRENT_DATE:
awk -F '[][]' '{print $2}'
If I use print $3 it prints the Some random text here part.
Is there any way that I could somehow read the string enclosed in brackets?
You may use this awk:
s='[CURRENT_DATE][THREAD_ID][PROCESS_NAME]Some random text here'
awk -F '\\]\\[' '{print $2}' <<< "$s"
THREAD_ID
-F '\\]\\[' will make text ][ as delimiter.
How about this? (Note that multiple character delimiters seem not to be available in GNU awk 4 respectively in the awk version the OP is using.)
pattern='[CURRENT_DATE][THREAD_ID][PROCESS_NAME]Some random text here'
echo $pattern
awk -F '[' '{print substr($3, 1, length($3)-2)}' <<< "$pattern"
Different versions of awk behave in different ways. Without knowing what you're running, it's difficult to say why your existing code behaves as it does.
You already know that with a field separator of [][] or just [, you have an empty field at the beginning of each line. Instead, I'd try this:
awk -F']' '{gsub(/\[/,""); print $2}' input.log
This simply strips out the left-square-bracket and uses its fellow as your field delimiter. The advantage of using ] instead of [ is that it makes $1 your first field.

read line by line with awk and parse variables

I have a script that read log files and parse the data to insert them to mysql table..
My script looks like
while read x;do
var=$(echo ${x}|cut -d+ -f1)
var2=$(echo ${x}|cut -d_ -f3)
...
echo "$var,$var2,.." >> mysql.infile
done<logfile
The Problem is that log files are thousands of lines and taking hours....
I read that awk is better, I tried, but don't know the syntax to parse the variables...
EDIT:
inputs are structure firewall logs so they are pretty large files like
#timestamp $HOST reason="idle Timeout" source-address="x.x.x.x"
source-port="19219" destination-address="x.x.x.x"
destination-port="53" service-name="dns-udp" application="DNS"....
So I'm using a lot of grep for ~60 variables e.g
sourceaddress=$(echo ${x}|grep -P -o '.{0,0}
source-address=\".{0,50}'|cut -d\" -f2)
if you think perl will be better I'm open to suggestions and maybe a hint how to script it...
To answer your question, I assume the following rules of the game:
each line contains various variables
each variable can be found by a different delimiter.
This gives you the following awk script :
awk 'BEGIN{OFS=","}
{ FS="+"; $0=$0; var=$1;
FS="_"; $0=$0; var2=$3;
...
print var1,var2,... >> "mysql.infile"
}' logfile
It basically does the following :
set the output separator to ,
read line
set the field separator to +, re-parse the line ($0=$0) and determine the first variable
set the field separator to '_', re-parse the line ($0=$0) and determine the second variable
... continue for all variables
print the line to the output file.
The perl script below might help:
perl -ane '/^[^+]*/;printf "%s,",$&;/^([^_]*_){2}([^_]*){1ntf "%s\n",$+' logfile
Since, $& can result in performance penalty, you could also use the /p modifier like below :
perl -ane '/^[^+]*/p;printf "%s,",${^MATCH};/^([^_]*_){2}([^_]*){1}_.*/;printf "%s\n",$+' logfile
For more on perl regex matching refer to [ PerlDoc ]
if you're extracting the values in order, something like this will help
$ awk -F\" '{for(i=2;i<=NF;i+=2) print $i}' file
idle Timeout
x.x.x.x
19219
x.x.x.x
53
dns-udp
DNS
you can easily change the output format as well
$ awk -F\" -v OFS=, '{for(i=2;i<=NF;i+=2)
printf "%s", $i ((i>NF-2)?ORS:OFS)}' file
idle Timeout,x.x.x.x,19219,x.x.x.x,53,dns-udp,DNS

Efficiently extract multiple columns into variables

This seems like it should be simple, but has been driving me nuts trying to find a way to phrase a search to answer it.
Quite simply I have a file structured as three columns consisting of a name, a path, and a pattern. While looping through the lines I would like to extract each of these into its own variable for use.
The solution I have currently is (forgive typos):
while read line
do
name=$(echo "$line" | awk '{print $1}')
path=$(echo "$line" | awk '{print $2}')
pattern=$(echo "$line" | awk '{print $3}')
done < "myFile.txt"
However this seems a really inefficient way to do it as it means invoking awk once for each variable. So, is there a better way to do this? The delimiter is just a tab at the moment, but I can change it to anything that's easier to work with.
while read name path pattern
do
# Do something
done < myFile.txt
And if you want to change your delimiter to something else, like a ,:
while IFS=, read name path pattern
do
# Do something
done < myFile.txt

Unix cut: Print same Field twice

Say I have file - a.csv
ram,33,professional,doc
shaym,23,salaried,eng
Now I need this output (pls dont ask me why)
ram,doc,doc,
shayam,eng,eng,
I am using cut command
cut -d',' -f1,4,4 a.csv
But the output remains
ram,doc
shyam,eng
That means cut can only print a Field just one time. I need to print the same field twice or n times.
Why do I need this ? (Optional to read)
Ah. It's a long story. I have a file like this
#,#,-,-
#,#,#,#,#,#,#,-
#,#,#,-
I have to covert this to
#,#,-,-,-,-,-
#,#,#,#,#,#,#,-
#,#,#,-,-,-,-
Here each '#' and '-' refers to different numerical data. Thanks.
You can't print the same field twice. cut prints a selection of fields (or characters or bytes) in order. See Combining 2 different cut outputs in a single command? and Reorder fields/characters with cut command for some very similar requests.
The right tool to use here is awk, if your CSV doesn't have quotes around fields.
awk -F , -v OFS=, '{print $1, $4, $4}'
If you don't want to use awk (why? what strange system has cut and sed but no awk?), you can use sed (still assuming that your CSV doesn't have quotes around fields). Match the first four comma-separated fields and select the ones you want in the order you want.
sed -e 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/\1,\4,\4/'
$ sed 's/,.*,/,/; s/\(,.*\)/\1\1,/' a.csv
ram,doc,doc,
shaym,eng,eng,
What this does:
Replace everything between the first and last comma with just a comma
Repeat the last ",something" part and tack on a comma. VoilĂ !
Assumptions made:
You want the first field, then twice the last field
No escaped commas within the first and last fields
Why do you need exactly this output? :-)
using perl:
perl -F, -ane 'chomp($F[3]);$a=$F[0].",".$F[3].",".$F[3];print $a."\n"' your_file
using sed:
sed 's/\([^,]*\),.*,\(.*\)/\1,\2,\2/g' your_file
As others have noted, cut doesn't support field repetition.
You can combine cut and sed, for example if the repeated element is at the end:
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/&&,/'
Output:
ram,doc,doc,
shaym,eng,eng,
Edit
To make the repetition variable, you could do something like this (assuming you have coreutils available):
n=10
rep=$(seq $n | sed 's:.*:\&:' | tr -d '\n')
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/'"$rep"',/'
Output:
ram,doc,doc,doc,doc,doc,doc,doc,doc,doc,doc,
shaym,eng,eng,eng,eng,eng,eng,eng,eng,eng,eng,
I had the same problem, but instead of adding all the columns to awk, I just used (to duplicate the 2nd column):
awk -v OFS='\t' '$2=$2"\t"$2' # for tab-delimited files
For CSVs you can just use
awk -F , -v OFS=, '$2=$2","$2'

How can I get 2nd and third column in tab delim file in bash?

I want to use bash to process a tab delimited file. I only need the second column and third to a new file.
cut(1) was made expressly for this purpose:
cut -f 2-3 input.txt > output.txt
Cut is probably the best choice here, second to that is awk
awk -F"\t" '{print $2 "\t" $3}' input > out
expanding on the answer of carl-norum, using only tab as a delimiter, not all blanks:
cut -d$'\t' -f 2-3 input.txt > output.txt
don't put a space between d and $

Resources