Awk strings enclosed in brackets - bash

I'm trying to make a script to make reading logs easier. I'm having trouble extracting a string enclosed in brackets.
I want to extract the thread ID of a log which looks like this:
[CURRENT_DATE][THREAD_ID][PROCESS_NAME]Some random text here
I have tried this but it prints the CURRENT_DATE:
awk -F '[][]' '{print $2}'
If I use print $3 it prints the Some random text here part.
Is there any way that I could somehow read the string enclosed in brackets?

You may use this awk:
s='[CURRENT_DATE][THREAD_ID][PROCESS_NAME]Some random text here'
awk -F '\\]\\[' '{print $2}' <<< "$s"
THREAD_ID
-F '\\]\\[' will make text ][ as delimiter.

How about this? (Note that multiple character delimiters seem not to be available in GNU awk 4 respectively in the awk version the OP is using.)
pattern='[CURRENT_DATE][THREAD_ID][PROCESS_NAME]Some random text here'
echo $pattern
awk -F '[' '{print substr($3, 1, length($3)-2)}' <<< "$pattern"

Different versions of awk behave in different ways. Without knowing what you're running, it's difficult to say why your existing code behaves as it does.
You already know that with a field separator of [][] or just [, you have an empty field at the beginning of each line. Instead, I'd try this:
awk -F']' '{gsub(/\[/,""); print $2}' input.log
This simply strips out the left-square-bracket and uses its fellow as your field delimiter. The advantage of using ] instead of [ is that it makes $1 your first field.

Related

List of extensions of filenames in bash script in one line

I currently have the following line of code:
ls /some/dir/prefix* | sed -e 's/prefix.//' | tr '\n' ' '
Which does achieve what I want it to do:
Get list of files starting with prefix
Remove path and prefix from each string
Remove newlines and replace with spaces for later processing.
For example:
/some/dir/prefix.hello
/some/dir/prefix.world
Should become
hello world
But I feel like there's a nicer way of doing this. Is there a better way to do this in one line?
Here is a two-liner using just built-ins that does it:
fnames=(some/dir/prefix*)
echo "${fnames[#]##*.}"
And here's how this works:
fnames=(some/dir/prefix*) creates an array with all the files starting with prefix and avoids all the problems that come with parsing ls
echo "${fnames[#]##*.}" is a combination of two parameter expansions: ${fnames[#]} prints all array elements, and the ##*. part removes the longest match of anything that ends with . from each array element, leaving just the extension
If you're hell-bent on a one-liner, just join the two commands with &&.
passing ls output to external programs is not recommended, following bash solution may help you here.
for file in prefix*; do echo ${file##*.}; done
Adding a non-one liner form of solution too now.
for file in prefix*
do
echo ${file##*.}
done
Here is a very simple Awk one-liner to achieve this :
awk -F. '{$0=FILENAME; printf $NF" "; nextfile}' /some/dir/prefix*
It essentially does the following :
-F.: Set the field separator FS to a .. This way $NF represents the extension.
$0=FILENAME: Ignore the current record and set it to FILENAME, reparse everything this way.
print $NF; nextfile : print the extension and go to the next file.
The problem with this is that the file still reads a record of the current file. If that file is empty this will fail.
To make this work with empty files, you could use the gawk extension BEGINFILE
awk -F. 'BEGINFILE{$0=FILENAME; printf $NF" "; nextfile}' /some/dir/prefix*
Or you can loop over all the arguments :
awk -F. 'BEGIN{for(i in ARGV){$0=ARGV[i]; printf $NF" "};exit}' /some/dir/prefix*
One approach with awk:
ls /some/dir/prefix* | awk -F"." '{printf "%s ", $2} END {print ""}'
It might qualify as being "nicer" because there's only one command the output is piped through?!

How can I determine the number of fields in a CSV, from the shell?

I have a well-formed CSV file, which may or may not have a header line; and may or may not have quoted data. I want to determine the number of columns in it, using the shell.
Now, if I can be sure there are no quoted commas in the file, the following seems to work:
x=$(tail -1 00-45-19-tester-trace.csv | grep -o , | wc -l); echo $((x + 1))
but what if I can't make that assumption? That is, what if I can't assume a comma is always a field separator? How do I do it then?
If it helps, you're allowed to make the assumption of there being no quoted quotes (i.e. \"s between within quoted strings); but better not to make that one either.
If you cannot make any optimistic assumptions about the data, then there won't be a simple solution in Bash. It's not trivial to parse a general CSV format with possible embedded newlines and embedded separators. You're better off not writing that in bash, but using an existing proper CSV parse. For example Python has one built in its standard library.
If you can assume that there are no embedded newlines and no embedded separators, than it's simple to split in commas using awk:
awk -F, '{ print NF; exit }' input.csv
-F, tells awk to use comma as the field separator, and the automatic NF variable is the number of fields on the current line.
If you want to allow embedded separators, but you can assume no embedded double quotes, then you can eliminate the embedded separators with a simple filter, before piping to the same awk as earlier:
head -n 1 input.csv | sed -e 's/"[^"]*"//g' | awk ...
Note that both of these examples use the first line to decide the number of fields. If the input has a header line, this should work quite well, as the header should not contain embedded newlines
count fields in first row, then verify all rows have same number
CNT=$(head -n1 hhdata.csv | awk -F ',' '{print NF}')
cat hhdata.csv | awk -F ',' '{print NF}' | grep -v $CNT
Doesn't cope with embedded commas but will highlight if they exist
If File has not double quotes then use below command:
awk -F"," '{ print NF }' filename| sort -u
If File has every column enclosed with double quotes then use below command:
awk -F, '{gsub(/"[^"]*"/,x);print NF}' filename | sort -u

How do I seperate a link to get the end of a URL in shell?

I have some data that looks like this
"thumbnailUrl": "http://placehold.it/150/adf4e1"
I want to know how I can get the trailing part of the URL, I want the output to be
adf4e1
I was trying to grep when starting with / and ending with " but I'm only a beginner in shell scripting and need some help.
I came up with a quick and dirty solution, using grep (with perl regex) and cut:
$ cat file
"thumbnailUrl": "http://placehold.it/150/adf4e1"
"anotherUrl": "http://stackoverflow.com/questions/3979680"
"thumbnailUrl": "http://facebook.com/12f"
"randortag": "http://google.com/this/is/how/we/roll/3fk19as1"
$ cat file | grep -o '/\w*"$' | cut -d'/' -f2- | cut -d'"' -f1
adf4e1
3979680
12f
3fk19as1
We could kill this with a thousand little cuts, or just one blow from Awk:
awk -F'[/"]' '{ print $(NF-1); }'
Test:
$ echo '"thumbnailUrl": "http://placehold.it/150/adf4e1"' \
| awk -F'[/"]' '{ print $(NF-1); }'
adf4e1
Filter thorugh Awk using double quotes and slashes as field separators. This means that the trailing part ../adf4e1" is separated as {..}</>{adf4e1}<">{} where curly braces denote fields and angle brackets separators. The Awk variable NF gives the 1-based number of fields and so $NF is the last field. That's not the one we want, because it is blank; we want $(NF-1): the second last field.
"Golfed" version:
awk -F[/\"] '$0=$(NF-1)'
If the original string is coming from a larger JSON object, use something like jq to extract the value you want.
For example:
$ jq -n '{thumbnail: "http://placehold.it/150/adf4e1"}' |
> jq -r '.thumbnail|split("/")[-1]'
adf4e1
(The first command just generates a valid JSON object representing the original source of your data; the second command parses it and extracts the desired value. The split function splits the URL into an array, from which you only care about the last element.)
You can also do this purely in bash using string replacement and substring removal if you wrap your string in single quotes and assign it to a variable.
#!/bin/bash
string='"thumbnailUrl": "http://placehold.it/150/adf4e1"'
string="${string//\"}"
echo "${string##*/}"
adf4e1 #output
You can do that using 'cut' command in linux. Cut it using '/' and keep the last cut. Try it, its fun!
Refer http://www.thegeekstuff.com/2013/06/cut-command-examples

Unix cut: Print same Field twice

Say I have file - a.csv
ram,33,professional,doc
shaym,23,salaried,eng
Now I need this output (pls dont ask me why)
ram,doc,doc,
shayam,eng,eng,
I am using cut command
cut -d',' -f1,4,4 a.csv
But the output remains
ram,doc
shyam,eng
That means cut can only print a Field just one time. I need to print the same field twice or n times.
Why do I need this ? (Optional to read)
Ah. It's a long story. I have a file like this
#,#,-,-
#,#,#,#,#,#,#,-
#,#,#,-
I have to covert this to
#,#,-,-,-,-,-
#,#,#,#,#,#,#,-
#,#,#,-,-,-,-
Here each '#' and '-' refers to different numerical data. Thanks.
You can't print the same field twice. cut prints a selection of fields (or characters or bytes) in order. See Combining 2 different cut outputs in a single command? and Reorder fields/characters with cut command for some very similar requests.
The right tool to use here is awk, if your CSV doesn't have quotes around fields.
awk -F , -v OFS=, '{print $1, $4, $4}'
If you don't want to use awk (why? what strange system has cut and sed but no awk?), you can use sed (still assuming that your CSV doesn't have quotes around fields). Match the first four comma-separated fields and select the ones you want in the order you want.
sed -e 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/\1,\4,\4/'
$ sed 's/,.*,/,/; s/\(,.*\)/\1\1,/' a.csv
ram,doc,doc,
shaym,eng,eng,
What this does:
Replace everything between the first and last comma with just a comma
Repeat the last ",something" part and tack on a comma. VoilĂ !
Assumptions made:
You want the first field, then twice the last field
No escaped commas within the first and last fields
Why do you need exactly this output? :-)
using perl:
perl -F, -ane 'chomp($F[3]);$a=$F[0].",".$F[3].",".$F[3];print $a."\n"' your_file
using sed:
sed 's/\([^,]*\),.*,\(.*\)/\1,\2,\2/g' your_file
As others have noted, cut doesn't support field repetition.
You can combine cut and sed, for example if the repeated element is at the end:
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/&&,/'
Output:
ram,doc,doc,
shaym,eng,eng,
Edit
To make the repetition variable, you could do something like this (assuming you have coreutils available):
n=10
rep=$(seq $n | sed 's:.*:\&:' | tr -d '\n')
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/'"$rep"',/'
Output:
ram,doc,doc,doc,doc,doc,doc,doc,doc,doc,doc,
shaym,eng,eng,eng,eng,eng,eng,eng,eng,eng,eng,
I had the same problem, but instead of adding all the columns to awk, I just used (to duplicate the 2nd column):
awk -v OFS='\t' '$2=$2"\t"$2' # for tab-delimited files
For CSVs you can just use
awk -F , -v OFS=, '$2=$2","$2'

How do I print a field from a pipe-separated file?

I have a file with fields separated by pipe characters and I want to print only the second field. This attempt fails:
$ cat file | awk -F| '{print $2}'
awk: syntax error near line 1
awk: bailing out near line 1
bash: {print $2}: command not found
Is there a way to do this?
Or just use one command:
cut -d '|' -f FIELDNUMBER
The key point here is that the pipe character (|) must be escaped to the shell. Use "\|" or "'|'" to protect it from shell interpertation and allow it to be passed to awk on the command line.
Reading the comments I see that the original poster presents a simplified version of the original problem which involved filtering file before selecting and printing the fields. A pass through grep was used and the result piped into awk for field selection. That accounts for the wholly unnecessary cat file that appears in the question (it replaces the grep <pattern> file).
Fine, that will work. However, awk is largely a pattern matching tool on its own, and can be trusted to find and work on the matching lines without needing to invoke grep. Use something like:
awk -F\| '/<pattern>/{print $2;}{next;}' file
The /<pattern>/ bit tells awk to perform the action that follows on lines that match <pattern>.
The lost-looking {next;} is a default action skipping to the next line in the input. It does not seem to be necessary, but I have this habit from long ago...
The pipe character needs to be escaped so that the shell doesn't interpret it. A simple solution:
$ awk -F\| '{print $2}' file
Another choice would be to quote the character:
$ awk -F'|' '{print $2}' file
Another way using awk
awk 'BEGIN { FS = "|" } ; { print $2 }'
And 'file' contains no pipe symbols, so it prints nothing. You should either use 'cat file' or simply list the file after the awk program.

Resources