Can't remove line break from $0 in awk - bash

I'm trying to parse some data from a dat file using awk, however I cant get rid of a linebreak that is being added to $0. I've tried gsub(/\n/,""), but it's done nothing.
Example below:
from dat file:
<A>1
<B>2
running:
awk '
BEGIN {FS = ">"; ORS=""; OFS=""}
/<A>/ {printf $2; printf$2}
' file.dat
currently gives me:
1
1
when I want:
11

I think you just want
awk -F '>' '/<A>/ { print $2 $2 }' file.dat
This being said, your code should work, as well; the problem was that your input file contained DOS style newlines, which can be removed with, for example, dos2unix. See How to convert DOS/Windows newline (CRLF) to Unix newline (\n) in a Bash script? for more ways to do it.

Related

Convert hex to dec using awk

I have a large csv files , where few columns values are in hex. I need to convert them into decimal. The CSV files are very big. So If I process each row , then it takes a lot of time to execute the script. So I want to know how this can be done parallely by using awk command
If I process the code line by line it works.
I process the files like this.
while read -r line;
do
start_time=`echo "$line"|awk -F "," '{ print $1 }'`
end_time=`echo "$line"|awk -F "," '{ print $2 }'`
st_time=$((16#$start_time))
en_time=$((16#$end_time))
Then I echo the required fields to output file.
Sample Input file:
16a91f90539,16a91f931a9,e,0
16a91f90bab,16a91f931a9,e,0
Expected output:
1557227177273,1557227188649,e,0
1557227178923,1557227188649,e,0
I need to know how the statement "((16#$start_time))" , can be used in awk.
I tried
awk -F',' '{OFS=",";}{print '"(($1#16))"','$en_time',$3'
But this syntax does not work.
With GNU awk for strtonum() you don't need to spawn multiple shells on each input line:
$ awk 'BEGIN{FS=OFS=","} {for (i=1;i<=2;i++) $i=strtonum("0x"$i)} 1' file
1557227177273,1557227188649,e,0
1557227178923,1557227188649,e,0
You can execute system calls from withnig awk with system(...). Don't forget to close the command afterwards.
awk -F "," '{ cmd=sprintf("echo $((0x%s))\n", $1); system(cmd); close(cmd); }' input
(for some reason the system call does not work with $((16#...)) on my system, but does work with $((0x...)))
With getline you can assign the echo'ed output to a variable. See https://www.gnu.org/software/gawk/manual/html_node/Getline-Notes.html to get you started.

List of extensions of filenames in bash script in one line

I currently have the following line of code:
ls /some/dir/prefix* | sed -e 's/prefix.//' | tr '\n' ' '
Which does achieve what I want it to do:
Get list of files starting with prefix
Remove path and prefix from each string
Remove newlines and replace with spaces for later processing.
For example:
/some/dir/prefix.hello
/some/dir/prefix.world
Should become
hello world
But I feel like there's a nicer way of doing this. Is there a better way to do this in one line?
Here is a two-liner using just built-ins that does it:
fnames=(some/dir/prefix*)
echo "${fnames[#]##*.}"
And here's how this works:
fnames=(some/dir/prefix*) creates an array with all the files starting with prefix and avoids all the problems that come with parsing ls
echo "${fnames[#]##*.}" is a combination of two parameter expansions: ${fnames[#]} prints all array elements, and the ##*. part removes the longest match of anything that ends with . from each array element, leaving just the extension
If you're hell-bent on a one-liner, just join the two commands with &&.
passing ls output to external programs is not recommended, following bash solution may help you here.
for file in prefix*; do echo ${file##*.}; done
Adding a non-one liner form of solution too now.
for file in prefix*
do
echo ${file##*.}
done
Here is a very simple Awk one-liner to achieve this :
awk -F. '{$0=FILENAME; printf $NF" "; nextfile}' /some/dir/prefix*
It essentially does the following :
-F.: Set the field separator FS to a .. This way $NF represents the extension.
$0=FILENAME: Ignore the current record and set it to FILENAME, reparse everything this way.
print $NF; nextfile : print the extension and go to the next file.
The problem with this is that the file still reads a record of the current file. If that file is empty this will fail.
To make this work with empty files, you could use the gawk extension BEGINFILE
awk -F. 'BEGINFILE{$0=FILENAME; printf $NF" "; nextfile}' /some/dir/prefix*
Or you can loop over all the arguments :
awk -F. 'BEGIN{for(i in ARGV){$0=ARGV[i]; printf $NF" "};exit}' /some/dir/prefix*
One approach with awk:
ls /some/dir/prefix* | awk -F"." '{printf "%s ", $2} END {print ""}'
It might qualify as being "nicer" because there's only one command the output is piped through?!

"grep" a csv file including multi-lines fields?

file.csv:
XA90;"standard"
XA100;"this is
the multi-line"
XA110;"other standard"
I want to grep the "XA100" entry like this:
grep XA100 file.csv
to obtain this result:
XA100;"this is
the multi-line"
but grep return only one line:
XA100;"this is
source.csv contains 3 entries.
The "XA100" entry contain a multi-line field.
And grep doesn't seem to be the right tool to "grep" CSV file including multilines fields.
Do you know the way to make the job ?
Edit: the real world file contains many columns. The researched term can be in any column (not at begin of line, nor at the begin of field). All fields are encapsulated by ". Any field can contain a multi-line, from 1 line to any, and this cannot be predicted.
Give this line a try:
awk '/^XA100;/{p=1}p;p&&/"$/{p=0}' file
I extended your example a bit:
kent$ cat f
XA90;"standard"
XA100;"this is
the
multi-
line"
XA110;"other standard"
kent$ awk '/^XA100;/{p=1}p;p&&/"$/{p=0}' f
XA100;"this is
the
multi-
line"
In the comments you mention: In the real world file, each line start with ". I assume they also end with " and present you this:
Test file:
$ cat file
"single line"
"multi-
lined"
Code and outputs:
$ awk 'BEGIN{RS=ORS="\"\n"} /single/' file
"single line"
$ awk 'BEGIN{RS=ORS="\"\n"} /m/' file
"multi-
lined"
You can also parametrize the search:
$ awk -v s="multi" 'BEGIN{RS=ORS="\"\n"} match($0,s)' file
"multi-
lined"
try:
Solution 1:
awk -v RS="XA" 'NR==3{gsub(/$\n$/,"");print RS $0}' Input_file
Making Record separator as string XA then looking for line 3rd here and then globally substituting the $\n$(which is to remove the extra line at the end of the line) with NULL. Then printing the Record Separator with the current line.
Solution 2:
awk '/XA100/{print;getline;while($0 !~ /^XA/){print;getline}}' Input_file
Looking for string XA100 then printing the current line and using getline to go to next line, using while loop then which will run and print the lines until a line is starting from XA.
If this file was exported from MS-Excel or similar then lines end with \r\n while the newlines inside quotes are just \ns so then all you need is:
$ awk -v RS='\r\n' '/XA100/' file
XA100;"this is
the multi-line"
The above uses GNU awk for multi-char RS. On some platforms, e.g. cygwin, you'll have to add -v BINMODE=3 so gawk sees the \rs rather than them getting stripped by underlying C primitives.
Otherwise, it's extremely hard to parse CSV files in general without a real CSV parser (which awk currently doesn't have but is in the works for GNU awk) but you could do this (again with GNU awk for multi-char RS):
$ cat file
XA90;"standard"
XA100;"this is
the multi-line"
XA110;"other standard"
$ awk -v RS="\"[^\"]*\"" -v ORS= '{gsub(/\n/," ",RT); print $0 RT}' file
XA90;"standard"
XA100;"this is the multi-line"
XA110;"other standard"
to replace all newlines within quotes with blank chars and then process it as regular 1-line-per-record file.
Using PS response, this works for the small example:
sed 's/^X/\n&/' file.csv | awk -v RS= '/XA100/ {print}'
For my real world CSV file, with many columns, with researched term anywhere, with unknown count of multi-lines, with characters " replaced by "", with multi-lines lines beginning with ", with all fields encapsulated by ", this works. Note the exclusion of the second character " in sed part:
sed 's/^"[^"]/\n&/' file.csv | awk -v RS= '/RESEARCH_TERM/ {print}'
Because first column of any entry cannot start with "". First column allways looks like "XXXXXXXXX", where X is any character but ".
Thank you all for so much responses, maybe others solutions are working depending the CSV file format you use.

Unix Output of command to text file

I'm reading from a file called IMSI.txt using the following command:
$awk 'NR>2' IMSI.txt | awk '{print $NF}'
I need the output of this command to go to a new file called NEW.txt
So i did this :
$awk 'NR>2' IMSI.txt | awk '{print $NF}' > NEW.txt
This worked fine, but when i open the file, the output from the command are on the same line.
The new line is being neglected.
As an example, if i get an output in the console
222
111
333
i open the text file and i get
222111333
How can i fix that ?
Thank you for your help :)
PS: i am using Cygwin on windows
I am guessing your (Windows-y) editor would like to see Carriage Returns at the end of lines, not Linefeeds (which is what awk outputs). Change your print to this
print $NF "\r"
so it looks like this altogether:
awk 'NR>2 {print $NF "\r"}' IMSI.txt
Simply set your ORS to "\r\n" which allows Awk to generate DOS line endings for every output. I believe this is the most natural solution:
awk -v ORS="\r\n" '{print $NF}' > NEW.txt
Tested on a virtual XP system with Cygwin.
From Awk's manual:
ORS The output record separator, by default a newline.

How do I print a field from a pipe-separated file?

I have a file with fields separated by pipe characters and I want to print only the second field. This attempt fails:
$ cat file | awk -F| '{print $2}'
awk: syntax error near line 1
awk: bailing out near line 1
bash: {print $2}: command not found
Is there a way to do this?
Or just use one command:
cut -d '|' -f FIELDNUMBER
The key point here is that the pipe character (|) must be escaped to the shell. Use "\|" or "'|'" to protect it from shell interpertation and allow it to be passed to awk on the command line.
Reading the comments I see that the original poster presents a simplified version of the original problem which involved filtering file before selecting and printing the fields. A pass through grep was used and the result piped into awk for field selection. That accounts for the wholly unnecessary cat file that appears in the question (it replaces the grep <pattern> file).
Fine, that will work. However, awk is largely a pattern matching tool on its own, and can be trusted to find and work on the matching lines without needing to invoke grep. Use something like:
awk -F\| '/<pattern>/{print $2;}{next;}' file
The /<pattern>/ bit tells awk to perform the action that follows on lines that match <pattern>.
The lost-looking {next;} is a default action skipping to the next line in the input. It does not seem to be necessary, but I have this habit from long ago...
The pipe character needs to be escaped so that the shell doesn't interpret it. A simple solution:
$ awk -F\| '{print $2}' file
Another choice would be to quote the character:
$ awk -F'|' '{print $2}' file
Another way using awk
awk 'BEGIN { FS = "|" } ; { print $2 }'
And 'file' contains no pipe symbols, so it prints nothing. You should either use 'cat file' or simply list the file after the awk program.

Resources