Extracting expression from csv field - bash

I'm trying to extract the value that comes after word= in CSV file that looks like this:
1473228800,0.0,word=google.sentence=Android.something=not_set
1480228800,100.0,word=google_analytics.number=not_set.country=US.source=internet
1493228800,0.0,location=NY.word=Android.sentence=not_set.something=not_set.type=gauge
and the output I need is (it's important for me to only print "word" and it's value):
1473228800,0.0,word=google
1480228800,100.0,word=google_analytics
1493228800,0.0,word=Android
I tried using sed and awk, but each gave me soultion for only few of the csv file.
This is my last try using awk:
awk -F "," '{sub(/.*word.*=(.*)\.*/,"word=\1", $3);print $1","$2","$3}'

awk solution:
awk -F, '{match($3,/word=[^.]+/); print $1,$2,substr($3,RSTART,RLENGTH)}' OFS=',' file
The output:
1473228800,0.0,word=google
1480228800,100.0,word=google_analytics
1493228800,0.0,word=Android
match($3,/word=[^.]+/) - to match the needed sequence within the 3rd field
substr($3,RSTART,RLENGTH) - to extract matched sequence from the 3rd field
The match() function sets the predefined variable RSTART to the
index. It also sets the predefined variable RLENGTH to the length in
characters of the matched substring.

try:
awk -F, '{sub(/.*word/,"word",$3);sub(/\..*/,"",$3);print $1,$2,$3}' OFS="," Input_file
Making field separator as , then substituting >8word with string word. Then substituting from DOT to everything with NULL in $3 as we don't need it as per your question. Then printing the first, second and third fields set output field separator as comma then.

Related

Print part of a comma-separated field using AWK

I have a line containing this string:
$DLOAD , 123 , Loadcase name=SUBCASE_1
I am trying to only print SUBCASE_1. Here is my code, but I get a syntax error.
awk -F, '{n=split($3,a,"="); a[n]} {printf(a[1]}' myfile
How can I fix this?
1st solution: In case you want only to get last field(which contains = in it) then with your shown samples please try following
awk -F',[[:space:]]+|=' '{print $NF}' Input_file
2nd solution: OR in case you want to get specifically 3rd field's value after = then try following awk code please. Simply making comma followed by space(s) as field separator and in main program splitting 3rd field storing values into arr array, then printing 2nd item value of arr array.
awk -F',[[:space:]]+' '{split($3,arr,"=");print arr[2]}' Input_file
Possibly the shortest solution would be:
awk -F= '{print $NF}' file
Where you simply use '=' as the field-separator and then print the last field.
Example Use/Output
Using your sample into in a heredoc with the sigil quoted to prevent expansion of $DLOAD, you would have:
$ awk -F= '{print $NF}' << 'eof'
> $DLOAD , 123 , Loadcase name=SUBCASE_1
> eof
SUBCASE_1
(of course in this case it probably doesn't matter whether $DLOAD was expanded or not, but for completeness, in case $DLOAD included another '=' ...)

Filter records based on Text in Unix

I'm trying to extract all the records that matches the text "IN" in the 10th field from this file.
i tried but it's not giving me the accurate results. Any help provided here would be highly appreciated.
awk '$10 == "IN" {print $0}'
input_file: my input file
A1|A2|A3|A4|A5|A6|A7|A8|A9|PK|A11|A13|A14|A15|A16|A17|A18
1|2|3|4|5|6|7|8|9|IN|11|12|13|14|15|16|17|18
AW|BW|CQ|AA|AR|AF|RR|AKL|ASD|US|PP|BN|TY|OL|Q3|M8|I7|V6
AR|BR|CR|A8|AN|AQ|RU|A11|A13|IN|P9P|B0N|T2Y|O4L|Q43|M88|I71|V16
output_file: my output should be
1|2|3|4|5|6|7|8|9|IN|11|12|13|14|15|16|17|18
AR|BR|CR|A8|AN|AQ|RU|A11|A13|IN|P9P|B0N|T2Y|O4L|Q43|M88|I71|V16
all the records that matched "IN" in the 10th field should be filtered.
Since you haven't mentioned the field separator in awk code so by default it makes space as field separator and your Input_file is | pipe delimited so let awk know you should set it up in code.
Could you please try following.
awk -F'|' '$10=="IN"' Input_file
Explanation: Adding explanation for above code too.
awk -F'|' ' ##Setting field separator as |(pipe) for all lines of Input_file.
$10=="IN" ##Checking condition if 10th field is equal to IN here if yes then print the current line.
' Input_file ##Mentioning Input_file name here.

How to set FS to eof?

I want to read whole file not per lines. How to change field separator to eof-symbol?
I do:
awk "^[0-9]+∆DD$1[PS].*$" $(ls -tr)
$1 - param (some integer), .* - message that I want to print. There is a problem: message can contains \n. In that way this code prints only first line of file. How can I scan whole file not per lines?
Can I do this using awk, sed, grep? Script must have length <= 60 characters (include spaces).
Assuming you mean record separator, not field separator, with GNU awk you'd do:
gawk -v RS='^$' '{ print "<" $0 ">" }' file
Replace the print with whatever you really want to do and update your question with some sample input and expected output if you want help with that part too.
The portable way to do this, by the way, is to build up the record line by line and then process it in the END section:
awk '{rec = rec (NR>1?RS:"") $0} END{ print "<" rec ">" }' file
using nf = split(rec,flds) to create fields if necessary.

Filtering data in a text file in bash

I am trying to filter the data in a text file. There are 2 fields in the text file. The first one is text while 2nd one has 3 parts seperated by _. The first part in the second file is date in yyyyMMdd format and the next 2 are string:
xyz yyyyMMdd_abc_lmn
Now I want to filter the lines in the file based on the date in the second field. I have come up with the following awk command but it doesn't seems to work as it is outputting the entire file definitely I am missing something.
Awk command:
awk -F'\t' -v ldate='20140101' '{cdate=substr($2, 1, 8); if( cdate <= ldate) {print $1'\t\t'$2}}' label
Try:
awk -v ldate='20140101' '{split($2,fld,/_/); if(fld[1]<=ldate) print $1,$2}' file
Note:
We are using split function which basically splits the field based on regex provided as the third element and stores the fields in the array defined as second element.
You don't need to set -F'\t unless your input file is tab-delimited. The default value of FS is space, so defining it to tab might throw it off in interpreting $2.
To output with two tabs you can set the OFS variable like:
awk -F'\t' -v OFS='\t\t' -v ldate='20140101' '{split($2,fld,/_/); if(fld[1]<=ldate) print $1,$2}' file
Try this:
awk -v ldate='20140101' 'substr($NF,1,8) <= ldate' label

Cut and replace bash

I have to process a file with data organized like this
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
etc
Columns can have different length but lines always have the same number of columns.
I want to be able to cut a specific column of a given line and change it to the value I want.
For example I'd apply my command and change the file to
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
I know how to select a specific line with sed and then cut the field but I have no idea on how to replace the field with the value I have.
Thanks
Here's a way to do it with awk:
Going with your example, if you wanted to replace the 3rd field of the 1st line:
awk 'BEGIN{FS=OFS=":"} {if (NR==1) {$3 = "XXXX"}; print}' input_file
Input:
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Output:
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Explanation:
awk: invoke the awk command
'...': everything enclosed by single-quotes are instructions to awk
BEGIN{FS=OFS=":"}: Use : as delimiters for both input and output. FS stands for Field Separator. OFS stands for Output Field Separator.
if (NR==1) {$3 = "XXXX"};: If Number of Records (NR) read so far is 1, then set the 3rd field ($3) to "XXXX".
print: print the current line
input_file: name of your input file.
If instead what you are trying to accomplish is simply replace all occurrences of CCC with XXXX in your file, simply do:
sed -i 's/CCC/XXXX/g` input_file
Note that this will also replace partial matches, such as ABCCCDD -> ABXXXXDD
This might work for you (GNU sed):
sed -r 's/^(([^:]*:?){2})CCC/\1XXXX/' file
or
awk -F: -vOFS=: '$3=="CCC"{$3="XXXX"};1' file

Resources