Filtering data in a text file in bash - bash

I am trying to filter the data in a text file. There are 2 fields in the text file. The first one is text while 2nd one has 3 parts seperated by _. The first part in the second file is date in yyyyMMdd format and the next 2 are string:
xyz yyyyMMdd_abc_lmn
Now I want to filter the lines in the file based on the date in the second field. I have come up with the following awk command but it doesn't seems to work as it is outputting the entire file definitely I am missing something.
Awk command:
awk -F'\t' -v ldate='20140101' '{cdate=substr($2, 1, 8); if( cdate <= ldate) {print $1'\t\t'$2}}' label

Try:
awk -v ldate='20140101' '{split($2,fld,/_/); if(fld[1]<=ldate) print $1,$2}' file
Note:
We are using split function which basically splits the field based on regex provided as the third element and stores the fields in the array defined as second element.
You don't need to set -F'\t unless your input file is tab-delimited. The default value of FS is space, so defining it to tab might throw it off in interpreting $2.
To output with two tabs you can set the OFS variable like:
awk -F'\t' -v OFS='\t\t' -v ldate='20140101' '{split($2,fld,/_/); if(fld[1]<=ldate) print $1,$2}' file

Try this:
awk -v ldate='20140101' 'substr($NF,1,8) <= ldate' label

Related

AWK write to a file based on number of fields in csv

I want to iterate over a csv file and discard the rows while writing to a file which doesnt have all columns in a row.
I have an input file mtest.csv like this
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP2##TestProcess2##TestDevice2
TestIP3##TestProcess3##TestDevice3##TestID3
But I am trying to only write those row records where all the 4 columns are present. The output should not have the TestIP2 column complete row as it has 3 columns.
Sample output should look like this:
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
I used to do like this to get all the columns earlier but it writes the TestIP2 row as well which has 3 columns
awk -F "\##" '{print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50)}' mtest.csv >output2.csv
But when I try to ensure that it writes to file when all 4 columns are present, it doesn't work
awk -F "\##", 'NF >3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50); exit}' mtest.csv >output2.csv
You are making things harder than it need to be. All you need to do is check NF==4 to output any records containing four fields. Your total awk expression would be:
awk -F'##' NF==4 < mtest.csv
(note: the default action by awk is print so there is no explicit print required.)
Example Use/Output
With your sample input in mtest.csv, you would receive:
$ awk -F'##' NF==4 < mtest.csv
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
Thanks David and vukung
Both your solutions are okay.I want to write to a file so that i can trim the length of each field as well
I think this below statement works out
awk -F "##" 'NF>3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,2)"\##"substr($4,1,3)}' mtest.csv >output2.csv

Script for changing CSV into Key Value (KV) format

I have a CSV file with data as below:
row_identifier,DBNAME,tblsps_name,Cur_size,Max_size,Used,Free,Percentage
tablespace,MRETF,RERETOSB15_DATA,51200,45600,14284,31316,31
tablespace,MRETF,SPOTLIGHT_DATA,500,2000,259,1741,13
tablespace,MRETF,DDLAUDITING,25,25,2,23,8
I want the output in the following format:
tablespace,MRETF,tblsps_name:RERETOSB15_DATA,Cur_size:51200,Max_size:45600,Used:14284,Free:31316,Percentage:31
tablespace,MRETF,tblsps_name:SPOTLIGHT_DATA,Cur_size:500,Max_size:2000,Used:259,Free:1741,Percentage:13
and so on..
Is this possible to get the output like the above key:value format?
Next time at least pretend that you tried something ;-)
awk -F"," 'FNR > 1 {print $1","$2",tblsps_name:"$3",Cur_size:"$4",Max_size:"$5",Used:"$6",Free:"$7",Percentage:"$8}' your.csv
-F"," is field separator, FNR > 1 skip first header line, $1 is first column and so on

Extracting expression from csv field

I'm trying to extract the value that comes after word= in CSV file that looks like this:
1473228800,0.0,word=google.sentence=Android.something=not_set
1480228800,100.0,word=google_analytics.number=not_set.country=US.source=internet
1493228800,0.0,location=NY.word=Android.sentence=not_set.something=not_set.type=gauge
and the output I need is (it's important for me to only print "word" and it's value):
1473228800,0.0,word=google
1480228800,100.0,word=google_analytics
1493228800,0.0,word=Android
I tried using sed and awk, but each gave me soultion for only few of the csv file.
This is my last try using awk:
awk -F "," '{sub(/.*word.*=(.*)\.*/,"word=\1", $3);print $1","$2","$3}'
awk solution:
awk -F, '{match($3,/word=[^.]+/); print $1,$2,substr($3,RSTART,RLENGTH)}' OFS=',' file
The output:
1473228800,0.0,word=google
1480228800,100.0,word=google_analytics
1493228800,0.0,word=Android
match($3,/word=[^.]+/) - to match the needed sequence within the 3rd field
substr($3,RSTART,RLENGTH) - to extract matched sequence from the 3rd field
The match() function sets the predefined variable RSTART to the
index. It also sets the predefined variable RLENGTH to the length in
characters of the matched substring.
try:
awk -F, '{sub(/.*word/,"word",$3);sub(/\..*/,"",$3);print $1,$2,$3}' OFS="," Input_file
Making field separator as , then substituting >8word with string word. Then substituting from DOT to everything with NULL in $3 as we don't need it as per your question. Then printing the first, second and third fields set output field separator as comma then.

How to set FS to eof?

I want to read whole file not per lines. How to change field separator to eof-symbol?
I do:
awk "^[0-9]+∆DD$1[PS].*$" $(ls -tr)
$1 - param (some integer), .* - message that I want to print. There is a problem: message can contains \n. In that way this code prints only first line of file. How can I scan whole file not per lines?
Can I do this using awk, sed, grep? Script must have length <= 60 characters (include spaces).
Assuming you mean record separator, not field separator, with GNU awk you'd do:
gawk -v RS='^$' '{ print "<" $0 ">" }' file
Replace the print with whatever you really want to do and update your question with some sample input and expected output if you want help with that part too.
The portable way to do this, by the way, is to build up the record line by line and then process it in the END section:
awk '{rec = rec (NR>1?RS:"") $0} END{ print "<" rec ">" }' file
using nf = split(rec,flds) to create fields if necessary.

Cut and replace bash

I have to process a file with data organized like this
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
etc
Columns can have different length but lines always have the same number of columns.
I want to be able to cut a specific column of a given line and change it to the value I want.
For example I'd apply my command and change the file to
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
I know how to select a specific line with sed and then cut the field but I have no idea on how to replace the field with the value I have.
Thanks
Here's a way to do it with awk:
Going with your example, if you wanted to replace the 3rd field of the 1st line:
awk 'BEGIN{FS=OFS=":"} {if (NR==1) {$3 = "XXXX"}; print}' input_file
Input:
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Output:
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Explanation:
awk: invoke the awk command
'...': everything enclosed by single-quotes are instructions to awk
BEGIN{FS=OFS=":"}: Use : as delimiters for both input and output. FS stands for Field Separator. OFS stands for Output Field Separator.
if (NR==1) {$3 = "XXXX"};: If Number of Records (NR) read so far is 1, then set the 3rd field ($3) to "XXXX".
print: print the current line
input_file: name of your input file.
If instead what you are trying to accomplish is simply replace all occurrences of CCC with XXXX in your file, simply do:
sed -i 's/CCC/XXXX/g` input_file
Note that this will also replace partial matches, such as ABCCCDD -> ABXXXXDD
This might work for you (GNU sed):
sed -r 's/^(([^:]*:?){2})CCC/\1XXXX/' file
or
awk -F: -vOFS=: '$3=="CCC"{$3="XXXX"};1' file

Resources