bash, get line form file with condition for particular column - bash

I've got file with two columns, for example line of this file looks like this: username,data
username,20150706
I'm trying to get lines that dates in the second column are older than eg. 20151231.
EDIT, SOLUTION: I want to do it in a loop and read the dates (the second column of the file) then compare it with another date, eg. 20151231. Then if date is let say older than 20151231 write whole line (username,date) to file. And here's how I do it:
date1=20151231
while read -r line;
do
date2=$(echo $line | cut -d "," -f2);
if (( date2 < date1 ));
then
echo "$line" >> filename2;
fi
done < filename
So one little problem solved :)
Thanks in advance for any suggestions.

you can solve this with awk:
awk -F',' -v threshold=20151231 '$2<=threshold' file
hope this helps you!

Related

Use awk to analyze csv file - combined with shell 'date' command in awk

I have a .csv file which has dates and the answer about enjoyable or not:
2019-04-1,enjoyable
2019-04-2,unenjoyable
2019-04-3,unenjoyable
2019-04-4,enjoyable
2019-04-5,unenjoyable
2019-04-6,unenjoyable
2019-04-7,enjoyable
2019-04-8,unenjoyable
2019-04-9,unenjoyable
2019-04-10,enjoyable
2019-04-11,enjoyable
2019-04-12,enjoyable
2019-04-13,unenjoyable
2019-04-14,enjoyable
2019-04-15,unenjoyable
2019-04-16,unenjoyable
2019-04-17,unenjoyable
2019-04-18,enjoyable
2019-04-19,unenjoyable
2019-04-20,unenjoyable
2019-04-21,unenjoyable
2019-04-22,unenjoyable
2019-04-23,unenjoyable
2019-04-24,unenjoyable
2019-04-25,unenjoyable
2019-04-26,unenjoyable
What I want to do is to print the day of the week in the third column seperate by ',' like this:
2019-04-1,enjoyable,2
2019-04-2,unenjoyable,3
I tried:
dates=$(awk '{FS=","}{print $1,$2}' weather_stat.csv')
weeks=$(
for vars in $dates[first_row]
do
echo $(date -j -f '%Y-%m-%d' $vars "+%w")
done
)
merge($dates,$weeks)
The first part of the code works without any problem, but in the second part, I am confused about how to get the data in the first row (so I use dates[first_row] to mean the first row in dates variable) from the variable "dates" so we can apply 'date' method on it
And for the third part, I want to merge these two tables together. I found the 'join' function but it seem to work on two files instead of two variables(I don't want to have any new files during the process)
Could anyone tells me how to get the rows in a variable instead of a file in shell and the way to merge two table-like variables?
As you're learning shell scripting, here's some code to study:
to read your csv file, and get the weekday number for each date in the file:
while IFS=, read -r date rest; do echo "$date,$(date -d "$date" +%w)"; done < file.csv
to join the output of that command with your file:
weekdays=$(while IFS=, read -r date rest; do echo "$date,$(date -d "$date" +%w)"; done < file.csv)
join -t, file.csv <(echo "$weekdays")
or, without needing to store the result in an intermediate variable
join -t, file.csv <(
while IFS=, read -r date rest; do echo "$date,$(date -d "$date" +%w)"; done < file.csv
)
The newlines within the <() are not necessary, but useful for maintainable code.
However, you can see that this is less efficient because you have to process the file twice. With awk you only have to read through the file once.
With GNU awk:
awk' BEGIN{FS=OFS=","}
{ split($1,a,"-")
t=sprintf("%0.4d %0.2d %0.2d 00 00 00",a[1],a[2],a[3]);
print $0,strftime("%w",mktime(t))
}' file.csv
With only your Bourne shell, so less efficient than awk if you have a lot of lines in your CSV file:
while IFS=, read date enjoy; do
date -d "$date" +"$date,$enjoy,%w"
done < your.csv

Trying to create a script that counts the length of a all the reads in a fastq file but getting no return

I am trying go count the length of each read in a fastq file from illumina sequencing and outputting this to a tsv or any sort of file so I can then later also look at this and count the number of reads per file. So I need to cycle down the file and eactract each line that has a read on it (every 4th line) then get its length and store this as an output
num=2
for file in *.fastq
do
echo "counting $file"
function file_length(){
wc -l $file | awk '{print$FNR}'
}
for line in $file_length
do
awk 'NR==$num' $file | chrlen > ${file}read_length.tsv
num=$((num + 4))
done
done
Currently all I get the counting $file and no other output but also no errors
Your script contains a lot of errors in both syntax and algorithm. Please try shellcheck to see what is the problem. The most issue will be the $file_length part.
You may want to call a function file_length() here but it is just
an undefined variable which is evaluated as null in the for loop.
If you just want to count the length of the 4th line of *.fastq files,
please try something like:
for file in *.fastq; do
awk 'NR==4 {print length}' "$file" > "${file}_length.tsv"
done
Or if you want to put the results together in a single tsv file, try:
tsvfile="read_lenth.tsv"
for file in *.fastq; do
echo -n -e "$file\t" >> "$tsvfile"
awk 'NR==4 {print length}' "$file" >> "$tsvfile"
done
Hope this helps.

Split file by date and keep header in Bash

I need to split a TSV file by date using whatever standard CLI tools come with OS X 10.10; e.g. sed, awk, etc. FYI the shell is Bash
The input file has a header row and follows a tab separated format (the date and time is in the first column) — I'm adding "\t" bellow to show the tabs, and "…" to indicate the rows have many more columns:
Transaction Date\t Account Number\t…
9/16/2004 12:00:00 AM\t ABC00147223\t…
9/17/2004 12:00:00 AM\t ABC00147223\t…
10/05/2004 12:00:00 AM\t ABC00147223\t…
The output should be:
A separate file for each unique year AND month (based on the example above I would get 2 output files: 9/2004 and 10/2004)
Maintain the first/header row of the original file
Filename in the form YYYYMM.txt
Thank you for your help.
If you want to do pure in bash shell do as below...
#!/bin/bash
datafile=inputdatafile.dat
ctr=0;
while read line
do
# counter to keep track of line number
ctr=$((ctr + 1))
# skip header line for processing
if [[ $ctr -gt 1 ]];
then
# create filename using date field present in record
vdate=${line%% *}
vday1=${vdate%%/*}
vday=`printf "%02d" $vday1` # day with padding 0
vyear=${vdate##*/} # year
vfilename="${vyear}${vday}.txt" # filname in YYYYMM.txt format
# check if file exists or not then put header record in it
if [ ! -f $vfilename ]; then
head -1 $datafile > $vfilename
fi
# put the record in that file
echo "$line" >> $vfilename
fi
done < $datafile
Not sure how big your data files are but its never a good idea to parse large files using shell scripting instead use other utils like awk, sed, grep, etc for it.
For big files and using nawk / gawk one-liner use as below ... it will do all you need.
# use nawk or gawk if you don't get the expected results using awk
$nawk '{if(NR==1)h=$0;} {if(NR>1){ split($1,a,"/"); fn=sprintf("%04d%02d.txt",a[3],a[1]); if(system( "[ ! -f " fn " ] ")==0)print h >> fn; print >> fn;} }' inputdatafile.dat

appending text to specific line in file bash

So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below:
Original file:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
Expected result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
This is my code:
while read p; do
if [[ $p == "HEA"* ]]
then
IFS=',' read -ra ADDR <<< "$p"
echo ${#ADDR[#]}
arrayCount=${#ADDR[#]}
if [ "${arrayCount}" -eq 16 ];
then
sed -i "/$p/ s/\$/,xx/g" $f
fi
fi
done <$f
Result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
What im doing wrong? I'm sure its something small but i cant find it..
It can be done using awk:
awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
-F, sets input field separator as comma
NF==16 is the condition that says execute block inside { and } if # of fields is 16
$0 = $0 FS "xx" appends xx at end of line
1 is the default awk action that means print the output
For using sed answer should be in the following:
Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first.
Use the special char & to denote the matched string
The sed statement should look like the following:
sed -i "${line_number}s/.*/&xx/"
I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.

Bash script to convert a date and time column to unix timestamp in .csv

I am trying to create a script to convert two columns in a .csv file which are date and time into unix timestamps. So i need to get the date and time column from each row, convert it and insert it into an additional column at the end containing the timestamp.
Could anyone help me? So far i have discovered the unix command to convert any give time and date to unixstamp:
date -d "2011/11/25 10:00:00" "+%s"
1322215200
I have no experience with bash scripting could anyone get me started?
Examples of my columns and rows:
Columns: Date, Time,
Row 1: 25/10/2011, 10:54:36,
Row 2: 25/10/2011, 11:15:17,
Row 3: 26/10/2011, 01:04:39,
Thanks so much in advance!
You don't provide an exerpt from your csv-file, so I'm using this one:
[foo.csv]
2011/11/25;12:00:00
2010/11/25;13:00:00
2009/11/25;19:00:00
Here's one way to solve your problem:
$ cat foo.csv | while read line ; do echo $line\;$(date -d "${line//;/ }" "+%s") ; done
2011/11/25;12:00:00;1322218800
2010/11/25;13:00:00;1290686400
2009/11/25;19:00:00;1259172000
(EDIT: Removed an uneccessary variable.)
(EDIT2: Altered the date command so the script actually works.)
this should do the job:
awk 'BEGIN{FS=OFS=", "}{t=$1" "$2; "date -d \""t"\" +%s"|getline d; print $1,$2,d}' yourCSV.csv
note
you didn't give any example. and you mentioned csv, so I assume that the column separator in your file should be "comma".
test
kent$ echo "2011/11/25, 10:00:00"|awk 'BEGIN{FS=OFS=", "}{t=$1" "$2; "date -d \""t"\" +%s"|getline d; print $1,$2,d}'
2011/11/25, 10:00:00, 1322211600
Now two imporvements:
First: No need for cat foo.csv, just stream that via < foo.csv into the while loop.
Second: No need for echo & tr to create the date stringformat. Just use bash internal pattern and substitute and do it inplace
while read line ; do echo ${line}\;$(date -d "${line//;/ }" +'%s'); done < foo.csv

Resources