delete all line after a specific date - bash

I have a lot of *.csv files. I want to delete the content after a specific line. I will delete all lines after 20031231
How do I solve this problem with some lines of a shell script?
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
Test,20040101,000100,0.73342,0.744318

quick and dirty but without any other info about constraint
sed '1,/20031231/p;d' YourFile

If you want to use a shell script, the best is to use awk. This will do the trick:
awk 'BEGIN {FS=","} {if ($2 == "20031231") print $0}' input.csv > output.csv
This code will write to a different file only the lines that have 20031231.

ignores empty lines and unmatched data
awk file:
$ cat awk.awk
{
if($2<="20031231" && $0!=""){
print $0
}else{
next
}
}
execution:
$ awk -F',' -f awk.awk input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
one liner:
$ awk -F',' '{if($2<="20031231" && $0!=""){print $0}else{next}}' input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818

with Miller (http://johnkerl.org/miller/doc/)
mlr --nidx --fs "," filter '$2>20031231' input
gives you
Test,20040101,000100,0.73342,0.744318

With awk please try:
awk -F, '$2<=20031231' input.csv

Related

how to discard the last field of the content of a file using awk command

how to discard the last field using awk
list.txt file contains data like below,
Ram/45/simple
Gin/Run/657/No/Sand
Ram/Hol/Sin
Tan/Tin/Bun
but I require output like below,
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
tried the following command but it prints only the last field
cat list.txt |awk -F '/' '{print $(NF)}'
45
No
Hol
Tin
With GNU awk, you could try following.
awk 'BEGIN{FS=OFS="/"} NF--' Input_file
OR with any awk try following.
awk 'BEGIN{FS=OFS="/"} match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
This simple awk should work:
awk '{sub(/\/[^/]*$/, "")} 1' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
Or even this simpler sed should also work:
sed 's~/[^/]*$~~' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin

Using awk to search for a line that starts with but also contains a string

I have a file that has multiple lines that starts with a keyword. I only want to modify one of them and it's easy to distinguish the two. I want the one that is under the [dbinfo] section. The domain name is static so I know that won't change.
awk -F '=' '$1 ~ /^dbhost/ {print $NF};' myfile.txt
myfile.txt
[ual]
path=/web/
dbhost=ez098sf
[dbinfo]
dbhost=ec0001.us-east-1.localdomain
dbname=ez098sf_default
dbpass=XXXXXX
You can use this awk command to first check for presence of [dbinfo] section and then modify dbhost parameter:
awk -v h='newhost' 'BEGIN{FS=OFS="="}
$0 == "[dbinfo]" {sec=1} sec && $1 == "dbhost"{$2 = h; sec=0} 1' file
[ual]
path=/web/
dbhost=ez098sf
[dbinfo]
dbhost=newhost
dbname=ez098sf_default
dbpass=XXXXXX
You want to utilize a little bit of a state machine here:
awk -F '=' '
$0 ~ /^\[.*\]/ {in_db_info=($0=="[dbinfo]"}
$0 ~ /^dbhost/{if (in_db_info) print $2;}' myfile.txt
You can also do it with sed:
sed '/\[dbinfo\]/,/\[/s/\(^dbhost=\).*/\1domain.com/' myfile.txt

rearranging columns in a csv as new lines

I have the following csv:
1,host1,group1,group2
2,host2,group3,group4
3,host3,group5
4,host4,group6,group7,group8
I want to achieve the following:
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8
How do I do this using linux command line?
$ awk -F , -v OFS=, '{for(i=3;i<=NF;i++) print $1,$2,$i}' data
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8
awk 'BEGIN{FS=","}
{nf=NF;count=3;
while(nf-2>0){
printf "%s,%s,%s\n",$1,$2,$count;
count++;nf--
}
}' your_file
also would give you the desired result.
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8

how can I insert a character at a certain position in a csv line

How should I go about inserting a character at a certain point in a csv line? For instance, if I had the following:
1,2,3,4,5,6,7
How could I insert ,,,,, at the spot where the 5 (fifth field) is, so it would look like
1,2,3,4,,,,,,5,6,7
I found a link for how to do this for java, but unfortunately I did not have much luck finding out how to do it with bash. Any help would be much appreciated, thanks!
You can use awk to change a specific field:
awk -F"," '{OFS=","; a=$5; $5=",,,,,",a; print $0}' file
The idea is to update the field 5 with the desired values and then print the whole line.
echo "1,2,3,4,5,6,7" | awk -F"," '{a=$5; $5=",,,,,"a; OFS=","; print}'
would print:
1,2,3,4,,,,,,5,6,7
awk -F, 'BEGIN{OFS=","}{$5=",,,,,"$5;print}' your_file
tested below:
> echo "1,2,3,4,5,6" | awk -F, 'BEGIN{OFS=","}{$5=",,,,,"$5;print}'
1,2,3,4,,,,,,5,6
>
or you can do it using perl:
> echo "1,2,3,4,5,6" | perl -F, -lane '$F[4]=~s/^/,,,,,/g;print join(",",#F)'
1,2,3,4,,,,,,5,6
>

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources