How to get all lines including dates before today from a textfile in the macOS terminal? - macos

In a textfile there are lots of dates and I want to grep or find all the dates before today.
Lines are like abc def ghi:2018-06-20 mno pqr and others without a date. The days are chaotic and not in order. I want to receive all lines of the file including a date before today (not ordered, just as they following in the file).
What I know:
Today = date +%Y-%m-%d and how to save it in a variable $A
Get lines with this date grep $A file.txt
I know how to implement this in a for-loop to get maybe some days of a week. But how can I find all the dates before today? I think I do have to get a comparison like if $A > $B do grep $B file.txt.
Thank you for your help!
[Yes, I searched a lot but I did not find my solution anywhere.]

$ today="$(date "+%s")"
$ input="/tmp/file.txt"
$ cat "${input}"
abc def ghi:2018-06-25 mno pqr
abc def ghi:2018-06-24 mno pqr
abc def ghi:2018-06-23 mno pqr
abc def ghi:2018-06-22 mno pqr
abc def ghi:2018-06-21 mno pqr
abc def ghi:2018-06-20 mno pqr
def ghi:2018-06-20 mno pqr
abc ghi:2018-06-20mno pqr abc
abc def ghi:2017-06-20 mno pqr
abc def2018-06-20 mno pqr
abc def ghi:2018-06-19 mno pqr
def ghi:2018-06-21 mno pqr
abc ghi:2018-07-20 mno pqr
abc def ghi:2018-06-20 mno pqr
abc def2018-05-20 mno pqr
1sss018-05-20 mno pqr
1sss05-20-2018 mno pqr
$ sed -n 's/.*\([[:digit:]]\{4\}-[[:digit:]]\{2\}-[[:digit:]]\{2\}\).*/\1/p' "${input}" \
| sort -u \
| xargs -n1 date -j -f '%Y-%m-%d' '+%s' \
| xargs -n1 -I% awk 'BEGIN{if(%<'${today}'){print %}}' \
| xargs -n1 date -j -f '%s' '+%Y-%m-%d' \
| xargs -n1 -I% grep % $input \
| sort -u
abc def ghi:2017-06-20 mno pqr
abc def ghi:2018-06-19 mno pqr
abc def ghi:2018-06-20 mno pqr
abc def ghi:2018-06-21 mno pqr
abc def ghi:2018-06-22 mno pqr
abc def2018-05-20 mno pqr
abc def2018-06-20 mno pqr
abc ghi:2018-06-20mno pqr abc
def ghi:2018-06-20 mno pqr
def ghi:2018-06-21 mno pqr
$today is the current date in seconds since the epoch, $input is the file you want to parse. sed hunts for dates (without verifying they are real dates, for instance 0000-99-99 would match), the first sort eliminates duplicate input dates, the first xargs/date converts all the found dates into seconds since the epoch, xargs/awk outputs all dates to today, the next xargs/dates converts the date back to "%Y-%d-%m", xargs/grep finds all the preceding dates in the input file, and the last sort eliminates any duplicated lines.

Cool. Now iterate over the dates (for example from today to 6 days ago) and grep the file for each date:
# iterate over i = 0, 1, 2, 3, ..., 6
for i in $(seq 0 6); do
# so substract $i days from today , for eaxmple `date --date="-5 days" +%Y-%m-%d`
A=$(date --date="-$i days" +%Y-%m-%d)
grep "$A" file.txt
# or shorter grep "$(date --date="-$i days" +%Y-%m-%d)" file.txt
done
You can also create one big grep argument and this should work faster:
grep "$(for i in $(seq 0 6); do echo -n "$(date --date="-$i days" +%Y-%m-%d)\|"; done | sed 's/\\|$//')" file.txt
For each date from today to 7 days ago i generate a string that looks ilke %Y-%m-%d\|, then i need to remove the last \| with sed 's/\\|$//'. Then I run grep that looks like grep "2018-06-23\|2018-06-22\|2018-06-21\|<and so on...>" file.txt. The \| is used as or in expressions in grep.

awk is a very powerful scripting tool that can do the job without resorting to multiple processes and pipes.
#!/usr/bin/awk -f
BEGIN {
today = systime()
}
/:[0-9]{4}-[0-9]{2}-[0-9]{2} / {
for(field=1;field<NF;field++) {
if (split($field,b,/\:/) > 1)
gsub(/\-/, " ", b[2])
if (mktime(b[2] " 0 0 0") > 0)
if (mktime(b[2] " 0 0 0") < today)
print $0
}
}
The BEGIN block simply sets the variable today to the current system time.
/:[0-9]{4}-[0-9]{2}-[0-9]{2} / will only process lines that contain date like strings preceded by a colon :
The for loop iterates on all the fields in a line to search for this date like string.
The next couple of lines simply split the string into array to get the date string and replacing all dashes - with space.
Running mktime() on all this date like strings and comparing against today tells us if the line is qualified.
Finally printing the entire line when it qualifies.

Assuming you know what column you're looking for the date in, you can also do this:
awk '$2 < "2020-09-16"' input.txt

Related

How to find and merge some specific lines from one file B to another file A in linux with condition that lines in file B can be increase or decrease

File A:
abc
bcd
def
ghi
jkl
File B:
bcd
def
klm
Desired output:
abc
bcd
def
klm
ghi
jkl
Give this awk one-liner a try:
awk '!a[$0]++' fileA fileB > output
It works for your example files.
cat A B | sort -u will remove the repeated ones and do sorts, #Kent 's anwser is more elegant, but still, the output doesn't satisfy your description.

Replace a word of a line if matched

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

Print into one file the query result of a database file

I have two files:
abc
ghi
and the second (aka database file)
abc 123
def 456
ghi 789
and I want to query the database file to print the second column into the second column of the first file if there is a match
So my output would be
abc 123
ghi 789
logically, I understand what I have to do, but I lack the commands in bash for it...
my attempt was to use join with the -1 but I do not understand how to implement it...
what's wrong with join?
$ cat 1
abc
ghi
$ cat 2
abc 123
def 456
ghi 789
$ join 1 2
abc 123
ghi 789
then if you want to store it somewhere just redirect the stdout.
join is a little overkill here (as it requires sorting) because file1 has just one column. Can you not use grep -f?
grep -Fwf file1 file2
-F treats the content of file1 as strings, not patterns
-w looks for the whole word to match

How to make cat start a new line

I have four files:
one_file.txt
abc | def
two_file.txt
ghi | jkl
three_file.txt
mno | pqr
four_WORD.txt
xyz| xyz
I want to concatenate all of the files ending with "file.txt" (i.e. all except four_WORD.txt) in order to get:
abc | def
ghi | jkl
mno | pqr
To accomplish this, I run:
cat *file.txt > full_set.txt
However, full_set.txt comes out as:
abc | defmno | pqrghi | jkl
Any ideas how to do this correctly and efficiently so that each ends up on its own line? In reality, I need to do the above for a lot of very large files. Thank you in advance for your help.
Try:
awk 1 *file.txt > full_set.txt
This is less efficient than a bare cat but will add an extra \n if missing at the end of each file
Many tools will add newlines if they are missing. Try e.g.
sed '' *file.txt >full_set.txt
but this depends on your sed version. Others to try include Awk, grep -ho '.*' file*.txt and etc.
this works for me:
for file in $(ls *file.txt) ; do cat $file ; echo ; done > full_set.txt
I hope this will help you.
You can loop over each file and do a check to see if the last line ends in a new line, outputting one if it doesn't.
for file in *file.txt; do
cat "$file"
[[ $(tail -c 1 "$file") == "" ]] || echo
done > full_set.txt
You can use one line for loop for this. The following line:
for f in *_file.txt; do (cat "${f}") >> full_set.txt; done
Yields the desired output:
$ cat full_set.txt
abc | def
mno | pqr
ghi | jkl
Also, possible duplicate.
find . -name "*file.txt" | xargs cat > full_set.txt

How can I add a new line to a large file every n characters in terminal (one liner sed)?

What am I missing here?
file.txt:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
in Terminal:
> sed "s/.\{3\}/&\n/g" < file.txt > new-file.txt
result: new-file.txt
ABCnDEFnGHInJKLnMNOnPQRnSTUnVWXnYZ
Expected Result:
ABC
DEF
...
VWX
YZ
Use sed:
$ sed 's/.../&\n/g' file.txt
Or use grep:
$ grep -oE '.{1,3}' file.txt
result:
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
$ echo abcdefghi | dd cbs=3 conv=unblock 2>/dev/null
abc
def
ghi
Just with bash:
while read -n 3 chars; do printf "%s\n" "$chars"; done < file.txt > new-file.txt
An option, although maybe not quite correct depending on your input file is the gnu coreutil fold. This will wrap lines so that no line is more than w characters long, e.g.:
$ <<< 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' fold -w3
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
One way to do it is to explicitly hit the Enter key while typing the sed command:
$ sed 's/.\{3\}/&\
/g' < file.txt > new-file.txt
$ cat new-file.txt
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
The following ended up working for me:
perl -0777 -pe 's/(.{3})/\1\n/sg' < file.txt > new-file.txt
Still not sure why the original didn't work.
Thanks for your help.

Resources