I have four files:
one_file.txt
abc | def
two_file.txt
ghi | jkl
three_file.txt
mno | pqr
four_WORD.txt
xyz| xyz
I want to concatenate all of the files ending with "file.txt" (i.e. all except four_WORD.txt) in order to get:
abc | def
ghi | jkl
mno | pqr
To accomplish this, I run:
cat *file.txt > full_set.txt
However, full_set.txt comes out as:
abc | defmno | pqrghi | jkl
Any ideas how to do this correctly and efficiently so that each ends up on its own line? In reality, I need to do the above for a lot of very large files. Thank you in advance for your help.
Try:
awk 1 *file.txt > full_set.txt
This is less efficient than a bare cat but will add an extra \n if missing at the end of each file
Many tools will add newlines if they are missing. Try e.g.
sed '' *file.txt >full_set.txt
but this depends on your sed version. Others to try include Awk, grep -ho '.*' file*.txt and etc.
this works for me:
for file in $(ls *file.txt) ; do cat $file ; echo ; done > full_set.txt
I hope this will help you.
You can loop over each file and do a check to see if the last line ends in a new line, outputting one if it doesn't.
for file in *file.txt; do
cat "$file"
[[ $(tail -c 1 "$file") == "" ]] || echo
done > full_set.txt
You can use one line for loop for this. The following line:
for f in *_file.txt; do (cat "${f}") >> full_set.txt; done
Yields the desired output:
$ cat full_set.txt
abc | def
mno | pqr
ghi | jkl
Also, possible duplicate.
find . -name "*file.txt" | xargs cat > full_set.txt
Related
In a textfile there are lots of dates and I want to grep or find all the dates before today.
Lines are like abc def ghi:2018-06-20 mno pqr and others without a date. The days are chaotic and not in order. I want to receive all lines of the file including a date before today (not ordered, just as they following in the file).
What I know:
Today = date +%Y-%m-%d and how to save it in a variable $A
Get lines with this date grep $A file.txt
I know how to implement this in a for-loop to get maybe some days of a week. But how can I find all the dates before today? I think I do have to get a comparison like if $A > $B do grep $B file.txt.
Thank you for your help!
[Yes, I searched a lot but I did not find my solution anywhere.]
$ today="$(date "+%s")"
$ input="/tmp/file.txt"
$ cat "${input}"
abc def ghi:2018-06-25 mno pqr
abc def ghi:2018-06-24 mno pqr
abc def ghi:2018-06-23 mno pqr
abc def ghi:2018-06-22 mno pqr
abc def ghi:2018-06-21 mno pqr
abc def ghi:2018-06-20 mno pqr
def ghi:2018-06-20 mno pqr
abc ghi:2018-06-20mno pqr abc
abc def ghi:2017-06-20 mno pqr
abc def2018-06-20 mno pqr
abc def ghi:2018-06-19 mno pqr
def ghi:2018-06-21 mno pqr
abc ghi:2018-07-20 mno pqr
abc def ghi:2018-06-20 mno pqr
abc def2018-05-20 mno pqr
1sss018-05-20 mno pqr
1sss05-20-2018 mno pqr
$ sed -n 's/.*\([[:digit:]]\{4\}-[[:digit:]]\{2\}-[[:digit:]]\{2\}\).*/\1/p' "${input}" \
| sort -u \
| xargs -n1 date -j -f '%Y-%m-%d' '+%s' \
| xargs -n1 -I% awk 'BEGIN{if(%<'${today}'){print %}}' \
| xargs -n1 date -j -f '%s' '+%Y-%m-%d' \
| xargs -n1 -I% grep % $input \
| sort -u
abc def ghi:2017-06-20 mno pqr
abc def ghi:2018-06-19 mno pqr
abc def ghi:2018-06-20 mno pqr
abc def ghi:2018-06-21 mno pqr
abc def ghi:2018-06-22 mno pqr
abc def2018-05-20 mno pqr
abc def2018-06-20 mno pqr
abc ghi:2018-06-20mno pqr abc
def ghi:2018-06-20 mno pqr
def ghi:2018-06-21 mno pqr
$today is the current date in seconds since the epoch, $input is the file you want to parse. sed hunts for dates (without verifying they are real dates, for instance 0000-99-99 would match), the first sort eliminates duplicate input dates, the first xargs/date converts all the found dates into seconds since the epoch, xargs/awk outputs all dates to today, the next xargs/dates converts the date back to "%Y-%d-%m", xargs/grep finds all the preceding dates in the input file, and the last sort eliminates any duplicated lines.
Cool. Now iterate over the dates (for example from today to 6 days ago) and grep the file for each date:
# iterate over i = 0, 1, 2, 3, ..., 6
for i in $(seq 0 6); do
# so substract $i days from today , for eaxmple `date --date="-5 days" +%Y-%m-%d`
A=$(date --date="-$i days" +%Y-%m-%d)
grep "$A" file.txt
# or shorter grep "$(date --date="-$i days" +%Y-%m-%d)" file.txt
done
You can also create one big grep argument and this should work faster:
grep "$(for i in $(seq 0 6); do echo -n "$(date --date="-$i days" +%Y-%m-%d)\|"; done | sed 's/\\|$//')" file.txt
For each date from today to 7 days ago i generate a string that looks ilke %Y-%m-%d\|, then i need to remove the last \| with sed 's/\\|$//'. Then I run grep that looks like grep "2018-06-23\|2018-06-22\|2018-06-21\|<and so on...>" file.txt. The \| is used as or in expressions in grep.
awk is a very powerful scripting tool that can do the job without resorting to multiple processes and pipes.
#!/usr/bin/awk -f
BEGIN {
today = systime()
}
/:[0-9]{4}-[0-9]{2}-[0-9]{2} / {
for(field=1;field<NF;field++) {
if (split($field,b,/\:/) > 1)
gsub(/\-/, " ", b[2])
if (mktime(b[2] " 0 0 0") > 0)
if (mktime(b[2] " 0 0 0") < today)
print $0
}
}
The BEGIN block simply sets the variable today to the current system time.
/:[0-9]{4}-[0-9]{2}-[0-9]{2} / will only process lines that contain date like strings preceded by a colon :
The for loop iterates on all the fields in a line to search for this date like string.
The next couple of lines simply split the string into array to get the date string and replacing all dashes - with space.
Running mktime() on all this date like strings and comparing against today tells us if the line is qualified.
Finally printing the entire line when it qualifies.
Assuming you know what column you're looking for the date in, you can also do this:
awk '$2 < "2020-09-16"' input.txt
I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt
Here are two files where I need to eliminate the data that they do not have in common:
a.txt:
hello world
tom tom
super hero
b.txt:
hello dolly 1
tom sawyer 2
miss sunshine 3
super man 4
I tried:
grep -f a.txt b.txt >> c.txt
And this:
awk '{print $1}' test1.txt
because I need to check only if the first word of the line exists in the two files (even if not at the same line number).
But then what is the best way to get the following output in the new file?
output in c.txt:
hello dolly 1
tom sawyer 2
super man 4
Use awk where you iterate over both files:
$ awk 'NR == FNR { a[$1] = 1; next } a[$1]' a.txt b.txt
hello dolly 1
tom sawyer 2
super man 4
NR == FNR is only true for the first file making { a[$1] = 1; next } only run on said file.
Use sed to generate a sed script from the input, then use another sed to execute it.
sed 's=^=/^=;s= .*= /p=' a.txt | sed -nf- b.txt
The first sed turns your a.txt into
/^hello /p
/^tom /p
/^super /p
which prints (p) whenever a line contains hello, tom, or super at the beginning of line (^) followed by a space.
This combines grep, cut and sed with process substitution:
$ grep -f <(cut -d ' ' -f 1 a.txt | sed 's/^/^/') b.txt
hello dolly 1
tom sawyer 2
super man 4
The output of the process substitution is this (piping to cat -A to show spaces):
$ cut -d ' ' -f 1 a.txt | sed 's/^/^/;s/$/ /' | cat -A
^hello $
^tom $
^super $
We then use this as input for grep -f, resulting in the above.
If your shell doesn't support process substitution, but your grep supports reading from stdin with the -f option (it should), you can use this instead:
$ cut -d ' ' -f 1 a.txt | sed 's/^/^/;s/$/ /' | grep -f - b.txt
hello dolly 1
tom sawyer 2
super man 4
I have a file say abc containing records like:
$cat xyz
ABC
ABCABC
ABCABCABC
I want to cut first pattern so result should be like:
AC
ACABC
ACABCABC
I am trying to cut pattern using awk like:
$ cat xyz|awk -F 'B' '{print $1,$2}'
A CA
A CA
A CA
Of course, B is deliminator so i am getting above result. How could i do that?
Thanks
I understand you want to delete first B in each line. If so, this will work:
sed 's/B//' xyx
Output:
AC
ACABC
ACABCABC
If you want the file to be replaced, add -i
sed -i 's/B//' xyx
I see you tried to edit my answer to add a new question - note that you have to do it updating your answer or writing in the comments.
Thanks and if i have one more case that i want to delete first pattern
only if i have more than one repeated pattern like:
$cat xyz
ABC
ABCABC
ABCABCABC
Output should be:
ABC
ACABC
ACABCABC
$cat xy
This can be a way to do it:
while read line
do
if [ `echo $line | grep -o "B" | wc -l` -ge 2 ]
then
echo $line | sed 's/B//'
else
echo $line
fi
done < xyz
Output:
ABC
ACABC
ACABCABC
What am I missing here?
file.txt:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
in Terminal:
> sed "s/.\{3\}/&\n/g" < file.txt > new-file.txt
result: new-file.txt
ABCnDEFnGHInJKLnMNOnPQRnSTUnVWXnYZ
Expected Result:
ABC
DEF
...
VWX
YZ
Use sed:
$ sed 's/.../&\n/g' file.txt
Or use grep:
$ grep -oE '.{1,3}' file.txt
result:
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
$ echo abcdefghi | dd cbs=3 conv=unblock 2>/dev/null
abc
def
ghi
Just with bash:
while read -n 3 chars; do printf "%s\n" "$chars"; done < file.txt > new-file.txt
An option, although maybe not quite correct depending on your input file is the gnu coreutil fold. This will wrap lines so that no line is more than w characters long, e.g.:
$ <<< 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' fold -w3
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
One way to do it is to explicitly hit the Enter key while typing the sed command:
$ sed 's/.\{3\}/&\
/g' < file.txt > new-file.txt
$ cat new-file.txt
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ
The following ended up working for me:
perl -0777 -pe 's/(.{3})/\1\n/sg' < file.txt > new-file.txt
Still not sure why the original didn't work.
Thanks for your help.