How can I add a new line to a large file every n characters in terminal (one liner sed)? - bash

What am I missing here?
file.txt:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
in Terminal:
> sed "s/.\{3\}/&\n/g" < file.txt > new-file.txt
result: new-file.txt
ABCnDEFnGHInJKLnMNOnPQRnSTUnVWXnYZ
Expected Result:
ABC
DEF
...
VWX
YZ

Use sed:
$ sed 's/.../&\n/g' file.txt
Or use grep:
$ grep -oE '.{1,3}' file.txt
result:
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ

$ echo abcdefghi | dd cbs=3 conv=unblock 2>/dev/null
abc
def
ghi

Just with bash:
while read -n 3 chars; do printf "%s\n" "$chars"; done < file.txt > new-file.txt

An option, although maybe not quite correct depending on your input file is the gnu coreutil fold. This will wrap lines so that no line is more than w characters long, e.g.:
$ <<< 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' fold -w3
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ

One way to do it is to explicitly hit the Enter key while typing the sed command:
$ sed 's/.\{3\}/&\
/g' < file.txt > new-file.txt
$ cat new-file.txt
ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ

The following ended up working for me:
perl -0777 -pe 's/(.{3})/\1\n/sg' < file.txt > new-file.txt
Still not sure why the original didn't work.
Thanks for your help.

Related

replace last word in specific line sed

I have file like
abc dog 1.0
abc cat 2.4
abc elephant 1.2
and I want to replace last word from a line which contains 'elephant' with string which I know.
The result should be
abc dog 1.0
abc cat 2.4
abc elephant mystring
I have sed '/.*elephant.*/s/%/%/' $file but what should be instead of '%'?
EDIT:
odd example
abc dogdogdogdog 1.0
abc cat 2.4
abc elephant 1.2
and now try to change last line.
EDIT: To preserve spaces could you please try following.
awk '
match($0,/elephant[^0-9]*/){
val=substr($0,RSTART,RLENGTH-1)
sub("elephant","",val)
$NF=val "my_string"
val=""
}
1
' Input_file
Could you please try following(if you are ok with awk).
awk '/elephant/{$NF="my_string"} 1' Input_file
In case you want to save output into Input_file itself try following.
awk '/elephant/{$NF="my_string"} 1' Input_file > temp_file && mv temp_file Input_file
basic
sed '/elephant/ s/[^[:blank:]]\{1,\}$/mstring/' $file
if some space could be at the end
sed '/elephant/ s/[^[:blank:]]\{1,\}[[:blank:]*$/mystring/' $file
an alternative to do the substitution and preserve the space:
awk '/elephant/{sub(".{"length($NF)"}$","new")}7' file
with your example:
kent$ cat f
abc dog 1.0
abc cat 2.4
abc elephant 1.2
kent$ awk '/elephant/{sub(".{"length($NF)"}$","new")}7' f
abc dog 1.0
abc cat 2.4
abc elephant new
Robustly in any awk:
$ awk '$2=="elephant"{sub(/[^[:space:]]+$/,""); $0=$0 "mystring"} 1' file
abc dog 1.0
abc cat 2.4
abc elephant mystring
Note that unlike the other answers you have so far it will not fail when the target string (elephant) is part of some other string or appears in some other location than the 2nd field or contains any regexp metachars, or when the replacement string contains &, etc.

How to find and merge some specific lines from one file B to another file A in linux with condition that lines in file B can be increase or decrease

File A:
abc
bcd
def
ghi
jkl
File B:
bcd
def
klm
Desired output:
abc
bcd
def
klm
ghi
jkl
Give this awk one-liner a try:
awk '!a[$0]++' fileA fileB > output
It works for your example files.
cat A B | sort -u will remove the repeated ones and do sorts, #Kent 's anwser is more elegant, but still, the output doesn't satisfy your description.

How to get all lines including dates before today from a textfile in the macOS terminal?

In a textfile there are lots of dates and I want to grep or find all the dates before today.
Lines are like abc def ghi:2018-06-20 mno pqr and others without a date. The days are chaotic and not in order. I want to receive all lines of the file including a date before today (not ordered, just as they following in the file).
What I know:
Today = date +%Y-%m-%d and how to save it in a variable $A
Get lines with this date grep $A file.txt
I know how to implement this in a for-loop to get maybe some days of a week. But how can I find all the dates before today? I think I do have to get a comparison like if $A > $B do grep $B file.txt.
Thank you for your help!
[Yes, I searched a lot but I did not find my solution anywhere.]
$ today="$(date "+%s")"
$ input="/tmp/file.txt"
$ cat "${input}"
abc def ghi:2018-06-25 mno pqr
abc def ghi:2018-06-24 mno pqr
abc def ghi:2018-06-23 mno pqr
abc def ghi:2018-06-22 mno pqr
abc def ghi:2018-06-21 mno pqr
abc def ghi:2018-06-20 mno pqr
def ghi:2018-06-20 mno pqr
abc ghi:2018-06-20mno pqr abc
abc def ghi:2017-06-20 mno pqr
abc def2018-06-20 mno pqr
abc def ghi:2018-06-19 mno pqr
def ghi:2018-06-21 mno pqr
abc ghi:2018-07-20 mno pqr
abc def ghi:2018-06-20 mno pqr
abc def2018-05-20 mno pqr
1sss018-05-20 mno pqr
1sss05-20-2018 mno pqr
$ sed -n 's/.*\([[:digit:]]\{4\}-[[:digit:]]\{2\}-[[:digit:]]\{2\}\).*/\1/p' "${input}" \
| sort -u \
| xargs -n1 date -j -f '%Y-%m-%d' '+%s' \
| xargs -n1 -I% awk 'BEGIN{if(%<'${today}'){print %}}' \
| xargs -n1 date -j -f '%s' '+%Y-%m-%d' \
| xargs -n1 -I% grep % $input \
| sort -u
abc def ghi:2017-06-20 mno pqr
abc def ghi:2018-06-19 mno pqr
abc def ghi:2018-06-20 mno pqr
abc def ghi:2018-06-21 mno pqr
abc def ghi:2018-06-22 mno pqr
abc def2018-05-20 mno pqr
abc def2018-06-20 mno pqr
abc ghi:2018-06-20mno pqr abc
def ghi:2018-06-20 mno pqr
def ghi:2018-06-21 mno pqr
$today is the current date in seconds since the epoch, $input is the file you want to parse. sed hunts for dates (without verifying they are real dates, for instance 0000-99-99 would match), the first sort eliminates duplicate input dates, the first xargs/date converts all the found dates into seconds since the epoch, xargs/awk outputs all dates to today, the next xargs/dates converts the date back to "%Y-%d-%m", xargs/grep finds all the preceding dates in the input file, and the last sort eliminates any duplicated lines.
Cool. Now iterate over the dates (for example from today to 6 days ago) and grep the file for each date:
# iterate over i = 0, 1, 2, 3, ..., 6
for i in $(seq 0 6); do
# so substract $i days from today , for eaxmple `date --date="-5 days" +%Y-%m-%d`
A=$(date --date="-$i days" +%Y-%m-%d)
grep "$A" file.txt
# or shorter grep "$(date --date="-$i days" +%Y-%m-%d)" file.txt
done
You can also create one big grep argument and this should work faster:
grep "$(for i in $(seq 0 6); do echo -n "$(date --date="-$i days" +%Y-%m-%d)\|"; done | sed 's/\\|$//')" file.txt
For each date from today to 7 days ago i generate a string that looks ilke %Y-%m-%d\|, then i need to remove the last \| with sed 's/\\|$//'. Then I run grep that looks like grep "2018-06-23\|2018-06-22\|2018-06-21\|<and so on...>" file.txt. The \| is used as or in expressions in grep.
awk is a very powerful scripting tool that can do the job without resorting to multiple processes and pipes.
#!/usr/bin/awk -f
BEGIN {
today = systime()
}
/:[0-9]{4}-[0-9]{2}-[0-9]{2} / {
for(field=1;field<NF;field++) {
if (split($field,b,/\:/) > 1)
gsub(/\-/, " ", b[2])
if (mktime(b[2] " 0 0 0") > 0)
if (mktime(b[2] " 0 0 0") < today)
print $0
}
}
The BEGIN block simply sets the variable today to the current system time.
/:[0-9]{4}-[0-9]{2}-[0-9]{2} / will only process lines that contain date like strings preceded by a colon :
The for loop iterates on all the fields in a line to search for this date like string.
The next couple of lines simply split the string into array to get the date string and replacing all dashes - with space.
Running mktime() on all this date like strings and comparing against today tells us if the line is qualified.
Finally printing the entire line when it qualifies.
Assuming you know what column you're looking for the date in, you can also do this:
awk '$2 < "2020-09-16"' input.txt

BASH: grep characters and replace by the same plus tab

Basically, the only thing I need is to replace two spaces by a tab; this is the query:
abc def ghi K00001 jkl
all the columns are separated by a tab; the K00001 jkl is separated by two spaces. But I want these two spaces to be replaced by a tab.
I cannot just grep all two spaces since other contents have to spaces and they should stay.
My approach would be to grep:
grep '[0-9][0-9][0-9][0-9][0-9] ' file
but I want to replace it to have the same K00001<TAB>jkl
How do I replace by the same string? Can I use variables to store the grep result and then print the modified (tab not spaces) by the same string?
sed -r "s/([A-Z][0-9]{5}) /&\t/" File
or
sed -r "s/([A-Z][0-9]{5})\s{2}/&\t/" File
Example :
AMD$ echo "abc def ghi K00001 jkl" | sed -r "s/([A-Z][0-9]{5}) /&\t/"
abc def ghi K00001 jkl
You can use this sed:
sed -E $'s/([^[:blank:]]) {2}([^[:blank:]])/\\1\t\\2/g' file
Regex ([^[:blank:]]) {2}([^[:blank:]]) makes sure to match 2 spaces surrounded by 2 non-space characters. In replacement we put back surrounding characters using back-references \1 and \2
I would use awk , since with awk no matter if fields are separated by one - two or more spaces i can force output to be with tabs:
$ echo "abc def ghi K00001 jkl" |awk -v OFS="\t" '{$1=$1}1'
abc def ghi K00001 jkl

How to make cat start a new line

I have four files:
one_file.txt
abc | def
two_file.txt
ghi | jkl
three_file.txt
mno | pqr
four_WORD.txt
xyz| xyz
I want to concatenate all of the files ending with "file.txt" (i.e. all except four_WORD.txt) in order to get:
abc | def
ghi | jkl
mno | pqr
To accomplish this, I run:
cat *file.txt > full_set.txt
However, full_set.txt comes out as:
abc | defmno | pqrghi | jkl
Any ideas how to do this correctly and efficiently so that each ends up on its own line? In reality, I need to do the above for a lot of very large files. Thank you in advance for your help.
Try:
awk 1 *file.txt > full_set.txt
This is less efficient than a bare cat but will add an extra \n if missing at the end of each file
Many tools will add newlines if they are missing. Try e.g.
sed '' *file.txt >full_set.txt
but this depends on your sed version. Others to try include Awk, grep -ho '.*' file*.txt and etc.
this works for me:
for file in $(ls *file.txt) ; do cat $file ; echo ; done > full_set.txt
I hope this will help you.
You can loop over each file and do a check to see if the last line ends in a new line, outputting one if it doesn't.
for file in *file.txt; do
cat "$file"
[[ $(tail -c 1 "$file") == "" ]] || echo
done > full_set.txt
You can use one line for loop for this. The following line:
for f in *_file.txt; do (cat "${f}") >> full_set.txt; done
Yields the desired output:
$ cat full_set.txt
abc | def
mno | pqr
ghi | jkl
Also, possible duplicate.
find . -name "*file.txt" | xargs cat > full_set.txt

Resources