How to redirect tail of multiple files to a new file with newlines? - bash

I suspect this is an easy one. I have a directory of files and I need the last line from each file grouped into a new file.
I used:
tail -q myFile_seed*.csv > output.csv
But the output file is one long line. Is there a simple way to redirect with newlines so that each file is on its own line?

It appears that your files do not have the usual \r\n appended to the final line of the file. In this case, you'll need to handle each file separately, rather than have tail process them all at once.
for f in myfile_seed*.csv; do
tail -n 1 "$f"
printf "\n"
done > output.csv

You can do:
tail -q -n 1 myFile_seed*.csv > output.csv

Other option would be in one line;
ls myFile_seed*.csv | xargs -ifile sh -c "tail -n 1 file; echo " > output.csv

Related

Combine multiple files into one including the file name

I have been looking around trying to combine multiple text files into including the name of the file.
My current file content is:
1111,2222,3333,4444
What I'm after is:
File1,1111,2222,3333,4444
File1,1111,2222,3333,4445
File1,1111,2222,3333,4446
File1,1111,2222,3333,4447
File2,1111,2222,3333,114444
File2,1111,2222,3333,114445
File2,1111,2222,3333,114446
I found multiple example how to combine them all but nothing to combine them including the file name.
Could you please try following. Considering that your Input_file names extensions are .csv.
awk 'BEGIN{OFS=","} {print FILENAME,$0}' *.csv > output_file
After seeing OP's comments if file extensions are .txt then try:
awk 'BEGIN{OFS=","} {print FILENAME,$0}' *.txt > output_file
Assuming all your files have a .txt extension and contain only one line as in the example, you can use the following code:
for f in *.txt; do echo "$f,$(cat "$f")"; done > output.log
where output.log is the output file.
Well, it works:
printf "%s\n" *.txt |
xargs -n1 -d $'\n' bash -c 'xargs -n1 -d $'\''\n'\'' printf "%s,%s\n" "$1" <"$1"' --
First output a newline separated list of files.
Then for each file xargs execute sh
Inside sh execute xargs for each line of file
and it executes printf "%s,%s\n" <filename> for each line of input
Tested in repl.
Solved using grep "" *.txt -I > $filename.

need to clean file via SED or GREP

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.
What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.
Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )
try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

File will be modified after while loop reads file in Bash

I need to pass each line in a text file to a program. What I did is using while loop to read each line of the file and then passing each line to the program. My script is
tail -n +2 output.txt | while IFS=' ' read ln
do
line=(${ln})
prog $(line) > newout
grep "runtime\|opt" newout | sed -e 's/ $/\n/' > res.txt
done
I did not modify the output.txt at all. However, its content will be changed and only heading is left. Why is the script doing that?

How could I redirect file name into counts by tab using one line commands in bash?

I have some files in fasta format and want to counts their reads and would like to have output in file names and their corresponding counts.
input file names:
1.fa
2.fa
3.fa
...
I tried:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
Problem:
It gives me out.txt but double file names and their counts by ':' separated. However, I need a tab and unique file names.
1.fa:7323580
1.fa:7323580
2.fa:5591179
2.fa:5591179
...
Suggested solution
grep -c '>' *.fa | sed 's/:/'$'\t'/ > out.txt
The $'\t\' is a Bash-ism called ANSI C Quoting.
Analysis of what went wrong
Your code is:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
It isn't a good idea to parse the output of the ls command. However, if your file names are well behaved (roughly, in the portable filename character set, which is [-A-Za-z._]), you'll be reasonably OK.
Your grep command, though, is confused. It is:
grep -c '>' $i > echo $i >> out.txt
That could be written more clearly as:
grep -c '>' $i $i > echo >> out.txt
This means 'count the number of lines containing > in $i, and then in $i again, and send the output first to a file echo, and then append to out.txt. Since the append overrides the redirection, the file echo is empty. You get the file name included in the output because there are two files to search; with only one file, you wouldn't get the file name too. (One way to ensure you get file names with regular (not -c or -l) grep is to scan /dev/null too. Many versions of grep also provide options to get the name explicitly, but POSIX doesn't mandate one. BSD grep uses -H; so does GNU grep.)
So, that's why you got the double file names and entries in your output.
Try this:
for i in $(ls -t -v *.fa)
do
c=$(grep -c '>' $i | awk -F: '{print $2}')
echo "$i: $c" >> out.txt
done

Unix: merge many files, while deleting first line of all files

I have >100 files that I need to merge, but for each file the first line has to be removed. What is the most efficient way to do this under Unix? I suspect it's probably a command using cat and sed '1d'. All files have the same extension and are in the same folder, so we probably could use *.extension to point to the files. Many thanks!
Assuming your filenames are sorted in the order you want your files appended, you can use:
ls *.extension | xargs -n 1 tail -n +2
EDIT: After Sorin and Gilles comments about the possible dangers of piping ls output, you could use:
find . -name "*.extension" | xargs -n 1 tail -n +2
Everyone has to be complicated. This is really easy:
tail -q -n +2 file1 file2 file3
And so on. If you have a large number of files you can load them in to an array first:
list=(file1 file2 file3)
tail -q -n +2 "${list[#]}"
All the files with a given extension in the current directory?
list=(*.extension)
tail -q -n +2 "${list[#]}"
Or just
tail -q -n +2 *.extension
Just append each file after removing the first line.
#!/bin/bash
DEST=/tmp/out
FILES=space separated list of files
echo "" >$DEST
for FILE in $FILES
do
sed -e'1d' $FILE >>$DEST
done
tail outputs the last lines of a file. You can tell it how many lines to print, or how many lines to omit at the beginning (-n +N where N is the number of the first line to print, counting from 1 — so +2 omits one line). With GNU utilities (i.e. under Linux or Cygwin), FreeBSD or other systems that have the -q option:
tail -q -n +2 *.extension
tail prints a header before each file, and -q is not standard. If your implementation doesn't have it, or to be portable, you need to iterate over the files.
for x in *.extension; do tail -n +2 <"$x"; done
Alternatively, you can call Awk, which has a way to identify the first line of each file. This is likely to be faster if you have a lot of small files and slower if you have many large files.
awk 'FNR != 1' *.extension
ls -1 file*.txt | xargs nawk 'FNR!=1'

Resources