How to merge multiple files in order and append filename at the end in bash - bash

I have multiple files like this:
BOB_1.brother_bob12.txt
BOB_2.brother_bob12.txt
..
BOB_35.brother_bob12.txt
How to join these files in order from {1..36} and append filename at the end of each row? I have tried:
for i in *.txt; do sed 's/$/ '"$i"'/' $i; done > outfile #joins but not in order
cat $(for((i=1;i<38;i++)); do echo -n "BOB_${i}.brother_bob12.txt "; done) # joins in order but no filename at the end
file sample:
1 345 378 1 3 4 5 C T
1 456 789 -1 2 3 4 A T

Do not do cat $(....). You may just:
for ((i=1;i<38;i++)); do
f="BOB_${i}.brother_bob12.txt"
sed "s/$/ $f/" "$f"
done
You may also do:
printf "%s\n" bob.txt BOB_{1..38}.brother_bob12.txt |
xargs -d'\n' -i sed 's/$/ {}/' '{}'

You may use:
for i in {1..36}; do
fn="BOB_${i}.brother_bob12.txt"
[[ -f $fn ]] && awk -v OFS='\t' '{print $0, FILENAME}' "$fn"
done > output
Note that it will insert FILENAME as the last field in every record. If this is not what you want then show your expected output in question.

This might work for you (GNU sed);
sed -n 'p;F' BOB_{1..36}.brother_bob12.txt | sed 'N;s/\n/ /' >newFile
Used 2 invocations of sed, the first to append the file name after each line of each file. The second to replace the newline between each 2 lines by a space.

Related

Estimate number of lines in a file and insert that value as first line

I have many files for which I have to estimate the number of lines in each file and add that value as first line. To estimate that, I used something like this:
wc -l 000600.txt | awk '{ print $1 }'
However, no success on how to do it for all files and then to add the value corresponding to each file as first line.
An example:
a.txt b.txt c.txt
>>print a
15
>> print b
22
>>print c
56
Then 15, 22 and 56 should be added respectively to: a.txt b.txt and c.txt
I appreciate the help.
You can add a pattern for example (LINENUM) in first line of file and then use the following script.
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i 's/LINENUM/LINENUM:{}/' a.txt
or just use from this script:
wc -l a.txt | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' a.txt
This way you can add the line number as the first line for all *.txt files in current directory. Also using that group command here would be faster than inplace editing commands, in case of large files. Do not modify spaces or semicolons into the grouping.
for f in *.txt; do
{ wc -l < "$f"; cat "$f"; } > "${f}.tmp" && mv "${f}.tmp" "$f"
done
For iterate over the all file you can add use from this script.
for f in `ls *` ; do if [ -f $f ]; then wc -l $f | awk 'BEGIN {FS =" ";} { print $1;}' | xargs -I {} sed -i '1s/^/LINENUM:{}\n/' $f ; fi; done
This might work for you (GNU sed):
sed -i '1h;1!H;$!d;=;x' file1 file2 file3 etc ...
Store each file in memory and insert the last lines line number as the file size.
Alternative:
sed -i ':a;$!{N;ba};=' file?

sed: interpolating variables in timestamp format

I would like to use sed to extract all the lines between two specific strings from a file.
I need to do this on a script and my two strings are variables.
The strings will be in a sort of time stamp format, which means they can be something like:
2014/01/01 or 2014/01/01 08:01
I was trying with something like:
sed -n '/$1/,/$2/p' $file
or even
sed -n '/"$1"/,/"$2"/p' $file
with no luck, tried also to replace / as delimiter with ;.
I'm pretty sure the problem is due to the / and blank in input variables, but I can't figure out the proper syntax.
The syntax to use alternate regex delimiters is:
\ c regexp c
Match lines matching the regular expression regexp. The c may be any character.
https://www.gnu.org/software/sed/manual/sed.html#Addresses
So, pick one of
sed -n '\#'"$1"'#,\#'"$2"'#p' "$file"
sed -n "\\#$1#,\\#$2#p" "$file"
sed -n "$( printf '\#%s#,\#%s#p' "$1" "$2" )" "$file"
or awk
awk -v start="$1" -v end="$1" '$0 ~ start {p=1}; p; $0 ~ end {p=0}' "$file"
From the first $1 to the last $2:
sed -n "\\#$1#,\$p" "$file" | tac | sed -n "\\#$2#,\$p" | tac
This prints from the first $1 to the end, reverses the lines, prints from the first $2 to the new end, and reverses the lines again.
An example: from the first "5" to the last "7"
$ set -- 5 7
$ seq 20 | sed -n "\\#$1#,\$p" | tac | sed -n "\\#$2#,\$p" | tac
5
6
7
8
9
10
11
12
13
14
15
16
17
Try using double quotes instead of single ones.
sed -n "/$1/,/$2/p" $file

extract multiple lines of a file unix

I have a file A with 400,000 lines. I have another file B that has a bunch of line numbers.
File B:
-------
98
101
25012
10098
23489
I have to extract those line numbers specified in file B from file A. That is I want to extract lines 98,101,25012,10098,23489 from file A. How to extract these lines in the following cases.
File B is a explicit file.
File B is arriving out of a pipe. For eg., grep -n pattern somefile.txt is giving me the file B.
I wanted to use see -n 'x'p fileA. However, I don't know how to give the 'x' from a file. Also, I don't to how to pipe the value of 'x' from a command.
sed can print the line numbers you want:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p'
bar
If you want multiple lines:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p;3p'
bar
baz
To transform a set of lines to a sed command like this, use sed for beautiful sedception:
$ printf $'98\n101' | sed -e 's/$/;/'
98;
101;
Putting it all together:
sed -ne "$(sed -e 's/$/p;/' B)" A
Testing:
$ cat A
1
22
333
4444
$ cat B
1
3
$ sed -ne "$(sed -e 's/$/p;/' B)" A
1
333
QED.
awk fits this task better:
fileA in file case:
awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB fileA
fileA content from pipe:
cat fileA|awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB -
oh, you want FileB in file or from pipe, then same awk cmd:
awk '...' fileB fileA
and
cat fileB|awk '...' - fileA

awk parse filename and add result to the end of each line

I have number of files which have similar names like
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out
etc.
I need to get number before .csv(1 or 2) from the file name and put it into end of every line in file with TAB separator.
I have written this code, it finds number that I need, but i do not know how to put this number into file. There is space in the filename, my script breaks because of it.
Also I am not sure, how to send to script list of files. Now I am working only with one file.
My code:
#!/bin/sh
string="DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out"
out=$(echo $string | awk 'BEGIN {FS="_"};{print substr ($7,0,1)}')
awk ' { print $0"\t$out" } ' $string
for file in *
do
sfx=$(echo "$file" | sed 's/.*_\(.*\).csv.*/\1/')
sed -i "s/$/\t$sfx/" "$file"
done
Using sed:
$ sed 's/.*_\(.*\).csv.*/&\t\1/' file
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out 1
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out 2
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out 1
To make this for many files:
sed 's/.*_\(.*\).csv.*/&\t\1/' file1 file2 file3
OR
sed 's/.*_\(.*\).csv.*/&\t\1/' file*
To make this changed get saved in the same file(If you have GNU sed):
sed -i 's/.*\(.\).csv.*/&\t\1/' file
Untested, but this should do what you want (extract the number before .csv and append that number to the end of every line in the .out file)
awk 'FNR==1 { split(FILENAME, field, /[_.]/) }
{ print $0"\t"field[7] > FILENAME"_aaaa" }' *.out
for file in *_aaaa; do mv "$file" "${file/_aaaa}"; done
If I understood correctly, you want to append the number from the filename to every line in that file - this should do it:
#!/bin/bash
while [[ 0 < $# ]]; do
num=$(echo "$1" | sed -r 's/.*_([0-9]+).csv.*/\t\1/' )
#awk -e "{ print \$0\"\t${num}\"; }" < "$1" > "$1.new"
#sed -r "s/$/\t$num/" < "$1" > "$1.mew"
#sed -ri "s/$/\t$num/" "$1"
shift
done
Run the script and give it names of the files you want to process. $# is the number of command line arguments for the script which is decremented at the end of the loop by shift, which drops the first argument, and shifts the other ones. Extract the number from the filename and pick one of the three commented lines to do the appending: awk gives you more flexibility, first sed creates new files, second sed processes them in-place (in case you are running GNU sed, that is).
Instead of awk, you may want to go with sed or coreutils.
Grab number from filename, with grep for variety:
num=$(<<<filename grep -Eo '[^_]+\.csv' | cut -d. -f1)
<<<filename is equivalent to echo filename.
With sed
Append num to each line with GNU sed:
sed "s/\$/\t$num" filename
Use the -i switch to modify filename in-place.
With paste
You also need to know the length of the file for this method:
len=$(<filename wc -l)
Combine filename and num with paste:
paste filename <(seq $len | while read; do echo $num; done)
Complete example
for filename in DWH_Export*; do
num=$(echo $filename | grep -Eo '[^_]+\.csv' | cut -d. -f1)
sed -i "s/\$/\t$num" $filename
done

Delete first line of file if it's empty

How can I delete the first (!) line of a text file if it's empty, using e.g. sed or other standard UNIX tools. I tried this command:
sed '/^$/d' < somefile
But this will delete the first empty line, not the first line of the file, if it's empty. Can I give sed some condition, concerning the line number?
With Levon's answer I built this small script based on awk:
#!/bin/bash
for FILE in $(find some_directory -name "*.csv")
do
echo Processing ${FILE}
awk '{if (NR==1 && NF==0) next};1' < ${FILE} > ${FILE}.killfirstline
mv ${FILE}.killfirstline ${FILE}
done
The simplest thing in sed is:
sed '1{/^$/d}'
Note that this does not delete a line that contains all blanks, but only a line that contains nothing but a single newline. To get rid of blanks:
sed '1{/^ *$/d}'
and to eliminate all whitespace:
sed '1{/^[[:space:]]*$/d}'
Note that some versions of sed require a terminator inside the block, so you might need to add a semi-colon. eg sed '1{/^$/d;}'
Using sed, try this:
sed -e '2,$b' -e '/^$/d' < somefile
or to make the change in place:
sed -i~ -e '2,$b' -e '/^$/d' somefile
If you don't have to do this in-place, you can use awk and redirect the output into a different file.
awk '{if (NR==1 && NF==0) next};1' somefile
This will print the contents of the file except if it's the first line (NR == 1) and it doesn't contain any data (NF == 0).
NR the current line number,NF the number of fields on a given line separated by blanks/tabs
E.g.,
$ cat -n data.txt
1
2 this is some text
3 and here
4 too
5
6 blank above
7 the end
$ awk '{if (NR==1 && NF==0) next};1' data.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
and
cat -n data2.txt
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
$ awk '{if (NR==1 && NF==0) next};1' data2.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
Update:
This sed solution should also work for in-place replacement:
sed -i.bak '1{/^$/d}' somefile
The original file will be saved with a .bak extension
Delete the first line of all files under the actual directory if the first line is empty :
find -type f | xargs sed -i -e '2,$b' -e '/^$/d'
This might work for you:
sed '1!b;/^$/d' file

Resources