How can I delete the first (!) line of a text file if it's empty, using e.g. sed or other standard UNIX tools. I tried this command:
sed '/^$/d' < somefile
But this will delete the first empty line, not the first line of the file, if it's empty. Can I give sed some condition, concerning the line number?
With Levon's answer I built this small script based on awk:
#!/bin/bash
for FILE in $(find some_directory -name "*.csv")
do
echo Processing ${FILE}
awk '{if (NR==1 && NF==0) next};1' < ${FILE} > ${FILE}.killfirstline
mv ${FILE}.killfirstline ${FILE}
done
The simplest thing in sed is:
sed '1{/^$/d}'
Note that this does not delete a line that contains all blanks, but only a line that contains nothing but a single newline. To get rid of blanks:
sed '1{/^ *$/d}'
and to eliminate all whitespace:
sed '1{/^[[:space:]]*$/d}'
Note that some versions of sed require a terminator inside the block, so you might need to add a semi-colon. eg sed '1{/^$/d;}'
Using sed, try this:
sed -e '2,$b' -e '/^$/d' < somefile
or to make the change in place:
sed -i~ -e '2,$b' -e '/^$/d' somefile
If you don't have to do this in-place, you can use awk and redirect the output into a different file.
awk '{if (NR==1 && NF==0) next};1' somefile
This will print the contents of the file except if it's the first line (NR == 1) and it doesn't contain any data (NF == 0).
NR the current line number,NF the number of fields on a given line separated by blanks/tabs
E.g.,
$ cat -n data.txt
1
2 this is some text
3 and here
4 too
5
6 blank above
7 the end
$ awk '{if (NR==1 && NF==0) next};1' data.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
and
cat -n data2.txt
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
$ awk '{if (NR==1 && NF==0) next};1' data2.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
Update:
This sed solution should also work for in-place replacement:
sed -i.bak '1{/^$/d}' somefile
The original file will be saved with a .bak extension
Delete the first line of all files under the actual directory if the first line is empty :
find -type f | xargs sed -i -e '2,$b' -e '/^$/d'
This might work for you:
sed '1!b;/^$/d' file
Related
I have multiple files like this:
BOB_1.brother_bob12.txt
BOB_2.brother_bob12.txt
..
BOB_35.brother_bob12.txt
How to join these files in order from {1..36} and append filename at the end of each row? I have tried:
for i in *.txt; do sed 's/$/ '"$i"'/' $i; done > outfile #joins but not in order
cat $(for((i=1;i<38;i++)); do echo -n "BOB_${i}.brother_bob12.txt "; done) # joins in order but no filename at the end
file sample:
1 345 378 1 3 4 5 C T
1 456 789 -1 2 3 4 A T
Do not do cat $(....). You may just:
for ((i=1;i<38;i++)); do
f="BOB_${i}.brother_bob12.txt"
sed "s/$/ $f/" "$f"
done
You may also do:
printf "%s\n" bob.txt BOB_{1..38}.brother_bob12.txt |
xargs -d'\n' -i sed 's/$/ {}/' '{}'
You may use:
for i in {1..36}; do
fn="BOB_${i}.brother_bob12.txt"
[[ -f $fn ]] && awk -v OFS='\t' '{print $0, FILENAME}' "$fn"
done > output
Note that it will insert FILENAME as the last field in every record. If this is not what you want then show your expected output in question.
This might work for you (GNU sed);
sed -n 'p;F' BOB_{1..36}.brother_bob12.txt | sed 'N;s/\n/ /' >newFile
Used 2 invocations of sed, the first to append the file name after each line of each file. The second to replace the newline between each 2 lines by a space.
I have a list of numbers in a file
cat to_delete.txt
2
3
6
9
11
and many txt files in one folder. Each file has tab delimited lines (can be more lines than this).
3 0.55667 0.66778 0.54321 0.12345
6 0.99999 0.44444 0.55555 0.66666
7 0.33333 0.34567 0.56789 0.34543
I want to remove the lines that the first number ($1 for awk) is in to_delete.txt and print only the lines that the first number is not in to_delete.txt. The change should be replacing the old file.
Expected output
7 0.33333 0.34567 0.56789 0.34543
This is what I got so far, which doesn't remove anything;
for file in *.txt; do awk '$1 != /2|3|6|9|11/' "$file" > "$tmp" && mv "$tmp" "$file"; done
I've looked through so many similar questions here but still cannot make it work. I also tried grep -v -f to_delete.txt and sed -n -i '/$to_delete/!p'
Any help is appreciated. Thanks!
In awk:
$ awk 'NR==FNR{a[$1];next}!($1 in a)' delete file
Output:
7 0.33333 0.34567 0.56789 0.34543
Explained:
$ awk '
NR==FNR { # hash records in delete file to a hash
a[$1]
next
}
!($1 in a) # if $1 not found in record in files after the first, output
' delete files* # mind the file order
My first idea was this:
printf "%s\n" *.txt | xargs -n1 sed -i "$(sed 's!.*!/& /d!' to_delete.txt)"
printf "%s\n" *.txt - outputs the *.txt files each on separate lines
| xargs -n1 execute the following command for each line passing the line content as the input
sed -i - edit file in place
$( ... ) - command substitution
sed 's!.*!/^& /d!' to_delete.txt - for each line in to_delete.txt, append the line with /^ and suffix with /d. That way from the list of numbers I get a list of regexes to delete, like:
/^2 /d
/^3 /d
/^6 /d
and so on. Which tells sed to delete lines matching the regex - line starting with the number followed by a space.
But I think awk would be simpler. You could do:
awk '$1 != 2 && $1 != 3 && $1 != 6 ... and so on ...`
but that would be longish, unreadable. It's easier to read the map from the file and then check if the number is in the array:
awk 'FNR==NR{ map[$1] } FNR!=NR && !($1 in map)' to_delete.txt "$file"
The FNR==NR is true only for the first file. So when we read it, we set the map[$1] (we "set" it, just so such element exists). Then FNR!=NR is true for the second file, for which we check if the first element is the key in the map. If it is not, the expression is true and the line gets printed out.
all together:
for file in *.txt; do awk 'FNR==NR{ map[$1] } FNR!=NR && !($1 in map)' to_delete.txt "$file" > "$tmp"; mv "$tmp" "$file"; done
I am using Bash to find the dimensions of a matrix. Here is my code to get the number of elements in one row, however it prints out for the whole file. I just need the number of elements in ONE ROW.
grep -oP "\^I" $1 | wc -l
Here is what the $1 is referring to:
1^I2^I3^I4$
5^I6^I7^I8$
For some reason, it is printing out 9 instead of 3.
Thanks in advance!
Use:
cat $1 | head -n 1 | sed 's/\^I/\n/g' | wc -l
I take the only the first row using head, replace every column delimiter with a newline using sed, then pipe that to wc.
You can use sed before calling grep to isolate one specific line of your file:
sed -n '1p' file | grep -oP "^I" | wc -l
^
^
# will print the 1st line, 2p will print the second line etc
on your input it gives:
using awk
$ awk -F'\\^I' 'NR==1{print NF-1}' $1
3
-F'\\^I' use ^I as field separator
NR==1 first line only
print NF-1 since the question is about counting number of ^I, need to print number of fields minus one
also, if $1 is argument being passed to shell script, use "$1" as good practice
and a guess, this is actual data OP is working with
$ cat ip.txt
1 2 3 4
5 6 7 8
$ cat -A ip.txt
1^I2^I3^I4$
5^I6^I7^I8$
$ # exit to avoid unnecessary processing of other lines
$ awk -F'\t' 'NR==1{print NF-1; exit}' ip.txt
3
sed 's:\^I:\n:g; q' | wc -l
^ ^
|_______|_______ change all ^I to \n
|_______ quit after first line
I have shell script variable var="7,8,9"
These are the line number use to delete to file using sed.
Here I tried:
sed -i "$var"'d' test_file.txt
But i got error `sed: -e expression #1, char 4: unknown command: ,'
Is there any other way to remove the line?
sed command doesn't accept comma delimited line numbers.
You can use this awk command that uses a bit if BASH string manipulation to form a regex with the given comma separated line numbers:
awk -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt
This will set awk variable var as this regex:
^(7|8|9)$
And then condition NR !~ var ensures that we print only those lines that don't match above regex.
For inline editing, if you gnu-awk with version > 4.0 then use:
awk -i inplace -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt
Or for older awk use:
awk -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt > $$.tmp && mv $$.tmp test_file.txt
I like sed, you were close to it. You just need to split each line number into a separate command. How about this:
sed -e "$(echo 1,3,4 | tr ',' '\n' | while read N; do printf '%dd;' $N; done)"
do like this:
sed -i "`echo $var|sed 's/,/d;/g'`d;" file
Another option to consider would be ed, with printf '%s\n' to put commands onto separate lines:
lines=( 9 8 7 )
printf '%s\n' "${lines[#]/%/d}" w | ed -s file
The array lines contains the line numbers to be deleted; it's important to put these in descending order! The expansion ${lines[#]/%/d} adds a d (delete) command to each line number and w writes to the file at the end. You can change this to ,p instead, to check the output before overwriting your file.
As an aside, for this example, you could also just use 7,9 as a single entry in the array.
I have number of files which have similar names like
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out
etc.
I need to get number before .csv(1 or 2) from the file name and put it into end of every line in file with TAB separator.
I have written this code, it finds number that I need, but i do not know how to put this number into file. There is space in the filename, my script breaks because of it.
Also I am not sure, how to send to script list of files. Now I am working only with one file.
My code:
#!/bin/sh
string="DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out"
out=$(echo $string | awk 'BEGIN {FS="_"};{print substr ($7,0,1)}')
awk ' { print $0"\t$out" } ' $string
for file in *
do
sfx=$(echo "$file" | sed 's/.*_\(.*\).csv.*/\1/')
sed -i "s/$/\t$sfx/" "$file"
done
Using sed:
$ sed 's/.*_\(.*\).csv.*/&\t\1/' file
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out 1
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out 2
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out 1
To make this for many files:
sed 's/.*_\(.*\).csv.*/&\t\1/' file1 file2 file3
OR
sed 's/.*_\(.*\).csv.*/&\t\1/' file*
To make this changed get saved in the same file(If you have GNU sed):
sed -i 's/.*\(.\).csv.*/&\t\1/' file
Untested, but this should do what you want (extract the number before .csv and append that number to the end of every line in the .out file)
awk 'FNR==1 { split(FILENAME, field, /[_.]/) }
{ print $0"\t"field[7] > FILENAME"_aaaa" }' *.out
for file in *_aaaa; do mv "$file" "${file/_aaaa}"; done
If I understood correctly, you want to append the number from the filename to every line in that file - this should do it:
#!/bin/bash
while [[ 0 < $# ]]; do
num=$(echo "$1" | sed -r 's/.*_([0-9]+).csv.*/\t\1/' )
#awk -e "{ print \$0\"\t${num}\"; }" < "$1" > "$1.new"
#sed -r "s/$/\t$num/" < "$1" > "$1.mew"
#sed -ri "s/$/\t$num/" "$1"
shift
done
Run the script and give it names of the files you want to process. $# is the number of command line arguments for the script which is decremented at the end of the loop by shift, which drops the first argument, and shifts the other ones. Extract the number from the filename and pick one of the three commented lines to do the appending: awk gives you more flexibility, first sed creates new files, second sed processes them in-place (in case you are running GNU sed, that is).
Instead of awk, you may want to go with sed or coreutils.
Grab number from filename, with grep for variety:
num=$(<<<filename grep -Eo '[^_]+\.csv' | cut -d. -f1)
<<<filename is equivalent to echo filename.
With sed
Append num to each line with GNU sed:
sed "s/\$/\t$num" filename
Use the -i switch to modify filename in-place.
With paste
You also need to know the length of the file for this method:
len=$(<filename wc -l)
Combine filename and num with paste:
paste filename <(seq $len | while read; do echo $num; done)
Complete example
for filename in DWH_Export*; do
num=$(echo $filename | grep -Eo '[^_]+\.csv' | cut -d. -f1)
sed -i "s/\$/\t$num" $filename
done