How can I redirect fixed lines to a new file with shell - bash

I know we can use > to redirect IO to a file. While I want to write fixed line to a file.
For example,
more something will output 3210 lines, then I want
line 1~1000 in file1
line 1001~2000 in file2
line 2001~3000 in file3
line 3001~3210 in file4.
How can I do it with SHELL script?
Thx.

The split command is what you need.
split -l 1000 your_file.txt "prefix"
Where:
-l - split in lines.
1000 - The number of lines to split.
your_file.txt - The file you want to split.
prefix - A prefix to the output files' names.
Example for a file of 3210 lines:
# Generate the file
$ seq 3210 > your_file.txt
# Split the file
$ split -l 1000 your_file.txt "prefix"
# Check the output files' names
$ ls prefix*
prefixaa prefixab prefixac prefixad
# Check all files' ending
$ tail prefixa*
==> prefixaa <==
991
992
993
994
995
996
997
998
999
1000
==> prefixab <==
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
==> prefixac <==
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
==> prefixad <==
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210

Related

how to combine more than two file into one new file with specific name using bash

I have many file
list file name:
p004c01.txt
p004c05.txt
p006c01.txt
p006c02.txt
p007c01.txt
p007c03.txt
p007c04.txt
...
$cat p004c01.txt
#header
122.5 -0.256 547
123.6 NaN 325
$cat p004c05.txt
#header
122.1 2.054 247
122.2 -1.112 105
$cat p006c01.txt
#header
99 -0.200 333
121.4 -1.206 243
$cat p006c02.txt
#header
122.5 2.200 987
99 -1.335 556
I want the file be like this
file1
$cat p004.txt
122 -0.256 547
122 2.054 247
122 -1.112 105
file2
$cat p006.txt
122.5 2.200 987
121.4 -1.206 243
99 -1.335 556
99 -0.200 333
And the other file too
File that contain the same value (?) in
p????cxx.txt
is in the same new file
I tried one by one file like this
cat p004* | sed '/#/d'| sort -k 1n | sed '/NaN/d' |awk '{print substr($1,2,3),$2,$3,$4,$5}' > p004.txt
Anyone can help me with the simple script for all the data?
Thank you :)
Perhaps this will work for you:
for f in {001..999}; do tail -n +2 p"$f"c* > p"$f".txt; done 2>/dev/null

Trying to sort a text file with dates in brackets with "sort"

I'm trying to sort a text by date.
My file format is:
...
[15/08/2019 - 01:58:49] some text here
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
...
I've tried multiple different methods with the sort command.
One attempt: "sort -b --key=1n --debug Final_out.txt"
sort: using ‘en_US.UTF-8’ sorting rules
sort: key 1 is numeric and spans multiple fields
sort: option '-b' is ignored
^ no match for key
^ no match for key
...
__
.?
^ no match for key
__
.?
^ no match for key
__
sort: write failed: 'standard output': Input/output error
sort: write error
Second attempt: "sort -n -b --key=10,11 --debug Final_out.txt"
Produced same output above
Just about to tear my hair out. This has to be possible, it's Linux! Come someone kindly give me pointers?
As Shawnn suggests, how about a bash solution:
#!/bin/bash
pat='^\[([0-9]{2})/([0-9]{2})/([0-9]{4})[[:blank:]]+-[[:blank:]]+([0-9]{2}:[0-9]{2}:[0-9]{2})\]'
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
m=( "${BASH_REMATCH[#]}" ) # make a copy just to shorten the variable name
echo -e "${m[3]}${m[2]}${m[1]}_${m[4]}\t$line"
fi
done < file.txt | sort -t $'\t' -k1,1 | cut -f2-
The variable pat is a regular expression to match the date and time field
and assigns bash variable BASH_REMATCH[#] to day, month, year and time
in order.
After extracting the date and time field, it generates a new string
composed of year, month, day and time in a sortable order and prepend
the string to the current line delimited with a tab
Then the whole lines are piped to sort keyed on the 1st field.
Finally the 1st field is cut off.
The input file file.txt:
[10/01/2020 - 01:23:45] lorem ipsum
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[15/08/2019 - 01:58:49] some text here
[14/08/2019 - 12:34:56] dolor sit amet
Output:
[14/08/2019 - 12:34:56] dolor sit amet
[15/08/2019 - 01:58:49] some text here
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[10/01/2020 - 01:23:45] lorem ipsum
Here is an alternative but shorter sorting way using gnu awk:
cat file
[10/01/2020 - 01:23:45] lorem ipsum
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[15/08/2019 - 01:58:49] some text here
[14/08/2019 - 12:34:56] dolor sit amet
Use this awk:
awk -v FPAT='[0-9:]+' '{ map[$3,$2,$1,$4] = $0 }
END { PROCINFO["sorted_in"]="#ind_str_asc"; for (k in map) print map[k] }' file
[14/08/2019 - 12:34:56] dolor sit amet
[15/08/2019 - 01:58:49] some text here
[15/08/2019 - 02:21:23] more text here
[15/08/2019 - 02:56:11] blah blah blah
[10/01/2020 - 01:23:45] lorem ipsum
I've the same issue with my HISTORY with HISTTIMEFORMAT="%d/%m/%y %T "
To sort according to year, month and day, I used this options in sort:
before
history | awk '/0[78]\/06/{print" "$1" "$2" "$3" command number "NR}'|head -20
1921 07/06/22 09:21:05 command number 925
1922 07/06/22 13:23:31 command number 926
1923 07/06/22 13:24:16 command number 927
1924 07/06/22 13:23:31 command number 928
1925 07/06/22 13:24:16 command number 929
1926 08/06/22 10:59:12 command number 930
1927 08/06/22 10:59:21 command number 931
1928 08/06/22 10:59:26 command number 932
1929 08/06/22 10:59:27 command number 933
1930 08/06/22 10:59:34 command number 934
1931 08/06/22 10:59:44 command number 935
1932 08/06/22 11:01:47 command number 936
1933 08/06/22 11:03:35 command number 937
1934 08/06/22 11:03:44 command number 938
1935 08/06/22 11:03:48 command number 939
1936 08/06/22 11:04:02 command number 940
1937 08/06/22 11:12:17 command number 941
1938 07/06/22 13:24:16 command number 942
1939 08/06/22 09:22:10 command number 943
1940 08/06/22 09:29:41 command number 944
after
history | awk '/0[78]\/06/{print" "$1" "$2" "$3" command number "NR}'|head -20|sort -bn -k2.7,2.8 -k2.4,2.5 -k2.1,2.2 -k3.1,3.2 -k3.4,3.5 -k3.7,3.8 -k1
1921 07/06/22 09:21:05 command number 925
1922 07/06/22 13:23:31 command number 926
1924 07/06/22 13:23:31 command number 928
1923 07/06/22 13:24:16 command number 927
1925 07/06/22 13:24:16 command number 929
1938 07/06/22 13:24:16 command number 942
1939 08/06/22 09:22:10 command number 943
1940 08/06/22 09:29:41 command number 944
1926 08/06/22 10:59:12 command number 930
1927 08/06/22 10:59:21 command number 931
1928 08/06/22 10:59:26 command number 932
1929 08/06/22 10:59:27 command number 933
1930 08/06/22 10:59:34 command number 934
1931 08/06/22 10:59:44 command number 935
1932 08/06/22 11:01:47 command number 936
1933 08/06/22 11:03:35 command number 937
1934 08/06/22 11:03:44 command number 938
1935 08/06/22 11:03:48 command number 939
1936 08/06/22 11:04:02 command number 940
1937 08/06/22 11:12:17 command number 941
Explainations in sort -bn -k2.7,2.8 -k2.4,2.5 -k2.1,2.2 -k3.1,3.2 -k3.4,3.5 -k3.7,3.8 -k1 command :
d is for remove leading blanks
n is for numeric
k2.7,2.8 is for 2nd key (the date) from 7th to 8th char (yy)
etc for keys 2 and 3 (the time)
And, for #Ventus, the solution can be sort -n -k1.9,1.12 -k1.5,1.6 -k1.2,1.3 -k3.1,3.2 -k3.4,3.5 -k3.7,3.8

loop through numeric text files in bash and add numbers row wise

I have a set of text files in a folder, like so:
a.txt
1
2
3
4
5
b.txt
1000
1001
1002
1003
1004
.. and so on (assume fixed number of rows, but unknown number of text files). What I am looking a results file which is a summation across all rows:
result.txt
1001
1003
1005
1007
1009
How do I go about achieving this in bash? without using Python etc.
Using awk
Try:
$ awk '{a[FNR]+=$0} END{for(i=1;i<=FNR;i++)print a[i]}' *.txt
1001
1003
1005
1007
1009
How it works:
a[FNR]+=$0
For every line read, we add the value of that line, $0, to partial sum, a[FNR], where a is an array and FNR is the line number in the current file.
END{for(i=1;i<=FNR;i++)print a[i]}
After all the files have been read in, this prints out the sum for each line number.
Using paste and bc
$ paste -d+ *.txt | bc
1001
1003
1005
1007
1009

bash insert space in last but one position of each line

using the $ regex I can get last position of each line. but if I have the following:
12345
23456
34567
I need to add a space so it becomes
1234 5
2345 6
3456 7
Thanks!
$ sed 's/.$/ &/' file
1234 5
2345 6
3456 7
gawk -v FIELDWIDTHS='4 1' '{$1=$1}1' file
1234 5
2345 6
3456 7

bash - check for word in specific column, check value in other column of this line, cut and paste the line to new text file

My text files contain ~20k lines and look like this:
file_A:
ATOM 624 SC1 SER 288 54.730 23.870 56.950 1.00 0.00
ATOM 3199 NC3 POP 487 50.780 27.750 27.500 1.00 3.18
ATOM 3910 C2B POP 541 96.340 99.070 39.500 1.00 7.00
ATOM 4125 W PW 559 55.550 64.300 16.880 1.00 0.00
Now I need to check for POP in column 4 (line 2 and 3) and check if the values in the last column (10) exceed a specific threshold (e.g. 5.00). These lines - in this case just line 3 - need to be removed from file_A and copied to a new file_B. Meaning:
file_A:
ATOM 624 SC1 SER 288 54.730 23.870 56.950 1.00 0.00
ATOM 3199 NC3 POP 487 50.780 27.750 27.500 1.00 3.18
ATOM 4125 W PW 559 55.550 64.300 16.880 1.00 0.00
file_B:
ATOM 3910 C2B POP 541 96.340 99.070 39.500 1.00 7.00
I'm not sure wether to use sed, grep or awk or anything couple them :/
So far i could just delete the lines and create a new file without these lines...
awk '!/POP/' file_A > file_B
EDIT:
Does the following work for having more than one different words removed?
for (( i= ; i<$numberoflipids ; i++ ))
do
awk '$4~/"${nol[$i]}"/&&$NF>"$pr"{print >"patch_rmlipids.pdb";next}{print > "tmp"}' bilayer_CG_ordered.pdb && mv tmp patch.pdb
done
whereas $nol is an array containing the words to be removed, $pr is the given threshold and the .pdb are the used files
awk
awk '$4~/POP/&&$NF>5{print >"fileb";next}{print > "tmp"}' filea && mv tmp filea
.
$4~/POP/&&$NF>5 -Checks if fourth field contains POP and last field is more than five
{print >"fileb";next} -If they are writes the line to fileb and
skips further statements
{print > "tmp"} -Only executed if first part fails, write to tmp file
filea && mv tmp filea -The file used, if awk command succeeds then overwrite
it with tmp

Resources