I have a text file having a list of around 150 to 200 file names
abc.txt
pqr.txt
xyz.txt
...
...
I need a string of comma separated files.
Each string should have not more than 20 files. So the echo will look something like this...
$string1="abc.txt,pqr.txt,xyz.txt..."
$string2="abc1.txt,pqr1.txt,xyz1.txt..."
...
The number of strings will be different depending upon the number of lines in the file. I have written something like this...
#!/bin/sh
delim=','
for gsfile in `cat filelist.txt`
do
filelist=$filelist$delim$gsfile
echo $filelist
done
Translate command is working as expected, but how do I restrict each string to 20 filenames?
cat filelist.txt | tr '\n' ','
Just use xargs:
$ seq 1 50 | xargs -n20 | tr ' ' ,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41,42,43,44,45,46,47,48,49,50
One way using sed:
Add like Igor Chubin 50 numbers to infile:
seq 1 50 >infile
Content of script.sed:
:b
## While not last line...
$! {
## Check if line has 19 newlines. Try substituting the line with itself and
## check if it succeed, then append next line and do it again in a loop.
s/\(\n[^n]*\)\{19\}/&/
ta
N
bb
}
## There are 20 lines in the buffer or found end of file, so substitute all '\n'
## with commas and print.
:a
s/\n/,/g
p
Run it like:
sed -nf script.sed infile
With following output:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41,42,43,44,45,46,47,48,49,50
This might work for you:
seq 41 | paste -sd ',,,,,,,,,,,,,,,,,,,\n'
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41
or GNU sed:
seq 41 | sed ':a;$bb;N;s/\n/&/19;Ta;:b;y/\n/,/'
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41
Use a flag in sed's s command to replace every 20th comma with a newline:
< filelist.txt tr '\n' , | sed ':a; s/,/\n/20; P; D; ta'; echo
Related
Using POSIX sed or awk, I would like to duplicate every second character in every pair of neighboring characters and list every newly-formed pair on a new line.
example.txt:
abcd 10001.
Expected result:
ab
bc
cd
d
1
10
00
00
01
1.
So far, this is what I have (N.B. omit "--posix" if on macOS). For some reason, adding a literal newline character before \2 does not produce the expected result. Removing the first group and using \1 has the same effect. What am I missing?
sed --posix -E -e 's/(.)(.)/&\2\
/g' example.txt
abb
cdd
100
000
1..
Try:
$ echo "abcd 10001." | awk '{for(i=1;i<length($0);i++) print substr($0,i,2)}'
ab
bc
cd
d
1
10
00
00
01
1.
You may use
sed --posix -e 's/./&\
&/g' example.txt | sed '1d;$d'
The first sed command finds every char in the string and replaces with the same char, then a newline and then the same char again. Since it replaces first and last chars, the first and last resulting lines must be removed, which is achieved with sed '1d;$d'.
Had sed supported lookarounds, one could have used (?!^).(?!$) (any char but not at the start or end of string) and the last sed command would not have been necessary, but it is not possible with sed. You could use it in perl though, perl -pe 's/(?!^).(?!$)/$&\n$&/g' example.txt (see demo online, $& in the RHS is the same as & placeholder in sed, the whole match value).
With GNU awk could you please try following. Written and tested with shown samples and tested it in link
https://ideone.com/qahp0S
awk '
BEGIN{
FS=""
}
{
for(i=1;i<=(NF-1);i++){
print $i$(i+1)
}
}
' Input_file
Explanation: setting field separator as NULL in the BEGIN section of program for all lines here. Then in main program running a for loop which runs from 1st field to till 2nd last field. In that loop's each iteration printing current and next field.
Using same routine, it can be done in bash itself:
s='abcd 10001.'
for((i=0; i<${#s}-1; i++)); do echo "${s:i:2}"; done
ab
bc
cd
d
1
10
00
00
01
1.
Just for fun, a single sed consisting of 3 substitutions:
$ echo "abcd 10001." | sed 's/./&&/g;s/\(^.\|.$\)//g;s/../&\n/g'
The first part duplicates all characters, the second part removes the first and last character, the third part adds a newline character after each character-pair.
If you want to be POSIX compliant you have to do:
$ echo "abcd 10001." | sed -e 's/./&&/g' -e 's/^.//g' -e 's/.$//g' -e 's/../&\n/g'
Here we had to add an extra one as the expression \(^.\|.$) is an ERE and posix sed only accepts a BRE
This might work for you (GNU sed):
sed 's/.\(.\)/&\n\1/;/../P;D' file
Replace the first two characters by the first two characters, a newline and the second character.
Print the first line if it is two characters long, delete the first line and repeat.
Alternative, more long winded:
sed -E ':a;s/^(([^\n]{2}\n)*[^\n])([^\n])([^\n])/\1\3\n\3\4/;ta' file
Or, with no hardcoded new line:
sed -E '/.../{G;s/^(.(.))(.*)(.)/\1\4\2\3/;P;D}' file
Lastly:
sed 's/./&\n&/g;s/^..\|..$/g' file
Process substitution isn't specified by POSIX. The POSIX requirement was only specified for awk and sed, so maybe the next solution is acceptable:
paste -d '\0' <(echo; fold -w1 example.txt) <(fold -w1 example.txt) | grep ..
or
while read -n1 ch; do
printf "%s\n%s" "${ch}" "${ch}"
done < example.txt | grep ..
or
sed 's/./&&/g;s/.//' example.txt | grep -o ..
I'm trying to print every nth line from file, but n is not a constant but a variable.
For instance, I want to replace sed -n '1~5p' with something like sed -n '1~${i}p'.
Is this possible?
awk can also do it in a more elegant way:
awk -v n=YOUR_NUM 'NR%n==1' file
With -v n=YOUR_NUM you indicate the number. Then, NR%n==1 evaluates to true just when the line number is on a form of 7n+1, so it prints the line.
Note how good it is to use awk for this: if you want the lines on the form of 7n+k, you just need to do: awk -v n=7 'NR%n==k' file.
Example
Let's print every 7 lines:
$ seq 50 | awk -v n=7 'NR%n==1'
1
8
15
22
29
36
43
50
Or in sed:
$ n=7
$ seq 50 | sed -n "1~$n p" # quote the expression, so that "$n" is expanded
1
8
15
22
29
36
43
50
The point is, you should use double quote " instead of the single one to wrap your sed codes. Variable won't be expanded within single quote. so:
sed -n "1~${i} p"
You can do it like this for example:
i=3
sed "2,${i}s/.*/changed line/g" InputFile
Example:
AMD$ cat File
aaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbb
cccccccccccccccccccccc
ddddddddddddddddddddddd
eeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffffffffff
ggggggggggggggggggggg
AMD$ i=4; sed "2,${i}s/.*/changed line/g" File
aaaaaaaaaaaaaaaaaaaaaaaa
changed line
changed line
changed line
eeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffffffffff
ggggggggggggggggggggg
The key is to use " " for variable substitution.
why a sed ?
head -${i} YourFile
I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.
I know I can remove lines using sed:
sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example
I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??
Thanks
A few options:
sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
%del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
$f = file("data_file")->openr();
while (<$f>) {
print unless $del{$.};
}
'
perl -ne'
BEGIN{ local #ARGV =pop; #h{<>} =() }
exists $h{"$.\n"} or print;
' myfile.txt lines
You can make the remove the lines using sed file.
First make a list of lines to remove. (One line number for one line)
$ cat lines
1
34
45
678
Make this file to sed format.
$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d
Now use this sed file and give it as input to sed command.
$ sed -i.bak -f lines.sed file_with_70k_lines
This will remove the lines.
If you can create a text file of the format
1d
34d
45d
678d
then you can run something like
sed -i.bak -f scriptfile datafile
You can use a genuine editor for that, and ed is the standard editor.
I'm assuming your lines are in a file lines.txt, one number per line, e.g.,
1
34
45
678
Then (with a blatant bashism):
ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)
A first sed selects only the numbers from file lines.txt (just in case).
There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.
Then we issue the w (write) and q (quit) commands.
Note that this overwrites the original file!
I have a document A which contains n lines. I also have a sequence of n integers all of which are unique and <n. My goal is to create a document B which has the same contents as A, but with reordered lines, based on the given sequence.
Example:
A:
Foo
Bar
Bat
sequence: 2,0,1 (meaning: First line 2, then line 0, then line 1)
Output (B):
Bat
Foo
Bar
Thanks in advance for the help
Another solution:
You can create a sequence file by doing (assuming sequence is comma delimited):
echo $sequence | sed s/,/\\n/g > seq.txt
Then, just do:
paste seq.txt A.txt | sort tmp2.txt | sed "s/^[0-9]*\s//"
Here's a bash function. The order can be delimited by anything.
Usage: schwartzianTransform "A.txt" 2 0 1
function schwartzianTransform {
local file="$1"
shift
local sequence="$#"
echo -n "$sequence" | sed 's/[^[:digit:]][^[:digit:]]*/\
/g' | paste -d ' ' - "$file" | sort -n | sed 's/^[[:digit:]]* //'
}
Read the file into an array and then use the power of indexing :
echo "Enter the input file name"
read ip
index=0
while read line ; do
NAME[$index]="$line"
index=$(($index+1))
done < $ip
echo "Enter the file having order"
read od
while read line ; do
echo "${NAME[$line]}";
done < $od
[aman#aman sh]$ cat test
Foo
Bar
Bat
[aman#aman sh]$ cat od
2
0
1
[aman#aman sh]$ ./order.sh
Enter the input file name
test
Enter the file having order
od
Bat
Foo
Bar
an awk oneliner could do the job:
awk -vs="$s" '{d[NR-1]=$0}END{split(s,a,",");for(i=1;i<=length(a);i++)print d[a[i]]}' file
$s is your sequence.
take a look this example:
kent$ seq 10 >file #get a 10 lines file
kent$ s=$(seq 0 9 |shuf|tr '\n' ','|sed 's/,$//') # get a random sequence by shuf
kent$ echo $s #check the sequence in var $s
7,9,1,0,5,4,3,8,6,2
kent$ awk -vs="$s" '{d[NR-1]=$0}END{split(s,a,",");for(i=1;i<=length(a);i++)print d[a[i]]}' file
8
10
2
1
6
5
4
9
7
3
One way(not an efficient one though for big files):
$ seq="2 0 1"
$ for i in $seq
> do
> awk -v l="$i" 'NR==l+1' file
> done
Bat
Foo
Bar
If your file is a big one, you can use this one:
$ seq='2,0,1'
$ x=$(echo $seq | awk '{printf "%dp;", $0+1;print $0+1> "tn.txt"}' RS=,)
$ sed -n "$x" file | awk 'NR==FNR{a[++i]=$0;next}{print a[$0]}' - tn.txt
The 2nd line prepares a sed command print instruction, which is then used in the 3rd line with the sed command. This prints only the line numbers present in the sequence, but not in the order of the sequence. The awk command is used to order the sed result depending on the sequence.
I have a tab delimited file (in which number of columns in each row is not fixed) which looks like this:
chr1 92536437 92537640 NM_024813 NM_053274
I want to have a file from this in following order (first three columns are identifiers which I need it while splitting it)
chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274
Suggestions for a shell script.
#!/bin/bash
{
IFS=' '
while read a b c rest
do
for fld in $rest
do
echo -e "$a\t$b\t$c\t$fld"
done
done
}
Note that you should enter a real tab there (IFS)
I also thought I should do a perl version:
#!/bin/perl -n
($a,$b,$c,#r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for #r
To do it all from the commandline, reading from in.txt and outputting to out.txt:
perl -ne '($a,$b,$c,#r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for #r' in.txt > out.txt
Of course if you save the perl script (say as script.pl)
perl script.pl in.txt > out.txt
If you also make the script file executable (chmod +x script.pl):
./script.pl in.txt > out.txt
HTH
Not shell, and the other answer is perfectly fine, but i onelined it in perl :
perl -F'/\s/' -lane '$,="\t"; print #F,$_ for splice #F,3' $FILE
Edit: New (even more unreadable ;) version, inspired by the other answers. Abusing perl's command line parameters and special variables for autosplitting and line ending handling.
Means: For each of the fields after the three first (for splice #F,3), print the first three and it (print #F,$_).
-F sets the field separator to \s (should be \t) for -a autosplitting into #F.
-l turns on line ending handling for -n which runs the -e code for each line of the input.
$, is the output field separator.
[Edited]
So you want to duplicate the first three columns for each remaining item?
$ cat File | while read X
do PRE=$(echo "$X" | cut -f1-3 -d ' ')
for Y in $(echo "$X" | cut -f4- -d ' ')
do echo $PRE $Y >> OutputFilename
done
done
Returns:
chr 786 789 NM
chr 786 789 NR
chr 786 789 NT
chr 123 345 NR
This cuts the first three space delimited columns as a prefix, and then abuses the fact that a for loop will step through a space delimited list to call echo.
Enjoy.
This is just a subset of your data comparison in two files question.
Extracting my slightly hacky solution from there:
for i in 4 5 6 7; do join -e _ -j $i f f -o 1.1,1.2,1.3,0; done | sed '/_$/d'