Sed file from row number stored in array - bash

I've an array such
echo ${arr[#]}
1 13 19 30 34
I would like to use this array to sed rows (1,13,19,30 and 34) from other file. I know that I can use a loop, but I would like to know if there is a more straightforward way to do this. So far I've not been able to do it.
Thanks

sed solution:
a=(1 13 19 30 34)
sed -n "$(sed 's/[^[:space:]]*/&p;/g' <<< ${a[#]})" file
This will extract 1, 13, 19, 30 and 34th rows from file

You can execute a single sed command on each line by appending the command and a semicolon to each line, and run the result as a sed program. This can be managed in a compact way using bash pattern replacement in variables and arrays; for example, to print the selected lines, use the p command (-n suppresses printing the unselected lines):
sed -n "${arr[*]/%/p;}"
Works fine also with more complex commands like s/from/to/:
sed "${arr[*]/%/s/from/to/;}"
This will perform the replacement only on the selected lines.

awk -v rows="${arr[*]}" 'BEGIN{split(rows,tmp); for (i in tmp) nrs[tmp[i]]} NR in nrs' file

You could use awk and the system function to run the sed command
awk '{ for (i=1;i<=NF;i++) { system("sed -n \""$i"p\" filename") } }' <<< ${arr[#]}
This can be open to command injection though and so assess the risk accordingly.

Related

Remove everything in a pipe delimited file after second-to-last pipe

How can remove everything in a pipe delimited file after the second-to-last pipe? Like for the line
David|3456|ACCOUNT|MALFUNCTION|CANON|456
the result should be
David|3456|ACCOUNT|MALFUNCTION
Replace |(string without pipe)|(string without pipe) at the end of each line:
sed 's/|[^|]*|[^|]*$//' inputfile
Using awk, something like
awk -F'|' 'BEGIN{OFS="|"}{NF=NF-2; print}' inputfile
David|3456|ACCOUNT|MALFUNCTION
(or) use cut if you know the number of columns in total, i,e 6 -> 4
cut -d'|' -f -4 inputfile
David|3456|ACCOUNT|MALFUNCTION
The command I would use is
cat input.txt | sed -r 's/(.*)\|.*/\1/' > output.txt
A pure Bash solution:
while IFS= read -r line || [[ -n $line ]] ; do
printf '%s\n' "${line%|*|*}"
done <inputfile
See Reading input files by line using read command in shell scripting skips last line (particularly the answer by Jahid) for details of how the while loop works.
See pattern matching in Bash for information about ${line%|*|*}.

"grep"ing first 12 of last 24 character from a line

I am trying to extract "first 12 of last 24 character" from a line, i.e.,
for a line:
species,subl,cmp= 1 4 1 s1,torque= 0.41207E-09-0.45586E-13
I need to extract "0.41207E-0".
(I have not written the code, so don't curse me for its formatting. )
I have managed to do this via:
var_s=`grep "species,subl,cmp= $3 $4 $5" $tfile |sed -n '$s/.*\(........................\)$/\1/p'|sed -n '$s/\(............\).*$/\1/p'`
but, is there any more readable way of doing this, rather then counting dots?
EDIT
Thanks to both of you;
so, I have sed,awk grep and bash.
I will run that in loop, for 100's of file.
so, can you also suggest me which one is most efficient, wrt time?
One way with GNU sed (without counting dots):
$ sed -r 's/.*(.{11}).{12}/\1/' file
0.41207E-09
Similarly with GNU grep:
$ grep -Po '.{11}(?=.{12}$)' file
0.41207E-09
Perhaps a python solution may also be helpful:
python -c 'import sys;print "\n".join([a[-24:-13] for a in sys.stdin])' < file
0.41207E-09
I'm not sure your example data and question match up so just change the values in the {n} quantifier accordingly.
Simplest is using pure bash:
echo "${str:(-24):12}"
OR awk can also do that:
awk '{print substr($0, length($0)-23, 12)}' <<< $str
OUTPUT:
0.41207E-09
EDIT: For using bash solution on a file:
while read l; do echo "${l:(-24):12}"; done < file
Another one, less efficient but has the advantage of making you discover new tools
`echo "$str" | rev | cut -b 1-24 | rev | cut -b 1-12
You can use awk to get first 12 characters of last 24 characters from a line:
awk '{substr($0,(length($0)-23))};{print substr($0,(length($0)-10))}' myfile.txt

sed: Argument list too long

I have created a script in Unix environment. In the script, I used the sed command as shown below to delete some lines from the file. I want to delete a specified set of lines, not necessarily a simple range, from the file, specified by line numbers.
sed -i "101d; 102d; ... 4930d;" <file_name>
When I execute this it shows the following error:
sed: Arg is too long
Can you please help to resolve this problem?
If you want to delete a contiguous range of lines, you can specify a range of line numbers:
sed -i '101,4930d' file
If you want to delete some arbitrary set of lines that can't easily be expressed as a range, you can put the commands in a file rather than on the command line, and use sed -f.
For example, if foo.sed contains:
2d
4d
6d
8d
10d
then this:
sed -i -f foo.sed file
will delete lines 2, 4, 6, 8, and 10 from file. Putting the commands in a file rather than on the command line avoids limits on command line length.
If there's some pattern to the lines you want to delete, you might consider using a more sophisticated tool such as Awk or Perl.
I had this exact same problem.
I originally put the giant sed command sed -i "101d; 102d; ... 4930d;" <file_name> in a file and tried to execute as a bash script.
To fix - put only the deletion commands in a file and run that file as a sed script. I was able to execute 18,193 deletion commands that had failed to run before.
sed -i -f to_delete.sed input_file
to_delete.sed:
101d;102d;...4930d
With awk:
awk ' NR < 101 || NR > 4930 { print } ' input_file
This might work for you (GNU sed and awk):
cat <<\! >/tmp/a
> 2
> 4
> 6
> 8
> !
seq 10 >/tmp/b
sed 's/$/d/' /tmp/a | sed -f - /tmp/b
1
3
5
7
9
10
awk 'NR==FNR{a[$0];next};FNR in a{next};1' /tmp/{a,b}
1
3
5
7
9
10

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

How can I delete every Xth line in a text file?

Consider a text file with scientific data, e.g.:
5.787037037037037063e-02 2.048402977658663748e-01
1.157407407407407413e-01 4.021264347118673754e-01
1.736111111111111049e-01 5.782032163406526371e-01
How can I easily delete, for instance, every second line, or every 9 out of 10 lines in the file? Is it for example possible with a bash script?
Background: the file is very large but I need much less data to plot. Note that I am using Ubuntu/Linux.
This is easy to accomplish with awk.
Remove every other line:
awk 'NR % 2 == 0' file > newfile
Remove every 10th line:
awk 'NR % 10 != 0' file > newfile
The NR variable in awk is the line number. Anything outside of { } in awk is a conditional, and the default action is to print.
How about perl?
perl -n -e '$.%10==0&&print' # print every 10th line
You could possibly do it with sed, e.g.
sed -n -e 'p;N;d;' file # print every other line, starting with line 1
If you have GNU sed it's pretty easy
sed -n -e '0~10p' file # print every 10th line
sed -n -e '1~2p' file # print every other line starting with line 1
sed -n -e '0~2p' file # print every other line starting with line 2
Try something like:
awk 'NR%3==0{print $0}' file
This will print one line in three. Or:
awk 'NR%10<9{print $0}' file
will print 9 lines out of ten.
This might work for you (GNU sed):
seq 10 | sed '0~2d' # delete every 2nd line
1
3
5
7
9
seq 100 | sed '0~10!d' # delete 9 out of 10 lines
10
20
30
40
50
60
70
80
90
100
You can use a awk and a shell script. Awk can be difficult but...
This will delete specific lines you tell it to:
nawk -f awkfile.awk [filename]
awkfile.awk contents
BEGIN {
if (!lines) lines="3 4 7 8"
n=split(lines, lA, FS)
for(i=1;i<=n;i++)
linesA[lA[i]]
}
!(FNR in linesA)
Also I can't remember if VIM comes with the standard Ubuntu or not. If not get it.
Then open the file with vim
vim [filename]
Then type
:%!awk NR\%2 or :%!awk NR\%2
This will delete every other line. Just change the 2 to another integer for a different frequency.

Resources