I have many very large files. Within each file it repeats 3 times. My intent is to delete the first portion of all of them such that only the last two repeats remain.
The code I have loops through the lines and identifies the position of each repeat (via a counter) and saves them as a variable (FIRST and END). My hope is that I would then use: sed -i '${FIRST},${END}d ${i}.log' to cut out that section of the file.
However when I run the code I get an error as follows: sed: -e expression #1, char 3: extra characters after command
Here is the code that reads the files, where "Cite" is the keyword that identifies repeats:
while read -r LINE ; do
((LCOUNT++))
if [[ "$LINE" =~ "Cite" ]] ; then
((CITE++))
if [[ "$CITE" = 1 ]] ; then
FIRST=${LCOUNT}
fi
if [[ "$CITE" = 2 ]] ; then
END=$((LCOUNT - 1))
fi
fi
done < "./${i}.log"
Your command
sed -i '${FIRST},${END}d ${i}.log'
does not make sense. You call sed here with two arguments: The option
-i
and a single string which is literally
${FIRST},${END}d ${i}.log
Since you have used single quotes, no parameter expansion occurs, and the whole piece is passed to sed as a single argument to be interpreted as a sed program. sed tries to read from stdin (since you have not passed a file argument), and the sed program obviously does not make sense.
You could do something like
sed $FIRST,${END}d "${i}.log"
A note aside, regarding the title of your post: "numerical variables" do not exist in bash. Every variable is a string. You can do a
typeset -i foo
which makes bash do some processing to ensure that the strings assigned represent natural numbers, but they are still strings. For instance,
foo=abc # sets foo to the string 0
foo=00005 # sets foo to the string 5
foo=5a # raises an error
This might work for you (GNU sed):
sed -ni '/Cite/!{p;b};:a;n;//!ba;:b;n;p;bb' file1 file2 ... filen
Turn off implicit printing -n and turn on edit inplace -i.
If a line does not match Cite, print it and repeat.
Otherwise filter following lines until another match and then print the remaining lines until the end of the file.
N.B. The -i treats each file separately in the same way the -s option does but edits the files inplace, so make sure by using the -s option first and when satisfied the results are as expected substitute the -i option.
Related
I have a list of files stored in a text file, and if a Python file is found in that list. I want to the corresponding test file using Pytest.
My file looks like this:
/folder1/file1.txt
/folder1/file2.jpg
/folder1/file3.md
/folder1/file4.py
/folder1/folder2/file5.py
When 4th/5th files are found, I want to run the command pytest like:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Currently, I am using this command:
cat /workspace/filelist.txt | while read line; do if [[ $$line == *.py ]]; then exec "pytest test_$${line}"; fi; done;
which is not working correctly, as I have file path in the text as well. Any idea how to implement this?
Using Bash's variable substring removal to add the test_. One-liner:
$ while read line; do if [[ $line == *.py ]]; then echo "pytest ${line%/*}/test_${line##*/}"; fi; done < file
In more readable form:
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/}"
fi
done < file
Output:
pytest /folder1/test_file4.py
pytest /folder1/folder2/test_file5.py
Don't know anything about the Google Cloudbuild so I'll let you experiment with the double dollar signs.
Update:
In case there are files already with test_ prefix, use this bash script that utilizes extglob in variable substring removal:
shopt -s extglob # notice
while read line
do
if [[ $line == *.py ]]
then
echo "pytest ${line%/*}/test_${line##*/?(test_)}" # notice
fi
done < file
You can easily refactor all your conditions into a simple sed script. This also gets rid of the useless cat and the similarly useless exec.
sed -n 's%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The regular expression matches anything after the last slash, which means the entire line if there is no slash; we include the .py suffix to make sure this only matches those files.
The pipe to xargs is a common way to convert standard input into command-line arguments. The -n 1 says to pass one argument at a time, rather than as many as possible. (Maybe pytest allows you to specify many tests; then, you can take out the -n 1 and let xargs pass in as many as it can fit.)
If you want to avoid adding the test_ prefix to files which already have it, one solution is to break up the sed script into two separate actions:
sed -n '/test_[^/]*\.py/p;t;s%[^/]*\.py$%test_&%p' /workspace/filelist.txt |
xargs -n 1 pytest
The first p simply prints the matches verbatim; the t says if that matched, skip the rest of the script for this input.
(MacOS / BSD sed will want a newline instead of a semicolon after the t command.)
sed is arguably a bit of a read-only language; this is already pressing towards the boundary where perhaps you would rewrite this in Awk instead.
You may want to focus on lines that ends with ".py" string
You can achieve that using grep combined with a regex so you can figure out if a line ends with .py - that eliminates the if statement.
IFS=$'\n'
for file in $(cat /workspace/filelist.txt|grep '\.py$');do pytest $file;done
I am already working on a script to replace value of a variable "SUBDIRS" in a Makefile from shell script.
I used below command and it works fine but exits after doing for first occurrene of "SUBDIRS" and makefile is incomplete.
sed -z -i "s/\(SUBDIRS = \).*/\1$(tr '\n' ' ' < changed_fe_modules.log)\n/g" Makefile
Now I want to keep my Makefile as it is and only replace 3 occurrences of "SUBDIRS= abcdefgh" and update Makefile properly.
Please suggest how to just replace all 3 occurrences and keep Makefile also end to end as original.
Makefile input sample:
Makefile Desired output sample:
Right now, current command is giving me below output: it exits after first replacement and file is incomplete.
This will be very hard to do.
The reason you're seeing this behavior is that you're using the -z option with sed. The -z option separates lines with NUL characters, not newlines. This means the entire file (up to the first NUL character, which there isn't one here) is treated as a single "line" for the purposes of sed's pattern matching.
So this regex:
\(SUBDIRS = \).*
the .* here matches the entire rest of the file after the first SUBDIRS = match. Then you replace the entire rest of the file with the contents of the changed_fe_modules.log file. After that there's nothing left to match, so sed is done.
If your original makefile listed all the SUBDIRS on a single line, not using backslash/newline separators, it would be simple; you can just use:
sed -i "s/^SUBDIRS = .*/SUBDIRS = $(tr '\n' ' ' < changed_fe_modules.log)/" Makefile
If you have to use the backslash/newline you probably won't be able to make this change using sed. You'll need to use something more powerful like Perl which has non-greedy matching capabilities.
ETA
You could also write it in plain shell:
new_subdirs=$(tr '\n' ' ' < changed_fe_modules.log)
line_cont=false
in_subdirs=false
while read -r line; do
if $line_cont; then
case $line in
(*\\) : still continuing ;;
(*) line_cont=false ;;
esac
$in_subdirs || printf '%s\n' "$line"
continue
fi
case $line in
(SUBDIRS =*)
echo "SUBDIRS = $new_subdirs"
in_subdirs=true ;;
(*) printf '%s\n' "$line"
in_subdirs=false ;;
esac
case $line in
(*\\) line_cont=true ;;
esac
done < Makefile > Makefile.new
mv -f Makefile.new Makefile
(note, completely untested)
I got a output from a command and it is something like this 2048,4096,8192,16384,24576,32768.
I want to split it into 6 different files but only the numbers, not the commas e.g
The initial text: 2048,4096,8192,16384,24576,32768 be split into: 2048 to the file A, 4096 to the file B, 8192 to the file C and so on.
That output follows this rules:
There are always 6 spaces, separated by commas
The numbers are always from 3 to 5 "length" (I don't know the proper English word)
As I told you, commas doesn't interest me because I'm going to do mathematical operations with those numbers
I tried to delete the last X numbers but didn't get a way to "detect" a comma so the operation can stop.
Is it possible using SED?
The following requires on no commands external to a POSIX-compliant shell (such as busybox ash, which you're most likely to be using on Android):
csv=/system/file.csv
IFS=, read a b c d e f <"$csv"
echo "$a" >A
echo "$b" >B
echo "$c" >C
echo "$d" >D
echo "$e" >E
echo "$f" >F
This does assume that the files to be written (A, B, C, D, E and F) are all inside the current working directory. If you want to write them somewhere else, either amend their names, or use cd to change to that other directory.
I'm not sure sed is the right tool for that.
With a simple Bash script:
IFS=',' read -ra val < file.csv
for i in "${val[#]}"; do
echo $i > file$(( ++j ))
done
It writes each values of you csv into file1, file2, etc. :
The read command assigns values from file.csv to array variable val.
Using loop, each value is written to file.
Just make sure you have write permissions in the current directory. If not, change the redirection (eg: > /dirWithWritePermissions/).
This might work for you (GNU sed, bash and parallel):
parallel --xapply echo {1} ">>" part{2} :::: <(sed 's/,/\n/g' file.csv) ::: {1..6}
This "zips" together two files reusing the shorter file as necessary.
N.B. Remember to remove any part* files before applying this command otherwise those files will grow (>> appends).
declare list_of_files=(fileA fileB fileC fileD fileE)
readarray a <<< $(sed 's/,/\n/;{;P;D;}' <<< '2048,4096,8192,16384,24576,32768')
for i in ${a}; do
echo $i > "${list_of_files["$((num++))"]}"
done
Explanation:
s/,/\n/ substitutes every comma with a newline
{ starts a command group
P prints everything in the pattern buffer up to the first newline
D Detetes everything in the pattern buffer up to the first newline and then restarts the current command group
} ends the command group
EDIT:
Let's say you want to copy the information into /system/file but want to have every number in its own row:
$ sed 's/,/\n/;{;P;D;}' < /sys/module/lowmemorykiller/parameters/minfree > /system/file
This will create a new file /system/file that will contain the formatted output.
EDIT: even shorter would: sed 's/,/\n/g', which works just by replacing every comma with a new space (which is done by the g at the end).
Also note that while sed is a nice to tool to use (you gotta love it for its confusing language and commands...), the better and faster way is to use the bash built in read.
Long story short, I'm trying to grep a value contained in the first column of a text file by using a variable.
Here's a sample of the script, with the grep command that doesn't work:
for ii in `cat list.txt`
do
grep '^$ii' >outfile.txt
done
Contents of list.txt :
123,"first product",description,20.456789
456,"second product",description,30.123456
789,"third product",description,40.123456
If I perform grep '^123' list.txt, it produces the correct output... Just the first line of list.txt.
If I try to use the variable (ie grep '^ii' list.txt) I get a "^ii command not found" error. I tried to combine text with the variable to get it to work:
VAR1= "'^"$ii"'"
but the VAR1 variable contained a carriage return after the $ii variable:
'^123
'
I've tried a laundry list of things to remove the cr/lr (ie sed & awk), but to no avail. There has to be an easier way to perform the grep command using the variable. I would prefer to stay with the grep command because it works perfectly when performing it manually.
You have things mixed in the command grep '^ii' list.txt. The character ^ is for the beginning of the line and a $ is for the value of a variable.
When you want to grep for 123 in the variable ii at the beginning of the line, use
ii="123"
grep "^$ii" list.txt
(You should use double quotes here)
Good moment for learning good habits: Continue in variable names in lowercase (well done) and use curly braces (don't harm and are needed in other cases) :
ii="123"
grep "^${ii}" list.txt
Now we both are forgetting something: Our grep will also match
1234,"4-digit product",description,11.1111. Include a , in the grep:
ii="123"
grep "^${ii}," list.txt
And how did you get the "^ii command not found" error ? I think you used backquotes (old way for nesting a command, better is echo "example: $(date)") and you wrote
grep `^ii` list.txt # wrong !
#!/bin/sh
# Read every character before the first comma into the variable ii.
while IFS=, read ii rest; do
# Echo the value of ii. If these values are what you want, you're done; no
# need for grep.
echo "ii = $ii"
# If you want to find something associated with these values in another
# file, however, you can grep the file for the values. Use double quotes so
# that the value of $ii is substituted in the argument to grep.
grep "^$ii" some_other_file.txt >outfile.txt
done <list.txt
I'm new to Bash scripting and I'm having a bit of a hard time. I'm trying to alter the configuration values of a config file. If it finds an existing value I want it to update it, but if it doesn't exist I want it to append it. This is as far I as I got from various tutorials and snippets online:
# FUNCTION TO MODIFY CONFIG BY APPEND OR REPLACE
# $1 File
# $2 Find
# $3 Replace / Append
function replaceappend() {
grep -q '^$2' $1
sed -i 's/^$2.*/$3/' $1
echo '$3' >> $1
}
replaceappend "/etc/test.conf" "Port 20" "Port 10"
However as you might imagine this doesn't work. It seems to be with the logic behind it, I'm not sure how to capture the result of grep in order to choose either sed or echo.
Just use the return value of the command and use double-quotes instead of single quotes:
if ! sed -i "/$2/{s//$3/;h};"'${x;/./{x;q0};x;q1}' $1
then
echo "$3" >> $1
fi
SOURCE: Return code of sed for no match for the q command
This is treading outside my normal use of sed, so let me give an explanation of how this works, as I understand it:
sed "/$2/{s//$3/;h};"'${x;/./{x;q0};x;q1}' $1
The first /$2/ is an address - we will do the commands within {...} for any lines that match this. As a by-product it also sets the pattern-space to $2.
The command {s//$3/;h} says to substitute whatever is in the pattern-space with $3 and then save the pattern-space in the "hold-space", a type of buffer within sed.
The $ after the single quote is another address - it says to do this next command on the LAST line.
The command {x;/./{x;q0};x;q1} says:
x = swap the hold-space and the pattern-space
/./ = an address which matches anything
{x;q0} = swap the hold-space and the pattern-space - if this is successful (there was something in the hold-space) then q0=exit with 0 status (success)
x;q1 = swap the hold-space and the pattern-space - since this is now successful (due to the previous x) then q1=exit with 1 status (fail)
The double-quotes around the first part allow substitution for $2 and $3. The single quotes around the latter part prevents erroneous substitution for the $.
A bit complicated, but it seems to work AS LONG AS YOU HAVE SOMETHING IN THE FILE. An empty file will still succeed since you don't get any match on the last line.
To be honest, after all this complication... Unless the files you are working with are really long so that a double-pass would be really bad I would probably go back to the grep solution like this:
if grep -q "^$2" $1
then
sed -i "s/^$2.*$/$3/" $1
else
echo "$3" >>$1
fi
That's a WHOLE lot easier to understand and maintain later...