Deleting first n rows and column x from multiple files using Bash script - bash

I am aware that the "deleting n rows" and "deleting column x" questions have both been answered individually before. My current problem is that I'm writing my first bash script, and am having trouble making that script work the way I want it to.
file0001.csv (there are several hundred files like these in one folder)
Data number of lines 540
No.,Profile,Unit
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
Desired output
1,1027.84
2,1027.92
3,1028
4,1028.81
I am able to use sed and cut individually but for some reason the following bash script doesn't take cut into account. It also gives me an error "sed: can't read ls: No such file or directory", yet sed is successful and the output is saved to the original files.
sem2csv.sh
for files in 'ls *.csv' #list of all .csv files
do
sed '1,2d' -i $files | cut -f '1-2' -d ','
done
Actual output:
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
I know there may be awk one-liners but I would really like to understand why this particular bash script isn't running as intended. What am I missing?

The -i option of sed modifies the file in place. Your pipeline to cut receives no input because sed -i produces no output. Without this option, sed would write the results to standard output, instead of back to the file, and then your pipeline would work; but then you would have to take care of writing the results back to the original file yourself.
Moreover, single quotes inhibit expansion -- you are "looping" over the single literal string ls *.csv. The fact that you are not quoting it properly then causes the string to be subject to wildcard expansion inside the loop. So after variable interpolation, your sed command expands to
sed -i 1,2d ls *.csv
and then the shell expands *.csv because it is not quoted. (You should have been receiving a warning that there is no file named ls in the current directory, too.) You probably attempted to copy an example which used backticks (ASCII 96) instead of single quotes (ASCII 39) -- the difference is quite significant.
Anyway, the ls is useless -- the proper idiom is
for files in *.csv; do
sed '1,2d' "$files" ... # the double quotes here are important
done
Mixing sed and cut is usually not a good idea because you can express anything cut can do in terms of a simple sed script. So your entire script could be
for f in *.csv; do
sed -i -e '1,2d' -e 's/,[^,]*$//' "$f"
done
which says to remove the last comma and everything after it. (If your sed does not like multiple -e options, try with a semicolon separator: sed -i '1,2d;s/,[^,]*$//' "$f")

You may use awk,
$ awk 'NR>2{sub(/,[^,]*$/,"",$0);print}' file
1,1027.84
2,1027.92
3,1028
4,1028.81
or
sed -i '1,2d;s/,[^,]*$//' file
1,2d; for deleting the first two lines.
s/,[^,]*$// removes the last comma part in remaining lines.

Related

How to remove characters in filename up to and including second underscore

I've been looking around for a while on this and can't seem to find a solution on how to use sed to do this. I have a file that is named:
FILE_772829345_D594_242_25kd_kljd.mov
that I want to be renamed
D594_242_25kd_kljd.mov
I currently have been trying to get sed to work for this but have only been able to remove the first second of the file:
echo 'FILE_772829345_D594_242_25kd_kljd.mov' | sed 's/[^_]*//'
_772829345_D594_242_25kd_kljd.mov
How would I get sed to do the same instruction again, up to the second underscore?
If the filename is in a shell variable, you don't even need to use sed, just use a shell expansion with # to trim through the second underscore:
filename="FILE_772829345_D594_242_25kd_kljd.mov"
echo "${filename#*_*_}" # prints "D594_242_25kd_kljd.mov"
BTW, if you're going to use mv to rename the file, use its -i option to avoid file getting overwritten if there are any name conflicts:
mv -i "$filename" "${filename#*_*_}"
If all your files are named similarly, you can use cut which would be a lot simpler than sed with a regex:
cut -f3- -d_ <<< "FILE_772829345_D594_242_25kd_kljd.mov"
Output:
D594_242_25kd_kljd.mov

Iterating through files in a folder with sed

I've a list of csv-files and would like to use a for loop to edit the content for each file. I'd like to do that with sed. I have this sed commands which works fine when testing it on one file:
sed 's/[ "-]//g'
So now I want to execute this command for each file in a folder. I've tried this but so far no luck:
for i in *.csv; do sed 's/[ "-]//g' > $i.csv; done
I would like that he would overwrite each file with the edit performed by sed. The sed commands removes all spaces, the " and the '-' character.
Small changes,
for i in *.csv
do
sed -i 's/[ "-]//g' "$i"
done
Changes
when you iterate through the for you get the filenames in $i as example one.csv, two.csv etc. You can directly use these as input to the sed command.
-i Is for inline changes, the sed will do the substitution and updates the file for you. No output redirection is required.
In the code you wrote, I guess you missed any inputs to the sed command
In my case i want to replace every first occurrence of a particular string in each line for several text files, i've use the following:
//want to replace 16 with 1 in each files only for the first occurance
sed -i 's/16/1/' *.txt
In your case, In terminal you can try this
sed 's/[ "-]//g' *.csv
In certain scenarios it might be worth considering finding the files and executing a command on them like explained in this answer (as stated there, make sure echo $PATH doesn't contain .)
find /path/to/csv/ -type f '*.csv' -execdir sed -i 's/[ "-]//g' {} \;
here we:
find all files (type f) which end with .csv in the folder /path/to/csv/
sed the found files in place, ie we replace the original files with the changed version instead of creating numbered csv files ($i.csv)

match multiple conditions with GNU sed

I'm using sed to replace values in other bash scripts, such as:
somedata="$(<somefile.sh)"
somedata=`sed 's/ ==/==/g' <<< $somedata` # [space]== becomes ==
somedata=`sed 's/== /==/g' <<< $somedata` # ==[space] becomes ==
The same for ||, &&, !=, etc. I think steps should be reduced with the right regex match. The operator does not need surrounding spaces, but may have a space before and after, only before, or only after. Is there a way to handle all of these with one sed command?
There are many other conditions not mentioned also. The script takes more time to execute than desired.
The goal is to reduce the overall execution time so I am hoping to reduce the number of commands used with clever regex to match multiple conditions.
I'm also considering tr, awk or perl - whichever is fastest?
With GNU sed, you can use the | (or) operator:
$ sed -r 's/ *(&&|\|\|) */\1/g' <<< "foo && bar || baz"
foo&&bar||baz
*(&&|\|\|) *: search for zero or more space followed by any of the | separated strings followed by zero or more space
the matching strings are captured and output using backreference
Edit:
As pointed out in comments, you can use the -E flag with GNU sed in place of -r. Your command will be more portable:
sed -E 's/ *(\&\&|\|\|) */\1/g'
As GNU sed also supports \| alternation operator with Basic Regular Expressions, you can use it for better readability:
sed 's/ *\(&&\|||\) */\1/g'
You can chain multiple sed substitutions with the -e flag:
$ echo -n "test data here" | sed -e 's/test/TEST/' \
-e 's/data/HERE/' \
-e 's/here/DATA/'
$ TEST HERE DATA
you can use a sedfile (-f option) alongside with the -i option (replace in-place, no need to store in env. variable):
sed -i -f mysedfile somefile.sh
mysedfile may contain expressions, 1 per line
s/ *&& */\&\&/g
s/ *== */==/g
(or use the -e option to use several expression, but if you have a lot of them, it wil become quickly unreadable)
BTW: -i option creates a temporary file within the processed file directory, so in the end, if operation succeeds, the original file is deleted and the temporary file is renamed into the original file name
When the end of the file is reached, the temporary file is renamed
to the output file's original name. The extension, if supplied,
is used to modify the name of the old file before renaming the
temporary file, thereby making a backup copy(2))
so there's no I/O overhead with that option. No need at all to store in a variable.

sed bash substitution only if variable has a value

I'm trying to find a way using variables and sed to do a specific text substitution using a changing input file, but only if there is a value given to replace the existing string with. No value= do nothing (rather than remove the existing string).
Example
Substitute.csv contains 5 lines=
this-has-text
this-has-text
this-has-text
this-has-text
and file.text has one sentence=
"When trying this I want to be sure that text-this-has is left alone."
If I run the following command in a shell script
Text='text-this-has'
Change=`sed -n '3p' substitute.csv`
grep -rl $Text /home/username/file.txt | xargs sed -i "s|$Text|$Change|"
I end up with
"When trying this I want to be sure that is left alone."
But I'd like it to remain as
"When trying this I want to be sure that text-this-has is left alone."
Any way to tell sed "If I give you nothing new, do nothing"?
I apologize for the overthinking, bad habit. Essentially what I'd like to accomplish is if line 3 of the csv file has a value - replace $Text with $Change inline. If the line is empty, leave $Text as $Text.
Text='text-this-has'
Change=$(sed -n '3p' substitute.csv)
if [[ -n $Change ]]; then
grep -rl $Text /home/username/file.txt | xargs sed -i "s|$Text|$Change|"
fi
Just keep it simple and use awk:
awk -v t="$Text" -v c="$Change" 'c!=""{sub(t,c)} {print}' file
If you need inplace editing just use GNU awk with -i inplace.
Given your clarified requirement, this is probably what you actually want:
awk -v t="$Text" 'NR==FNR{if (NR==3) c=$0; next} c!=""{sub(t,c)} {print}' Substitute.csv file.txt
Testing whether $Change has a value before launching into the grep and sed is undoubtedly the most efficient bash solution, although I'm a bit skeptical about the duplication of grep and sed; it saves a temporary file in the case of files which don't contain the target string, but at the cost of an extra scan up to the match in the case of files which do contain it.
If you're looking for typing efficiency, though, the following might be interesting:
find . -name '*.txt' -exec sed -i "s|$Text|${Change:-&}|" {} \;
Which will recursively find all files whose names end with the extension .txt and execute the sed command on each one. ${Change:-&} means "the value of $Change if it exists and is non-empty, and otherwise an &"; & in the replacement of a sed s command means "the matched text", so s|foo|&| replaces every occurrence of foo with itself. That's an expensive no-op but if your time matters more than your cpu time, it might have been worth it.

using sed to find and replace in bash for loop

I have a large number of words in a text file to replace.
This script is working up until the sed command where I get:
sed: 1: "*.js": invalid command code *
PS... Bash isn't one of my strong points - this doesn't need to be pretty or efficient
cd '/Users/xxxxxx/Sites/xxxxxx'
echo `pwd`;
for line in `cat myFile.txt`
do
export IFS=":"
i=0
list=()
for word in $line; do
list[$i]=$word
i=$[i+1]
done
echo ${list[0]}
echo ${list[1]}
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
done
You're running BSD sed (under OS X), therefore the -i flag requires an argument specifying what you want the suffix to be.
Also, no files match the glob *.js.
This looks like a simple typo:
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
Should be:
sed -i "s/${list[0]}/${list[1]}/g" *.js
(just like the echo lines above)
So myFile.txt contains a list of from:to substitutions, and you are looping over each of those. Why don't you create a sed script from this file instead?
cd '/Users/xxxxxx/Sites/xxxxxx'
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt |
# Output from first sed script is a sed script!
# It contains substitutions like this:
# s:from:to:
# s:other:substitute:
sed -f - -i~ *.js
Your sed might not like the -f - which means sed should read its script from standard input. If that is the case, perhaps you can create a temporary script like this instead;
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt >script.sed
sed -f script.sed -i~ *.js
Another approach, if you don't feel very confident with sed and think you are going to forget in a week what the meaning of that voodoo symbols is, could be using IFS in a more efficient way:
IFS=":"
cat myFile.txt | while read PATTERN REPLACEMENT # You feed the while loop with stdout lines and read fields separated by ":"
do
sed -i "s/${PATTERN}/${REPLACEMENT}/g"
done
The only pitfall I can see (it may be more) is that if whether PATTERN or REPLACEMENT contain a slash (/) they are going to destroy your sed expression.
You can change the sed separator with a non-printable character and you should be safe.
Anyway, if you know whats on your myFile.txt you can just use any.

Resources