Removing n columns from all the files from directory in Unix - shell

I have around 2000+ files which has random number of columns in each file.
I wanted to remove last 4 columns from each of the file.
I tried to use below command, but it is not an inline command. Delimiter of the file is #
awk -F"#" '{NF-=4;OFS="#";print}' test > testing.csv
I wanted to save the file with the same name.(e.g. filename test with test only)
How to remove last 4 columns and save the file with same name?
Can someone please help?

In case you have GNU awk's latest version could you please try following.
gawk -i inplace -v INPLACE_SUFFIX=.bak 'BEGIN{FS=OFS=","} {NF-=4} 1' *.csv
This will take backup also for each csv Input_file.
Above is safe option which has your Input_file's backup too, in case you are happy with above command and DO NOT want backup files then you could simply run following.
gawk -i inplace 'BEGIN{FS=OFS=","} {NF-=4} 1' *.csv
NOTE: In case anyone using GNU awk version 5+ then we could use inplace::suffix='.bak' s per #Sundeep sir's comment here.

You really, really, really do not want to edit the files "in-place". It is (almost) always the wrong thing to do. For something like this, you want to do something like:
$ rm -rf new-dir/
$ mkdir new-dir
$ for file in old-dir/*; do
f=${file#old-dir/};
awk '{NF-=4; $1=$1; print}' FS=# OFS=# "$file" > new-dir/"$f"; done
Then, after you know things have worked, you can replace your original directory with the new one.

Using any POSIX awk:
tmp=$(mktemp) || exit 1
for file in *; do
awk '{sub(/(#[^#]*){4}$/,"")}1' "$file" > "$tmp" &&
mv -- "$tmp" "$file"
done

Related

How to rename a CSV file from a value in the CSV file

I have 100 1-line CSV files. The files are currently labeled AAA.txt, AAB.txt, ABB.txt (after I used split -l 1 on them). The first field in each of these files is what I want to rename the file as, so instead of AAA, AAB and ABB it would be the first value.
Input CSV (filename AAA.txt)
1234ABC, stuff, stuff
Desired Output (filename 1234ABC.csv)
1234ABC, stuff, stuff
I don't want to edit the content of the CSV itself, just change the filename
something like this should work:
for f in ./* ; do new_name=$(head -1 $f | cut -d, -f1); cp $f dir/$new_name
move them into a new dir just in case something goes wrong, or you need the original file names.
starting with your original file before splitting
$ awk -F, '{print > ($1".csv")}' originalFile.csv
and do all in one shot.
This will store the whole input file into the colum1.csv of the inputfile.
awk -F, '{print $0 > $1".csv" }' aaa.txt
In a terminal, changed directory, e.g. cd /path/to/directory that the files are in and then use the following compound command:
for f in *.txt; do echo mv -n "$f" "$(awk -F, '{print $1}' "$f").cvs"; done
Note: There is an intensional echo command that is there for you to test with, and it will only print out the mv command for you to see that it's the outcome you wish. You can then run it again removing just echo from the compound command to actually rename the files as desired via the mv command.

awk Getting ALL line but last field with the delimiters

I have to make a one-liner that renames all files in the current directory
that end in ".hola" to ".txt".
For example:
sample.hola and name.hi.hola will be renamed to sample.txt and name.hi.txt respectively
I was thinking about something like:
ls -1 *.hola | awk '{NF="";print "$0.hola $0.txt"}' (*)
And then passing the stdin to xargs mv -T with a |
But the output of (*) for the example would be sample and name hi.
How do I get the output name.hi for name.hi.hola using awk?
Why would you want to involve awk in this?
$ for f in *.hola; do echo mv "$f" "${f%hola}txt"; done
mv name.hi.hola name.hi.txt
mv sample.hola sample.txt
Remove the echo when you're happy with the output.
Well, for your specific problem, I recommend the rename command. Depending on the version on your system, you can do either rename -s .hola .txt *.hola, or rename 's/\.hola$/.txt/' *.hola.
Also, you shouldn't use ls to get filenames. When you run ls *.hola, the shell expands *.hola to a list of all the filenames matching that pattern, and ls is just a glorified echo at that point. You can get the same result using e.g. printf '%s\n' *.hola without running any program outside the shell.
And your awk is missing any attempt to remove the .hola. If you have GNU awk, you can do something like this:
awk -F. '{old=$0; NF-=1; new=$0".txt"; print old" "new}'
That won't work on BSD/MacOS awk. In that case you can do something like this:
awk -F. '{
old=$0; new=$1;
for (i=2;i<NF;++i) { new=new"."$i };
print old" "new".txt"; }'
Either way, I'm sure #EdMorton probably has a better awk-based solution.
How about this? Simple and straightforward:
for file in *.hola; do mv "$file" "${file/%hola/txt}"; done

Bash Shell Scripting assigning new variables for output of a grep search

EDIT 2:
I've decided to re-write this in order to better portray my outcome.
I'm currently using this code to output a list of files within various directories:
for file in /directoryX/*.txt
do
grep -rl "Annual Compensation" $file
done
The output shows all files that have a certain table I'm trying to extract in a layout like this:
txtfile1.txt
txtfile2.txt
txtfile3.txt
I have been using this awk command on each individual .txt file to extract the table and then send it to a .csv:
awk '/Annual Compensation/{f=1} f{print; if (/<\/TABLE>/) exit}' txtfile1.txt > txtfile1.csv
My goal is to find a command that will run my awk command against each file in the list all at once. Thank you to those that have provided suggestions already.
If I understand what you're asking, I think what you want to do is add a line after the grep, or instead of the grep, that says:
awk '/Annual Compensation/{f=1} f{print; if (/<\/TABLE>/) exit}' $file > ${file}_new.csv
When you say ${file}_new.csv, it expands the file variable, then adds the string "_new.csv" to it. That's what you're shooting for, right?
Modifying your code:
for file in /directoryX/*.txt
do
files+=($(grep -rl "Annual Compensation" $file))
done
for f in "${files[#]}";do
awk '/Annual Compensation/{f=1} f{print; if (/<\/TABLE>/) exit}' "$f" > "$f"_new.csv
done
Alternative code:
files+=($(grep -rl "Annual Compensation" /directoryX/*))
for f in "${files[#]}";do
awk '/Annual Compensation/{f=1} f{print; if (/<\/TABLE>/) exit}' "$f" > "$f"_new.csv
In both cases, the grep results and awk results are not verified by me - it is just a copy - paste of your code.

how to remove <Technology> and </Technology> words from a file using shell script?

My text file contains 100 lines and the text file surely contains Technology and /Technology words .In which ,I want to remove Technology and /Technology words present in the file using shell scripting.
sed -i.bak -e 's#/Technology##g' -e 's#Technology##g' my_text_file
This is delete the words and also make a backup of the original file just in case
sed -i -e 's#/Technology##g' -e 's#Technology##g' my_text_file
This will not make a backup but just modify the original file
You can try this one.
sed -r 's/<[\/]*technology>//g' a
Here is an awk
cat file
This Technology
More here
Mine /Technology must go
awk '{gsub(/\/*Technology/,"")}1' file
This
More here
Mine must go
By adding an extra space in the regex, it will not leave an extra space in the output.
awk '{gsub(/\/*Technology /,"")}1' file
This Technology
More here
Mine must go
To write back to original file
awk '{gsub(/\/*Technology /,"")}1' file > tmp && mv tmp file
If you have gnu awk 4.1+ you can do
awk -i '{gsub(/\/*Technology /,"")}1' file

bash script to add current date in each record as first column

how to add the current date in a each record as first column.
Input file:
12345|Test1
67890|Test2
expected Output file:
2014-04-26|12345|Test1
2014-04-26|67890|Test2
Thanks,
sed -e "s,^,$(date +'%Y-%M-%d')|," file
If you use Linux (more specifically, GNU sed) then you may use in-place editing with -i flag:
sed -i -e "s,^,$(date +'%Y-%M-%d')|," file
Otherwise you have to store results into a temporary file and then rename.
You could use awk
awk -vOFS='|' -vcdate=$(date '+%Y-%m-%d') ' {print cdate, $0}' file
You can use sed for example:
sed -i "/^$/ !s/^/`date +"%Y-%m-%d"`|/" data_file
If you want to edit the file, why not use ed, the standard editor? the common and nice versions of ed will support the following:
printf '%s\n' "$(date '+%%s/^/%Y-%m-%d|/')" wq | ed -s file
(this will edit the file in place, so make sure you have appropriate backups if you want to revert the changes).

Resources