awk parse filename and add result to the end of each line - bash

I have number of files which have similar names like
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out
etc.
I need to get number before .csv(1 or 2) from the file name and put it into end of every line in file with TAB separator.
I have written this code, it finds number that I need, but i do not know how to put this number into file. There is space in the filename, my script breaks because of it.
Also I am not sure, how to send to script list of files. Now I am working only with one file.
My code:
#!/bin/sh
string="DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out"
out=$(echo $string | awk 'BEGIN {FS="_"};{print substr ($7,0,1)}')
awk ' { print $0"\t$out" } ' $string

for file in *
do
sfx=$(echo "$file" | sed 's/.*_\(.*\).csv.*/\1/')
sed -i "s/$/\t$sfx/" "$file"
done

Using sed:
$ sed 's/.*_\(.*\).csv.*/&\t\1/' file
DWH_Export_AUSTA_20120701_20120731_v1_1.csv.397.dat.2012-10-02 04-01-46.out 1
DWH_Export_AUSTA_20120701_20120731_v1_2.csv.397.dat.2012-10-02 04-03-12.out 2
DWH_Export_AUSTA_20120801_20120831_v1_1.csv.397.dat.2012-10-02 04-04-16.out 1
To make this for many files:
sed 's/.*_\(.*\).csv.*/&\t\1/' file1 file2 file3
OR
sed 's/.*_\(.*\).csv.*/&\t\1/' file*
To make this changed get saved in the same file(If you have GNU sed):
sed -i 's/.*\(.\).csv.*/&\t\1/' file

Untested, but this should do what you want (extract the number before .csv and append that number to the end of every line in the .out file)
awk 'FNR==1 { split(FILENAME, field, /[_.]/) }
{ print $0"\t"field[7] > FILENAME"_aaaa" }' *.out
for file in *_aaaa; do mv "$file" "${file/_aaaa}"; done

If I understood correctly, you want to append the number from the filename to every line in that file - this should do it:
#!/bin/bash
while [[ 0 < $# ]]; do
num=$(echo "$1" | sed -r 's/.*_([0-9]+).csv.*/\t\1/' )
#awk -e "{ print \$0\"\t${num}\"; }" < "$1" > "$1.new"
#sed -r "s/$/\t$num/" < "$1" > "$1.mew"
#sed -ri "s/$/\t$num/" "$1"
shift
done
Run the script and give it names of the files you want to process. $# is the number of command line arguments for the script which is decremented at the end of the loop by shift, which drops the first argument, and shifts the other ones. Extract the number from the filename and pick one of the three commented lines to do the appending: awk gives you more flexibility, first sed creates new files, second sed processes them in-place (in case you are running GNU sed, that is).

Instead of awk, you may want to go with sed or coreutils.
Grab number from filename, with grep for variety:
num=$(<<<filename grep -Eo '[^_]+\.csv' | cut -d. -f1)
<<<filename is equivalent to echo filename.
With sed
Append num to each line with GNU sed:
sed "s/\$/\t$num" filename
Use the -i switch to modify filename in-place.
With paste
You also need to know the length of the file for this method:
len=$(<filename wc -l)
Combine filename and num with paste:
paste filename <(seq $len | while read; do echo $num; done)
Complete example
for filename in DWH_Export*; do
num=$(echo $filename | grep -Eo '[^_]+\.csv' | cut -d. -f1)
sed -i "s/\$/\t$num" $filename
done

Related

How to write a command line script that will loop through every line in a text file and append a string at the end of each? [duplicate]

How do I add a string after each line in a file using bash? Can it be done using the sed command, if so how?
If your sed allows in place editing via the -i parameter:
sed -e 's/$/string after each line/' -i filename
If not, you have to make a temporary file:
typeset TMP_FILE=$( mktemp )
touch "${TMP_FILE}"
cp -p filename "${TMP_FILE}"
sed -e 's/$/string after each line/' "${TMP_FILE}" > filename
I prefer echo. using pure bash:
cat file | while read line; do echo ${line}$string; done
I prefer using awk.
If there is only one column, use $0, else replace it with the last column.
One way,
awk '{print $0, "string to append after each line"}' file > new_file
or this,
awk '$0=$0"string to append after each line"' file > new_file
If you have it, the lam (laminate) utility can do it, for example:
$ lam filename -s "string after each line"
Pure POSIX shell and sponge:
suffix=foobar
while read l ; do printf '%s\n' "$l" "${suffix}" ; done < file |
sponge file
xargs and printf:
suffix=foobar
xargs -L 1 printf "%s${suffix}\n" < file | sponge file
Using join:
suffix=foobar
join file file -e "${suffix}" -o 1.1,2.99999 | sponge file
Shell tools using paste, yes, head
& wc:
suffix=foobar
paste file <(yes "${suffix}" | head -$(wc -l < file) ) | sponge file
Note that paste inserts a Tab char before $suffix.
Of course sponge can be replaced with a temp file, afterwards mv'd over the original filename, as with some other answers...
This is just to add on using the echo command to add a string at the end of each line in a file:
cat input-file | while read line; do echo ${line}"string to add" >> output-file; done
Adding >> directs the changes we've made to the output file.
Sed is a little ugly, you could do it elegantly like so:
hendry#i7 tmp$ cat foo
bar
candy
car
hendry#i7 tmp$ for i in `cat foo`; do echo ${i}bar; done
barbar
candybar
carbar

One line command with variable, word count and zcat

I have many files on a server which contains many lines:
201701010530.contentState.csv.gz
201701020530.contentState.csv.gz
201701030530.contentState.csv.gz
201701040530.contentState.csv.gz
I would like with one line command this result:
170033|20170101
169865|20170102
170010|20170103
170715|20170104
The goal is to have the number of lines of each file, just by keeping the date which is already in the filename of the file.
I tried this but the result is not in one line but two...
for f in $(ls -1 2017*gz);do zcat $f | wc -l;echo $f | awk '{print substr($0,1,8)}';done
Thanks in advance guys.
Just use zcat file | wc -l to get the number of lines.
For the name, I understand it is enough to extract the first 8 characters:
$ t="201701030530.contentState.csv.gz"
$ echo "${t:0:8}"
20170103
All together:
for file in 2017*gz;
do
lines=$(zcat "$file" | wc -l)
printf "%s|%s\n" "$lines" "${file:0:8}"
done > myresult.csv
Note the usage of for file in 2017*gz; to go through the files matching the 2017*gz pattern: this suffices, no need to parse ls!
Use zgrep -c ^ file to count the lines, here encapsulated in awk:
$ awk 'FNR==1{ "zgrep -c ^ " FILENAME | getline s; print s "|" substr(FILENAME,1,8) }' *.gz
12|20170101
The whole "zgrep -c ^ " FILENAME should probably be in a var (s) and then s | getline s.

How to remove a filename from the list of path in Shell

I would like to remove a file name only from the following configuration file.
Configuration File -- test.conf
knowledgebase/arun/test.rf
knowledgebase/arunraj/tester/test.drl
knowledgebase/arunraj2/arun/test/tester.drl
The above file should be read. And removed contents should went to another file called output.txt
Following are my try. It is not working to me at all. I am getting empty files only.
#!/bin/bash
file=test.conf
while IFS= read -r line
do
# grep --exclude=*.drl line
# awk 'BEGIN {getline line ; gsub("*.drl","", line) ; print line}'
# awk '{ gsub("/",".drl",$NF); print line }' arun.conf
# awk 'NF{NF--};1' line arun.conf
echo $line | rev | cut -d'/' -f 1 | rev >> output.txt
done < "$file"
Expected Output :
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test
There's the dirname command to make it easy and reliable:
#!/bin/bash
file=test.conf
while IFS= read -r line
do
dirname "$line"
done < "$file" > output.txt
There are Bash shell parameter expansions that will work OK with the list of names given but won't work reliably for some names:
file=test.conf
while IFS= read -r line
do
echo "${line%/*}"
done < "$file" > output.txt
There's sed to do the job — easily with the given set of names:
sed 's%/[^/]*$%%' test.conf > output.txt
It's harder if you have to deal with names like /plain.file (or plain.file — the same sorts of edge cases that trip up the shell expansion).
You could add Perl, Python, Awk variants to the list of ways of doing the job.
You can get the path like this:
path=${fullpath%/*}
It cuts away the string after the last /
Using awk one liner you can do this:
awk 'BEGIN{FS=OFS="/"} {NF--} 1' test.conf
Output:
knowledgebase/arun
knowledgebase/arunraj/tester
knowledgebase/arunraj2/arun/test

Print text at specified line number bash

I want to print the text at a specified line number from a file.
Here is my bash script
line=12
sed -n "$line{p;q;}"
My line number comes in a variable. But the above code is not working. What should I do?
Using sed
line=12
sed -n "${line}p" my_file
# Multiple lines
line1=10
line2=15
sed -n "${line1},${line2}p" my_file
In awk:
awk "NR==${line}" my_file
# Multiple lines
awk "NR >= ${line1} && NR <= ${line2}" my_file
Or using head and tail but probably not as efficient:
head -${line} my_file | tail -1
# Multiple lines
head -${line2} my_file | tail -$(($line2-$line1+1))
You have to give the file name as an argument to sed.
line=12
sed -n "$line{p;q;}" filename
If you are passing the filename as an argument to a bash script, you need to use:
line=12
sed -n "$line{p;q;}" "$1"
Fast sed command (useful for bigger files) is:
n=12; sed $n'q;d' file

Speed up bash filter function to run commands consecutively instead of per line

I have written the following filter as a function in my ~/.bash_profile:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
exit 0
}
to find lines of anything piped into it matching a regular expression, and highlight matches using ANSI escape codes on a VT100-compatible terminal.
For example, the following finds and highlights the strings bin, U or 1 which are whole words in the last 10 lines of /etc/passwd:
tail /etc/passwd | hilite "\b(bin|[U1])\b"
However, the script runs very slowly as each line forks an echo, egrep and sed.
In this case, it would be more efficient to do egrep on the entire input, and then run sed on its output.
How can I modify my function to do this? I would prefer to not create any temporary files if possible.
P.S. Is there another way to find and highlight lines in a similar way?
sed can do a bit of grepping itself: if you give it the -n flag (or #n instruction in a script) it won't echo any output unless asked. So
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
could be simplified to
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
EDIT:
Here's the whole function:
hilite() {
REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g");
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
}
That's all there is to it - no while loop, reading, grepping, etc.
If your egrep supports --color, just put this in .bash_profile:
hilite() { command egrep --color=auto "$#"; }
(Personally, I would name the function egrep; hence the usage of command).
I think you can replace the whole while loop with simply
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
because sed can read from stdin line-by-line so you don't need read
I'm not sure if running egrep and piping to sed is faster than using sed alone, but you can always compare using time.
Edit: added -n and p to sed to print only highlighted lines.
Well, you could simply do this:
egrep "$1" $line | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
But I'm not sure that it'll be that much faster ; )
Just for the record, this is a method using a temporary file:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
export FILE=$2
if [ -z "$FILE" ]
then
export FILE=~/tmp
echo -n > $FILE
while read line
do
echo $line >> $FILE
done
fi
egrep "$1" $FILE | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
return $?
}
which also takes a file/pathname as the second argument, for case like
cat /etc/passwd | hilite "\b(bin|[U1])\b"

Resources