Add filename of each file as a separator row when merging into a single file Bash Script - bash

I have the current script which combines all the CSV files in a folder into a single CSV file and it works great. I need to add functionality to add the filename of the original csv's as a header row for each data block so I know which section is which.
Can someone assist as this is not by strong point and I am over my head
#!/bin/bash
OutFileName="./Data/all/all.csv" # Fix the output name
i=0 # Reset a counter
for filename in ./Data/all/*.csv; do
if [ "$filename" != "$OutFileName" ] ; # Avoid recursion
then
if [[ $i -eq 0 ]] ; then
head -1 $filename > $OutFileName # Copy header if it is the first file
fi
tail -n +2 $filename >> $OutFileName # Append from the 2nd line each file
i=$(( $i + 1 )) # Increase the counter
fi
done
I will be automating this and using and run shell script in apple automator.
Thank you got any help.
This is one of the files that are imported and output example
Example of current input file
Once combined I need the filename where the "headers are"

When you want to generate something like ...
Header1,Header2,Header3
file1.csv
a,b,c
x,y,z
file2.csv
1,2,3
9,9,9
file3.csv
...
... then you just have to insert an echo "$filename" >> "$OutFileName" in front of the tail command. Here is an updated version of your script with some minor improvements.
#!/bin/bash
out="./Data/all/all.csv"
i=0
rm -f "$out"
for file in ./Data/all/*.csv; do
(( i++ == 0)) && head -1 "$file"
echo "$file"
tail -n +2 "$file"
done > "$out"

There is no concept of "header line" other than the first line of the CSV file. What you can do is add a new column.
I've switched to Awk because it simplifies the script considerably. Your original would be literally a one-liner.
awk -F , 'NR==1 { OFS=FS; $(NF+1) = "Filename" }
FNR>1{ $(NF+1) = FILENAME }1' all/*.csv >all.csv
Not saving the output in the same directory as the inputs removes the pesky corner case handling.

Related

How to compare 2 files word by word and storing the different words in result output file

Suppose there are two files:
File1.txt
My name is Anamika.
File2.txt
My name is Anamitra.
I want result file storing:
Result.txt
Anamika
Anamitra
I use putty so can't use wdiff, any other alternative.
not my greatest script, but it works. Other might come up with something more elegant.
#!/bin/bash
if [ $# != 2 ]
then
echo "Arguments: file1 file2"
exit 1
fi
file1=$1
file2=$2
# Do this for both files
for F in $file1 $file2
do
if [ ! -f $F ]
then
echo "ERROR: $F does not exist."
exit 2
else
# Create a temporary file with every word from the file
for w in $(cat $F)
do
echo $w >> ${F}.tmp
done
fi
done
# Compare the temporary files, since they are now 1 word per line
# The egrep keeps only the lines diff starts with > or <
# The awk keeps only the word (i.e. removes < or >)
# The sed removes any character that is not alphanumeric.
# Removes a . at the end for example
diff ${file1}.tmp ${file2}.tmp | egrep -E "<|>" | awk '{print $2}' | sed 's/[^a-zA-Z0-9]//g' > Result.txt
# Cleanup!
rm -f ${file1}.tmp ${file2}.tmp
This uses a trick with the for loop. If you use a for to loop on a file, it will loop on each word. NOT each line like beginners in bash tend to believe. Here it is actually a nice thing to know, since it transforms the files into 1 word per line.
Ex: file content == This is a sentence.
After the for loop is done, the temporary file will contain:
This
is
a
sentence.
Then it is trivial to run diff on the files.
One last detail, your sample output did not include a . at the end, hence the sed command to keep only alphanumeric charactes.

Extract a line from a text file using grep?

I have a textfile called log.txt, and it logs the file name and the path it was gotten from. so something like this
2.txt
/home/test/etc/2.txt
basically the file name and its previous location. I want to use grep to grab the file directory save it as a variable and move the file back to its original location.
for var in "$#"
do
if grep "$var" log.txt
then
# code if found
else
# code if not found
fi
this just prints out to the console the 2.txt and its directory since the directory has 2.txt in it.
thanks.
Maybe flip the logic to make it more efficient?
f=''
while read prev
do case "$prev" in
*/*) f="${prev##*/}"; continue;; # remember the name
*) [[ -e "$f" ]] && mv "$f" "$prev";;
done < log.txt
That walks through all the files in the log and if they exist locally, move them back. Should be functionally the same without a grep per file.
If the name is always the same then why save it in the log at all?
If it is, then
while read prev
do f="${prev##*/}" # strip the path info
[[ -e "$f" ]] && mv "$f" "$prev"
done < <( grep / log.txt )
Having the file names on the same line would significantly simplify your script. But maybe try something like
# Convert from command-line arguments to lines
printf '%s\n' "$#" |
# Pair up with entries in file
awk 'NR==FNR { f[$0]; next }
FNR%2 { if ($0 in f) p=$0; else p=""; next }
p { print "mv \"" p "\" \"" $0 "\"" }' - log.txt |
sh
Test it by replacing sh with cat and see what you get. If it looks correct, switch back.
Briefly, something similar could perhaps be pulled off with printf '%s\n' "$#" | grep -A 1 -Fxf - log.txt but you end up having to parse the output to pair up the output lines anyway.
Another solution:
for f in `grep -v "/" log.txt`; do
grep "/$f" log.txt | xargs -I{} cp $f {}
done
grep -q (for "quiet") stops the output

How to browse a line from a file?

I have a file that contains 10 lines with this sort of content:
aaaa,bbb,132,a.g.n.
I wanna walk throw every line, char by char and put the data before the " , " is met in an output file.
if [ $# -eq 2 ] && [ -f $1 ]
then
echo "Read nr of fields to be saved or nr of commas."
read n
nrLines=$(wc -l < $1)
while $nrLines!="1" read -r line || [[ -n "$line" ]]; do
do
for (( i=1; i<=$n; ++i ))
do
while [ read -r -n1 temp ]
do
if [ temp != "," ]
then
echo $temp > $(result$i)
else
fi
done
paste -d"\n" $2 $(result$i)
done
nrLines=$($nrLines-1)
done
else
echo "File not found!"
fi
}
In parameter $2 I have an empty file in which I will store the data from file $1 after I extract it without the " , " and add a couple of comments.
Example:
My input_file contains:
a.b.c.d,aabb,comp,dddd
My output_file is empty.
I call my script: ./script.sh input_file output_file
After execution the output_file contains:
First line info: a.b.c.d
Second line info: aabb
Third line info: comp
(yes, without the 4th line info)
You can do what you want very simply with parameter-expansion and substring-removal using bash alone. For example, take an example file:
$ cat dat/10lines.txt
aaaa,bbb,132,a.g.n.
aaaa,bbb,133,a.g.n.
aaaa,bbb,134,a.g.n.
aaaa,bbb,135,a.g.n.
aaaa,bbb,136,a.g.n.
aaaa,bbb,137,a.g.n.
aaaa,bbb,138,a.g.n.
aaaa,bbb,139,a.g.n.
aaaa,bbb,140,a.g.n.
aaaa,bbb,141,a.g.n.
A simple one-liner using native bash string handling could simply be the following and give the following results:
$ while read -r line; do echo ${line%,*}; done <dat/10lines.txt
aaaa,bbb,132
aaaa,bbb,133
aaaa,bbb,134
aaaa,bbb,135
aaaa,bbb,136
aaaa,bbb,137
aaaa,bbb,138
aaaa,bbb,139
aaaa,bbb,140
aaaa,bbb,141
Paremeter expansion w/substring removal works as follows:
var=aaaa,bbb,132,a.g.n.
Beginning at the left and removing up to, and including, the first ',' is:
${var#*,} # bbb,132,a.g.n.
Beginning at the left and removing up to, and including, the last ',' is:
${var##*,} # a.g.n.
Beginning at the right and removing up to, and including, the first ',' is:
${var%,*} # aaaa,bbb,132
Beginning at the left and removing up to, and including, the last ',' is:
${var%%,*} # aaaa
Note: the text to remove above is represented with a wildcard '*', but wildcard use is not required. It can be any allowable text. For example, to only remove ,a.g.n where the preceding number is 136, you can do the following:
${var%,136*},136 # aaaa,bbb,136 (all others unchanged)
To print 2016 th line from a file named file.txt u have to run a command like this-
sed -n '2016p' < file.txt
More-
sed -n '2p' < file.txt
will print 2nd line
sed -n '2011p' < file.txt
2011th line
sed -n '10,33p' < file.txt
line 10 up to line 33
sed -n '1p;3p' < file.txt
1st and 3th line
and so on...
For more detail, please have a look in this tutorial and this answer.
In native bash the following should do what you want, assuming you replace the contents of your script.sh with the below:
#!/bin/bash
IN_FILE=${1}
OUT_FILE=${2}
IFS=\,
while read line; do
set -- ${line}
for ((i=1; i<=${#}; i++)); do
((${i}==4)) && continue
((n+=1))
printf '%s\n' "Line ${n} info: ${!i}"
done
done < ${IN_FILE} > ${OUT_FILE}
This will not print the 4th field of each line within the input file, on a new line in the output file (I assume this is your requirement as per your comment?).
[wspace#wspace sandbox]$ awk -F"," 'BEGIN{OFS="\n"}{for(i=1; i<=NF-1; i++){print "line Info: "$i}}' data.txt
line Info: a.b.c.d
line Info: aabb
line Info: comp
This little snippet can ignore the last field.
updated:
#!/usr/bin/env bash
if [ ! -f "$1" -o $# -ne 2 ];then
echo "Usage: $(basename $0) input_file out_file"
exit 127
fi
input_file=$1
output_file=$2
: > $output_file
if [ "$(wc -l < $1)" -ne 0 ];then
while true
do
read -r -n1 char
if [ "$char" == "" ];then
break
elif [ $char != "," ];then
temp=$temp$char
else
echo "line info: $temp" >> $output_file
temp=""
fi
done < $input_file
else
echo "file $1 is empty"
fi
Maybe this is what you want
Did you try
sed "s|,|\n|g" $1 | head -n -1 > $2
I assume that only the last word would not have a comma on its right.
Try this (tested with you sample line) :
#!/bin/bash
# script.sh
echo "Number of fields to save ?"
read nf
while IFS=$',' read -r -a arr; do
newarr=${arr[#]:0:${nf}}
done < "$1"
for i in ${newarr[#]};do
printf "%s\n" $i
done > "$2"
Execute script with :
$ ./script.sh inputfile outputfile
Number of fields ?
3
$ cat outputfile
a.b.c.d
aabb
comp
All words separated with commas are stored into an array $arr
A tmp array $newarr removes last $n element ($n get the read command).
It loops over new array and prints result in $2, the outputfile.

Bash script to remove redundant lines

Good afternoon,
I'm trying to make a bash script that cleans out some data output files. The files look like this:
/path/
/path/to
/path/to/keep
/another/
/another/path/
/another/path/to
/another/path/to/keep
I'd like to end up with this:
/path/to/keep
/another/path/to/keep
I want to cycle through lines of the file, checking the next line to see if it contains the current line, and if so, delete the current line from the file. Here's my code:
for LINE in $(cat bbutters_data2.txt)
do
grep -A1 ${LINE} bbutters_data2.txt
if [ $? -eq 0 ]
then
sed -i '/${LINE}/d' ./bbutters_data2.txt
fi
done
Assuming that your input file is sorted in the way that you have shown:
$ awk 'NR>1 && substr($0,1,length(last))!=last {print last;} {last=$0;} END{print last}' file
/path/to/keep
/another/path/to/keep
How it works
awk reads through the input file line by line. Every time we read a new line, we compare it to the last. If the new line does not contain the last line, then we print the last line. In more detail:
NR>1 && substr($0,1,length(last))!=last {print last;}
If this is not the first line and if the last line, called last, is not contained in the current line, $0, then print the last line.
last=$0
Update the variable last to the current line.
END{print last}
After we finish reading the file, print the last line.
I like the awk solution, but bash itself can handle the task. Note: the solution (both awk and bash), require that the lesser included paths be listed in increasing order. Here is an alternative bash solution (bash only due to the glob match operation):
#!/bin/bash
fn="${1:-/dev/stdin}" ## accept filename or stdin
[ -r "$fn" ] || { ## validate file is readable
printf "error: file not found: '%s'\n" "$fn"
exit 1
}
declare -i cnt=0 ## flag for 1st iteration
while read -r line; do ## for each line in file
## if 1st iteration, fill 'last', increment 'cnt', continue
[ $cnt -eq 0 ] && { last="$line"; ((cnt++)); continue; }
## while 'line' is a child of 'last', continue, else print
[[ $line = "${last%/}"/* ]] || printf "%s\n" "$last"
last="$line" ## update last=$line
done <"$fn"
[ ${#line} -eq 0 ] && ## print last line (updated for non POSIX line end)
printf "%s\n" "$last" ||
printf "%s\n" "$line"
exit 0
Output
$ bash path_uniql.sh < dat/incpaths.txt
/path/to/keep
/another/path/to/keep

I need to parse a log file into multiple files based on delimiters

I have a log file which i need to parse it into multiple files.
############################################################################################
6610
############################################################################################
GTI02152 I gtirreqi 20130906 000034 TC SJ014825 GTT_E_REQ_INF テーブル挿入件数 16件
############################################################################################
Z5000
############################################################################################
GTP10000 I NIPS gtgZ5000 20130906 000054 TC SJ014825 シェル開始
############################################################################################
I need to create files like 6610.txt which will have all values under 6610 like(GTI02152..) and for z5000(GTP10000) respectively. Any help will be greatly appreciated!
Below script would help you to get the information. You can modify them to create the data you require.
#!/bin/sh
cmd=`cat data.dat | paste -d, - - - - - | cut -d ',' -f 2,4 > file.out`
$cmd
while read p; do
fileName=`echo $p | cut -d ',' -f 1`
echo $fileName
dataInfo=`echo $p | cut -d ',' -f 2`
echo $dataInfo
done< file.out
Here's an awk styled answer:
I put the following into a file named awko and chmod +x it to use it:
#!/usr/bin/awk -f
BEGIN { p = 0 } # look for filename flag - start at zero
/^\#/ { p = !p } # turn it on to find the filename
# either make a filename or write to the last filename based on the flag
$0 !~ /^\#/ {
if( p == 1 ) filename = $1 ".txt"
else print $0 > filename
}
Running awko data.txt produced two files, 6610.txt and Z5000.txt from your example data. It's capable of sending more data lines to the output files as well.
You can do it with Ruby as well:
ruby -e 'File.read(ARGV.shift).scan(/^[^#].*?(?=^[#])/m).each{|e| name = e.split[0]; File.write("#{name}.txt", e)}' file
Example output:
> for A in *.txt; do echo "---- $A ----"; cat "$A"; done
---- 6610.txt ----
6610
---- GTI02152.txt ----
GTI02152 I gtirreqi 20130906 000034 TC SJ014825 GTT_E_REQ_INF テーブル挿入件数 16件
---- GTP10000.txt ----
GTP10000 I NIPS gtgZ5000 20130906 000054 TC SJ014825 シェル開始
---- Z5000.txt ----
Z5000
This script makes the following assumptions:
Each record is separated by an empty line
#### lines are purely comment/space filler and can be ignored during parsing
The first line of each record (ignoring ####) contains the basename for the filename
The name of the logfile is passed as the first argument to this script.
#!/bin/bash
# write records to this temporary file, rename later
tempfile=$(mktemp)
while read line; do
if [[ $line == "" ]] ; then
# line is empty - separator - save existing record and start a new one
mv $tempfile $filename
filename=""
tempfile=$(mktemp)
else
# output non-empty line to record file
echo $line >> $tempfile
if [[ $filename == "" ]] ; then
# we haven't yet figured out the filename for this record
if [[ $line =~ ^#+$ ]] ; then
# ignore #### comment lines
:
else
# 1st non-comment line in record is filename
filename=${line}.txt
fi
fi
fi
done < $1
# end of input file might not have explicit empty line separator -
# make sure last record file is moved correctly
if [[ -e $tempfile ]] ; then
mv $tempfile $filename
fi

Resources