Best way to merge two lines with same pattern - shell

I have a text file like below
Input:
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
I am wondering the best way to merge two lines into:
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0

With this as the input file:
$ cat file
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
We can get the output you want with:
$ awk -F, -v OFS=, 'NR==1{first=$0;next;} {print first,$6,$7;}' file
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0

This is a more general solution that reads both files, item by item, where items are separated by comma. After the first mismatch, remaining items from the first line are appended to the output, followed by remaining items from the second line.
The most complicated tool this uses is sed. Looking at it again, even sed can be replaced.
#!/bin/bash
inFile="$1"
tmp=$(mktemp -d)
sed -n '1p' <"$inFile" | tr "," "\n" > "$tmp/in1"
sed -n '2p' <"$inFile" | tr "," "\n" > "$tmp/in2"
{ while true; do
read -r f1 <&3; r1=$?
read -r f2 <&4; r2=$?
[ $r1 -ne 0 ] || [ $r2 -ne 0 ] && break
[ $r1 -ne 0 ] && echo "$f2"
[ $r2 -ne 0 ] && echo "$f1"
if [ "$f1" == "$f2" ]; then
echo "$f1"
else
while echo "$f1"; do
read -r f1 <&3 || break
done
while echo "$f2"; do
read -r f2 <&4 || break
done
fi
done; } 3<"$tmp/in1" 4<"$tmp/in2" | tr '\n' ',' | sed 's/.$/\n/'
rm -rf "$tmp"
Assuming your input file looks like this:
$ cat in.txt
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
You can then run the script as:
$ ./merge.sh in.txt
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0

Related

Bash - Extract Matching String from GZIP Files Is Running Very Slow

Complete novice in Bash. Trying to iterate thru 1000 gzip files, may be GNU parallel is the solution??
#!/bin/bash
ctr=0
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
for f in "$dir"/*.gz; do
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr" >> $1
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym
else
ctr=$((ctr+1))
fi
done
done
Any help to speed the process will be greatly appreciated !!!
#!/bin/bash
ctr=0
export ctr
echo "file_name,symbol,record_count" > $1
dir="/data/myfolder"
export dir
doit() {
f="$1"
gunzip -c $f | while read line;
do
str=`echo $line | cut -d"|" -f1`
if [ "$str" == "H" ]; then
if [ $ctr -gt 0 ]; then
echo "$f,$sym,$ctr"
fi
ctr=0
sym=`echo $line | cut -d"|" -f3`
echo $sym >&2
else
ctr=$((ctr+1))
fi
done
}
export -f doit
parallel doit ::: *gz 2>&1 > $1
The Bash while read loop is probably your main bottleneck here. Calling multiple external processes for simple field splitting will exacerbate the problem. Briefly,
while IFS="|" read -r first second third rest; do ...
leverages the shell's built-in field splitting functionality, but you probably want to convert the whole thing to a simple Awk script anyway.
echo "file_name,symbol,record_count" > "$1"
for f in "/data/myfolder"/*.gz; do
gunzip -c "$f" |
awk -F "\|" -v f="$f" -v OFS="," '
/H/ { if(ctr) print f, sym, ctr
ctr=0; sym=$3;
print sym >"/dev/stderr"
next }
{ ++ctr }'
done >>"$1"
This vaguely assumes that printing the lone sym is just for diagnostics. It should hopefully not be hard to see how this can be refactored if this is an incorrect assumption.

Linux: Appending values into files, to the end of particular lines, and at the bottom of the file if there is on "key"

I have one file, file1, that has values like so:
key1|value1|
key2|value2|
key3|value3|
I have another file, file2, that has key based values I would like to add to add to file1:
key2 value4
key3 value5
key4 value6
I would like to add values to file1 to lines where the "key" matches, and if there is no "key" in file1, simply adding the new key & value to the bottom:
key1|value1|
key2|value2|value4|
key3|value3|value5|
key4|value6|
It seems like this is something that could be done with 2 calls to awk, but I am not familiar enough with it. I'm also open to using bash or shell commands.
UPDATE
I found this to work
awk 'NR==FNR {a[$1]=$2; next} {print $1,$2,a[$1];delete a[$1]}END{for(k in a) print k,a[k]}' file2 file1
The only deviation from the desired output is that keys from file1 that are not in file2 are not known AOT, so they are printed at the end to keep things semi-online:
awk -v first=data1.txt -f script.awk data2.txt
BEGIN {
OLD=FS
FS="|"
while (getline < first)
table[$1] = $0
OFS=FS
FS=OLD
}
!($1 in table) {
queue[$1] = $0
}
$1 in table {
id=$1
gsub(FS, OFS)
sub(/[^|]*\|/, "")
print table[id] $0 OFS
delete table[id]
}
END {
for (id in table)
print table[id]
for (id in queue) {
gsub(FS, OFS, queue[id])
print queue[id] OFS
}
}
key2|value2|value4|
key3|value3|value5|
key1|value1|
key4|value6|
this is the LOL answer ... ha ha . I basically loop over both keeping track of them and sort ... silly'ish , probably not even something you would want to use bash for perhaps ..
declare -a checked
checked=()
file="/tmp/file.txt"
> "${file}"
while IFS= read -r line1 ;do
key1=$(echo $line1 | cut -d'|' -f1)
if ! grep -qi ${key1} "/tmp/file2.txt" ; then
echo "$line1" >> "${file}"
continue
fi
while IFS= read -r line2 ;do
key2=$(echo $line2 | cut -d' ' -f1)
if ! grep -qi ${key2} "/tmp/file1.txt" ; then
if ! [[ "${checked[#]}" =~ $key2 ]] ;then
echo "$(echo $line2| awk '{print $1"|"$2}')|" >> "${file}"
checked+=(${key2})
continue
fi
fi
if [[ "$key2" == "$key1" ]] ;then
echo "${line1}$(echo $line2 | cut -d' ' -f2-)|" >> "${file}"
continue
fi
done < "/tmp/file2.txt"
done < "/tmp/file1.txt"
sort -k2 -n ${file}
[[ -f "${file}" ]] && rm -f "${file}"
Output:
key1|value1|
key2|value2|value4|
key3|value3|value5|
key4|value6|

Bash, deleting specific row from file

I have a file with filename and path to the file
I want to delete the the rows which have files that do not exist anymore
file.txt (For now all existing files):
file1;~/Documents/test/123
file2;~/Documents/test/456
file3;~/Test
file4;~/Files/678
Now if I delete any of the given files(file 2 AND file4 fore example) and run my script I want it to test if the file in the given row exists and remove the row if it does not
file.txt(after removing file2, file4):
file1;~/Documents/test/123
file3;~/Test
What I got so far(Not working at all):
-Does not want to run at all
#!/bin/sh
backup=`cat file.txt`
rm -f file.txt
touch file.txt
while read -r line
do
dir=`echo "$line" | awk -F';' '{print $2}'`
file=`echo "$line" | awk -F';' '{print $1}'`
if [ -f "$dir"/"$file" ];then
echo "$line" >> file.txt
fi
done << "$backup"
Here's one way:
tmp=$(mktemp)
while IFS=';' read -r file rest; do
[ -f "$file" ] && printf '%s;%s\n' "$file" "$rest"
done < file.txt > "$tmp" && mv "$tmp" file.txt
or if you don't want a temp file for some reason:
tmp=()
while IFS=';' read -r file rest; do
[ -f "$file" ] && tmp+=( "$file;$rest" )
done < file.txt &&
printf '%s\n' "${tmp[#]}" > file.txt
Both are untested but should be very close if not exactly correct.
If I understand, this should do it.
touch file.txt file2.txt
for i in `cat file.txt`; do
fp=`echo $i|cut -d ';' -f2`
if [ -e $fp ];then
echo "$i" >> file2.txt
fi
done
mv file2.txt file.txt

bash, adding string after a line

I'm trying to put together a bash script that will search a bunch of files and if it finds a particular string in a file, it will add a new line on the line after that string and then move on to the next file.
#! /bin/bash
echo "Creating variables"
SEARCHDIR=testfile
LINENUM=1
find $SEARCHDIR* -type f -name *.xml | while read i; do
echo "Checking $i"
ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
cat $i | while read LINE; do
((LINENUM=LINENUM+1))
if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
echo "editing $i"
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
fi
done
fi
LINENUM=1
done
the bit I'm having trouble with is
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
if I just use $i at the end, it will output the content to the screen, if I use $i > $i then it will just erase the file and if I use $i >> $i it will get stuck in a loop until the disk fills up.
any suggestions?
Unfortunately awk dosen't have an in-place replacement option, similar to sed's -i, so you can create a temp file and then remove it:
awk '{commands}' file > tmpfile && mv tmpfile file
or if you have GNU awk 4.1.0 or newer, the -i inplace is added, so you can do:
awk -i inplace '{commands}' file
to modify the original
#cat $i | while read LINE; do
# ((LINENUM=LINENUM+1))
# if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
# echo "editing $i"
# awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
# fi
# done
# replaced by
sed -i 's/STRING_TO_SEARCH_FOR/&\n/g' ${i}
or use awk in place of sed
also
# ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
# if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
#by
if [ $( grep -c 'STRING_TO_SEARCH_FOR' ${i} ) -gt 0 ]; then
# if file are huge, if not directly used sed on it, it will be faster (but no echo about finding the file)
If you can, maybe use a temporary file?
~$ awk ... $i > tmpfile
~$ mv tmpfile $i
Or simply awk ... $i > tmpfile && mv tmpfile $i
Note that, you can use mktemp to create this temporary file.
Otherwise, with sed you can insert a line right after a match:
~$ cat f
auie
nrst
abcd
efgh
1234
~$ sed '/abcd/{a\
new_line
}' f
auie
nrst
abcd
new_line
efgh
1234
The command search if the line matches /abcd/, if so, it will append (a\) the line new_line.
And since sed as the -i to replace inline, you can do:
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
echo "editing $i"
sed -i "/STRING_TO_SEARCH_FOR/{a
\new line to insert
}" $i
fi

bash: only process line if not in second file

I have this block of code:
while IFS=$'\n' read -r line || [[ -n "$line" ]]; do
if [ "$line" != "" ]; then
echo -e "$lanIP\t$line" >> /tmp/ipList;
fi
done < "/tmp/includeList"
I know this must be really simple. But I have another list (/tmp/excludeList). I only want to echo the line within my while loop if the line ins't found in my excludeList. How do I do that. Is there some awk statement or something?
You can do this with grep alone:
$ cat file
blue
green
red
yellow
pink
$ cat exclude
green
pink
$ grep -vx -f exclude file
blue
red
yellow
The -v flag tells grep to only output the lines in file that are not found in exclude and the -x flags forces whole line matching.
use grep
while IFS=$'\n' read -r line || [[ -n "$line" ]]; do
if [[ -n ${line} ]] \
&& ! grep -xF "$line" excludefile &>/dev/null; then
echo -e "$lanIP\t$line" >> /tmp/ipList;
fi
done < "/tmp/includeList"
the -n $line means if $line is not empty
the grep returns true if $line is found in exclude file which is inverted by the ! so returns true if the line is not found.
-x means line matched so nothing else can appear on the line
-F means fixed string so if any metacharacters end up in $line they'll be matched literally.
Hope this helps
With awk:
awk -v ip=$lanIP -v OFS="\t" '
NR==FNR {exclude[$0]=1; next}
/[^[:space:]]/ && !($0 in exclude) {print ip, $0}
' /tmp/excludeList /tmp/includeList > /tmpipList
This reads the exclude list info an array (as the array keys) -- the NR==FNR condition is true while awk is reading the first file from the arguments. Then, while reading the include file, if the current line contains a non-space character and it does not exist in the exclude array, print it.
The equivalent with grep:
grep -vxF -f /tmp/excludeList /tmp/includeList | while IFS= read -r line; do
[[ -n "$line" ]] && printf "%s\t%s\n" "$ipList" "$line"
done > /tmp/ipList

Resources