Reformatting a csv file, script is confused by ' %." ' - bash

I'm using bash on cygwin.
I have to take a .csv file that is a subset of a much larger set of settings and shuffle the new csv settings (same keys, different values) into the 1000-plus-line original, making a new .json file.
I have put together a script to automate this. The first step in the process is to "clean up" the csv file by extracting lines that start with "mme " and "sms ". Everything else is to pass through cleanly to the "clean" .csv file.
This routine is as follows:
# clean up the settings, throwing out mme and sms entries
cat extract.csv | while read -r LINE; do
if [[ $LINE == "mme "* ]]
then
printf "$LINE\n" >> mme_settings.csv
elif [[ $LINE == "sms "* ]]
then
printf "$LINE\n" >> sms_settings.csv
else
printf "$LINE\n" >> extract_clean.csv
fi
done
My problem is that this thing stubs its toe on the following string at the end of one entry: 100%." When it's done with the line, it simply elides the %." and the new-line marker following it, and smears the two lines together:
... 100next.entry.keyname...
I would love to reach in and simply manually delimit the % sign, but it's not a realistic option for my use case. Clearly I'm missing something. My suspicion is that I am in some wise abusing cat or read in the first line.
If there is some place I should have looked to find the answer before bugging you all, by all means point me in that direction and I'll sod off.

Syntax for printf is :
printf format [argument]...
In [ printf ] format string, anything followed by % is a format specifier as described in the link above. What you would like to do is :
while read -r line; do # Replaced LINE with line, full uppercase variable are reserved for the syste,
if [[ "$line" = "mme "* ]] # Here* would glob for anything that comes next
then
printf "%s\n" $line >> mme_settings.csv
elif [[ "$line" = "sms "* ]]
then
printf "%s\n" $line >> sms_settings.csv
else
printf "%s\n" $line >> extract_clean.csv
fi
done<extract.csv # Avoided the useless use of cat

As pointed out, your problem is expanding a parameter containing a formatting instruction in the formatting argument of printf, which can be solved by using echo instead or moving the parameter to be expanded out of the formatting string, as demonstrated in other answers.
I recommend not looping over your whole file with Bash in the first place, as it's notoriously slow; you're extracting lines starting with certain patterns, which is a job at which grep excels:
grep '^mme ' extract.csv > mme_settings.csv
grep '^sms ' extract.csv > sms_settings.csv
grep -v '^mme \|^sms ' extract.csv > extract_clean.csv
The third command uses the -v option (extract lines that don't match) and alternation to exclude lines both starting with mme and sms.

Related

Copy number of line composed by special character in bash

I have an exercise where I have a file and at the begin of it I have something like
#!usr/bin/bash
# tototata
#tititutu
#ttta
Hello world
Hi
Test test
#zabdazj
#this is it
And I have to take each first line starting with a # until the line where I don't have one and stock it in a variable. In case of a shebang, it has to skip it and if there's blank space between lines, it has to skip them too. We just want the comment between the shebang and the next character.
I'm new to bash and I would like to know if there's a way to do it please ?
Expected output:
# tototata
#tititutu
#ttta
Try in this easy way to better understand.
#!/bin/bash
sed 1d your_input_file | while read line;
do
check=$( echo $line | grep ^"[#;]" )
if ([ ! -z "$check" ] || [ -z "$line" ])
then
echo $line;
else
exit 1;
fi
done
This may be more correct, although your question was unclear about weather the input file had a script shebang, if the shebang had to be skipped to match your sample output, or if the input file shebang was just bogus.
It is also unclear for what to do, if the first lines of the input file are not starting with #.
You should really post your assignment's text as a reference.
Anyway here is a script that does collects first set of consecutive lines starting with a sharp # into the arr array variable.
It may not be an exact solution to your assignment (witch you should be able to solve with what your previous lessons taught you), but will get you some clues and keys to iterate reading lines from a file and testing that lines starts with a #.
#!/usr/bin/env bash
# Our variable to store parsed lines
# Is an array of strings with an entry per line
declare -a arr=()
# Iterate reading lines from the file
# while it matches Regex: ^[#]
# mean while lines starts with a sharp #
while IFS=$'\n' read -r line && [[ "$line" =~ ^[#] ]]; do
# Add line to the arr array variable
arr+=("$line")
done <a.txt
# Print each array entries with a newline
printf '%s\n' "${arr[#]}"
How about this (not tested, so you may have to debug it a bit, but my comments in the code should explain what is going on):
while read line
do
# initial is 1 one the first line, and 0 after this. When the script starts,
# the variable is undefined.
: ${initial:=1}
# Test for lines starting with #. Need to quote the hash
# so that it is not taken as comment.
if [[ $line == '#'* ]]
then
# Test for initial #!
if (( initial == 1 )) && [[ $line == '#!'* ]]
then
: # ignore it
else
echo $line # or do whatever you want to do with it
fi
fi
# stop on non-blank, non-comment line
if [[ $line != *[^\ ]* ]]
then
break
fi
initial=0 # Next line won't be an initial line
done < your_file

Counting newline characters in bash shell script

I cannot get this script to work at all. I am just trying to count the number of lines in a file WITHOUT using wc. here is what I have so far
FILE=file.txt
lines=0
while IFS= read -n1 char
do
if [ "$char" == "\n" ]
then
lines=$((lines+1))
fi
done < $FILE
this is just a small part of a bigger script that should count total words, characters and lines in a file. I cannot figure any of it out though. Please help
The problem is the if-statement conditional is never true.. Its as if the program cannot detect what a '\n' is.
declare -i lines=0 words=0 chars=0
while IFS= read -r line; do
((lines++))
array=($line) # don't quote the var to enable word splitting
((words += ${#array[#]}))
((chars += ${#line} + 1)) # add 1 for the newline
done < "$filename"
echo "$lines $words $chars $filename"
You have two problems there. They are fixed in the following:
#!/bin/bash
file=file.txt
lines=0
while IFS= read -rN1 char; do
if [[ "$char" == $'\n' ]]; then
((++lines))
fi
done < "$file"
One problem was the $'\n' in the test, the other one, more subtle, was that you need to use the -N switch, not the -n one in read (help read for more information). Oh, and you also want to use the -r option (check with and without, when you have backslashes in your file).
Minor things I changed: Use more robust [[...]], used lower case variable names (it's considered bad practice to have upper case variable names). Used arithmetic ((++lines)) instead of the silly lines=$((lines+1)).

How to printf a variable length line in fixed length chunks?

I need to to analyze (with grep) and print (with some formatting) the content of an
app's log.
This log contains text data in variable length lines. What I need is, after some grepping, loop each line of this output and print it with a maximum fixed length of 50 characters. If a line is longer than 50 chars, it should print a newline and then continue with the rest in the following line and so on until the line is completed.
I tried to use printf to do this, but it's not working and I don't know why. It just outputs the lines in same fashion of echo, without any consideration about printf formatting, though the \t character (tab) works.
function printContext
{
str="$1"
log="$2"
tmp="/tmp/deluge/$$"
rm -f $tmp
echo ""
echo -e "\tLog entries for $str :"
ln=$(grep -F "$str" "$log" &> "$tmp" ; cat "$tmp" | wc -l)
if [ $ln -gt 0 ];
then
while read line
do
printf "\t%50s\n" "$line"
done < $tmp
fi
}
What's wrong? I Know that I can make a substring routine to accomplish this task, but printf should be handy for stuff like this.
Instead of:
printf "\t%50s\n" "$line"
use
printf "\t%.50s\n" "$line"
to truncate your line to 50 characters only.
I'm not sure about printf but seeing as how perl is installed everywhere, how about a simple 1 liner?
echo $ln | perl -ne ' while( m/.{1,50}/g ){ print "$&\n" } '
Here's a clunky bash-only way to break the string into 50-character chunks
i=0
chars=50
while [[ -n "${y:$((chars*i)):$chars}" ]]; do
printf "\t%s\n" "${y:$((chars*i)):$chars}"
((i++))
done

Read a config file in BASH without using "source"

I'm attempting to read a config file that is formatted as follows:
USER = username
TARGET = arrows
I realize that if I got rid of the spaces, I could simply source the config file, but for security reasons I'm trying to avoid that. I know there is a way to read the config file line by line. I think the process is something like:
Read lines into an array
Filter out all of the lines that start with #
search for the variable names in the array
After that I'm lost. Any and all help would be greatly appreciated. I've tried something like this with no success:
backup2.config>cat ~/1
grep '^[^#].*' | while read one two;do
echo $two
done
I pulled that from a forum post I found, just not sure how to modify it to fit my needs since I'm so new to shell scripting.
http://www.linuxquestions.org/questions/programming-9/bash-shell-program-read-a-configuration-file-276852/
Would it be possible to automatically assign a variable by looping through both arrays?
for (( i = 0 ; i < ${#VALUE[#]} ; i++ ))
do
"${NAME[i]}"=VALUE[i]
done
echo $USER
Such that calling $USER would output "username"? The above code isn't working but I know the solution is something similar to that.
The following script iterates over each line in your input file (vars in my case) and does a pattern match against =. If the equal sign is found it will use Parameter Expansion to parse out the variable name from the value. It then stores each part in it's own array, name and value respectively.
#!/bin/bash
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=${line%% =*}
value[i]=${line#*= }
((i++))
fi
done < vars
echo "total array elements: ${#name[#]}"
echo "name[0]: ${name[0]}"
echo "value[0]: ${value[0]}"
echo "name[1]: ${name[1]}"
echo "value[1]: ${value[1]}"
echo "name array: ${name[#]}"
echo "value array: ${value[#]}"
Input
$ cat vars
sdf
USER = username
TARGET = arrows
asdf
as23
Output
$ ./varscript
total array elements: 2
name[0]: USER
value[0]: username
name[1]: TARGET
value[1]: arrows
name array: USER TARGET
value array: username arrows
First, USER is a shell environment variable, so it might be better if you used something else. Using lowercase or mixed case variable names is a way to avoid name collisions.
#!/bin/bash
configfile="/path/to/file"
shopt -s extglob
while IFS='= ' read lhs rhs
do
if [[ $lhs != *( )#* ]]
then
# you can test for variables to accept or other conditions here
declare $lhs=$rhs
fi
done < "$configfile"
This sets the vars in your file to the value associated with it.
echo "Username: $USER, Target: $TARGET"
would output
Username: username, Target: arrows
Another way to do this using keys and values is with an associative array:
Add this line before the while loop:
declare -A settings
Remove the declare line inside the while loop and replace it with:
settings[$lhs]=$rhs
Then:
# set keys
user=USER
target=TARGET
# access values
echo "Username: ${settings[$user]}, Target: ${settings[$target]}"
would output
Username: username, Target: arrows
I have a script which only takes a very limited number of settings, and processes them one at a time, so I've adapted SiegeX's answer to whitelist the settings I care about and act on them as it comes to them.
I've also removed the requirement for spaces around the = in favour of ignoring any that exist using the trim function from another answer.
function trim()
{
local var=$1;
var="${var#"${var%%[![:space:]]*}"}"; # remove leading whitespace characters
var="${var%"${var##*[![:space:]]}"}"; # remove trailing whitespace characters
echo -n "$var";
}
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
setting_name=$(trim "${line%%=*}");
setting_value=$(trim "${line#*=}");
case "$setting_name" in
max_foos)
prune_foos $setting_value;
;;
max_bars)
prune_bars $setting_value;
;;
*)
echo "Unrecognised setting: $setting_name";
;;
esac;
fi
done <"$config_file";
Thanks SiegeX. I think the later updates you mentioned does not reflect in this URL.
I had to edit the regex to remove the quotes to get it working. With quotes, array returned is empty.
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=${line%% =*}
value[i]=${line##*= }
((i++))
fi
done < vars
A still better version is .
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=`echo $line | cut -d'=' -f 1`
value[i]=`echo $line | cut -d'=' -f 2`
((i++))
fi
done < vars
The first version is seen to have issues if there is no space before and after "=" in the config file. Also if the value is missing, i see that the name and value are populated as same. The second version does not have any of these. In addition it trims out unwanted leading and trailing spaces.
This version reads values that can have = within it. Earlier version splits at first occurance of =.
i=0
while read line; do
if [[ "$line" =~ ^[^#]*= ]]; then
name[i]=`echo $line | cut -d'=' -f 1`
value[i]=`echo $line | cut -d'=' -f 2-`
((i++))
fi
done < vars

How can I get my bash script to work?

My bash script doesn't work the way I want it to:
#!/bin/bash
total="0"
count="0"
#FILE="$1" This is the easier way
for FILE in $*
do
# Start processing all processable files
while read line
do
if [[ "$line" =~ ^Total ]];
then
tmp=$(echo $line | cut -d':' -f2)
count=$(expr $count + 1)
total=$(expr $total + $tmp)
fi
done < $FILE
done
echo "The Total Is: $total"
echo "$FILE"
Is there another way to modify this script so that it reads arguments into $1 instead of $FILE? I've tried using a while loop:
while [ $1 != "" ]
do ....
done
Also when I implement that the code repeats itself. Is there a way to fix that as well?
Another problem that I'm having is that when I have multiple files hi*.txt it gives me duplicates. Why? I have files like hi1.txt hi1.txt~ but the tilde file is of 0 bytes, so my script shouldn't be finding anything.
What i have is fine, but could be improved. I appreciate your awk suggestions but its currently beyond my level as a unix programmer.
Strager: The files that my text editor generates automatically contain nothing..it is of 0 bytes..But yeah i went ahead and deleted them just to be sure. But no my script is in fact reading everything twice. I suppose its looping again when it really shouldnt. I've tried to silence that action with the exit commands..But wasnt successful.
while [ "$1" != "" ]; do
# Code here
# Next argument
shift
done
This code is pretty sweet, but I'm specifying all the possible commands at one time. Example: hi[145].txt
If supplied would read all three files at once.
Suppose the user enters hi*.txt;
I then get all my hi files read twice and then added again.
How can I code it so that it reads my files (just once) upon specification of hi*.txt?
I really think that this is because of not having $1.
It looks like you are trying to add up the totals from the lines labelled 'Total:' in the files provided. It is always a good idea to state what you're trying to do - as well as how you're trying to do it (see How to Ask Questions the Smart Way).
If so, then you're doing in about as complicated a way as I can see. What was wrong with:
grep '^Total:' "$#" |
cut -d: -f2 |
awk '{sum += $1}
END { print sum }'
This doesn't print out "The total is" etc; and it is not clear why you echo $FILE at the end of your version.
You can use Perl or any other suitable program in place of awk; you could do the whole job in Perl or Python - indeed, the cut work could be done by awk:
grep "^Total:" "$#" |
awk -F: '{sum += $2}
END { print sum }'
Taken still further, the whole job could be done by awk:
awk -F: '$1 ~ /^Total/ { sum += $2 }
END { print sum }' "$#"
The code in Perl wouldn't be much harder and the result might be quicker:
perl -na -F: -e '$sum += $F[1] if m/^Total:/; END { print $sum; }' "$#"
When iterating over the file name arguments provided in a shell script, you should use '"$#"' in place of '$*' as the latter notation does not preserve spaces in file names.
Your comment about '$1' is confusing to me. You could be asking to read from the file whose name is in $1 on each iteration; that is done using:
while [ $# -gt 0 ]
do
...process $1...
shift
done
HTH!
If you define a function, it'll receive the argument as $1. Why is $1 more valuable to you than $FILE, though?
#!/bin/sh
process() {
echo "doing something with $1"
}
for i in "$#" # Note use of "$#" to not break on filenames with whitespace
do
process "$i"
done
while [ "$1" != "" ]; do
# Code here
# Next argument
shift
done
On your problem with tilde files ... those are temporary files created by your text editor. Delete them if you don't want them to be matched by your glob expression (wildcard). Otherwise, filter them in your script (not recommended).

Resources