Bash XSV auto populate empty values with CSV column - bash

I have a CSV export that I need to map to new values to in order to then import into a different system. I am using ArangoDB to create this data migration mapping.
Below is the full script used:
#!/bin/bash
execute () {
filepath=$1
prefix=$2
keyField=$3
filename=`basename "${filename%.csv}"`
collection="$prefix$filename"
filepath="/data-migration/$filepath"
# Check for "_key" column
if ! xsv headers "$1" | grep -q _key
# Add "_key" column using the keyfield provided
then
xsv select $keyField "$1" | sed -e "1s/$keyField/_key/" > "$1._key"
xsv cat columns "$1" "$1._key" > "$1.cat"
mv "$1.cat" "$1"
rm "$1._key"
fi
# Import CSV into Arango Collection
docker exec arango arangoimp --collection "$collection" --type csv "$filepath" --server.password ''
}
# This single line runs the execute() above
execute 'myDirectory/myFile.csv' prefix_ OLD_ORG_ID__C
So far I've deduced the $keyField (OLD_ORG_ID__C) parameter passed to the execute() function, is used in the loop of the script. This looks for $keyField column and then migrates the values to a newly created _key column using the XSV toolkit.
OLD_ORG_ID__C | _key
A123 -> A123
B123 -> B123
-> ## <-auto populate
Unfortunately not every row has a value for the OLD_ORG_ID__C column and as a result the _key for that row is also empty which then causes the import to Arango to fail.
Note: This _key field is necessary for my AQL scripts to work properly
How can I rewrite the loop to auto-index the blank values?
then
xsv select $keyField "$1" | sed -e "1s/$keyField/_key/" > "$1._key"
xsv cat columns "$1" "$1._key" > "$1.cat"
mv "$1.cat" "$1"
rm "$1._key"
fi
Is there a better way to solve this issue? Perhaps xsv sort by the keyField and then auto populate the from the blank rows to the end?
UPDATE: Per the comments/answer I tried something along these lines but so far still not working
#!/bin/bash
execute () {
filepath=$1
prefix=$2
keyField=$3
filename=`basename "${filename%.csv}"`
collection="$prefix$filename"
filepath="/data-migration/$filepath"
# Check for "_key" column
if ! xsv headers "$1" | grep -q _key
# Add "_key" column using the keyfield provided
then
awk -F, 'NR==1 { for(i=1; i<=NF;++i) if ($i == "'$keyField'") field=i; print; next }
$field == "" { $field = "_generated_" ++n }1' $1 > $1-test.csv
fi
}
# import a single collection if needed
execute 'agas/Account.csv' agas_ OLD_ORG_ID__C
This creates a Account-test.csv file but unfortunately it does not have the "_key" column or and changes to the OLD_ORG_ID__C values. Preferably I would only want to see the "_key" values populated with auto-numbered values when OLD_ORG_ID__C is blank, otherwise they should copy the provided value.

If your question is "how can I find from the first header line of a CSV file which field is named OLD_ORG_ID__C, then on subsequent lines put a unique value in this column if it is empty" try something like
awk -F, 'NR==1 { for(i=1; i<=NF;++i) if ($i == "OLD_ORG_ID__C") field=i ; print; next }
$field == "" { $field = "_generated_" ++n }1' file >newfile
This has no provision for coping with complexities like quoted fields with embedded commas. (I have no idea what xsv is but maybe it would be better equipped for such scenarios?)
If I can guess what this code does
xsv select $keyField "$1" |
sed -e "1s/$keyField/_key/" > "$1._key"
then probably you could replace it with something like
xsv select "$keyField" "$1" |
awk -v field="$keyField" 'NR==1 { $0 = field }
/^$/ { $0 = NR } 1' >"$1._key"
to replace the first line with the value of $keyField and replace any subsequent empty lines with their line number.

Related

How can extract a value from .ini using sed [duplicate]

I have a parameters.ini file, such as:
[parameters.ini]
database_user = user
database_version = 20110611142248
I want to read in and use the database version specified in the parameters.ini file from within a bash shell script so I can process it.
#!/bin/sh
# Need to get database version from parameters.ini file to use in script
php app/console doctrine:migrations:migrate $DATABASE_VERSION
How would I do this?
How about grepping for that line then using awk
version=$(awk -F "=" '/database_version/ {print $2}' parameters.ini)
You can use bash native parser to interpret ini values, by:
$ source <(grep = file.ini)
Sample file:
[section-a]
var1=value1
var2=value2
IPS=( "1.2.3.4" "1.2.3.5" )
To access variables, you simply printing them: echo $var1. You may also use arrays as shown above (echo ${IPS[#]}).
If you only want a single value just grep for it:
source <(grep var1 file.ini)
For the demo, check this recording at asciinema.
It is simple as you don't need for any external library to parse the data, but it comes with some disadvantages. For example:
If you have spaces between = (variable name and value), then you've to trim the spaces first, e.g.
$ source <(grep = file.ini | sed 's/ *= */=/g')
Or if you don't care about the spaces (including in the middle), use:
$ source <(grep = file.ini | tr -d ' ')
To support ; comments, replace them with #:
$ sed "s/;/#/g" foo.ini | source /dev/stdin
The sections aren't supported (e.g. if you've [section-name], then you've to filter it out as shown above, e.g. grep =), the same for other unexpected errors.
If you need to read specific value under specific section, use grep -A, sed, awk or ex).
E.g.
source <(grep = <(grep -A5 '\[section-b\]' file.ini))
Note: Where -A5 is the number of rows to read in the section. Replace source with cat to debug.
If you've got any parsing errors, ignore them by adding: 2>/dev/null
See also:
How to parse and convert ini file into bash array variables? at serverfault SE
Are there any tools for modifying INI style files from shell script
Sed one-liner, that takes sections into account. Example file:
[section1]
param1=123
param2=345
param3=678
[section2]
param1=abc
param2=def
param3=ghi
[section3]
param1=000
param2=111
param3=222
Say you want param2 from section2. Run the following:
sed -nr "/^\[section2\]/ { :l /^param2[ ]*=/ { s/[^=]*=[ ]*//; p; q;}; n; b l;}" ./file.ini
will give you
def
Bash does not provide a parser for these files. Obviously you can use an awk command or a couple of sed calls, but if you are bash-priest and don't want to use any other shell, then you can try the following obscure code:
#!/usr/bin/env bash
cfg_parser ()
{
ini="$(<$1)" # read the file
ini="${ini//[/\[}" # escape [
ini="${ini//]/\]}" # escape ]
IFS=$'\n' && ini=( ${ini} ) # convert to line-array
ini=( ${ini[*]//;*/} ) # remove comments with ;
ini=( ${ini[*]/\ =/=} ) # remove tabs before =
ini=( ${ini[*]/=\ /=} ) # remove tabs after =
ini=( ${ini[*]/\ =\ /=} ) # remove anything with a space around =
ini=( ${ini[*]/#\\[/\}$'\n'cfg.section.} ) # set section prefix
ini=( ${ini[*]/%\\]/ \(} ) # convert text2function (1)
ini=( ${ini[*]/=/=\( } ) # convert item to array
ini=( ${ini[*]/%/ \)} ) # close array parenthesis
ini=( ${ini[*]/%\\ \)/ \\} ) # the multiline trick
ini=( ${ini[*]/%\( \)/\(\) \{} ) # convert text2function (2)
ini=( ${ini[*]/%\} \)/\}} ) # remove extra parenthesis
ini[0]="" # remove first element
ini[${#ini[*]} + 1]='}' # add the last brace
eval "$(echo "${ini[*]}")" # eval the result
}
cfg_writer ()
{
IFS=' '$'\n'
fun="$(declare -F)"
fun="${fun//declare -f/}"
for f in $fun; do
[ "${f#cfg.section}" == "${f}" ] && continue
item="$(declare -f ${f})"
item="${item##*\{}"
item="${item%\}}"
item="${item//=*;/}"
vars="${item//=*/}"
eval $f
echo "[${f#cfg.section.}]"
for var in $vars; do
echo $var=\"${!var}\"
done
done
}
Usage:
# parse the config file called 'myfile.ini', with the following
# contents::
# [sec2]
# var2='something'
cfg.parser 'myfile.ini'
# enable section called 'sec2' (in the file [sec2]) for reading
cfg.section.sec2
# read the content of the variable called 'var2' (in the file
# var2=XXX). If your var2 is an array, then you can use
# ${var[index]}
echo "$var2"
Bash ini-parser can be found at The Old School DevOps blog site.
Just include your .ini file into bash body:
File example.ini:
DBNAME=test
DBUSER=scott
DBPASSWORD=tiger
File example.sh
#!/bin/bash
#Including .ini file
. example.ini
#Test
echo "${DBNAME} ${DBUSER} ${DBPASSWORD}"
All of the solutions I've seen so far also hit on commented out lines. This one didn't, if the comment code is ;:
awk -F '=' '{if (! ($0 ~ /^;/) && $0 ~ /database_version/) print $2}' file.ini
You may use crudini tool to get ini values, e.g.:
DATABASE_VERSION=$(crudini --get parameters.ini '' database_version)
one of more possible solutions
dbver=$(sed -n 's/.*database_version *= *\([^ ]*.*\)/\1/p' < parameters.ini)
echo $dbver
Display the value of my_key in an ini-style my_file:
sed -n -e 's/^\s*my_key\s*=\s*//p' my_file
-n -- do not print anything by default
-e -- execute the expression
s/PATTERN//p -- display anything following this pattern
In the pattern:
^ -- pattern begins at the beginning of the line
\s -- whitespace character
* -- zero or many (whitespace characters)
Example:
$ cat my_file
# Example INI file
something = foo
my_key = bar
not_my_key = baz
my_key_2 = bing
$ sed -n -e 's/^\s*my_key\s*=\s*//p' my_file
bar
So:
Find a pattern where the line begins with zero or many whitespace characters,
followed by the string my_key, followed by zero or many whitespace characters, an equal sign, then zero or many whitespace characters again. Display the rest of the content on that line following that pattern.
Similar to the other Python answers, you can do this using the -c flag to execute a sequence of Python statements given on the command line:
$ python3 -c "import configparser; c = configparser.ConfigParser(); c.read('parameters.ini'); print(c['parameters.ini']['database_version'])"
20110611142248
This has the advantage of requiring only the Python standard library and the advantage of not writing a separate script file.
Or use a here document for better readability, thusly:
#!/bin/bash
python << EOI
import configparser
c = configparser.ConfigParser()
c.read('params.txt')
print c['chassis']['serialNumber']
EOI
serialNumber=$(python << EOI
import configparser
c = configparser.ConfigParser()
c.read('params.txt')
print c['chassis']['serialNumber']
EOI
)
echo $serialNumber
sed
You can use sed to parse the ini configuration file, especially when you've section names like:
# last modified 1 April 2001 by John Doe
[owner]
name=John Doe
organization=Acme Widgets Inc.
[database]
# use IP address in case network name resolution is not working
server=192.0.2.62
port=143
file=payroll.dat
so you can use the following sed script to parse above data:
# Configuration bindings found outside any section are given to
# to the default section.
1 {
x
s/^/default/
x
}
# Lines starting with a #-character are comments.
/#/n
# Sections are unpacked and stored in the hold space.
/\[/ {
s/\[\(.*\)\]/\1/
x
b
}
# Bindings are unpacked and decorated with the section
# they belong to, before being printed.
/=/ {
s/^[[:space:]]*//
s/[[:space:]]*=[[:space:]]*/|/
G
s/\(.*\)\n\(.*\)/\2|\1/
p
}
this will convert the ini data into this flat format:
owner|name|John Doe
owner|organization|Acme Widgets Inc.
database|server|192.0.2.62
database|port|143
database|file|payroll.dat
so it'll be easier to parse using sed, awk or read by having section names in every line.
Credits & source: Configuration files for shell scripts, Michael Grünewald
Alternatively, you can use this project: chilladx/config-parser, a configuration parser using sed.
For people (like me) looking to read INI files from shell scripts (read shell, not bash) - I've knocked up the a little helper library which tries to do exactly that:
https://github.com/wallyhall/shini (MIT license, do with it as you please. I've linked above including it inline as the code is quite lengthy.)
It's somewhat more "complicated" than the simple sed lines suggested above - but works on a very similar basis.
Function reads in a file line-by-line - looking for section markers ([section]) and key/value declarations (key=value).
Ultimately you get a callback to your own function - section, key and value.
Here is my version, which parses sections and populates a global associative array g_iniProperties with it.
Note that this works only with bash v4.2 and higher.
function parseIniFile() { #accepts the name of the file to parse as argument ($1)
#declare syntax below (-gA) only works with bash 4.2 and higher
unset g_iniProperties
declare -gA g_iniProperties
currentSection=""
while read -r line
do
if [[ $line = [* ]] ; then
if [[ $line = [* ]] ; then
currentSection=$(echo $line | sed -e 's/\r//g' | tr -d "[]")
fi
else
if [[ $line = *=* ]] ; then
cleanLine=$(echo $line | sed -e 's/\r//g')
key=$currentSection.$(echo $cleanLine | awk -F: '{ st = index($0,"=");print substr($0,0,st-1)}')
value=$(echo $cleanLine | awk -F: '{ st = index($0,"=");print substr($0,st+1)}')
g_iniProperties[$key]=$value
fi
fi;
done < $1
}
And here is a sample code using the function above:
parseIniFile "/path/to/myFile.ini"
for key in "${!g_iniProperties[#]}"; do
echo "Found key/value $key = ${g_iniProperties[$key]}"
done
Yet another implementation using awk with a little more flexibility.
function parse_ini() {
cat /dev/stdin | awk -v section="$1" -v key="$2" '
BEGIN {
if (length(key) > 0) { params=2 }
else if (length(section) > 0) { params=1 }
else { params=0 }
}
match($0,/#/) { next }
match($0,/^\[(.+)\]$/){
current=substr($0, RSTART+1, RLENGTH-2)
found=current==section
if (params==0) { print current }
}
match($0,/(.+)=(.+)/) {
if (found) {
if (params==2 && key==$1) { print $3 }
if (params==1) { printf "%s=%s\n",$1,$3 }
}
}'
}
You can use calling passing between 0 and 2 params:
cat myfile1.ini myfile2.ini | parse_ini # List section names
cat myfile1.ini myfile2.ini | parse_ini 'my-section' # Prints keys and values from a section
cat myfile1.ini myfile2.ini | parse_ini 'my-section' 'my-key' # Print a single value
complex simplicity
ini file
test.ini
[section1]
name1=value1
name2=value2
[section2]
name1=value_1
name2 = value_2
bash script with read and execute
/bin/parseini
#!/bin/bash
set +a
while read p; do
reSec='^\[(.*)\]$'
#reNV='[ ]*([^ ]*)+[ ]*=(.*)' #Remove only spaces around name
reNV='[ ]*([^ ]*)+[ ]*=[ ]*(.*)' #Remove spaces around name and spaces before value
if [[ $p =~ $reSec ]]; then
section=${BASH_REMATCH[1]}
elif [[ $p =~ $reNV ]]; then
sNm=${section}_${BASH_REMATCH[1]}
sVa=${BASH_REMATCH[2]}
set -a
eval "$(echo "$sNm"=\""$sVa"\")"
set +a
fi
done < $1
then in another script I source the results of the command and can use any variables within
test.sh
#!/bin/bash
source parseini test.ini
echo $section2_name2
finally from command line the output is thus
# ./test.sh
value_2
Some of the answers don't respect comments. Some don't respect sections. Some recognize only one syntax (only ":" or only "="). Some Python answers fail on my machine because of differing captialization or failing to import the sys module. All are a bit too terse for me.
So I wrote my own, and if you have a modern Python, you can probably call this from your Bash shell. It has the advantage of adhering to some of the common Python coding conventions, and even provides sensible error messages and help. To use it, name it something like myconfig.py (do NOT call it configparser.py or it may try to import itself,) make it executable, and call it like
value=$(myconfig.py something.ini sectionname value)
Here's my code for Python 3.5 on Linux:
#!/usr/bin/env python3
# Last Modified: Thu Aug 3 13:58:50 PDT 2017
"""A program that Bash can call to parse an .ini file"""
import sys
import configparser
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser(description="A program that Bash can call to parse an .ini file")
parser.add_argument("inifile", help="name of the .ini file")
parser.add_argument("section", help="name of the section in the .ini file")
parser.add_argument("itemname", help="name of the desired value")
args = parser.parse_args()
config = configparser.ConfigParser()
config.read(args.inifile)
print(config.get(args.section, args.itemname))
I wrote a quick and easy python script to include in my bash script.
For example, your ini file is called food.ini
and in the file you can have some sections and some lines:
[FRUIT]
Oranges = 14
Apples = 6
Copy this small 6 line Python script and save it as configparser.py
#!/usr/bin/python
import configparser
import sys
config = configparser.ConfigParser()
config.read(sys.argv[1])
print config.get(sys.argv[2],sys.argv[3])
Now, in your bash script you could do this for example.
OrangeQty=$(python configparser.py food.ini FRUIT Oranges)
or
ApplesQty=$(python configparser.py food.ini FRUIT Apples)
echo $ApplesQty
This presupposes:
you have Python installed
you have the configparser library installed (this should come with a std python installation)
Hope it helps
:¬)
The explanation to the answer for the one-liner sed.
[section1]
param1=123
param2=345
param3=678
[section2]
param1=abc
param2=def
param3=ghi
[section3]
param1=000
param2=111
param3=222
sed -nr "/^\[section2\]/ { :l /^\s*[^#].*/ p; n; /^\[/ q; b l; }" ./file.ini
To understand, it will be easier to format the line like this:
sed -nr "
# start processing when we found the word \"section2\"
/^\[section2\]/ { #the set of commands inside { } will be executed
#create a label \"l\" (https://www.grymoire.com/Unix/Sed.html#uh-58)
:l /^\s*[^#].*/ p;
# move on to the next line. For the first run it is the \"param1=abc\"
n;
# check if this line is beginning of new section. If yes - then exit.
/^\[/ q
#otherwise jump to the label \"l\"
b l
}
" file.ini
This script will get parameters as follow :
meaning that if your ini has :
pars_ini.ksh < path to ini file > < name of Sector in Ini file > < the name in name=value to return >
eg. how to call it :
[ environment ]
a=x
[ DataBase_Sector ]
DSN = something
Then calling :
pars_ini.ksh /users/bubu_user/parameters.ini DataBase_Sector DSN
this will retrieve the following "something"
the script "pars_ini.ksh" :
\#!/bin/ksh
\#INI_FILE=path/to/file.ini
\#INI_SECTION=TheSection
\# BEGIN parse-ini-file.sh
\# SET UP THE MINIMUM VARS FIRST
alias sed=/usr/local/bin/sed
INI_FILE=$1
INI_SECTION=$2
INI_NAME=$3
INI_VALUE=""
eval `sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
-e 's/;.*$//' \
-e 's/[[:space:]]*$//' \
-e 's/^[[:space:]]*//' \
-e "s/^\(.*\)=\([^\"']*\)$/\1=\"\2\"/" \
< $INI_FILE \
| sed -n -e "/^\[$INI_SECTION\]/,/^\s*\[/{/^[^;].*\=.*/p;}"`
TEMP_VALUE=`echo "$"$INI_NAME`
echo `eval echo $TEMP_VALUE`
This implementation uses awk and has the following advantages:
Will only return the first matching entry
Ignores lines that start with a ;
Trims leading and trailing whitespace, but not internal whitespace
Formatted version:
awk -F '=' '/^\s*database_version\s*=/ {
sub(/^ +/, "", $2);
sub(/ +$/, "", $2);
print $2;
exit;
}' parameters.ini
One-liner:
awk -F '=' '/^\s*database_version\s*=/ { sub(/^ +/, "", $2); sub(/ +$/, "", $2); print $2; exit; }' parameters.ini
You can use a CSV parser xsv as parsing INI data.
cargo install xsv
$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
$ xsv select -d "=" - <<< "$( cat /etc/*release )" | xsv search --no-headers --select 1 "DISTRIB_CODENAME" | xsv select 2
xenial
or from a file.
$ xsv select -d "=" - file.ini | xsv search --no-headers --select 1 "DISTRIB_CODENAME" | xsv select 2
My version of the one-liner
#!/bin/bash
#Reader for MS Windows 3.1 Ini-files
#Usage: inireader.sh
# e.g.: inireader.sh win.ini ERRORS DISABLE
# would return value "no" from the section of win.ini
#[ERRORS]
#DISABLE=no
INIFILE=$1
SECTION=$2
ITEM=$3
cat $INIFILE | sed -n /^\[$SECTION\]/,/^\[.*\]/p | grep "^[:space:]*$ITEM[:space:]*=" | sed s/.*=[:space:]*//
Just finished writing my own parser. I tried to use various parser found here, none seems to work with both ksh93 (AIX) and bash (Linux).
It's old programming style - parsing line by line. Pretty fast since it used few external commands. A bit slower because of all the eval required for dynamic name of the array.
The ini support 3 special syntaxs:
includefile=ini file -->
Load an additionnal ini file. Useful for splitting ini in multiple files, or re-use some piece of configuration
includedir=directory -->
Same as includefile, but include a complete directory
includesection=section -->
Copy an existing section to the current section.
I used all thoses syntax to have pretty complex, re-usable ini file. Useful to install products when installing a new OS - we do that a lot.
Values can be accessed with ${ini[$section.$item]}. The array MUST be defined before calling this.
Have fun. Hope it's useful for someone else!
function Show_Debug {
[[ $DEBUG = YES ]] && echo "DEBUG $#"
}
function Fatal {
echo "$#. Script aborted"
exit 2
}
#-------------------------------------------------------------------------------
# This function load an ini file in the array "ini"
# The "ini" array must be defined in the calling program (typeset -A ini)
#
# It could be any array name, the default array name is "ini".
#
# There is heavy usage of "eval" since ksh and bash do not support
# reference variable. The name of the ini is passed as variable, and must
# be "eval" at run-time to work. Very specific syntax was used and must be
# understood before making any modifications.
#
# It complexify greatly the program, but add flexibility.
#-------------------------------------------------------------------------------
function Load_Ini {
Show_Debug "$0($#)"
typeset ini_file="$1"
# Name of the array to fill. By default, it's "ini"
typeset ini_array_name="${2:-ini}"
typeset section variable value line my_section file subsection value_array include_directory all_index index sections pre_parse
typeset LF="
"
if [[ ! -s $ini_file ]]; then
Fatal "The ini file is empty or absent in $0 [$ini_file]"
fi
include_directory=$(dirname $ini_file)
include_directory=${include_directory:-$(pwd)}
Show_Debug "include_directory=$include_directory"
section=""
# Since this code support both bash and ksh93, you cannot use
# the syntax "echo xyz|while read line". bash doesn't work like
# that.
# It forces the use of "<<<", introduced in bash and ksh93.
Show_Debug "Reading file $ini_file and putting the results in array $ini_array_name"
pre_parse="$(sed 's/^ *//g;s/#.*//g;s/ *$//g' <$ini_file | egrep -v '^$')"
while read line; do
if [[ ${line:0:1} = "[" ]]; then # Is the line starting with "["?
# Replace [section_name] to section_name by removing the first and last character
section="${line:1}"
section="${section%\]}"
eval "sections=\${$ini_array_name[sections_list]}"
sections="$sections${sections:+ }$section"
eval "$ini_array_name[sections_list]=\"$sections\""
Show_Debug "$ini_array_name[sections_list]=\"$sections\""
eval "$ini_array_name[$section.exist]=YES"
Show_Debug "$ini_array_name[$section.exist]='YES'"
else
variable=${line%%=*} # content before the =
value=${line#*=} # content after the =
if [[ $variable = includefile ]]; then
# Include a single file
Load_Ini "$include_directory/$value" "$ini_array_name"
continue
elif [[ $variable = includedir ]]; then
# Include a directory
# If the value doesn't start with a /, add the calculated include_directory
if [[ $value != /* ]]; then
value="$include_directory/$value"
fi
# go thru each file
for file in $(ls $value/*.ini 2>/dev/null); do
if [[ $file != *.ini ]]; then continue; fi
# Load a single file
Load_Ini "$file" "$ini_array_name"
done
continue
elif [[ $variable = includesection ]]; then
# Copy an existing section into the current section
eval "all_index=\"\${!$ini_array_name[#]}\""
# It's not necessarily fast. Need to go thru all the array
for index in $all_index; do
# Only if it is the requested section
if [[ $index = $value.* ]]; then
# Evaluate the subsection [section.subsection] --> subsection
subsection=${index#*.}
# Get the current value (source section)
eval "value_array=\"\${$ini_array_name[$index]}\""
# Assign the value to the current section
# The $value_array must be resolved on the second pass of the eval, so make sure the
# first pass doesn't resolve it (\$value_array instead of $value_array).
# It must be evaluated on the second pass in case there is special character like $1,
# or ' or " in it (code).
eval "$ini_array_name[$section.$subsection]=\"\$value_array\""
Show_Debug "$ini_array_name[$section.$subsection]=\"$value_array\""
fi
done
fi
# Add the value to the array
eval "current_value=\"\${$ini_array_name[$section.$variable]}\""
# If there's already something for this field, add it with the current
# content separated by a LF (line_feed)
new_value="$current_value${current_value:+$LF}$value"
# Assign the content
# The $new_value must be resolved on the second pass of the eval, so make sure the
# first pass doesn't resolve it (\$new_value instead of $new_value).
# It must be evaluated on the second pass in case there is special character like $1,
# or ' or " in it (code).
eval "$ini_array_name[$section.$variable]=\"\$new_value\""
Show_Debug "$ini_array_name[$section.$variable]=\"$new_value\""
fi
done <<< "$pre_parse"
Show_Debug "exit $0($#)\n"
}
When I use a password in base64, I put the separator ":" because the base64 string may has "=". For example (I use ksh):
> echo "Abc123" | base64
QWJjMTIzCg==
In parameters.ini put the line pass:QWJjMTIzCg==, and finally:
> PASS=`awk -F":" '/pass/ {print $2 }' parameters.ini | base64 --decode`
> echo "$PASS"
Abc123
If the line has spaces like "pass : QWJjMTIzCg== " add | tr -d ' ' to trim them:
> PASS=`awk -F":" '/pass/ {print $2 }' parameters.ini | tr -d ' ' | base64 --decode`
> echo "[$PASS]"
[Abc123]
This uses the system perl and clean regular expressions:
cat parameters.ini | perl -0777ne 'print "$1" if /\[\s*parameters\.ini\s*\][\s\S]*?\sdatabase_version\s*=\s*(.*)/'
The answer of "Karen Gabrielyan" among another answers was the best but in some environments we dont have awk, like typical busybox, i changed the answer by below code.
trim()
{
local trimmed="$1"
# Strip leading space.
trimmed="${trimmed## }"
# Strip trailing space.
trimmed="${trimmed%% }"
echo "$trimmed"
}
function parseIniFile() { #accepts the name of the file to parse as argument ($1)
#declare syntax below (-gA) only works with bash 4.2 and higher
unset g_iniProperties
declare -gA g_iniProperties
currentSection=""
while read -r line
do
if [[ $line = [* ]] ; then
if [[ $line = [* ]] ; then
currentSection=$(echo $line | sed -e 's/\r//g' | tr -d "[]")
fi
else
if [[ $line = *=* ]] ; then
cleanLine=$(echo $line | sed -e 's/\r//g')
key=$(trim $currentSection.$(echo $cleanLine | cut -d'=' -f1'))
value=$(trim $(echo $cleanLine | cut -d'=' -f2))
g_iniProperties[$key]=$value
fi
fi;
done < $1
}
If Python is available, the following will read all the sections, keys and values and save them in variables with their names following the format "[section]_[key]". Python can read .ini files properly, so we make use of it.
#!/bin/bash
eval $(python3 << EOP
from configparser import SafeConfigParser
config = SafeConfigParser()
config.read("config.ini"))
for section in config.sections():
for (key, val) in config.items(section):
print(section + "_" + key + "=\"" + val + "\"")
EOP
)
echo "Environment_type: ${Environment_type}"
echo "Environment_name: ${Environment_name}"
config.ini
[Environment]
type = DEV
name = D01
If using sections, this will do the job :
Example raw output :
$ ./settings
[section]
SETTING_ONE=this is setting one
SETTING_TWO=This is the second setting
ANOTHER_SETTING=This is another setting
Regexp parsing :
$ ./settings | sed -n -E "/^\[.*\]/{s/\[(.*)\]/\1/;h;n;};/^[a-zA-Z]/{s/#.*//;G;s/([^ ]*) *= *(.*)\n(.*)/\3_\1='\2'/;p;}"
section_SETTING_ONE='this is setting one'
section_SETTING_TWO='This is the second setting'
section_ANOTHER_SETTING='This is another setting'
Now all together :
$ eval "$(./settings | sed -n -E "/^\[.*\]/{s/\[(.*)\]/\1/;h;n;};/^[a-zA-Z]/{s/#.*//;G;s/([^ ]*) *= *(.*)\n(.*)/\3_\1='\2'/;p;}")"
$ echo $section_SETTING_TWO
This is the second setting
I have nice one-liner (assuimng you have php and jq installed):
cat file.ini | php -r "echo json_encode(parse_ini_string(file_get_contents('php://stdin'), true, INI_SCANNER_RAW));" | jq '.section.key'
This thread does not have enough solutions to choose from, thus here my solution, it does not require tools like sed or awk :
grep '^\[section\]' -A 999 config.ini | tail -n +2 | grep -B 999 '^\[' | head -n -1 | grep '^key' | cut -d '=' -f 2
If your are to expect sections with more than 999 lines, feel free to adapt the example above. Note that you may want to trim the resulting value, to remove spaces or a comment string after the value. Remove the ^ if you need to match keys that do not start at the beginning of the line, as in the example of the question. Better, match explicitly for white spaces and tabs, in such a case.
If you have multiple values in a given section you want to read, but want to avoid reading the file multiple times:
CONFIG_SECTION=$(grep '^\[section\]' -A 999 config.ini | tail -n +2 | grep -B 999 '^\[' | head -n -1)
KEY1=$(echo ${CONFIG_SECTION} | tr ' ' '\n' | grep key1 | cut -d '=' -f 2)
echo "KEY1=${KEY1}"
KEY2=$(echo ${CONFIG_SECTION} | tr ' ' '\n' | grep key2 | cut -d '=' -f 2)
echo "KEY2=${KEY2}"

How to merge two or more lines if they start with the same word?

I have a file like this:
AAKRKA HIST1H1B AAGAGAAKRKATGPP
AAKRKA HIST1H1E RKSAGAAKRKASGPP
AAKRLN ACAT1 LMTADAAKRLNVTPL
AAKRLN SUCLG2 NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
I would like to kind of merge 2 lines if they are exactly the same in the 1st column. The desired output is:
AAKRKA HIST1H1B,HIST1H1E AAGAGAAKRKATGPP,RKSAGAAKRKASGPP
AAKRLN ACAT1,SUCLG2 LMTADAAKRLNVTPL,NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
Sometimes there could be more than two lines starting with the same word. How could I reach the desired output with bash/awk?
Thanks for help!
Since this resembles SQL like group operations, you can use sqlite which is available in bash
with the given inputs
$ cat aqua.txt
AAKRKA HIST1H1B AAGAGAAKRKATGPP
AAKRKA HIST1H1E RKSAGAAKRKASGPP
AAKRLN ACAT1 LMTADAAKRLNVTPL
AAKRLN SUCLG2 NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
$
Script:
$ cat ./sqlite_join.sh
#!/bin/sh
sqlite3 << EOF
create table data(a,b,c);
.separator ' '
.import $1 data
select a, group_concat(b) , group_concat(c) from data group by a;
EOF
$
Results
$ ./sqlite_join.sh aqua.txt
AAKRKA HIST1H1B,HIST1H1E AAGAGAAKRKATGPP,RKSAGAAKRKASGPP
AAKRLN ACAT1,SUCLG2 LMTADAAKRLNVTPL,NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA
$
This is a two-liner in awk; the first line stores the second and third fields in associative arrays indexed by the first field, accumulating fields with identical indices with leading commas before each field, and the second line iterates over the two arrays, deleting the leading comma on output:
{ second[$1] = second[$1] "," $2; third[$1] = third[$1] "," $3 }
END { for (i in second) print i, substr(second[i],2), substr(third[i],2) }
I made no assumptions about the order of the input or the output. If you want sorted output, pipe the output through sort. You can run the program at https://ideone.com/sbgLNk.
try this:
DATAFILE=data.txt
cut -d " " -f1 < $DATAFILE | sort | uniq |
while read key; do
column1="$key"
column2=""
column3=""
grep "$key" $DATAFILE |
while read line; do
set -- $line
[ -n "$column2" ] && [ -n "$2" ] && column2="$column2,"
[ -n "$column3" ] && [ -n "$3" ] && column3="$column3,"
column2="$column2$2"
column3="$column3$3"
echo "$column1 $column2 $column3"
done | tail -n1
done

Append data to another column in a CSV if duplicate is found in first column

I have a CSV with data such as:
somename1,value1
somename1,value2
somename1,value3
anothername1,anothervalue1
anothername1,anothervalue2
anothername1,anothervalue3
I would like to rewrite the CSV so that when a duplicate in column 1 is found, the the data is appended to a new column on the first entry.
For instance, the desired output would be :
somename1,value1,value2,value3
anothername1,anothervalue1,anothervalue2,anothervalue3
How can i do this in a shell script ?
TIA
You need much more than just removing duplicated lines when using Awk, you need a logic as below to create an array of elements for each unique entry in $1.
The solution creates a hash-map with unique values in $1 working as indices of the array and elements as the value appended with a , separator.
awk 'BEGIN{FS=OFS=","; prev="";}{ if (prev != $1) {unique[$1]=$2;} else {unique[$1]=(unique[$1]","$2)} prev=$1; }END{for (i in unique) print i,unique[i]}' file
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3
A more readable version would be to have something like,
BEGIN {
# set input and output field separator to ',' and initialize
# variable holding last instance of $1 to empty
FS=OFS=","
prev=""
}
{
# Update the value of $2 directly in the hash array only when new
# unique elements are found in $1
if (prev != $1){
unique[$1]=$2
}
else {
unique[$1]=(unique[$1]","$2)
}
# Update the current $1
prev=$1
}
END {
for (i in unique) {
print i,unique[i]
}
FILE=$1
NAMES=`cut -d',' -f 1 $FILE | sort -u`
for NAME in $NAMES; do
echo -n "$NAME"
VALUES=`grep "$NAME" $FILE | cut -d',' -f2`
for VAL in $VALUES; do
echo -n ",$VAL"
done
echo ""
done
running with your data generates:
>bash script.sh data1.txt
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3
the filename of your data has to be passed as parameter. output can be written to a new file by redirecting.
>bash script.sh data1.txt > data_new.txt

Looping through multiline CSV rows in bash

I have the following csv file with 3 columns:
row1value1,row1value2,"row1
multi
line
value"
row2value1,row2value2,"row2
multi
line
value"
Is there a way to loop through its rows like (this does not work, it reads lines):
while read $ROW
do
#some code that uses $ROW variable
done < file.csv
Using gnu-awk you can do this using FPAT:
awk -v RS='"\n' -v FPAT='"[^"]*"|[^,]*' '{
print "Record #", NR, " =======>"
for (i=1; i<=NF; i++) {
sub(/^"/, "", $i)
printf "Field # %d, value=[%s]\n", i, $i
}
}' file.csv
Record # 1 =======>
Field # 1, value=[row1value1]
Field # 2, value=[row1value2]
Field # 3, value=[row1
multi
line
value]
Record # 2 =======>
Field # 1, value=[row2value1]
Field # 2, value=[row2value2]
Field # 3, value=[row2
multi
line
value]
However, as I commented above a dedicated CSV parser using PHP, Perl or Python will be more robust for this job.
Here is a pure bash solution. The multiline_csv.sh script translates the multiline csv into standard csv by replacing the newline characters between quotes with some replacement string. So the usage is
./multiline_csv.sh CSVFILE SEP
I placed your example script in a file called ./multi.csv. Running the command ./multiline_csv.sh ./multi.csv "\n" yielded the following output
[ericthewry#eric-arch-pc stackoverflow]$ ./multiline_csv.sh ./multi.csv "\n"
r1c2,r1c2,"row1\nmulti\nline\nvalue"
r2c1,r2c2,"row2\nmultiline\nvalue"
This can be easily translated back to the original csv file using printf:
[ericthewry#eric-arch-pc stackoverflow]$ printf "$(./multiline_csv.sh ./multi.csv "\n")\n"
r1c2,r1c2,"row1
multi
line
value"
r2c1,r2c2,"row2
multiline
value"
This might be an Arch-specific quirk of echo/sprintf (I'm not sure), but you could use some other separator string like ~~~++??//NEWLINE\\??++~~~ that you could sed out if need be.
# multiline_csv.sh
open=0
line_is_open(){
quote="$2"
(printf "$1" | sed -e "s/\(.\)/\1\n/g") | (while read char; do
if [[ "$char" = '"' ]]; then
open=$((($open + 1) % 2))
fi
done && echo $open)
}
cat "$1" | while read ln ; do
flatline="${ln}"
open=$(line_is_open "${ln}" $open)
until [[ "$open" = "0" ]]; do
if read newln
then
flatline="${flatline}$2${newln}"
open=$(line_is_open "${newln}" $open)
else
break
fi
done
echo "${flatline}"
done
Once you've done this translation, you can proceed as you would normally via the while read $ROW do ... done method.

Use awk to parse source code

I'm looking to create documentation from source code that I have. I've been looking around and something like awk seems like it will work, but I've had no luck so far. The information is split in two files, file1.c and file2.c.
Note: I've set up an automatic build environment for the program. This detects changes in the source and builds it. I would like to generate a text file containing a list of any variables which have been modified since the last successful build. The script I'm looking for would be a post-build step, and would run after compilation
In file1.c I have a list of function calls (all the same function) that have a string name to identify them such as:
newFunction("THIS_IS_THE_STRING_I_WANT", otherVariables, 0, &iAlsoNeedThis);
newFunction("I_WANT_THIS_STRING_TOO", otherVariable, 0, &iAnotherOneINeed);
etc...
The fourth parameter in the function call contains the value of the string name in file2. For example:
iAlsoNeedThis = 25;
iAnotherOneINeed = 42;
etc...
I'm looking to output the list to a txt file in the following format:
THIS_IS_THE_STRING_I_WANT = 25
I_WANT_THIS_STRING_TOO = 42
Is there any way of do this?
Thanks
Here is a start:
NR==FNR { # Only true when we are reading the first file
split($1,s,"\"") # Get the string in quotes from the first field
gsub(/[^a-zA-Z]/,"",$4) # Remove the none alpha chars from the forth field
m[$4]=s[2] # Create array
next
}
$1 in m { # Match feild four from file1 with field one file2
sub(/;/,"") # Get rid of the ;
print m[$1],$2,$3 # Print output
}
Saving this script.awk and running it with your example produces:
$ awk -f script.awk file1 file2
THIS_IS_THE_STRING_I_WANT = 25
I_WANT_THIS_STRING_TOO = 42
Edit:
The modifications you require affects the first line of the script:
NR==FNR && $3=="0," && /start here/,/end here/ {
You can do it in the shell like so.
#!/bin/sh
eval $(sed 's/[^a-zA-Z0-9=]//g' file2)
while read -r line; do
case $line in
(newFunction*)
set -- $line
string=${1#*\"}
string=${string%%\"*}
while test $# -gt 1; do shift; done
x=${1#&}
x=${x%);}
eval x=\$$x
printf '%s = %s\n' $string $x
esac
done < file1.c
Assumptions: newFunction is at the start of the line. Nothing follows the );. Whitespace exactly as in your samples. Output
THIS_IS_THE_STRING_I_WANT = 25
I_WANT_THIS_STRING_TOO = 42
You can execute file file2.c so variables will be defined in bash. Then, you will just have to print $iAlsoNeedThis to get value from iAlsoNeedThis = 25;
It can be done with . file2.c.
Then, what you can do is:
while read line;
do
name=$(echo $line | cut -d"\"" -f2);
value=$(echo $line | cut -d"&" -f2 | cut -d")" -f1);
echo $name = ${!value};
done < file1.c
to get the THIS_IS_THE_STRING_I_WANT, I_WANT_THIS_STRING_TOO text.

Resources