Get tokens from a String until they are exhausted in shell script - bash

I have a shell script that reads input strings from stdin and get only part of the value from the input. The input string can have any number of key/value pairs and is in the following format:
{"input0":"name:/data/name0.csv",
"input1":"name:/data/name1.csv",
....}
So in the above example, I want to get these as the output of my script:
/data/name0.csv
/data/name1.csv
.....
I think I need two while loops, one needs to keep reading from stdin, the other one needs to extract the values from the input until there is no more. Can someone let me know how to do the second loop block ?

if you have
{"input0":"name:/data/name0.csv",
"input1":"name:/data/name1.csv",
....}
inside a file abc.in, then you can do the following to parse your input with a command called sed:
cat abc.in | sed 's/.*"input[0-9]\+":"name:\(\/data\/name[0-9]\+.csv\)".*$/\1/g'
it basically lookup the current line with a regular expression and see if it matches one of the form Begining of line then anything"input and a number":"name:/data/name and a number.csv"anything and then end of line.
The result is:
/data/name1.csv
/data/name2.csv
/data/name3.csv
/data/name4.csv
...

A simple BashFAQ #1 loop works here, with jq preprocessing your string into line-oriented content:
while read -r value; do
echo "${value#name:}"
done < <(jq -r '.[]')
That said, you can actually do the whole thing just in jq with no bash at all; the following transforms your given input directly to your desired output (given jq 1.5 or newer):
jq -r '.[] | sub("name:"; "")'
If you really want to do things the fragile way rather than leveraging a JSON parser, you can do that too:
# This is evil: Will fail very badly if input formatting changes
content_re='"name:(.*)"'
while read -r line; do
[[ $line =~ $content_re ]] && printf '%s\n' "${BASH_REMATCH[1]}"
done
There's still no inner loop required -- just a single loop iterating over lines of input, with the body determining how to process each line.

Related

File input getting consumed inside the while loop

I'm reading through a lookup file and performing a set of actions for each line in the file. However the while loop only reads the first line in the file and exits. Here's the current code that I have.
sql_from_lkp $lookup
function sql_from_lkp {
lkp=$1
while read line; do
sql_from_columns ${line}
echo ${line}
done < ${lkp}
}
function sql_from_columns {
table_name=$1
table_column_info_file=${table_name}_columns
line_count=`cat $table_info_file | wc -l`
....
}
By selectively commenting the code, I found that if I comment the line_count line, the while loop goes through every line in the file and works fine. So the input is getting consumed by the cat statement.
I've checked other answers and understood that ssh usually consumes the file inputs inside while loops if -n option is not used. But not sure how to fix this case. Need some help.
You've mistyped a variable name: $table_info_file should be $table_column_info_file.
If you correct that, your problem will go away.
By referring to a non-extant variable - the mistyped $table_info_file - you're essentially executing cat | wc -l (no filename argument passed to cat) in sql_from_columns(), which makes cat read from stdin.
Therefore - after having read the 1st line in the while loop - the cat command in sql_from_columns() consumes the entire rest of your input (< ${lkp}), which is why the while loop exits after the 1st iteration.
Generally,
You should double-quote all your variable references so as not to subject their values to word-splitting and globbing.
Bash won't allow you to call functions before they're defined, so as presented in your question, your code fundamentally couldn't work.
While the legacy `...` syntax for command substitutions is still supported, it has pitfalls that can be avoided with the modern $(...) syntax.
A more efficient way to count lines is to pass the input file to wc -l via < rather than via cat and a pipeline (wc also accepts filename operands directly, but it then prints the input filename after the counts).
Incidentally, you probably would have caught your mistyped variable reference more easily had you done that, as Bash would have reported an ambiguous redirect error in the absence of a filename following <.
Here's a reformulation that addresses all the issues:
function sql_from_lkp {
lkp=$1
while read line; do
sql_from_columns "${line}"
echo "${line}"
done < "${lkp}"
}
function sql_from_columns {
table_name=$1
table_column_info_file=${table_name}_columns
line_count=$(wc -l < "$table_column_info_file")
# ...
}
sql_from_lkp "$lookup"
Note that I've only added double quotes where strictly needed to make the command robust; it wouldn't hurt to add them whenever a parameter (variable) is referenced.

Bash/Shell | How to prioritize quote from IFS in read [duplicate]

This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

Numerically sorting strings from file

Thing is, I would like to numerically sort those strings from file, without changing content of the file. Strings in file must not be changed after sorting operation. I want to use lines for editing them later, so my variable var should get values starting with 0:wc...'till 200:wc.
Input:
11:wc
1:wc
0:wc
200:wc
Desired order:
0:wc
1:wc
11:wc
200:wc
I'm using this code, but has no effect:
sort -k1n $1 | while read line
do
if [[ ${line:0:1} != "#" ]]
then
var=$line
fi
done <$1
Why not just
$ sort -k1n -t: file.txt
specifying the field separator as ':'.
You need to sort numerically on the first key, and if you need them for later, just read them into an array:
myarray=( $(sort -k1n <file) )
which will provide an array with sorted contents:
0:wc
1:wc
11:wc
200:wc
Two issues:
When you create a pipe, such as command | while read line; do ... end, the individual commands in the pipe (command and while read line; do ... end) run in subshells.
The subshells are created with copies of all the current variables, but are not able to reflect changes back into their parent. In this case line is only present in the subshell, and when the subshell terminates, it disappears with it.
You can use bash process substitution to avoid creating a subshell for one of the pipeline commands. For example, you could use:
while read line; do ... end < <(command)
If you both pipe and redirect, the redirect wins.
So when you write: command | while read line; do ... end < input, the while loop actually reads from input, not from the output of command.

Shell file doesn't extract value properly [grep/cut] from file [bash]

I have a test.txt file which contains key value pair just like any other property file.
test.txt
Name="ABC"
Age="24"
Place="xyz"
i want to extract the value of different key's value into corresponding variables. For that i have written the following shell script
master.sh
file=test.txt
while read line; do
value1=`grep -i 'Name' $file|cut -f2 -d'=' $file`
value2=`grep -i 'Age' $file|cut -f2 -d'=' $file`
done <$file
but when i execute it; it doesnt run properly, giving me the entire line extracted by the grep part of the command as output. Can someone please point me to the error ?
If I understood your question correctly, the following Bash script should do the trick:
#!/bin/bash
IFS="="
while read k v ; do
test -z "$k" && continue # skip empty lines
declare $k=$v
done <test.txt
echo $Name
echo $Age
echo $Place
Why is that working? Most information can be retrieved from bash's man page:
IFS is the "Internal Field Separator" which is used by bash's 'read' command to separate fields in each line. By default, IFS separates along spaces, but it is redefined to separate along the equal sign. It is a bash-only solution similar to the 'cut' command, where you define the equal sign as delimiter ('-d =').
The 'read' builtin reads two fields from a line. As only two variables are provided (k and v), the first field ends up in k, all remaining fields (i.e. after the equal sign) end up in v.
As the comment states, empty lines are skipped, i.e. those where the k variable is emtpy (test -z).
'eval' is a bash builtin as well, which executes the arguments (but only after evaluating $k=$v), i.e. the eval statement becomes equivalent to Name="ABC" etc.
'<test.txt' after 'done' tells bash to read test.txt and to feed it line by line into the 'read' builtin further up.
The three 'echo' statements are simply to show that this solution did work.
The format or the file is valid sh syntax, so you could just source the file:
source test.txt
In any case, your code doesn't work because after the pipe you shouldn't specify the file again.
value1=$(grep -i 'Name' "$file" | cut -f2 -d'=')
would keep your logic
This is a comment, but the comment box does not allow formatting. Consider rewriting this:
while read line; do
value1=`grep -i 'Name' $file|cut -f2 -d'=' $file`
value2=`grep -i 'Age' $file|cut -f2 -d'=' $file`
done <$file
as:
while IFS== read key value; do
case $key in
Name|name) value1=$value;;
Age|age) value2=$value;;
esac;
done < $file
Parsing the line multiple times via cut is inefficient. This is slightly different than your version, since the comparison is case sensitive, but that is easily fixed if necessary. For example, you could preprocess the input file and convert everything to lower case. You can do the preprocessing on the fly, but be aware that this will put your while loop in a subprocess which will require some additional care (since the variable definitions will end with the pipeline), but that is not significant. But running the entire file through grep twice for each line of the file is O(n^2), and ghastly! (Why are you reading the entire file anyway instead of just echoing the line ?)

Writing a bash script that removes every line of text not beginning with x, y, or z

I'm trying to write a bash script that goes through a text file, TextInput.txt, and remove every line of text that doesn't start with s1, s2 ... sN (where sX is a single- or multi-character string; in the example below, I'll go with 'Rabbit', 'Squirrel', and 'Puppy' as accepted starts of lines). It should then write everything to a new text file, TextOutput.txt.
Being no to bash scripting, I've only managed to write this fairly limited dummy code.
#!/bin/sh
TextInput=$(<TextInput.txt)
AcceptedStrings[0]='Rabbit'
AcceptedStrings[1]='Squirrel'
AcceptedStrings[2]='Puppy'
# What goes here?
echo "$TextModified" > TextOutput.txt
If the strings are specified on the command line, you can loop through them using grep -v. If you already know all the strings, you can do egrep -v ^(str1|str2|str3) filename.
Reading your question again, you said that you want to remove lines that don't start with the specified string but in the example, you gave a list of accepted strings. You may need to clarify on that.
As mentioned in the comment above, you could use grep. However, if you wanted to do it using bash, you could say:
acceptedstring=( Rabbit Squirrel Puppy )
pattern=$( IFS=$'|'; echo "${acceptedstring[*]}" )
while read -r line; do
[[ "$line" =~ ^($pattern) ]] || echo "$line"
done < TextInput.txt > TextOutput.txt
First convert your array of accepting strings into a pattern and then use grep to find all the lines which match the pattern:
pattern=$(printf "|%s" "${AcceptedStrings[#]}")
pattern=${pattern:1}
grep "^($pattern)" TextInput.txt > TextOutput.txt

Resources