What defines a "column" in bash? In awk? - bash

I was viewing this question: Bash - Take nth column in a text file
I want to make a function that writes to a textfile that I can then parse using the method above. So, for example, I want my function to write 'dates' in the first column, 'ID's in the second column, and 'addresses' in the third column. Then once I have this, a user could, for example, see if a certain ID is present in the file by querying for the second column, then looking at each item there. The user could do this using the method discussed in the question above.
What defines a column? Is it just a space delimiter? Is it a tab?
If I want to output this information as stated above, what would the method where I write to the file look like? So far I have:
cat "$DATE $ID $ADDRESS \n" > myfile.data

In bash, as opposed to awk, columns are separated by characters in IFS.
That is to say, if you set:
IFS=$'\t'
...then columns, as understood by bash builtins such as read first second rest, will be separated by tabs. On the output side, printf '%s\n' "${array[*]}" will print the items in the array array separated by the first character of IFS.
The default value of IFS is equivalent to $' \t\n' -- that is, the space, the tab, and the newline character.
To write a file with a delimiter of your choice, and (presumably) more than one row (replace the while read with however you're actually getting your data, or only use the inside of the loop if you're only writing one line):
while read -r date id address; do
printf '%s\t' "$date" "$id" "$address" >&3; printf '\n' >&3
done 3>filename
...or, if you don't want the trailing tab left by the above:
IFS=$'\t' # use a tab as the field separator for output
while IFS=$' \t\n' read -r date id address; do
entry=( "$date" "$id" "$address" )
printf '%s\n' "${entry[*]}" >&3
done 3>filename
Putting 3>filename on the outside of the loop is much more efficient than >>filename on each line, which re-opens the output file once per line written.

If you're going to use awk, the columns are separated by the field separator. See FS in man awk for details.
Most tools support some ways of changing the column separator:
cut -f
sort -t
bash itself uses the IFS variable (Internal Field Separator) for word splitting.
cat awaits file as an argument. To output a string, use echo instead.

If we are talking about awk then the space character is the default column separator.
Its easy to change what is used as the "Field Separator" (FS) when awk is parsing a file: awk '{FS=",";print $2}'. Will use comma as the separator (note: does not respect quotes and stuff like a csv parser).
To write to the file I would use echo and the double carrot >>.
>> appends whereas > rewrites the file.
echo -e will let echo recognize \n and similar special chars
So the command would be
echo -e "$DATE $ID $ADDRESS \n" >> myfile.data

Related

Split string using \r\n using IFS in bash

I would like to split string contains \r\n in bash but carriage return and \n gives issue. Can anyone give me hint for different IFS? I tried IFS=' |\' too.
input:
projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02
Code:
IFS=$'\r'
inputData="projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02"
for line1 in ${inputData}; do
line2=`echo "${line1}"`
echo ${line2} //Expected one by one entry
done
Expected:
projects.google.tests.inbox.document_01
projects.google.tests.inbox.document_02
projects.google.tests.inbox.global_02
inputData=$'projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02'
while IFS= read -r line; do
line=${line%$'\r'}
echo "$line"
done <<<"$inputData"
Note:
The string is defined as string=$'foo\r\n', not string="foo\r\n". The latter does not put an actual CRLF sequence in your variable. See ANSI C-like strings on the bash-hackers' wiki for a description of this syntax.
${line%$'\r'} is a parameter expansion which strips a literal carriage return off the end of the contents of the variable line, should one exist.
The practice for reading an input stream line-by-line (used here) is described in detail in BashFAQ #1. Unlike iterating with for, it does not attempt to expand your data as globs.
Following awk could help you in your question.
awk '{gsub(/\\r\\n/,RS)} 1' Input_file
OR
echo "$var" | awk '{gsub(/\\r\\n/,RS)} 1'
Output will be as follows.
projects.google.tests.inbox.document_01
projects.google.tests.inbox.document_02
projects.google.tests.inbox.global_02
Explanation: Using awk's gsub utility which is used for globally substitution and it's method is gsub(/regex_to_be_subsituted/,variable/new_value,current_line/variable), so here I am giving \\r\\n(point to be noted here I am escaping here \\ which means it will take it as a literal character) with RS(record separator, whose default value is new line) in the current line. Then 1 means, awk works on method of condition and action, so by mentioning 1 I am making condition as TRUE and no action is given, so default action print of current will happen.
EDIT: With a variable you could use as following.
var="projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02"
echo "$var" | awk '{gsub(/\\r\\n/,RS)} 1'
projects.google.tests.inbox.document_01
projects.google.tests.inbox.document_02
projects.google.tests.inbox.global_02

Is it possible to get an element from a string variable by index?

If I have a variable with multiple string elements separated by spaces, is it possible to get an element by providing an index? Something similar you could do with arrays?:
my_var="string1 string2 string3 string4"
echo $[my_var[3]] # this does not work
You use... an array.
$ my_var=("string1" "string2" "string3" "string4")
$ echo "${my_var[3]}"
string4
However, you might be asking, given your original string, can you split it into an array?
$ read -a arr <<< "$my_var"
This works, but only if every space in the string is to be treated as a delimiter. You can't quote some to treat as literal space; that's why arrays were added to the language in the first place.
You might be in luck and your string uses some other delimiter, e.g. a comma, in which case you can set the value of IFS:
IFS=, read -r -a arr <<< "$my_str"
but in general, strings-as-lists are fragile.
You can store it using read command with setting the input-field-separator appropriately. It does't matter for your current input since the default IFS handles the single white-space character between words.
IFS=' ' read -ra inputArray <<<"$my_var"
Setting IFS=' ' is optional here, added it to stress the need for using it when splitting strings containing other de-limiters.
and access individual elements as
printf "%s\n" "${inputArray[0]}"
string1
When you want the third word, you can use
cut -d" " -f3 <<< ${my_var}

bash - IFS changes behavior of echo -n in for loop

I have code that requires a response within a for loop.
Prior to the loop I set IFS="\n"
Within the loop echo -n is ignored (except for the last line).
Note: this is just an example of the behavior of echo -n
Example:
IFS='\n'
for line in `cat file`
do
echo -n $line
done
This outputs:
this is a test
this is a test
this is a test$
with the user prompt occuring only at the end of the last line.
Why is this occuring and is there a fix?
Neither IFS="\n" nor IFS='\n' set $IFS to a newline; instead they set it to literal \ followed by literal n.
You'd have to use an ANSI C-quoted string in order to assign an actual newline: IFS=$'\n'; alternatively, you could use a normal string literal that contains an actual newline (spans 2 lines).
Assigning literal \n had the effect that the output from cat file was not split into lines, because an actual newline was not present in $IFS; potentially - though not with your sample file content - the output could have been split into fields by embedded \ and n characters.
Without either, the entire file contents were passed at once, resulting in a single iteration of your for loop.
That said, your approach to looping over lines from a file is ill-advised; try something like the following instead:
while IFS= read -r line; do
echo -n "$line"
done < file
Never use for loops when parsing files in bash. Use while loops instead. Here is a really good tutorial on that.
http://mywiki.wooledge.org/BashFAQ/001

Explanation of bash specific syntax

Came across this piece of code:
for entry in $(echo $tmp | tr ';' '\n')
do
echo $entry
rproj="${entry%%,*}"
rhash="${entry##*,}"
remoteproj[$rproj]=$rhash
done
So I do understand that initially ';' is converted to new line so that all entries in the file are on a separate line. However, I am seeing this for the first time:
rproj="${entry%%,*}"
rhash="${entry##*,}"
I do understand that this is taking everything before ',' and after comma ',' . But, is this more efficient than split? Also, if someone please explain the syntax because I am unable to relate this to regular expression or bash syntax.
These are string manipulation operators.
${string##substring}
Deletes longest match of $substring from front of $string.
Meaning it will remove everything before the first comma, including it
${string%%substring}
Deletes longest match of $substring from back of $string.
Meaning it will remove everything after the last comma, including it
Btw, I would use the internal field separator instead of the tr command:
IFS=';'
for entry in $tmp ; do
echo $entry
rproj="${entry%%,*}"
rhash="${entry##*,}"
remoteproj[$rproj]=$rhash
done
unset IFS
Like this.
Use the read command both to split the line original line and to split each entry.
IFS=';' read -r -a entries <<< "$tmp"
for entry in "${entries[#]}"; do
IFS=, read -r rproj rhash <<< "$entry"
remoteproj["$rproj"]=$rhash
done
For performance it is best to do things without subshells. I am still getting confused between % and #, but these internal evaluations are way better than using sed, cut or perl.
The %% means "remove the largest possible matching string from the end of the variable's contents".
The ## means "remove the largest possible matching string from the beginning of the variable's contents".
You can see the working with a simple test:
for entry in key,value a,b,c
do
echo "$entry is split into ${entry%%,*} and ${entry##*,}"
done
The result of splitting key,value is obvious. When you are splitting a,b,c the field b is lost.

linux shell script: split string, put them in an array then loop through them [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Split string based on delimiter in Bash?
In a bash script how do I split string with a separator like ; and loop through the resulting array?
You can probably skip the step of explicitly creating an array...
One trick that I like to use is to set the inter-field separator (IFS) to the delimiter character. This is especially handy for iterating through the space or return delimited results from the stdout of any of a number of unix commands.
Below is an example using semicolons (as you had mentioned in your question):
export IFS=";"
sentence="one;two;three"
for word in $sentence; do
echo "$word"
done
Note: in regular Bourne-shell scripting setting and exporting the IFS would occur on two separate lines (IFS='x'; export IFS;).
If you don't wish to mess with IFS (perhaps for the code within the loop) this might help.
If know that your string will not have whitespace, you can substitute the ';' with a space and use the for/in construct:
#local str
for str in ${STR//;/ } ; do
echo "+ \"$str\""
done
But if you might have whitespace, then for this approach you will need to use a temp variable to hold the "rest" like this:
#local str rest
rest=$STR
while [ -n "$rest" ] ; do
str=${rest%%;*} # Everything up to the first ';'
# Trim up to the first ';' -- and handle final case, too.
[ "$rest" = "${rest/;/}" ] && rest= || rest=${rest#*;}
echo "+ \"$str\""
done
Here's a variation on ashirazi's answer which doesn't rely on $IFS. It does have its own issues which I ouline below.
sentence="one;two;three"
sentence=${sentence//;/$'\n'} # change the semicolons to white space
for word in $sentence
do
echo "$word"
done
Here I've used a newline, but you could use a tab "\t" or a space. However, if any of those characters are in the text it will be split there, too. That's the advantage of $IFS - it can not only enable a separator, but disable the default ones. Just make sure you save its value before you change it - as others have suggested.
Here is an example code that you may use:
$ STR="String;1;2;3"
$ for EACH in `echo "$STR" | grep -o -e "[^;]*"`; do
echo "Found: \"$EACH\"";
done
grep -o -e "[^;]*" will select anything that is not ';', therefore spliting the string by ';'.
Hope that help.
sentence="one;two;three"
a="${sentence};"
while [ -n "${a}" ]
do
echo ${a%%;*}
a=${a#*;}
done

Resources