Transform file content to an specific output in bash - bash

I'm new to bash scripting, and I am trying to transform a file that:
Is a CSV
Has random line jumps that break the format (bad inputs from user in CSV)
The outcome needs to be a processed file that separates lines correctly (lines of a set # of elements/columns), removing the random jumps.
My first aproach was to remove all the line jumps by using
variable=$(cat $1)
tr -d "\n" $variable > $variable
and then though of reading char by char looking for ',' with a counter, and adding a line jump after every set number of them were found. When trying to do this, I found info on while IFS= read and tried the following:
while IFS=, read -r col1 col2 col3 col4 <<< $variable; do
echo "$col1" "$col2" "$col3" "$col4" '\n'
done
I'm clearly missing something there (I even believe the while loop should already remove the line jumps, please correct me about it) and I am not sure how to keep going about it. I am not looking for a bunch of code that does the job, but trying to find someone here who could point me towards the right direction (maybe I'm not understanding IFS, or I need to process something prior to that step... whatever it could be).
Thanks in advance.
edit: I removed the line jump I was adding in the echo and it now prints
col1 col2 col3 col4
and never ends, so I am clearly not being able to link each element from the file to the variables before printing.

instead of echo "$col1" "$col2" "$col3" "$col4" '\n'
just write this
echo -e "$col1 $col2 $col3 $col4 \n"
Actually the problem isn't with the shell, is echo command itself, and the lack of double quotes around the variable interpolation. You can try using echo -e but that isn't supported on all platforms, and one of the reasons printf is now recommended for portability.
You can also try and insert the newline directly into your shell script (if a script is what you're writing) so it looks like...
#!/bin/sh
echo "this is for
test"
#EOF
or equivalently
#!/bin/sh
string="this is for
test"
echo "$string"
# note double quotes!
good luck

Related

Read CSV and add data using condition

I am trying to read a CSV which has data like:
Name Time
John
Ken
Paul
I want to read column one if it matches then change time. For example, if $1 = John then change time of the John to $2.
Here is what I have so far:
while IFS=, read -r col1 col2
do
echo "$col1"
if[$col1 eq $1] then
echo "$2:$col2"
done < test.csv >> newupdate.csv
To run ./test.sh John 30.
I am trying to keep the csv updated so making a new file I thought would be okay. so I can read updated file again for next run and update again.
Your shell script has a number of syntax errors. You need spaces inside [...] and you should generally quote your variables. You can usefully try http://shellcheck.net/ before asking for human assistance.
while IFS=, read -r col1 col2
do
if [ "$col1" = "$1" ]; then
col2=$2
fi
echo "$col1,$col2" # comma or colon separated?
done < test.csv >newupdate.csv
Notice how we always print the entire current line, with or without modifications depending on the first field. Notice also the semicolon (or equivalently newline) before then, and use of = as the equality comparison operator for strings. (The numeric comparison operator is -eq with a dash, not eq.)
However, it's probably both simpler and faster to use Awk instead. The shell isn't very good (or very quick) at looping over lines in the first place.
awk -F , -v who="$1" -v what="$2" 'BEGIN { OFS=FS }
$1 == who { $2 = what } 1' test.csv >newupdate.csv
Doing this in sed will be even more succinct; but the error symptoms if your variables contain characters which have a special meaning to sed will be bewildering. So don't really do this.
sed "s/^$1,.*/$1,$2/" test.csv >newupdate.csv
There are ways to make this less brittle, but then not using sed for any non-trivial scripts is probably the most straightforward solution.
None of these scripts use any Bash-specific syntax, so you could run them under any POSIX-compatible shell.

I want to compare one line to the next line, but only in the third column, from a file using bash

So, what I'm trying to do is read in a file, loop through it comparing it line by line, but only in the third column. Sorry if this doesn't make sense, but maybe this will help. I have a file of names:
JOHN SMITH SMITH
JIM JOHNSON JOHNSON
JIM SMITH SMITH
I want to see if (first, col3)SMITH is equal to JOHNSON, if not, move onto the next name. If (first, col3) SMITH is equal to (second, col3) SMITH, then I'll do something with that.
Again, I'm sorry if this doesn't make much sense, but I tried to explain it as best as I could.
I was attempting to see if they were equal, but obviously that didn't work. Here is what I have so far, but I got stuck:
while read -a line
do
if [ ${line[2]} == ${line[2]} ]
then
echo -e "${line[2]}" >> names5.txt
else
echo "Not equal."
fi
done < names4.txt
Store your immediately prior line in a separate variable, so you can compare against it:
#!/usr/bin/env bash
old_line=( )
while read -r -a line
do
if [ "${line[2]}" = "${line[2]}" ]; then
printf '%s\n' "${line[2]}"
else
echo "Not equal." >&2
fi
old_line=( "${line[#]}" )
done <names4.txt >>names5.txt
Some other changes of note:
Instead of re-opening names5.txt every time you want to write a single line to it, we're opening it just once, for the whole loop. (You could make this >names5.txt if you want to clear it at the top of the loop and append from there, which is likely to be desirable behavior).
We're avoiding echo -e. See the APPLICATION USE and RATIONALE sections of the POSIX standard for echo for background on why echo use is not recommended for new development when contents are not tightly constrained (known not to contain any backslashes, for example).
We're quoting both sides of the test operation. This is mandatory with [ ] to ensure correct operation of words can be expanded as globs (ie. if you have a word *, you don't want it replaced with a list of files in your current directory in the final command), or if they can contain spaces (not so much a concern here, since you're using the same IFS value for the read -a as the unquoted expansion). Even if using [[ ]], you want to quote the right-hand side so it's treated as a literal string and not a pattern.
We're passing -r to read, which ensures that backslashes are not silently removed (changing \t in the input to just t, for example).
When you want to compare each third field with all previous third fields, you need to store the old third fields in an array. You can use awk for this.
When you only want to see the repeated third fields, you can use other tools:
cut -d" " -f3 names4.txt | sort | uniq -d
EDIT:
When you onlu want to print doubles from 2 consecutive lines, it is even easier:
cut -d" " -f3 names4.txt | uniq -d

Bash/Shell | How to prioritize quote from IFS in read [duplicate]

This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern

How to read a space in bash - read will not

Many people have shown how to keep spaces when reading a line in bash. But I have a character based algorithm which need to process each end every character separately - spaces included. Unfortunately I am unable to get bash read to read a single space character from input.
while read -r -n 1 c; do
printf "[%c]" "$c"
done <<< "mark spitz"
printf "[ ]\n"
yields
[m][a][r][k][][s][p][i][t][z][][ ]
I've hacked my way around this, but it would be nice to figure out how to read a single any single character.
Yep, tried setting IFS, etc.
Just set the input field separator(a) so that it doesn't treat space (or any character) as a delimiter, that works just fine:
printf 'mark spitz' | while IFS="" read -r -n 1 c; do
printf "[%c]" "$c"
done
echo
That gives you:
[m][a][r][k][ ][s][p][i][t][z]
You'll notice I've also slightly changed how you're getting the input there, <<< appears to provide a extraneous character at the end and, while it's not important to the input method itself, I though it best to change that to avoid any confusion.
(a) Yes, I'm aware that you said you've tried setting IFS but, since you didn't actually show how you'd tried this, and it appears to work fine the way I do it, I have to assume you may have just done something wrong.

Bourne Shell Scripting -- simple for loop syntax

I'm not entirely new to programming, but I'm not exactly experienced. I want to write small shell script for practice.
Here's what I have so far:
#!/bin/sh
name=$0
links=$3
owner=$4
if [ $# -ne 1 ]
then
echo "Usage: $0 <directory>"
exit 1
fi
if [ ! -e $1 ]
then
echo "$1 not found"
exit 1
elif [ -d $1 ]
then
echo "Name\t\tLinks\t\tOwner\t\tDate"
echo "$name\t$links\t$owner\t$date"
exit 0
fi
Basically what I'm trying to do is have the script go through all of the files in a specified directory and then display the name of each file with the amount of links it has, its owner, and the date it was created. What would be the syntax for displaying the date of creation or at least the date of last modification of the file?
Another thing is, what is the syntax for creating a for loop? From what I understand I would have to write something like for $1 in $1 ($1 being all of the files in the directory the user typed in correct?) and then go through checking each file and displaying the information for each one. How would I start and end the for loop (what is the syntax for this?).
As you can see I'm not very familiar bourne shell programming. If you have any helpful websites or have a better way of approaching this please show me!
Syntax for a for loop:
for var in list
do
echo $var
done
for example:
for var in *
do
echo $var
done
What you might want to consider however is something like this:
ls -l | while read perms links owner group size date1 date2 time filename
do
echo $filename
done
which splits the output of ls -l into fields on-the-fly so you don't need to do any splitting yourself.
The field-splitting is controlled by the shell-variable IFS, which by default contains a space, tab and newline. If you change this in a shell script, remember to change it back. Thus by changing the value of IFS you can, for example, parse CSV files by setting this to a comma. this example reads three fields from a CSV and spits out the 2nd and 3rd only (it's effectively the shell equivalent of cut -d, -f2,3 inputfile.csv)
oldifs=$IFS
IFS=","
while read field1 field2 field3
do
echo $field2 $field3
done < inputfile.csv
IFS=oldifs
(note: you don't need to revert IFS, but I generally do to make sure that further text processing in a script isn't affected after I'm done with it).
Plenty of documentation out the on both for and while loops; just google for it :-)
$1 is the first positional parameter, so $3 is the third and $4 is the fourth. They have nothing to do with the directory (or its files) the script was started from. If your script was started using this, for example:
./script.sh apple banana cherry date elderberry
then the variable $1 would equal "apple" and so on. The special parameter $# is the count of positional parameters, which in this case would be five.
The name of the script is contained in $0 and $* and $# are arrays that contain all the positional parameters which behave differently depending on whether they appear in quotes.
You can refer to the positional parameters using a substring-style index:
${#:2:1}
would give "banana" using the example above. And:
${#: -1}
or
${#:$#}
would give the last ("elderberry"). Note that the space before the minus sign is required in this context.
You might want to look at Advanced Bash-Scripting Guide. It has a section that explains loops.
I suggest to use find with the option -printf "%P\t%n\t%u\t%t"
for x in "$#"; do
echo "$x"
done
The "$#" protects any whitespace in supplied file names. Obviously, do your real work in place of "echo $x", which isn't doing much. But $# is all the junk supplied on the command line to your script.
But also, your script bails out if $# is not equal to 1, but you're apparently fully expecting up to 4 arguments (hence the $4 you reference in the early part of your script).
assuming you have GNU find on your system
find /path -type f -printf "filename: %f | hardlinks: %n| owner: %u | time: %TH %Tb %TY\n"

Resources