Many people have shown how to keep spaces when reading a line in bash. But I have a character based algorithm which need to process each end every character separately - spaces included. Unfortunately I am unable to get bash read to read a single space character from input.
while read -r -n 1 c; do
printf "[%c]" "$c"
done <<< "mark spitz"
printf "[ ]\n"
yields
[m][a][r][k][][s][p][i][t][z][][ ]
I've hacked my way around this, but it would be nice to figure out how to read a single any single character.
Yep, tried setting IFS, etc.
Just set the input field separator(a) so that it doesn't treat space (or any character) as a delimiter, that works just fine:
printf 'mark spitz' | while IFS="" read -r -n 1 c; do
printf "[%c]" "$c"
done
echo
That gives you:
[m][a][r][k][ ][s][p][i][t][z]
You'll notice I've also slightly changed how you're getting the input there, <<< appears to provide a extraneous character at the end and, while it's not important to the input method itself, I though it best to change that to avoid any confusion.
(a) Yes, I'm aware that you said you've tried setting IFS but, since you didn't actually show how you'd tried this, and it appears to work fine the way I do it, I have to assume you may have just done something wrong.
Related
So, what I'm trying to do is read in a file, loop through it comparing it line by line, but only in the third column. Sorry if this doesn't make sense, but maybe this will help. I have a file of names:
JOHN SMITH SMITH
JIM JOHNSON JOHNSON
JIM SMITH SMITH
I want to see if (first, col3)SMITH is equal to JOHNSON, if not, move onto the next name. If (first, col3) SMITH is equal to (second, col3) SMITH, then I'll do something with that.
Again, I'm sorry if this doesn't make much sense, but I tried to explain it as best as I could.
I was attempting to see if they were equal, but obviously that didn't work. Here is what I have so far, but I got stuck:
while read -a line
do
if [ ${line[2]} == ${line[2]} ]
then
echo -e "${line[2]}" >> names5.txt
else
echo "Not equal."
fi
done < names4.txt
Store your immediately prior line in a separate variable, so you can compare against it:
#!/usr/bin/env bash
old_line=( )
while read -r -a line
do
if [ "${line[2]}" = "${line[2]}" ]; then
printf '%s\n' "${line[2]}"
else
echo "Not equal." >&2
fi
old_line=( "${line[#]}" )
done <names4.txt >>names5.txt
Some other changes of note:
Instead of re-opening names5.txt every time you want to write a single line to it, we're opening it just once, for the whole loop. (You could make this >names5.txt if you want to clear it at the top of the loop and append from there, which is likely to be desirable behavior).
We're avoiding echo -e. See the APPLICATION USE and RATIONALE sections of the POSIX standard for echo for background on why echo use is not recommended for new development when contents are not tightly constrained (known not to contain any backslashes, for example).
We're quoting both sides of the test operation. This is mandatory with [ ] to ensure correct operation of words can be expanded as globs (ie. if you have a word *, you don't want it replaced with a list of files in your current directory in the final command), or if they can contain spaces (not so much a concern here, since you're using the same IFS value for the read -a as the unquoted expansion). Even if using [[ ]], you want to quote the right-hand side so it's treated as a literal string and not a pattern.
We're passing -r to read, which ensures that backslashes are not silently removed (changing \t in the input to just t, for example).
When you want to compare each third field with all previous third fields, you need to store the old third fields in an array. You can use awk for this.
When you only want to see the repeated third fields, you can use other tools:
cut -d" " -f3 names4.txt | sort | uniq -d
EDIT:
When you onlu want to print doubles from 2 consecutive lines, it is even easier:
cut -d" " -f3 names4.txt | uniq -d
I have some text files $f resembling the following
function
%blah
%blah
%blah
code here
I want to append the following text before the first empty line:
%
%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike
%3.0 Unported License. See notes at the end of this file for more information.
I tried the following:
top=$(cat ./PATH/text.txt)
top="${top//$'\n'/\\n}"
sed -i.bak 's#^$#'"$top"'\\n#' $f
where the second line (I think) preserves the new line in the text and the third line (I think) substitutes the first empty line with the text plus a new empty line.
Two problems:
1- My code appends the following text:
%n%This work is licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike n%3.0 Unported License. See notes
at the end of this file for more information.\n
2- It appends it at end of the file.
Can someone please help me understand the problems with my code?
If you are using GNU sed, following would work.
Use ^$ to find the empty line and then use sed to replace/put the text that you want.
# Define your replacement text in a variable
a="%\n%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike\n%3.0 Unported License. See notes at the end of this file for more information."
Note, $a should include those \n that will be directly interpreted by sed as newlines.
$ sed "0,/^$/s//$a/" inputfile.txt
In the above syntax, 0 represents the first occurrence.
Output:
function
%blah
%blah
%
%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike
%3.0 Unported License. See notes at the end of this file for more information.
%blah
code here
test
You've included bash and sed tags in your question. Since I can't seem to come up with a way of doing this in sed, here's a bash-only solution. It's likely to perform the worst of all working solutions you might find.
The following works with your sample input:
$ while read -r x; do [[ -z "$x" ]] && cat boilerplate; printf '%s\n' "$x"; done < src
This will however insert the boilerplate before EVERY blank line, which is probably not what you're after. Instead, we should probably make this more than a one-liner:
#!/usr/bin/env bash
y=true
while read -r x; do
if [[ -z "$x" ]] && $y; then
cat boilerplate
y=false
fi
printf '%s\n' "$x"
done < src
Note that unlike the code in your question, this doesn't store your boilerplate in a variable, it just cats it "at the right time".
Note that this sends the combined output to stdout. If your goal is to modify the original file, you'll need to wrap this in something that moves around temporary files. (Note that sed's -i option also doesn't really edit files in place, it only hides the moving-around-temp-files from you.)
The following alternatives are probably a better idea.
A similar solution to the bash one might be achieved with better performance using awk:
awk 'NR==FNR{b=b $0 ORS;next} /^$/&&!y{printf "%s",b;y++} 1' boilerplate src
This awk solution obviously reads your boilerplate into a variable, though it's not a shell variable.
Notwithstanding non-standard platform-specific extensions, awk does not have any facility for editing files "in place" either. A portable solution using awk would still need to push temp files around.
And of course, the following old standard of ed is great to keep in your back pocket:
printf 'H\n/^$/\n-\n.r boilerplate\nw\nq\n' | ed src
In bash, of course, you could always use heretext, which might be clearer:
$ ed src <<< $'H\n/^$/\n-\n.r boilerplate\nw\nq\n'
The ed command is non-stream version of sed. Or rather, sed is the stream version of ed, which has been around since before the dinosaurs and is still going strong.
The commands we're using are separated by newlines and fed to ed's standard input. You can discard stdout if you feel the urge. The commands shown here are:
H - instruct ed to print more useful errors, if it gets any.
/^$/ - search for the first occurrence of a newline.
- - GO BACK ONE LINE. Awesome, right?
.r boilerplate - Read your boilerplate at the current line,
w - and write the file.
q - Quit.
Note that this does not keep a .bak file. You'll need to do that yourself if you really want one.
And if, as you suggested in comments, the filename you're reading is to be constructed from a variable, note that variable expansion does not happen inside format quoting ($' .. '). You can either switch quoting mechanisms mid-script:
ed "$file" <<< $'H\n/^$/\n-\n.r ./TATTOO_'"$currn"$'/top.txt\nw\nq\n'
Or you could put ed script in a variable constructed by printf
printf -v scr 'H\n/^$/\n-\n.r ./TATTOO_%s/top.txt\nw\nq\n' "$currn"
ed "$file" <<< "$scr"`
Adding the text to a variable so you can interpolate the variable is wasteful and an unnecessary complication. sed can easily read the contents of a file by itself.
sed -i.bak '1r./PATH/text.txt' "$f"
Unfortunately, this part of sed is poorly standardized, so you may have to experiment a little bit. Some dialects require a newline (perhaps, or perhaps not, preceded by a backslash) before the filename.
sed -i.bak '1r\
./PATH/text.txt' "$f"
(Notice also the double quotes around the file name. You generally always want double quotes around variables which contain file names. More here.)
Adapting the recipe from here we can extend this to apply to the first empty line instead of the first line.
sed -i.bak -e '/^$/!b' -e 'r./PATH/text.txt' -e :a -e '$!{' -e n -e ba -e } "$f"
This adds the boilerplate after the first empty line but perhaps that's acceptable. Refactoring it to replace it or add an empty line after should not be too challenging anyway. (Maybe use sed -n and instead explicitly print everything except the empty line.)
In brief terms, this skips to the end (simply prints) up until we find the first empty line. Then, we read and print the file, and go into a loop which prints the remainder of the file without returning to the beginning of the script.
sed that I think works. Uses files for the extra bit to be inserted.
b='##\n## comment piece\n##'
sed --posix -ne '
1,/^$/ {
/^$/ {
x;
/^true$/ !{
x
s/^$/true/
i\
'"$b"'
};
x;
s/^.*$//
}
}
p
' file1
with the examples using ranges of 1,/^$/, an empty first line would result in the disclaimer being printed twice. To avoid this, I've set it up to put a flag in the hold space ( x; s/^$/true/ ) that I can swap to the pattern space to check whether its the first blank. Once theres a match for blank line, i\ inserts the comment ($b) in front of the pattern space.
Thanks to ghoti for the initial plan.
This question already has answers here:
IFS separate a string like "Hello","World","this","is, a boring", "line"
(3 answers)
Closed 6 years ago.
I'm working with a hand fill file and I am having issue to parse it.
My file input file cannot be altered, and the language of my code can't change from bash script.
I made a simple example to make it easy for you ^^
var="hey","i'm","happy, like","you"
IFS="," read -r one two tree for five <<<"$var"
echo $one:$two:$tree:$for:$five
Now I think you already saw the problem here. I would like to get
hey:i'm:happy, like:you:
but I get
hey:i'm:happy: like:you
I need a way to tell the read that the " " are more important than the IFS. I have read about the eval command but I can't take that risk.
To end this is a directory file and the troublesome field is the description one, so it could have basically anything in it.
original file looking like that
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
"type","cn","uid","gid","gecos","description","timestamp","disabled"
Edit #1
I will give a better exemple; the one I use above is too simple and #StefanHegny found it cause another error.
while read -r ldapLine
do
IFS=',' read -r objectClass dumy1 uidNumber gidNumber username description modifyTimestamp nsAccountLock gecos homeDirectory loginShell createTimestamp dumy2 <<<"$ldapLine"
isANetuser=0
while IFS=":" read -r -a class
do
for i in "${class[#]}"
do
if [ "$i" == "account" ]
then
isANetuser=1
break
fi
done
done <<< $objectClass
if [ $isANetuser == 0 ]
then
continue
fi
#MORE STUFF APPEND#
done < file.csv
So this is a small part of the code but it should explain what I do. The file.csv is a lot of lines like this:
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
If the various bash versions you will use are all more recent than v3.0, when regexes and BASH_REMATCH were introduced, you could use something like the following function: [Note 1]
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"[^\"]*\") ]]; do
printf "%s\n" "${BASH_REMATCH[2]:-${BASH_REMATCH[1]:1:-1}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
It's argument is a single line (remember to quote it!) and it prints each comma-separated field on a separate line. As written, it assumes that no field has an enclosed newline; that's legal in CSV, but it makes dividing the file into lines a lot more complicated. If you actually needed to deal with that scenario, you could change the \n in the printf statement to a \0 and then use something like xargs -0 to process the output. (Or you could insert whatever processing you need to do to the field in place of the printf statement.)
It goes to some trouble to dequote quoted fields without modifying unquoted fields. However, it will fail on fields with embedded double quotes. That's fixable, if necessary. [Note 2]
Here's a sample, in case that wasn't obvious:
while IFS= read -r line; do
each_field "$line"
printf "%s\n" "-----"
done <<EOF
type,cn,uid,gid,gecos,"description",timestamp,disabled
"top:shadowAccount:account:posixAccount","Jdupon","12345","6789","Jdupon","Jean Mark, Dupon","20140511083750Z","","Jean Mark, Dupon","/home/user/Jdupon","/bin/ksh","20120512083750Z","",""
EOF
Output:
type
cn
uid
gid
gecos
description
timestamp
disabled
-----
top:shadowAccount:account:posixAccount
Jdupon
12345
6789
Jdupon
Jean Mark, Dupon
20140511083750Z
Jean Mark, Dupon
/home/user/Jdupon
/bin/ksh
20120512083750Z
-----
Notes:
I'm not saying you should use this function. You should use a CSV parser, or a language which includes a good CSV parsing library, like python. But I believe this bash function will work, albeit slowly, on correctly-formatted CSV files of a certain common CSV dialect.
Here's a version which handles doubled quotes inside quoted fields, which is the classic CSV syntax for interior quotes:
each_field () {
local v=,$1;
while [[ $v =~ ^,(([^\",]*)|\"(([^\"]|\"\")*)\") ]]; do
echo "${BASH_REMATCH[2]:-${BASH_REMATCH[3]//\"\"/\"}}";
v=${v:${#BASH_REMATCH[0]}};
done
}
My suggestion, as in some previous answers (see below), is to switch the separator to | (and use IFS="|" instead):
sed -r 's/,([^,"]*|"[^"]*")/|\1/g'
This requires a sed that has extended regular expressions (-r) however.
Should I use AWK or SED to remove commas between quotation marks from a CSV file? (BASH)
Is it possible to write a regular expression that matches a particular pattern and then does a replace with a part of the pattern
I am currently trying to extract ALL matching expressions from a text which e.g. looks like this and put them into an array.
aaaaaaaaa${bbbbbbb}ccccccc${dddd}eeeee
ssssssssssssssssss${TTTTTT}efhsekfh ej
348653jlk3jß1094utß43t59ßgöelfl,-s-fko
The matching expressions are similar to this: ${}. Beware that I need the full expression, not only the word in between this expression! So in this case the result should be an array which contains:
${bbbbbbb}
${dddd}
${TTTTTTT}
Problems I have stumbled upon and couldn't solve:
It should NOT recognizes this as a whole
${bbbbbbb}ccccccc${dddd} but each for its own
grep -o is not installed on the old machine, Perl is not allowed either!
Many commands e.g. BASH_REMATCH only deliver the whole line or the first occurrence of the expression, instead of all matching expressions in the line!
The mentioned pattern \${[^}]*} seems to work partly, as it can extract the first occurrence of the expression, however it always omitts the ones following after that, if it's in the same text line. What I need is ALL matching expressions found in the line, not only the first one.
You could split the string on any of the characters $,{,}:
$ s='...blaaaaa${blabla}bloooo${bla}bluuuuu...'
$ echo "$s"
...blaaaaa${blabla}bloooo${bla}bluuuuu...
$ IFS='${}' read -ra words <<< "$s"
$ for ((i=0; i<${#words[#]}; i++)); do printf "%d %s\n" $i "${words[i]}"; done
0 ...blaaaaa
1
2 blabla
3 bloooo
4
5 bla
6 bluuuuu...
So if you're trying to extract the words inside the braces:
$ for ((i=2; i<${#words[#]}; i+=3)); do printf "%d %s\n" $i "${words[i]}"; done
2 blabla
5 bla
If the above doesn't suit you, grep will work:
$ echo '...blaaaaa${blabla}bloooo${bla}bluuuuu...' | grep -o '\${[^}]\+}'
${blabla}
${bla}
You still haven't told us exactly what output you want.
Since it bugged me a lot I have asked directly on www.unix.com and was kindly provided with a solution which fits for my ancient shell. So if anyone got the same problem here is the solution:
line='aaaa$aa{yyy}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line"
regex='^(\{[^}]+})'
for e in "${words[#]}"; do
if [[ $e =~ $regex ]]; then
echo "\$${BASH_REMATCH[0]}";
fi;
done
which prints then the following - without even getting disturbed by random occurrences of $ and { or } between the syntactically correct expressions:
${important}
${important2}
${importantstring3}
I have updated the full solution after I got another update from the forums: now it also ignores this: aaa$aa{yyy}aaaa - which it previously printed as ${yyy} - but which it should completely ignore as there are characters between $ and {. Now with the additional anchoring on the beginning of the regexp it works as expected.
I just found another issue: theoretically using the above approach I would still get a wrong output if the read line looks like this line='{ccc}aaaa${important}aaa'. The IFS would split it and the REGEX would match {ccc} although this hadn't the $ sign in front. This is suboptimal.
However following approach could solve it: after getting the BASH_REMATCH I would need to do a search in the original line - the one I gave to the IFS - for this exact expression ${ccc} - with the difference, that the $ is included! And only if it finds this exact match, only then, it counts as a valid match; otherwise it should be ignored. Kind of a reverse search method...
Updated - add this reverse search to ignore the trap on the beginning of the line:
pattern="\$${BASH_REMATCH[0]}";
searchresult="";
searchresult=`echo "$line" | grep "$pattern"`;
if [ "$searchresult" != "" ]; then echo "It was found!"; fi;
Neglectable issue: If the line looks like this line='{ccc}aaaaaa${ccc}bbbbb' it would recognize the first {ccc} as a valid match (although it isn't) and print it, because the reverse search found the second ${ccc}. Although this is not intended it's irrelevant for my specific purpose as it implies that this pattern does in fact exist at least once in the same line.
I have a file which has very long rows of data. When i try to read using shell script, the data comes into multiple lines,ie, breaks at certain points.
Example row:
B_18453583||Active|917396140129|405819121107402|Active|7396140129||7396140129|||||||||18-MAY-10|||||18-MAY-10|405819121107402|Outgoing International Calls,Outgoing Calls,WAP,Call Waiting,MMS,Data Service,National Roaming-Voice,Outgoing International Calls except home country,Conference Call,STD,Call Forwarding-Barr,CLIP,Incoming Calls,INTSNS,WAPSNS,International Roaming-Voice,ISD,Incoming Calls When Roaming Internationally,INTERNET||For You Plan||||||||||||||||||
All this is the content of a single line.
I use a normal read like this :
var=`cat pranay.psv`
for i in $var; do
echo $i
done
The output comes as:
B_18453583||Active|917396140129|405819121107402|Active|7396140129||7396140129|||||||||18- MAY-10|||||18-MAY-10|405819121107402|Outgoing
International
Calls,Outgoing
Calls,WAP,Call
Waiting,MMS,Data
Service,National
Roaming-Voice,Outgoing
International
Calls
except
home
country,Conference
Call,STD,Call
Forwarding-Barr,CLIP,Incoming
Calls,INTSNS,WAPSNS,International
Roaming-Voice,ISD,Incoming
Calls
When
Roaming
Internationally,INTERNET||For
You
Plan||||||||||||||||||
How do i print all in single line??
Please help.
Thanks
This is because of word splitting. An easier way to do this (which also disbands with the useless use of cat) is this:
while IFS= read -r -d $'\n' -u 9
do
echo "$REPLY"
done 9< pranay.psv
To explain in detail:
$'...' can be used to create human readable strings with escape sequences. See man bash.
IFS= is necessary to avoid that any characters in IFS are stripped from the start and end of $REPLY.
-r avoids interpreting backslash in text specially.
-d $'\n' splits lines by the newline character.
Use file descriptor 9 for data storage instead of standard input to avoid greedy commands like cat eating all of it.
You need proper quoting. In your case, you should use the command read:
while read line ; do
echo "$line"
done < pranay.psv