Bash for loop / echo removing capitalized characters from string array - bash

I was trying to enumerate a bunch of files in bash and noticed the following strange error occuring.
The input string 'rtpwatcher_Class_Sync_License_Capture' gets echo'd as 'rtpwatcher_Class_ ync_ icense_Capture', seemingly removing uppercase characters at random.
Code:
hb_names=("rtpwatcher_Truckmove_Statemachine" "rtpwatcher_Class_Sync_License_Capture")
hb_test="rtpwatcher_Truckmove_Statemachine,rtpwatcher_Class_Sync_License_Capture"
for i in $(echo $hb_test | tr ',' '\n')
do
echo $i
done
for hb in ${hb_names[#]}; do
echo $hb
done
Output:
rtpwatcher_Truckmove_ tatemachine
rtpwatcher_Class_ ync_ icense_Capture
rtpwatcher_Truckmove_ tatemachine
rtpwatcher_Class_ ync_ icense_Capture
I've tried changing the string to only have one upper case character (rtpwatcher_Class_sync_license_capture) and the output was 'rtpwatcher_Class_sync_license_capture' as expected.
SOLVED:
for hb in "${hb_names[#]}"; do
echo "${hb}"
done

Related

How to prepend to a string that comes out of a pipe

I have two strings saved in a bash variable delimited by :. I want to get extract the second string, prepend that with THIS_VAR= and append it to a file named saved.txt
For example if myVar="abc:pqr", THIS_VAR=pqr should be appended to saved.txt.
This is what I have so far,
myVar="abc:pqr"
echo $myVar | cut -d ':' -f 2 >> saved.txt
How do I prepend THIS_VAR=?
printf 'THIS_VAR=%q\n' "${myVar#*:}"
See Shell Parameter Expansion and run help printf.
The more general solution in addition to #konsolebox's answer is piping into a compound statement, where you can perform arbitrary operations:
echo This is in the middle | {
echo This is first
cat
echo This is last
}

Convert multi-line csv to single line using Linux tools

I have a .csv file that contains double quoted multi-line fields. I need to convert the multi-line cell to a single line. It doesn't show in the sample data but I do not know which fields might be multi-line so any solution will need to check every field. I do know how many columns I'll have. The first line will also need to be skipped. I don't how much data so performance isn't a consideration.
I need something that I can run from a bash script on Linux. Preferably using tools such as awk or sed and not actual programming languages.
The data will be processed further with Logstash but it doesn't handle double quoted multi-line fields hence the need to do some pre-processing.
I tried something like this and it kind of works on one row but fails on multiple rows.
sed -e :0 -e '/,.*,.*,.*,.*,/b' -e N -e '1n;N;N;N;s/\n/ /g' -e b0 file.csv
CSV example
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
The output I want is
First name,Last name,Address,ZIP
John,Doe,Country City Street,12345
Jane,Doe,Country City Street,67890
etc.
etc.
First my apologies for getting here 7 months late...
I came across a problem similar to yours today, with multiple fields with multi-line types. I was glad to find your question but at least for my case I have the complexity that, as more than one field is conflicting, quotes might open, close and open again on the same line... anyway, reading a lot and combining answers from different posts I came up with something like this:
First I count the quotes in a line, to do that, I take out everything but quotes and then use wc:
quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes
If you think of a single multi-line field, knowing if the quotes are 1 or 2 is enough. In a more generic scenario like mine I have to know if the number of quotes is odd or even to know if the line completes the record or expects more information.
To check for even or odd you can use the mod operand (%), in general:
even % 2 = 0
odd % 2 = 1
For the first line:
Odd means that the line expects more information on the next line.
Even means the line is complete.
For the subsequent lines, I have to know the status of the previous one. for instance in your sample text:
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
You can say line 1 (John,Doe,"Country) has 1 quote (odd) what means the status of the record is incomplete or open.
When you go to line 2, there is no quote (even). Nevertheless this does not mean the record is complete, you have to consider the previous status... so for the lines following the first one it will be:
Odd means that record status toggles (incomplete to complete).
Even means that record status remains as the previous line.
What I did was looping line by line while carrying the status of the last line to the next one:
incomplete=0
cat file.csv | while read line; do
quotes=`echo $line | tr -cd '"' | wc -c` # Counts the quotes
incomplete=$((($quotes+$incomplete)%2)) # Check if Odd or Even to decide status
if [ $incomplete -eq 1 ]; then
echo -n "$line " >> new.csv # If line is incomplete join with next
else
echo "$line" >> new.csv # If line completes the record finish
fi
done
Once this was executed, a file in your format generates a new.csv like this:
First name,Last name,Address,ZIP
John,Doe,"Country City Street",12345
I like one-liners as much as everyone, I wrote that script just for the sake of clarity, you can - arguably - write it in one line like:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv
I would appreciate it if you could go back to your example and see if this works for your case (which you most likely already solved). Hopefully this can still help someone else down the road...
Recovering the multi-line fields
Every need is different, in my case I wanted the records in one line to further process the csv to add some bash-extracted data, but I would like to keep the csv as it was. To accomplish that, instead of joining the lines with a space I used a code - likely unique - that I could then search and replace:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l ~newline~ " || echo "$l";done >new.csv
the code is ~newline~, this is totally arbitrary of course.
Then, after doing my processing, I took the csv text file and replaced the coded newlines with real newlines:
sed -i 's/ ~newline~ /\n/g' new.csv
References:
Ternary operator: https://stackoverflow.com/a/3953666/6316852
Count char occurrences: https://stackoverflow.com/a/41119233/6316852
Other peculiar cases: https://www.linuxquestions.org/questions/programming-9/complex-bash-string-substitution-of-csv-file-with-multiline-data-937179/
TL;DR
Run this:
i=0;cat file.csv|while read l;do i=$((($(echo $l|tr -cd '"'|wc -c)+$i)%2));[[ $i = 1 ]] && echo -n "$l " || echo "$l";done >new.csv
... and collect results in new.csv
I hope it helps!
If Perl is your option, please try the following:
perl -e '
while (<>) {
$str .= $_;
}
while ($str =~ /("(("")|[^"])*")|((^|(?<=,))[^,]*((?=,)|$))/g) {
if (($el = $&) =~ /^".*"$/s) {
$el =~ s/^"//s; $el =~ s/"$//s;
$el =~ s/""/"/g;
$el =~ s/\s+(?!$)/ /g;
}
push(#ary, $el);
}
foreach (#ary) {
print /\n$/ ? "$_" : "$_,";
}' sample.csv
sample.csv:
First name,Last name,Address,ZIP
John,Doe,"Country
City
Street",12345
John,Doe,"Country
City
Street",67890
Result:
First name,Last name,Address,ZIP
John,Doe,Country City Street,12345
John,Doe,Country City Street,67890
This might work for you (GNU sed):
sed ':a;s/[^,]\+/&/4;tb;N;ba;:b;s/\n\+/ /g;s/"//g' file
Test each line to see that it contains the correct number of fields (in the example that was 4). If there are not enough fields, append the next line and repeat the test. Otherwise, replace the newline(s) by spaces and finally remove the "'s.
N.B. This may be fraught with problems such as ,'s between "'s and quoted "'s.
Try cat -v file.csv. When the file was made with Excel, you might have some luck: When the newlines in a field are a simple \n and the newline at the end is a \r\n (which will look like ^M), parsing is simple.
# delete all newlines and replace the ^M with a new newline.
tr -d "\n" < file.csv| tr "\r" "\n"
# Above two steps with one command
tr "\n\r" " \n" < file.csv
When you want a space between the joined line, you need an additional step.
tr "\n\r" " \n" < file.csv | sed '2,$ s/^ //'
EDIT: #sjaak commented this didn't work is his case.
When your broken lines also have ^M you still can be a lucky (wo-)man.
When your broken field is always the first field in double quotes and you have GNU sed 4.2.2, you can join 2 lines when the first line has exactly one double quote.
sed -rz ':a;s/(\n|^)([^"]*)"([^"]*)\n/\1\2"\3 /;ta' file.csv
Explanation:
-z don't use \n as line endings
:a label for repeating the step after successful replacement
(\n|^) Search after a newline or the very first line
([^"]*) Substring without a "
ta Go back to label a and repeat
awk pattern matching is working.
answer in one line :
awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile
if you'd like to drop quotes, you could use:
awk '/,"/{ORS=" "};/",/{ORS="\n"}{print $0}' YourFile | sed 's/"//gw NewFile'
but I prefer to keep it.
to explain the code:
/Pattern/ : find pattern in current line.
ORS : indicates the output line record.
$0 : indicates the whole of the current line.
's/OldPattern/NewPattern/': substitude first OldPattern with NewPattern
/g : does the previous action for all OldPattern
/w : write the result to Newfile

How do I recursively replace part of a string with another given string in bash?

I need to write bash script that converts a string of only integers "intString" to :id. intString always exists after /, may never contain any other types (create_step2 is not a valid intString), and may end at either a second / or end of line. intString may be any 1-8 characters. Script needs to be repeated for every line in a given file.
For example:
/sample/123456/url should be converted to /sample/:id/url
and /sample_url/9 should be converted to /sampleurl/:id however /sample_url_2/ should remain the same.
Any help would be appreciated!
It seems like the long way around the problem to go recursive but then I don't know what problem you are solving. It seems like a good sed command like
sed -E 's/\/[0-9]{1,}/\/:id/g'
could do it in one shot, but if you insist on being recursive then it might go something like this ...
#!/bin/bash
function restring()
{
s="$1"
s="$(echo $s | sed -E 's/\/[0-9]{1,}/\/:id/')"
if ( echo $s | grep -E '\/[0-9]{1,}' > /dev/null ) ; then
restring $s
else
echo $s
exit
fi
echo $s
}
restring "$1"
now run it
$ ./restring.sh "/foo/123/bar/456/baz/45435/andstuff"
/foo/:id/bar/:id/baz/:id/andstuff

\t Tab is lost in bash script

I'm reading a text file line by line in a bash script. The text file I'm reading is a tab separated csv - however when I try to cut the read line, it does not work, it seems like the \t is converted to a blank space somewhere
Below code is not what I am doing finally - I have not yet implemented the actual workload to the code, until the data can be read reliably.
for (( currlineno=2 ; $currlineno <= $maxlines ; currlineno++ )); do
currline=$(sed -n "$currlineno"p "$IMPORT_TABLE".csv )
echo $currline |cut -f2
done
now when I change the two lines like below it works
for (( currlineno=2 ; $currlineno <= $maxlines ; currlineno++ )); do
currline=$(sed -n "$currlineno"p "$IMPORT_TABLE".csv |tr '\t' ';')
echo $currline |cut -f2 -d ';'
done
but I cannot do it like that as my text file also contains ';' ',' and '.' in the fields. Tab is the only acceptable option for me, as my fields will never contain it.
That's because you don't double quote your variable.
tabbed=$'a\tb'
echo $tabbed : "$tabbed"
When bash sees the variable outside of quotes, it applies word splitting on its contents, and echo just outputs its parameters separated by spaces. Double quotes make the value one parameter, even if it contains whitespace, newlines, etc.

gnuplot for cycle and spaces in filename

I have small script in bash, which is generating graphs via gnuplot.
Everything works fine until names of input files contain space(s).
Here's what i've got:
INPUTFILES=("data1.txt" "data2 with spaces.txt" "data3.txt")
...
#MAXROWS is set earlier, not relevant.
for LINE in $( seq 0 $(( MAXROWS - 1 )) );do
gnuplot << EOF
reset
set terminal png
set output "out/graf_${LINE}.png"
filenames="${INPUTFILES[#]}"
set multiplot
plot for [file in filenames] file every ::0::${LINE} using 1:2 with line title "graf_${LINE}"
unset multiplot
EOF
done
This code works, but only without spaces in names of input files.
In the example gnuplot evaluate this:
1 iteration: file=data1.txt - CORRECT
2 iteration: file=data2 - INCORRECT
3 iteration: file=with - INCORRECT
4 iteration: file=spaces.txt - INCORRECT
The quick answer is that you can't do exactly what you want to do. Gnuplot splits the string in an iteration on spaces and there's no way around that (AFIK). Depending on what you want, there may be a "Work-around". You can write a (recursive) function in gnuplot to replace a character string with another --
#S,C & R stand for STRING, CHARS and REPLACEMENT to help this be a little more legible.
replace(S,C,R)=(strstrt(S,C)) ? \
replace( S[:strstrt(S,C)-1].R.S[strstrt(S,C)+strlen(C):] ,C,R) : S
Bonus points to anyone who can figure out how to do this without recursion...
Then your (bash) loop looks something like:
INPUTFILES_BEFORE=("data1.txt" "data2 with spaces.txt" "data3.txt")
INPUTFILES=()
#C style loop to avoid changing IFS -- Sorry SO doesn't like the #...
#This loop pre-processes files and changes spaces to '#_#'
for (( i=0; i < ${#INPUTFILES_BEFORE[#]}; i++)); do
FILE=${INPUTFILES_BEFORE[${i}]}
INPUTFILES+=( "`echo ${FILE} | sed -e 's/ /#_#/g'`" ) #replace ' ' with '#_#'
done
which preprocesses your input files to add '#_#' to the filenames which have spaces in them... Finally, the "complete" script:
...
INPUTFILES_BEFORE=("data1.txt" "data2 with spaces.txt" "data3.txt")
INPUTFILES=()
for (( i=0; i < ${#INPUTFILES_BEFORE[#]}; i++)); do
FILE=${INPUTFILES_BEFORE[${i}]}
INPUTFILES+=( "`echo ${FILE} | sed -e 's/ /#_#/g'`" ) #replace ' ' with '#_#'
done
for LINE in $( seq 0 $(( MAXROWS - 1 )) );do
gnuplot <<EOF
filenames="${INPUTFILES[#]}"
replace(S,C,R)=(strstrt(S,C)) ? \
replace( S[:strstrt(S,C)-1].R.S[strstrt(S,C)+strlen(C):] , C ,R) : S
#replace '#_#' with ' ' in filenames.
plot for [file in filenames] replace(file,'#_#',' ') every ::0::${LINE} using 1:2 with line title "graf_${LINE}"
EOF
done
However, I think the take-away here is that you shouldn't use spaces in filenames ;)
Escape the spaces:
"data2\ with\ spaces.txt"
EDIT
It seems that even with escape sequences, as you have mentioned, the bash for will always parse the input on the spaces.
Can you convert your script to work in a while loop fashion:
http://ubuntuforums.org/showthread.php?t=83424
This also may be a solution, but it's new to me and I'm still playing with it to understand exactly what it's doing:
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html

Resources