Reading from csv adding a new line to variabes - bash

Reading from csv adding a new line to variabes - bash - bash

I am writing a bash script to read rows from csv and put them in variables. The read however is adding a new line to the variables. I tried removing them but it doesnt work.
In this case, ROW2 is Printing as -
Testing - line1
- line2
However ROW1 is fine
My script:
sed '1d' export.csv | while IFS=,read -r ROW1 ROW2
do
ROW2=${ROW2%$'\n'}
echo ROW2
done < export.csv
export.csv:
abc,Testing

Try this ( I corrected some typos in your code ):
sed '1d' export.csv | while IFS=, read -r ROW1 ROW2
do
ROW1=${ROW1%$'\n'}
echo $ROW1 $ROW2
done

This might work better for you assuming that you're deleting the first line of the file to avoid processing CSV headers:
#!/bin/bash
while IFS=, read -r ROW1 ROW2
do
echo $ROW1 $ROW2
done < <(sed '1d' export.csv)
This is using a technique called process substitution to feed the output of the sed command into the input of while.
With that, I don't seem to have any issue with newlines in the variables.
You may be confusing the newline created by the echo command. For example, if we run the above script and pipe the output to hexdump, we can see what newlines actually exist:
# ./test.sh | hexdump -C
00000000 61 62 63 20 54 65 73 74 69 6e 67 0a |abc Testing.|
There is one newline character present in the output (0a) and it is created by the echo command. To prove it, we can add the -n flag to the echo call (echo -n $ROW1 $ROW2) to suppress the newline:
# ./test.sh | hexdump -C
00000000 61 62 63 20 54 65 73 74 69 6e 67 |abc Testing|
0000000b

Related

UNIX/Linux shell script: Removing variant form emoji from a text

Consider you are using a Linux/UNIX shell whose default character set is UTF-8:
$ echo $LANG
en_US.UTF-8
You have a text file, emoji.txt, which is coded in UTF-8:
$ file -i ./emoji.txt
./emoji.txt: text/plain; charset=utf-8
This text file contains some emoji and a variant form escape sequence:
$ cat ./emoji.txt
Standard ☁
Variant form ☁️
$ uni2ascii -a B -q ./emoji.txt
Standard \x2601
Variant form \x2601\xFE0F
You want to remove both emoji, including that variant form character (\xFE0F), and so the output should be
Standard
Variant form
How would you do this?
Update. This question is not about how to remove the last word in every line. Imagine emoji2.txt that includes a large text with many emoji characters; and some of these are followed by the variant form sequence.

With GNU sed and bash:
sed -E s/$'\u2601\uFE0F?'//g emoji.txt

You can use awk, like this:
$ cat emo.ascii
Standard \x2601
Variant form \x2601\xFE0F
$ ascii2uni -a B emo.ascii
Standard ☁
Variant form ☁️
3 tokens converted # note: this is stderr
$ ascii2uni -a B emo.ascii | awk -F' ' '{NF--}1' | cat -A
3 tokens converted # note: this is stderr
Standard$
Variant form$
NF-- will decrease the field count in awk, which effectively removes the last field. 1 evaluates to true, which makes awk print the modified line.
(Used cat -A here only to show that there aren't any invisible characters left)

Have awk print all but the last field:
$ awk '/^Standard/ || /^Variant form/ { $(NF)="" }1' emoji.txt
Standard
Variant form
NOTE: This particular solution will leave the field separator (blank) on the end of the output line; if you want to strip the trailing blank you can pipe to sed, tr, etc ... or have awk loop through fields 1 to (NF-1) and output via printf

Use nkf command. nkf -s try to convert character encoding to Shift-jis which does not support emojis. Therefore, emojis and escape sequence will be gone. Finally, revert input to UTF-8 with nkf -w.
$ cat emoji.txt | nkf -s | nkf -w
Standard
Variant form
$ cat emoji.txt | nkf -s | nkf -w | od -tx1c
0000000 53 74 61 6e 64 61 72 64 20 0a 56 61 72 69 61 6e
S t a n d a r d \n V a r i a n
0000020 74 20 66 6f 72 6d 20 0a
t f o r m \n
0000030
I thought ruby may work. Because \p{Emoji} matches emojis. But it remains the escape sequences..
$ ruby -nle 'puts $_.gsub!(/\p{Emoji}/,"")' emoji.txt
Standard
Variant form ️
$ ruby -nle 'puts $_.gsub!(/\p{Emoji}/,"")' emoji.txt | od -tx1c
0000000 53 74 61 6e 64 61 72 64 20 0a 56 61 72 69 61 6e
S t a n d a r d \n V a r i a n
0000020 74 20 66 6f 72 6d 20 ef b8 8f 0a
t f o r m 217 \n
0000033

Convert the Unicode text file to ASCII and remove those Unicode characters that are represented by ASCII characters, and convert it to UTF-8 again:
$ uni2ascii -q ./emoji.txt | sed "s/ 0x2601\(0xFE0F\)\?//g" | ascii2uni -q
Standard
Variant form
$

Capturing special characters from stdin to a shell variable

I have a program which prints something that contains null bytes \0 and special characters like \x1f and newlines. For instance:
someprogram
#!/bin/bash
printf "ALICE\0BOB\x1fCHARLIE\n"
Given such a program, I want to read its output in such a way that all those special characters are captured in a shell variable output. So, if I run:
echo $output
because I'm not giving -e, I'd want the output to be:
ALICE\0BOB\x1fCHARLIE\n
How can this be achieved?
My first attempt was:
output=$(someprogram)
But I got this echoed output which doesn't have the special characters:
./myscript.sh: line 2: warning: command substitution: ignored null byte in input
ALICEBOBCHARLIE
I also tried to use read as follows:
output=""
while read -r
do
output="$output$REPLY"
done < <(someprogram)
Then I got rid of the warning but the output is still missing all special characters:
ALICEBOBCHARLIE
So how can I capture the output of someprogram in such a way that I have all the special characters in my resulting string?
EDIT: Note that it is possible to have such strings in bash:
$ x="ALICE\0BOB\x1fCHARLIE\n"
$ echo $x
ALICE\0BOB\x1fCHARLIE\n
So that shouldn't be the problem.
EDIT2: I'll reformulate the question a little bit now that I got an accepted answer and I understood things a little bit better. So, I just needed to be able to store the output of someprogram in some shell variable in such a way that I can print it to stdout without any changes in any special characters as if someprogram was just piped directly to stdout.

You just can't store zero byte in bash variables. It's impossible.
The usual solution is to convert the stream of bytes into hexadecimal. Then convert it back each time you want to do something with it.
$ x=$(printf "ALICE\0BOB\x1fCHARLIE\n" | xxd -p)
$ echo "$x"
414c49434500424f421f434841524c49450a
$ <<<"$x" xxd -p -r | hexdump -C
00000000 41 4c 49 43 45 00 42 4f 42 1f 43 48 41 52 4c 49 |ALICE.BOB.CHARLI|
00000010 45 0a |E.|
00000012
You can also write your own serialization and deserialization functions for the purpose.
Another idea I have is to for example read the data into an array by using zero byte as a separator (as any other byte is valid). This however will have problems with distinguishing the trailing zero byte:
$ readarray -d '' arr < <(printf "ALICE\0BOB\x1fCHARLIE\n")
$ printf "%s\0" "${arr[#]}" | hexdump -C
00000000 41 4c 49 43 45 00 42 4f 42 1f 43 48 41 52 4c 49 |ALICE.BOB.CHARLI|
00000010 45 0a 00 |E..|
# ^^ additional zero byte if input doesn't contain a trailing zero byte
00000013

N combinations of words, simple bash

File 1:
1F
2F
3F
4F
5f
File 2:
1F
2F
3F
4F
5f
I have a simple code that produces all possible combinations
#!/bin/bash
for a in $(awk '{print $1}' intf1)
do
for b in $(awk '{print $1}' intf2)
do
echo -e "$a:$b" >> file
done
done
Output of this code:
1F:1F
1F:2F
1F:3F
1F:4F
2F:1F
etc
But I would like to:
1) Completely avoid repetitions
2) "Select the number" (the number of words (lines) which he will be taken from the second file):
Each two lines in second file:
1F:2F
1F:3F
2F:3F
2F:4F
3F:4F
3F:5F
4F:5F
Each three lines in second file:
1F:2F
1F:3F
1F:4F
2F:3F
2F:4F
2F:5F
etc..
And etc

If the files could be sorted (and exclude repeating lines), this could work:
printf "%s\n" $(eval "echo {$(sort -u file.txt | paste -sd, -)}:{$(sort -u file2.txt | head -2 | paste -sd, -)}") | sort -u
It uses the bash expansion to generate the combinations, like:
$ echo {a,b}{X,Y,Z}
aX aY aZ bX bY bZ
also, you must ABSOLUTELY trust the content of the files, because the code uses dangerous eval.
The argument to head like head -2 could be used for limiting the cound of lines from the file2.txt
The code produces (by limiting the second file to 2 already sorted lines) the following:
20160702F:20160702F
20160702F:20160714F
20160714F:20160702F
20160714F:20160714F
20160807F:20160702F
20160807F:20160714F
20160819F:20160702F
20160819F:20160714F
20160831F:20160702F
20160831F:20160714F
20160912F:20160702F
20160912F:20160714F

Conversion hex string into ascii in bash command line

I have a lot of this kind of string and I want to find a command to convert it in ascii, I tried with echo -e and od, but it did not work.
0xA7.0x9B.0x46.0x8D.0x1E.0x52.0xA7.0x9B.0x7B.0x31.0xD2

This worked for me.
$ echo 54657374696e672031203220330 | xxd -r -p
Testing 1 2 3$
-r tells it to convert hex to ascii as opposed to its normal mode of doing the opposite
-p tells it to use a plain format.

This code will convert the text 0xA7.0x9B.0x46.0x8D.0x1E.0x52.0xA7.0x9B.0x7B.0x31.0xD2 into a stream of 11 bytes with equivalent values. These bytes will be written to standard out.
TESTDATA=$(echo '0xA7.0x9B.0x46.0x8D.0x1E.0x52.0xA7.0x9B.0x7B.0x31.0xD2' | tr '.' ' ')
for c in $TESTDATA; do
echo $c | xxd -r
done
As others have pointed out, this will not result in a printable ASCII string for the simple reason that the specified bytes are not ASCII. You need post more information about how you obtained this string for us to help you with that.
How it works: xxd -r translates hexadecimal data to binary (like a reverse hexdump). xxd requires that each line start off with the index number of the first character on the line (run hexdump on something and see how each line starts off with an index number). In our case we want that number to always be zero, since each execution only has one line. As luck would have it, our data already has zeros before every character as part of the 0x notation. The lower case x is ignored by xxd, so all we have to do is pipe each 0xhh character to xxd and let it do the work.
The tr translates periods to spaces so that for will split it up correctly.

You can use xxd:
$cat hex.txt
68 65 6c 6c 6f
$cat hex.txt | xxd -r -p
hello

You can use something like this.
$ cat test_file.txt
54 68 69 73 20 69 73 20 74 65 78 74 20 64 61 74 61 2e 0a 4f 6e 65 20 6d 6f 72 65 20 6c 69 6e 65 20 6f 66 20 74 65 73 74 20 64 61 74 61 2e
$ for c in `cat test_file.txt`; do printf "\x$c"; done;
This is text data.
One more line of test data.

The values you provided are UTF-8 values. When set, the array of:
declare -a ARR=(0xA7 0x9B 0x46 0x8D 0x1E 0x52 0xA7 0x9B 0x7B 0x31 0xD2)
Will be parsed to print the plaintext characters of each value.
for ((n=0; n < ${#ARR[*]}; n++)); do echo -e "\u${ARR[$n]//0x/}"; done
And the output will yield a few printable characters and some non-printable characters as shown here:
For converting hex values to plaintext using the echo command:
echo -e "\x<hex value here>"
And for converting UTF-8 values to plaintext using the echo command:
echo -e "\u<UTF-8 value here>"
And then for converting octal to plaintext using the echo command:
echo -e "\0<octal value here>"
When you have encoding values you aren't familiar with, take the time to check out the ranges in the common encoding schemes to determine what encoding a value belongs to. Then conversion from there is a snap.

The echo -e must have been failing for you because of wrong escaping.
The following code works fine for me on a similar output from your_program with arguments:
echo -e $(your_program with arguments | sed -e 's/0x\(..\)\.\?/\\x\1/g')
Please note however that your original hexstring consists of non-printable characters.

Make a script like this:
bash
#!/bin/bash
echo $((0x$1)).$((0x$2)).$((0x$3)).$((0x$4))
Example:
sh converthextoip.sh c0 a8 00 0b
Result:
192.168.0.11

bash: cat the first lines of a file & get position

I got a very big file that contains n lines of text (with n being <1000) at the beginning, an empty line and then lots of untyped binary data.
I would like to extract the first n lines of text, and then somehow extract the exact offset of the binary data.
Extracting the first lines is simple, but how can I get the offset? bash is not encoding aware, so just counting up the number of characters is senseless.

grep has an option -b to output the byte offset.
Example:
$ hexdump -C foo
00000000 66 6f 6f 0a 0a 62 61 72 0a |foo..bar.|
00000009
$ grep -b "^$" foo
4:
$ hexdump -s 5 -C foo
00000005 62 61 72 0a |bar.|
00000009
In the last step I used 5 instead of 4 to skip the newline.
Also works with umlauts (äöü) in the file.

Use grep to find the empty line
grep -n "^$" your_file | tr -d ':'
Optionally use tail -n 1 if you want the last empty line (that is, if the top part of the file can contain empty lines before the binary stuff starts).
Use head to get the top part of the file.
head -n $num

you might want to use tools like hexdump or od to retrieve binary offsets instead of bash. Here's a reference.

Perl can tell you where you are in a file:
pos=$( perl -le '
open $fh, "<", $ARGV[0];
$/ = ""; # read the file in "paragraphs"
$first_paragraph = <$fh>;
print tell($fh)
' filename )
Parenthetically, I was attempting to one-liner this
pos=$( perl -00 -lne 'if ($. == 2) {print tell(___what?___); exit}' filename
What is the "current filehandle" variable? I couldn't find it in the docs.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Reading from csv adding a new line to variabes - bash - bash

Try this ( I corrected some typos in your code ): sed '1d' export.csv | while IFS=, read -r ROW1 ROW2 do ROW1=${ROW1%$'\n'} echo $ROW1 $ROW2 done

Related

UNIX/Linux shell script: Removing variant form emoji from a text

Capturing special characters from stdin to a shell variable

N combinations of words, simple bash

Conversion hex string into ascii in bash command line

bash: cat the first lines of a file & get position

Categories

Resources