Getting rid of some special symbol while reading from a file

Getting rid of some special symbol while reading from a file - bash

I am writing a small script which is getting some configuration options from a settings file with a certain format (option=value or option=value1 value2 ...).
settings-file:
SomeOption=asdf
IFS=HDMI1 HDMI2 VGA1 DP1
SomeOtherOption=ghjk
Script:
for VALUE in $(cat settings | grep IFS | sed 's/.*=\(.*\)/\1/'); do
echo "$VALUE"x
done
Now I get the following output:
HDMI1x
HDMI2x
VGA1x
xP1
Expected output:
HDMI1x
HDMI2x
VGA1x
DP1x
I obviously can't use the data like this since the last read entry is mangled up somehow. What is going on and how do I stop this from happening?
Regards

Generally you can use awk like this:
awk -F'[= ]' '$1=="IFS"{for(i=2;i<=NF;i++)print $i"x"}' settings
-F'[= ] splits the line by = or space. The following awk program checks if the first field, the variable name equals IFS and then iterates trough column 2 to the end and prints them.
However, in comments you said that the file is using Windows line endings. In this case you need to pre-process the file before using awk. You can use tr to remove the carriage return symbols:
tr -d '\r' settings | awk -F'[= ]' '$1=="IFS"{for(i=2;i<=NF;i++)print $i"x"}'

The reason is likely that your settings file uses DOS line endings.
Once you've fixed that (with dos2unix for example), your loop can also be modified to the following, removing two utility invocations:
for value in $( sed -n -e 's/^IFS.*=\(.*\)/\1/p' settings ); do
echo "$value"x
done
Or you can do it all in one go, removing the need to modify the settings file at all:
tr -d '\r' <settings |
for value in $( sed -n -e 's/^IFS.*=\(.*\)/\1/p' ); do
echo "$value"x
done

Related

How to grab the value of a variable that is tabbed using bash

I have a golang file named ver.go consisting of 1 variable.
const (
ver = "1.1.1"
)
I want to be able to output 1.1.1 using a bash command. I am able to do this without a problem if we get rid of the tab at the beginning and the spaces like so
const (
ver="1.1.1"
)
by using this command awk -F= '/^ver=/{print $2}' ver.go | sed -e 's/^"//' -e 's/"$//'
However, since it must be formatted properly with gofmt I can't seem to figure it out with the tab in there as well as the space after the equal sign
Any help is appreciated.

You may use this awk:
awk -F= '$1 ~ /^[[:blank:]]*ver/{gsub(/["[:blank:]]+/, ""); print $2}' file
1.1.1

A simple solution if you just want "1.1.1" from the file ver.go is to let sed do the work for you. You can use the normal substitution form with a single backreference to capture the "1.1.1" and reinsert it as the replacement for the entire line. You can use sed -n to suppress output of normal pattern-space and add a p after the substitution so that only the line matching your REGEX prints following successful substitution, e..g
sed -n 's/^[[:space:]]*ver[[:space:]]*=[[:space:]]*"\([^"][^"]*\).*$/\1/p' ver.go
This will work with or without the spaces (or tabs) before the "ver".
Example Use/Output
With your ver.go contents, you would get:
$ sed -n 's/^[[:space:]]*ver[[:space:]]*=[[:space:]]*"\([^"][^"]*\).*$/\1/p' ver.go
1.1.1
If I misunderstood what you are after, please let me know. If you have further questions, just drop a comment below.

With bash and Parameter Expansion to remove all ":
while read -r key x value; do [[ "$key" == "ver" ]] && echo "${value//\"/}"; done < ver.go
Output:
1.1.1

Unable to get sed to replace commas with a word in my CSV

Hello I am using bash to create CSV file by extracting data from an html file using grep. The problem is after getting the data then using sed to take out , in it and put a word like My_com it gose a crazy on me. here is my code.
time=$(grep -oP 'data-context-item-time=.*.data-context-item-views' index.html \
| cut -d'"' -f2)
title=$(grep -oP 'data-context-item-title=.*.data-context-item-id' index.html |\
cut -d'"' -f2)
sed "s/,/\My_commoms/g" $title
echo "$user,$views,$time,$title" >> test
I keep getting this error
sed: can't read Flipping: No such file or directory
sed: can't read the: No such file or directory
and so on
any advice on what wrong with my code

You can't use sed on text directly on the command line like that; sed expects a file, so it is reading your text as a file name. Try this for your second to last line:
echo $title | sed 's/,/My_com/g'
that way sed sees the text on a file (stdin in this case). Also note that I've used single quotes in the argument to sed; in this case I don't think it will make any difference, but in general it is good practice to make sure bash doesn't mess with the command at all.
If you don't want to use the echo | sed chain, you might also be able to rewrite it like this:
sed 's/,/My_com/g' <<< "$title"
I think that only works in bash, not dash etc. This is called a 'here-string', and bash passes the stuff on the right of the <<< to the command on its stdin, so you get the same effect.

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?

Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2

To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.

What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.

In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile

Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4

Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information

An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).

Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

Get substring from file using "sed"

Can anyone help me to get substring using sed program?
I have a file with this line:
....
define("BASE", "empty"); # there can be any string (not only "empty").
....
And I need to get "empty" as string variable to my bash script.
At this moment I have:
sed -n '/define(\"BASE\"/p' path/to/file.ext
# returns this line:
# define("BASE", "empty");
# but I need 'empty'
UPD: Thanks to #Jaypal
For now I have bash script:
DBNAME=`sed -n '/define(\"BASE\"/p' path/to/file.ext`
echo $DBNAME | sed -r 's/.*"([a-zA-Z]+)".*/\1/'
It work OK, but if there any way to make the same manipulation with one line of code?

You should use is
sed -n 's/.*\".*\", \"\(.*\)\".*/\1/p' yourFile.txt
which means something (.*) followed by something in quotes (\".*\"), then a comma and a blank space (,), and then again something within quotes (\"\(.*\)\").
The brackets define the part that you later can reuse, i.e. the string within the second quotes. used it with \1.
I put -n front in order to answer the updated question, to get online the line that was manipulated.

This should help -
sed -r 's/.*"([a-zA-Z]+)"\);/\1/' path/to/file.ext
If you are ok with using awk then you can try the following -
awk -F\" '/define\(/{print $(NF-1)}' path/to/file.ext
Update:
DBNAME=$(sed -r '/define\(\"BASE\"/s/.*"([a-zA-Z]+)"\);/\1/' path/to/file.ext)

sed -nr '/^define.*"(.*)".*$/{s//\1/;p}' path/to/file.ext

if your file doesn't change over time (i.e. the line numbers will always be the same) you can take the line, and use delimiters to take your part out:
`sed -n 'Xp' your.file | cut -d ' ' -f 2 |cut -d "\"" -f 2`
assuming X is the line number of your required line

Concise and portable "join" on the Unix command-line

How can I join multiple lines into one line, with a separator where the new-line characters were, and avoiding a trailing separator and, optionally, ignoring empty lines?
Example. Consider a text file, foo.txt, with three lines:
foo
bar
baz
The desired output is:
foo,bar,baz
The command I'm using now:
tr '\n' ',' <foo.txt |sed 's/,$//g'
Ideally it would be something like this:
cat foo.txt |join ,
What's:
the most portable, concise, readable way.
the most concise way using non-standard unix tools.
Of course I could write something, or just use an alias. But I'm interested to know the options.

Perhaps a little surprisingly, paste is a good way to do this:
paste -s -d","
This won't deal with the empty lines you mentioned. For that, pipe your text through grep, first:
grep -v '^$' | paste -s -d"," -

This sed one-line should work -
sed -e :a -e 'N;s/\n/,/;ba' file
Test:
[jaypal:~/Temp] cat file
foo
bar
baz
[jaypal:~/Temp] sed -e :a -e 'N;s/\n/,/;ba' file
foo,bar,baz
To handle empty lines, you can remove the empty lines and pipe it to the above one-liner.
sed -e '/^$/d' file | sed -e :a -e 'N;s/\n/,/;ba'

How about to use xargs?
for your case
$ cat foo.txt | sed 's/$/, /' | xargs
Be careful about the limit length of input of xargs command. (This means very long input file cannot be handled by this.)

Perl:
cat data.txt | perl -pe 'if(!eof){chomp;$_.=","}'
or yet shorter and faster, surprisingly:
cat data.txt | perl -pe 'if(!eof){s/\n/,/}'
or, if you want:
cat data.txt | perl -pe 's/\n/,/ unless eof'

Just for fun, here's an all-builtins solution
IFS=$'\n' read -r -d '' -a data < foo.txt ; ( IFS=, ; echo "${data[*]}" ; )
You can use printf instead of echo if the trailing newline is a problem.
This works by setting IFS, the delimiters that read will split on, to just newline and not other whitespace, then telling read to not stop reading until it reaches a nul, instead of the newline it usually uses, and to add each item read into the array (-a) data. Then, in a subshell so as not to clobber the IFS of the interactive shell, we set IFS to , and expand the array with *, which delimits each item in the array with the first character in IFS

I needed to accomplish something similar, printing a comma-separated list of fields from a file, and was happy with piping STDOUT to xargs and ruby, like so:
cat data.txt | cut -f 16 -d ' ' | grep -o "\d\+" | xargs ruby -e "puts ARGV.join(', ')"

I had a log file where some data was broken into multiple lines. When this occurred, the last character of the first line was the semi-colon (;). I joined these lines by using the following commands:
for LINE in 'cat $FILE | tr -s " " "|"'
do
if [ $(echo $LINE | egrep ";$") ]
then
echo "$LINE\c" | tr -s "|" " " >> $MYFILE
else
echo "$LINE" | tr -s "|" " " >> $MYFILE
fi
done
The result is a file where lines that were split in the log file were one line in my new file.

Simple way to join the lines with space in-place using ex (also ignoring blank lines), use:
ex +%j -cwq foo.txt
If you want to print the results to the standard output, try:
ex +%j +%p -scq! foo.txt
To join lines without spaces, use +%j! instead of +%j.
To use different delimiter, it's a bit more tricky:
ex +"g/^$/d" +"%s/\n/_/e" +%p -scq! foo.txt
where g/^$/d (or v/\S/d) removes blank lines and s/\n/_/ is substitution which basically works the same as using sed, but for all lines (%). When parsing is done, print the buffer (%p). And finally -cq! executing vi q! command, which basically quits without saving (-s is to silence the output).
Please note that ex is equivalent to vi -e.
This method is quite portable as most of the Linux/Unix are shipped with ex/vi by default. And it's more compatible than using sed where in-place parameter (-i) is not standard extension and utility it-self is more stream oriented, therefore it's not so portable.

POSIX shell:
( set -- $(cat foo.txt) ; IFS=+ ; printf '%s\n' "$*" )

My answer is:
awk '{printf "%s", ","$0}' foo.txt
printf is enough. We don't need -F"\n" to change field separator.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Getting rid of some special symbol while reading from a file - bash

Related

How to grab the value of a variable that is tabbed using bash

Unable to get sed to replace commas with a word in my CSV

Counting commas in a line in bash

Get substring from file using "sed"

Concise and portable "join" on the Unix command-line

Categories

Resources