I'm trying to read file line by line in bash.
Every line has format as follows text|number.
I want to produce file with format as follows text,text,text etc. so new file would have just text from previous file separated by comma.
Here is what I've tried and couldn't get it to work :
FILENAME=$1
OLD_IFS=$IFSddd
IFS=$'\n'
i=0
for line in $(cat "$FILENAME"); do
array=(`echo $line | sed -e 's/|/,/g'`)
echo ${array[0]}
i=i+1;
done
IFS=$OLD_IFS
But this prints both text and number but in different format text number
here is sample input :
dsadadq-2321dsad-dasdas|4212
dsadadq-2321dsad-d22as|4322
here is sample output:
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
What did I do wrong?
Not pure bash, but you could do this in awk:
awk -F'|' 'NR>1{printf(",")} {printf("%s",$1)}'
Alternately, in pure bash and without having to strip the final comma:
#/bin/bash
# You can get your input from somewhere else if you like. Even stdin to the script.
input=$'dsadadq-2321dsad-dasdas|4212\ndsadadq-2321dsad-d22as|4322\n'
# Output should be reset to empty, for safety.
output=""
# Step through our input. (I don't know your column names.)
while IFS='|' read left right; do
# Only add a field if it exists. Salt to taste.
if [[ -n "$left" ]]; then
# Append data to output string
output="${output:+$output,}$left"
fi
done <<< "$input"
echo "$output"
No need for arrays and sed:
while IFS='' read line ; do
echo -n "${line%|*}",
done < "$FILENAME"
You just have to remove the last comma :-)
Using sed:
$ sed ':a;N;$!ba;s/|[0-9]*\n*/,/g;s/,$//' file
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Alternatively, here is a bit more readable sed with tr:
$ sed 's/|.*$/,/g' file | tr -d '\n' | sed 's/,$//'
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Choroba has the best answer (imho) except that it does not handle blank lines and it adds a trailing comma. Also, mucking with IFS is unnecessary.
This is a modification of his answer that solves those problems:
while read line ; do
if [ -n "$line" ]; then
if [ -n "$afterfirst" ]; then echo -n ,; fi
afterfirst=1
echo -n "${line%|*}"
fi
done < "$FILENAME"
The first if is just to filter out blank lines. The second if and the $afterfirst stuff is just to prevent the extra comma. It echos a comma before every entry except the first one. ${line%|\*} is a bash parameter notation that deletes the end of a paramerter if it matches some expression. line is the paramter, % is the symbol that indicates a trailing pattern should be deleted, and |* is the pattern to delete.
Related
I am attempting to return the line number of lines that have a break. An input example:
2938
383
3938
3
383
33333
But my script is not working and I can't see why. My script:
input="./input.txt"
declare -i count=0
while IFS= read -r line;
do
((count++))
if [ "$line" == $'\n\n' ]; then
echo "$count"
fi
done < "$input"
So I would expect, 3, 6 as output.
I just receive a blank response in the terminal when I execute. So there isn't a syntax error, something else is wrong with the approach I am taking. Bit stumped and grateful for any pointers..
Also "just use awk" doesn't help me. I need this structure for additional conditions (this is just a preliminary test) and I don't know awk syntax.
The issue is that "$line" == $'\n\n' won't match a newline as it won't be there after consuming an empty line from the input, instead you can match an empty line with regex pattern ^$:
if [[ "$line" =~ ^$ ]]; then
Now it should work.
It's also match easier with awk command:
$ awk '$0 == ""{ print NR }' test.txt
3
6
As Roman suggested, line read by read terminates with a delimiter, and that delimiter would not show up in the line the way you're testing for.
If the pattern you are searching for looks like an empty line (which I infer is how a "double newline" always manifests), then you can just test for that:
while read -r; do
((count++))
if [[ -z "$REPLY" ]]; then
echo "$count"
fi
done < "$input"
Note that IFS is for field-splitting data on lines, and since we're only interested in empty lines, IFS is moot.
Or if the file is small enough to fit in memory and you want something faster:
mapfile -t -O1 foo < i
declare -p foo
for n in "${!foo[#]}"; do
if [[ -z "${foo[$n]}" ]]; then
echo "$n"
fi
done
Reading the file all at once (mapfile) then stepping through an array may be easier on resources than stepping through a file line by line.
You can also just use GNU awk:
gawk -v RS= -F '\n' '{ print (i += NF); i += length(RT) - 1 }' input.txt
By using FS = ".+", it ensures only truly zero-length (i.e. $0 == "") line numbers get printed, while skipping rows consisting entirely of [[:space:]]'s
echo '2938
383
3938
3
383
33333' |
{m,g,n}awk -F'.+' '!NF && $!NF = NR'
3
6
This sed one-liner should do the job at once:
sed -n '/^$/=' input.txt
Simply writes the current line number (the = command) if the line read is empty (the /^$/ matches the empty line).
I can't figure out why my script is not displaying the string separated by white space.
This is my code:
While read -r row
do
line = ($row)
for word in $line
do
echo ${word[0]}
done
done < $1
say the line is "add $s0 $s0 $t1"
i want the output to be "add"
While read -r row
This will try to run a command called While, you'll probably get an error for that. The shell keyword is while.
do
line = ($row)
This will try to run a command called line, which is a program from GNU coreutils (line - read one line), but probably not what you want. Assignments in the shell must not have whitespace around the equal sign.
If that assignment worked, it would make an array called line.
for word in $line
Referencing the array just by name expands to the first item of it, so the loop is useless here.
do
echo ${word[0]}
And here, indexing is not very useful since word is going to be a single value, not an array.
I suspect what you want is this:
while read -r row ; do
words=($row);
echo "${words[0]}"
done
Though if $row contains glob characters like *, they'll be expanded to matching filenames.
This would be better:
read -r -a words
echo "${words[0]}"
or simply
read -r line
echo "${line%% *}" # remove everything after the first space
This work fine :
while read -r row
do
echo $row | awk '{print $1}'
done
while read -r row ask for user input and store it in row variable, awk '{print $1}' display only first word of user input.
Do you want each token on a seperate line? Why not just use sed?
$ echo "1 2 3 hi" | sed -r 's/[ \t]+/\n/g'
1
2
3
hi
If you want the first word of each line, then:
$ echo "1 2 3 hi" | sed -r 's/^([^ \t]+).+/\1/'
1
If its a file, then remove "echo ... | " and just give the filename as a parameter to sed:
$ sed -r 's/^([^ \t]+).+/\1/' file.txt
I have a file, where I want to add a * char on specific line, and at a specific location in that line.
Is that possible?
Thank you
You can use a kind of external tool available to manipulate data such as sed or awk. You can use this tool directly from your command line or include it in your bash script.
Example:
$ a="This is a test program that will print
Hello World!
Test programm Finished"
$ sed -E '2s/(.{4})/&\*/' <<<"$a" #Or <file
#Output:
This is a test program that will print
Hell*o World!
Test programm Finished
In above test, we enter an asterisk after 4th char of line2.
If you want to operate on a file and make changes directly on the file then use sed -E -i '....'
Same result can also be achieved with gnu awk:
awk 'BEGIN{OFS=FS=""}NR==2{sub(/./,"&*",$4)}1' <<<"$a"
In pure bash you can achieve above output with something like this:
while read -r line;do
let ++c
[[ $c == 2 ]] && printf '%s*%s\n' "${line:0:4}" "${line:4}" || printf '%s\n' "${line}"
# alternative:
# [[ $c == 2 ]] && echo "${line:0:4}*${line:4}" || echo "${line}"
done <<<"$a"
#Alternative for file read:
# done <file >newfile
If your variable is just a single line, you don't need the loop. You can do it directly like:
printf '%s*%s\n' "${a:0:4}" "${a:4}"
# Or even
printf '%s\n' "${a:0:4}*${a:4}" #or echo "${a:0:4}*${a:4}"
I suggest to use sed. If you want to insert an asterisk at the 2nd line at the 5th column:
sed -r "2s/^(.{5})(.*)$/\1*\2/" myfile.txt
2s says you are going to perform a substitution on the 2nd line. ^(.{5})(.*)$ says you are taking 5 characters from the beginning of the line and all characters after it. \1*\2 says you are building the string from the first match (i.e. 5 beginning characters) then a * then the second match (i.e. characters until the end of the line).
If your line and column are in variables you can do something like that:
_line=5
_column=2
sed -r "${_line}s/^(.{${_column}})(.*)$/\1*\2/" myfile.txt
What I have is a file (let's call it 'xfile'), containing lines such as
file1 <- this line goes to file1
file2 <- this goes to file2
and what I want to do is run a script that does the work of actually taking the lines and writing them into the file.
The way I would do that manually could be like the following (for the first line)
(echo "this line goes to file1"; echo) >> file1
So, to automate it, this is what I tried to do
IFS=$'\n'
for l in $(grep '[a-z]* <- .*' xfile); do
$(echo $l | sed -e 's/\([a-z]*\) <- \(.*\)/(echo "\2"; echo)\>\>\1/g')
done
unset IFS
But what I get is
-bash: file1(echo "this content goes to file1"; echo)>>: command not found
-bash: file2(echo "this goes to file2"; echo)>>: command not found
(on OS X)
What's wrong?
This solves your problem on Linux
awk -F ' <- ' '{print $2 >> $1}' xfile
Take care in choosing field-separator in such a way that new files does not have leading or trailing spaces.
Give this a try on OSX
You can use the regex capabilities of bash directly. When you use the =~ operator to compare a variable to a regular expression, bash populates the BASH_REMATCH array with matches from the groups in the regex.
re='(.*) <- (.*)'
while read -r; do
if [[ $REPLY =~ $re ]]; then
file=${BASH_REMATCH[1]}
line=${BASH_REMATCH[2]}
printf '%s\n' "$line" >> "$file"
fi
done < xfile
I am trying to remove newlines from a file. My file is like this (it contains backward slashes):
line1\|
line2\|
I am using the following script to remove newlines:
#!/bin/bash
INPUT="file1"
while read line
do
: echo -n $line
done < $INPUT
I get the following output:
line1|line2|
It removes the backslashes. How can I retain those backslashes?
The -r option to read prevents backslash processing of the input.
while read -r line
do
echo -n "$line"
done < $INPUT
But if you just want to remove all newlines from the input, the tr command would be better:
tr -d '\n' < $INPUT
Try sed 's/\n//' /path/to/file