I am trying to loop through my file and grab the lines in groups of 2. Every data entry in the file contains a header line and then the following line has the data.
I am trying to: Loop through the file, grab every two lines and manipulate them. My current problem is that I am trying to echo the next line in the loop. So every time I hit a header row, it will print the data line (next line) with it.
out="$(cat $1)" #file
file=${out}
iter=0
for line in $file;
do
if [ $((iter%2)) -eq 0 ];
then
#this will be true when it hits a header
echo $line
# I need to echo the next line here
fi
echo "space"
iter=$((iter+1))
done
Here is an example of a possible input file:
>fc11ba964421kjniwefkniojhsdeddb4_runid=65bedc43sdfsdfsdfsd76b7303_read=42_ch=459_start_time=2017-11-01T21:10:05Z <br>
TGAGCTATTATTATCGGCGACTATCTATCTACGACGACTCTAGCTACGACTATCGACTCGACTACSAGCTACTACGTACCGATC
>fd38df1sd6sdf9867345uh43tr8199_runid=65be1fasdfsdfgdsfg4376b7303_read=60_ch=424_start_time=2017-11-01T21:10:06Z <br>
TGAGCTATTATTATCGGCGACTATCTATCTACGACGACTCTAGCTACGACTATCGACTCGACTACSAGCTACTACGTACCGATC
>1d03jknsdfnjhdsf78sd89ds89cc17d_runid=65bedsdfsdfsdf03_read=24_ch=439_start_time=201711-01T21:09:43Z <br>
TGAGCTATTATTATCGGCGACTATCTATCTACGACGACTCTAGCTACGACTATCGACTCGACTACSAGCTACTACGTACCGATC
header lines start with > and data is the lines containing TGACATC
EDIT:
For those asking about the output, based on the original question, I am trying to access the header and data together. Each header and matching data will be processed 6 times. The end goal is to have each header and data pair:
>fc11ba964421kjniwe (original header)
GATATCTAGCTACTACTAT (original data)
translate to:
>F1_fc11ba964421kjniwe
ASNASDKLNASDHGASKNHDLK
>F2_fc11ba964421kjniwe
ASHGASKNHDLKNASDKLNASD
>F3_fc11ba964421kjniwe
KNHDLKNASDKLNASDASHGAS
>R1_fc11ba964421kjniwe
ASHGLKNASDKLNASDASKNHD
>R2_fc11ba964421kjniwe
AKNASDKLNASDSHGASKNHDL
>R3_fc11ba964421kjniwe
SKNHDLKNASDKASHGALNASD
and then the next header and data entry would generate another 6 lines
If you know your records each consist of exactly 2 lines, use the read command twice on each iteration of the while loop.
while IFS= read -r line1; IFS= read -r line2; do
...
done < "$1"
Your for line in $file notation cannot work; in bash, the text after in is a series of values, not an input file. What you're probably looking for is a while read loop that takes the file as standard input. Something like this:
while read -r header; do
# We should be starting with a header.
if [[ $header != >* ]]; then
echo "ERROR: corrupt header: $header" >&2
break
fi
# read the next line...
read -r data
printf '%s\n' "$data" >> data.out
done < "$file"
I don't know what output you're looking for, so I just made something up. This loop enforces header position with the if statement, and prints data lines to an output file.
Of course, if you don't want this enforcement, you could simply:
grep -v '^>' "$file"
to return lines which are not headers.
Related
I'm a beginner in bash and here is my problem. I have a file just like this one:
Azzzezzzezzzezzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I try in a script to edit this file.ABC letters are unique in all this file and there is only one per line.
I want to replace the first e of each line by a number who can be :
1 in line beginning with an A,
2 in line beginning with a B,
3 in line beginning with a C,
and I'd like to loop this in order to have this type of result
Azzz1zzz5zzz1zzz...
Bzzz2zzz4zzz5zzz...
Czzz3zzz6zzz3zzz...
All the numbers here are random int variables between 0 and 9. I really need to start by replacing 1,2,3 in first exec of my loop, then 5,4,6 then 1,5,3 and so on.
I tried this
sed "0,/e/s/e/$1/;0,/e/s/e/$2/;0,/e/s/e/$3/" /tmp/myfile
But the result was this (because I didn't specify the line)
Azzz1zzz2zzz3zzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I noticed that doing sed -i "/A/ s/$/ezzz/" /tmp/myfile will add ezzz at the end of A line so I tried this
sed -i "/A/ 0,/e/s/e/$1/;/B/ 0,/e/s/e/$2/;/C/ 0,/e/s/e/$3/" /tmp/myfile
but it failed
sed: -e expression #1, char 5: unknown command: `0'
Here I'm lost.
I have in a variable (let's call it number_of_e_per_line) the number of e in either A, B or C line.
Thank you for the time you take for me.
Just apply s command on the line that matches A.
sed '
/^A/{ s/e/$1/; }
/^B/{ s/e/$2/; }
# or shorter
/^C/s/e/$3/
'
s command by default replaces the first occurrence. You can do for example s/s/$1/2 to replace the second occurrence, s/e/$1/g (like "Global") replaces all occurrences.
0,/e/ specifies a range of lines - it filters lines from the first up until a line that matches /e/.
sed is not part of Bash. It is a separate (crude) programming language and is a very standard command. See https://www.grymoire.com/Unix/Sed.html .
Continuing from the comment. sed is a poor choice here unless all your files can only have 3 lines. The reason is sed processes each line and has no way to keep a separate count for the occurrences of 'e'.
Instead, wrapping sed in a script and keeping track of the replacements allows you to handle any file no matter the number of lines. You just loop and handle the lines one at a time, e.g.
#!/bin/bash
[ -z "$1" ] && { ## valiate one argument for filename provided
printf "error: filename argument required.\nusage: %s filename\n" "./$1" >&2
exit 1
}
[ -s "$1" ] || { ## validate file exists and non-empty
printf "error: file not found or empty '%s'.\n" "$1"
exit 1
}
declare -i n=1 ## occurrence counter initialized 1
## loop reading each line
while read -r line || [ -n "$line" ]; do
[[ $line =~ ^.*e.*$ ]] || continue ## line has 'e' or get next
sed "s/e/1/$n" <<< "$line" ## substitute the 'n' occurence of 'e'
((n++)) ## increment counter
done < "$1"
Your data file having "..." at the end of each line suggests your files is larger than the snippet posted. If you have lines beginning 'A' - 'Z', you don't want to have to write 26 separate /match/s/find/replace/ substitutions. And if you have somewhere between 3 and 26 (or more), you don't want to have to rewrite a different sed expression for every new file you are faced with.
That's why I say sed is a poor choice. You really have no way to make the task a generic task with sed. The downside to using a script is it will become a poor choice as the number of records you need to process increase (over 100000 or so just due to efficiency)
Example Use/Output
With the script in replace-e-incremental.sh and your data in file, you would do:
$ bash replace-e-incremental.sh file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
To Modify file In-Place
Since you make multiple calls to sed here, you need to redirect the output of the file to a temporary file and then replace the original by overwriting it with the temp file, e.g.
$ bash replace-e-incremental.sh file > mytempfile && mv -f mytempfile file
$ cat file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:
cat myfile.csv|while read line
do
read -d, col1 col2 < <(echo $line)
echo "I got:$col1|$col2"
done
It's only printing the first column. As an additional test, I tried the following:
read -d, x y < <(echo a,b,)
And $y is empty. So I tried:
read x y < <(echo a b)
And $y is b. Why?
You need to use IFS instead of -d:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
To skip a given number of header lines:
skip_headers=3
while IFS=, read -r col1 col2
do
if ((skip_headers))
then
((skip_headers--))
else
echo "I got:$col1|$col2"
fi
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash, using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
12,22.45,"Hello, ""man"".","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port
asort Sort arrays in-place
basename Return non-directory portion of pathname.
cat cat(1) replacement with no options - the way cat was intended.
csv process one line of csv data and populate an indexed array.
dirname Return directory portion of pathname.
fdflags Change the flag associated with one of bash's open file descriptors.
finfo Print file info.
head Copy first part of files.
hello Obligatory "Hello World" / sample loadable.
...
tee Duplicate standard input.
template Example template for loadable builtin.
truefalse True and false builtins.
tty Return terminal name.
uname Print system information.
unlink Remove a directory entry.
whoami Print out username of current user.
There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
apt install bash-builtins
Using loadable bash-builtins:
Then:
enable -f /usr/lib/bash/csv csv
From there, you could use csv as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[#]}" | cat -n
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
Then in a loop, processing a file.
while IFS= read -r line;do
csv -a aVar "$line"
printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable, this may not work...
Complete sample with multiline CSV fields.
Conforming to RFC 4180, a string like this single CSV row:
12,22.45,"Hello ""man"",
This is a good day, today!","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello "man",
This is a good day, today!
4 A, b.
5 42
Full sample script for parsing CSV containing multilines fields
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash
enable -f /usr/lib/bash/csv csv
file="sample.csv"
exec {FD}<"$file"
read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[#]}"
numcols=${#headline[#]}
while read -ru $FD line;do
while csv -a row "$line" ; (( ${#row[#]} < numcols )) ;do
read -ru $FD sline || break
line+=$'\n'"$sline"
done
printf "$fieldfmt\\n" "${row[#]}"
done
This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')
Id : "1234"
Name : "Cpt1023"
Desc : "Energy\ counter"
Value : "34213"
Id : "2343"
Name : "Sns2123"
Desc : "$'Temperatur sensor\nto trigg for alarm'"
Value : "48.4"
Id : "42"
Name : "Eye1412"
Desc : "$'Solar sensor "Day /\nNight"'"
Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or
csvsample.sh.
Note:
In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Read carefully csv.c source code comments and RFC 4180!
From the man page:
-d delim
The first character of delim is used to terminate the input line,
rather than newline.
You are using -d, which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.
We can parse csv files with quoted strings and delimited by say | with following code
while read -r line
do
field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')
echo "$field1 $field2"
done < "$csvFile"
awk parses the string fields to variables and tr removes the quote.
Slightly slower as awk is executed for each field.
In addition to the answer from #Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:
{
read
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done
} < myfile.csv
If you want to read CSV file with some lines, so this the solution.
while IFS=, read -ra line
do
test $i -eq 1 && ((i=i+1)) && continue
for col_val in ${line[#]}
do
echo -n "$col_val|"
done
echo
done < "$csvFile"
Need while loop that can get each two lines and store in variable.
while read data; do
echo $data
done
so I need to do something for each block of text which is two lines each.
For this input -
some text here
some text here a
some text here 2
some text here 2a
This will merge two lines use while read line.. It's NOT how I'd do it but it does what you said you wanted ...
last=""
while read line; do
if [ "$last" != "" ]; then
echo "$last$line"
last=""
else
last=$line
fi
done
if [ "$last" != "" ]; then
echo "$last"
fi
This great article (How to merge every two lines into one from the command line?) shows lots of different ways of merging 2 lines ..
You can read two lines in the while condition:
while read -r first && read -r second
do
echo "${first} ${second}"
done
It would help to know what you want to do to the two lines, but you can collect each block of 2 surrounded by empty lines easy enough with awk, e.g.
awk '
NF==0 { n=0; next }
n<2 { arr[++n]=$0 }
n==2 { printf "do to: %s & %s\n",arr[1],arr[2]; n=0 }
' file
or as a 1-liner:
awk 'NF==0{n=0;next} n<2{arr[++n]=$0} n==2{printf "do to: %s & %s\n",arr[1],arr[2]; n=0}' file
Where you have 3-rules, the first checks if the line is empty with NF==0, and if so, sets the index n to zero and skips to the next record (line). The second check is n<2 and adds the current line to the array arr. The final rule where n==2 just does whatever you need to the lines contained in arr[1] ane arr[2] and then resets the index n to zero.
Example Input File
Shamelessly borrowed from the other answer and modified (thank you), you could have:
$ cat file
some text here
some text here a
some text here 2
some text here 2a
Example Use/Output
Where each 2-lines separated by whitespace are collected and then output with "do to: " prefixed and the lines joined by " & ", for example purposes only:
$ awk 'NF==0{n=0;next} n<2{arr[++n]=$0} n==2{printf "do to: %s & %s\n",arr[1],arr[2]; n=0}' file
do to: some text here & some text here a
do to: some text here 2 & some text here 2a
Depending on what you need to do to the lines, awk may provide a very efficient solution. (as may sed)
I have this code in Elastix2.5 (CentOS):
for variable in $(while read line; do myarray[ $index]="$line"; index=$(($index+1)); echo "$line"; done < prueba);
This extract the values for each line from "prueba" file.
Prueba file contents passwords like this:
Admin1234
Hello543
Chicken5444
Dino6759
3434Cars4
Adminis5555
But, $variable only get values from lines where there are letters, I need that it get NULL values from blank lines. How can I do it?
Your problem is use of a for loop with a command substitution ($(...)); let's look at this simple example:
$ for v in $(echo 'line_1'; echo ''; echo 'line_3'); do echo "$v"; done
line_1
line_3
Note how the empty string produced by the 2nd echo command is effectively discarded.
Analogously, any empty lines produced by your while loop are discarded.
The solution is to avoid for loops altogether for parsing command output:
In your case, simply use only the while loop for iterating over the input file:
while read -r line; do
myarray[index++]="$line"
done < prueba
printf '%s\n' "${myarray[#]}"
-r was added to ensure that read doesn't modify the input (doesn't try to interpret \-prefixed sequences) - this is good practice in general.
Note how incrementing the index was moved directly into the array subscript (index++).
printf '%s\n' "${myarray[#]}" prints all array elements after the file's been read, demonstrating that empty lines were read as well.
You can use is_null function.
is_null($a)
http://php.net/manual/en/function.is-null.php
I have a text file in which each a first block of text on each line is separated by a tab from a second block of text like so:
VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.
In case it is hard to tell, tab is long space between "quasi-subjunctive" and "Be".
So I am thinking off the top of my head a 'for' loop in which a var is set using 'sed' to read the first block of text of a line, upto and including the tab (or not, doesn't really matter) and then the 'var' is used to find subsequent matches adding a "(x)" right before the tab to make the line unique. The 'x' of course would be a running counter numbering the first instance '1' incrementing and then each subsequent match one number higher.
One problem I see is stopping 'sed' after each subsequent match so the counter can be incremented. Is there a way to do this, since it is "sed's" normal behaviour to continue on thru without stop (as far as I know) until all lines are processed.
You can set the IFS to TAB character and read the line into variables. Something like:
$ while IFS=$'\t' read block1 block2;do
echo "block1 is $block1"
echo "block2 is $block2"
done < file
block1 is VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive
block2 is Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.
Ok so I got the job done with this little (or perhaps big if too much overkill?) script I whipped up:
#!/bin/bash
sedLnCnt=1
while [[ "$sedLnCnt" -lt 521 ]] ; do
lN=$(sed -n "${sedLnCnt} p" sGNoSecNums.html|sed -r 's/^([^\t]*\t).*$/\1/') #; echo "\$lN: $lN"
lnNum=($(grep -n "$lN" sGNoSecNums.html|sed -r 's/^([0-9]+):.*$/\1/')) #; echo "num of matches: ${#lnNum[#]}"
if [[ "${#lnNum[#]}" -gt 1 ]] ; then #'if'
lCnt="${#lnNum[#]}"
((eleN = $lCnt-1)) #; echo "\$eleN: ${eleN}" # var $eleN needs to be 1 less than total line count of zero-based array
while [[ "$lCnt" -gt 0 ]] ; do
sed -ri "${lnNum[$eleN]}s/^([^\t]*)\t/\1 \(${lCnt}\)\t/" sGNoSecNums.html
((lCnt--))
((eleN--))
done
fi
((sedLnCnt++))
done
Grep was the perfect way to find line numbers of matches, jamming them into an array and then editing each line appending the unique identifier.