I have a tab separated file:
c1 1000000
c2 2000000
c3 1000000
I would like to loop through each line of that file and save the second column in a variable to then loop through increments of that number and generate a specific new file out of it.
out=""
while read i; do
length=$(echo $i | cut -d$'\t' -f2) #How to use $i here?
c=$(echo $i | cut -d$'\t' -f1)
start=1
end=10000
for (( i = 0; i < $(expr $length / 500); i++ )); do
start=$(expr $start + $i \* 500)
end=$(expr $end + $i \* 500)
echo $c $start $end >> out
done
done <file
Of course, I am always happy to learn about how inefficient my code may be and how I can improve it.
Thanks for your input!
The problem isn't specific to loops -- it's specific to unquoted echos. As described in BashPitfalls #14, echo $i string-splits and glob-expands the contents of $i before passing them to echo.
Part of string-splitting is that the content are split into words, and the words are passed as separate parameters -- so what it actually runs is echo "c1" "1000000", which doesn't put a tab between the two values, so your cut command can't find a tab to cut on.
The Right Way to fix this is to not use cut at all:
while IFS=$'\t' read -r c length; do
Related
I have an input file, over 1,000,000 lines long which looks something like this:
G A 0|0:2,0:2:3:0,3,32
G A 0|1:2,0:2:3:0,3,32
G C 1|1:0,1:1:3:32,3,0
C G 1|1:0,1:1:3:32,3,0
A G 1|0:0,1:1:3:39,3,0
For my purposes, everything after the first : in the third field is irrelevant (but I left it in as it'll affect the code).
The first field defines the values coded as 0 in the third, and the second field defines the values coded as 1
So, for example:
G A 0|0 = G|G
G A 1|0 = A|G
G A 1|1 = A|A
etc.
I first need to decode the third field, and then convert it from a vertical list to a horizontal list of values, with the values before the | on one line, and the values after on a second line.
So the example at the top would look like this:
HAP0 GGCGG
HAP1 GACGA
I've been working in bash, but any other suggestions are welcome. I have a script which does the job - but it's incredibly slow and long-winded and I'm sure there's a better way.
echo "HAP0 " > output.txt
echo "HAP1 " >> output.txt
while IFS=$'\t' read -a array; do
ref=${array[0]}
alt=${array[1]}
data=${array[2]}
IFS=$':' read -a code <<< $data
IFS=$'|' read -a hap <<< ${code[0]}
if [[ "${hap[0]}" -eq 0 ]]; then
sed -i "1s/$/${ref}/" output.txt
elif [[ "${hap[0]}" -eq 1 ]]; then
sed -i "1s/$/${alt}/" output.txt
fi
if [[ "${hap[1]}" -eq 0 ]]; then
sed -i "2s/$/${ref}/" output.txt
elif [[ "${hap[1]}" -eq 1 ]]; then
sed -i "2s/$/${alt}/" output.txt
fi
done < input.txt
Suggestions?
Instead of running sed in a subshell, use parameter expansion.
#!/bin/bash
printf '%s ' HAP0 > tmp0
printf '%s ' HAP1 > tmp1
while read -a cols ; do
indexes=${cols[2]}
indexes=${indexes%%:*}
idx0=${indexes%|*}
idx1=${indexes#*|}
printf '%s' ${cols[idx0]} >> tmp0
printf '%s' ${cols[idx1]} >> tmp1
done < "$1"
cat tmp0
printf '\n'
cat tmp1
printf '\n'
rm tmp0 tmp1
The script creates two temporaty files, one contains the first line, the second file the second line.
Or, use Perl for even faster solution:
#!/usr/bin/perl
use warnings;
use strict;
my #haps;
while (<>) {
my #cols = split /[\s|:]+/, $_, 5;
$haps[$_] .= $cols[ $cols[ $_ + 2 ] ] for 0, 1;
}
print "HAP$_ $haps[$_]\n" for 0, 1;
I'm not use to the syntax of bash script. I'm trying to read a file. For each line I want to keep only the part of the string before the delimiter '/' and put it back into a new file if the word respect a perticular length. I've download a dictionary, but the format does not meet my expectation. Since there is 84000 words, I don't really want to manualy remove what after the '/' for each word. I though it would be an easy thing and I follow couple of idea in other similar question on this site, but it seem that I'm missing something somewhere because it still doesn't work. I can't get the length right. The file Test_Input contains one word per line. Here's the code:
#!/usr/bin/bash
filename="Test_Input.txt"
while read -r line
do
sub= echo $line | cut -d '/' -f1
length= echo ${#sub}
if $length >= 4 && $length <= 10;
then echo $sub >> Test_Output.txt
fi
done < "$filename"
Several items:
I assume that you have been using single back-quotes in the assignments, and not literally sub= echo $line | cut -d '/' -f1, as this would have certainly failed. Alternatively, you can also use sub=$(), as in $(echo $line | cut -d '/' -f1)
The conditions in an if clause need to be encompassed by single or double [], like this: if [[ $length -ge 4 ]] && [[ $length -le 10 ]];
Which brings me to the next point: <= doesn't reliably work in bash. Just use -ge for "greater or equal" and -le for "less or equal".
If your line does not contain any / characters, in your version sub will contain the whole line. This might not be what you want, so I'd advise to also add the -s flag to cut.
You don't need somevar=$(echo $someothervar). Just use somevar=$someothervar
Here's a version that works:
#!/usr/bin/env bash
filename="Test_Input.txt"
while read -r line
do
sub=$(echo $line | cut -s -d '/' -f 1)
length=${#sub}
if [[ $length -ge 4 ]] && [[ $length -le 10 ]];
then echo $sub >> Test_Output.txt
fi
done < "$filename"
Of course, you could also just use sed:
sed -n -r '/^[^/]{4,10}\// s;/.*$;;p' Test_Input.txt > Test_Output.txt
Explanation:
-n Don't print anything unless explicitly marked for printing.
-r Use the extended regex
/<searchterm>/ <operation> Search for lines that match a certain criteria, and perform this operation:
Searchterm is: ^[^/]{4,10}\/ From the beginning of the line, there should be between 4 and 10 non-slash characters, followed by the slash
Operation is: s;/.*$;;p replace everything between the first slash and the end of the line with nothing, then print.
awk is the best tool for this
awk -F/ 'length($1) >= 4 && length($1) <= 10 {print $1} > newfile
I have a file text titled test.txt containing
1 2 3 4
2 3 4
1 12 2 4 5 66
I would like to read it line by line and for each line I would like to extract the elements.
I tried with
while read lines;
do
echo $lines
done < test.txt
It prints correctly the lines of the matrix saved in the txt file but now I don't know how extract the single elements from the variable lines...
I would like to do something like this
while read lines;
do
((for i = 0; i<=numberofelementline; i++))
do
element = ....
echo $element
done
done < test.txt
read -a line <<< $a
for i in ${!line[#]}; do echo ${rline[i]}; done
We read a file / line and store as an array. Next for each element of the array we run a for loop. This way you know the element and it's position.
You could replace all spaces to new lines and read them again with another while loop:
while read LINE; do
echo "$LINE" | tr -s ' ' '\n' | while read NUM; do
echo $NUM
done
done < test.txt
If you just want to echo them:
cat test.txt | tr ' ' '\n'
If you want to use the numbers:
while read num; do
echo $num
...
done < <( cat test.txt | tr ' ' '\n' )
I have the following at the moment:
for file in *
do
list="$list""$file "`cat $file | wc -l | sort -k1`$'\n'
done
echo "$list"
This is printing:
fileA 10
fileB 20
fileC 30
I would then like to cycle through $list and cut column 2 and perform calculations.
When I do:
for line in "$list"
do
noOfLinesInFile=`echo "$line" | cut -d\ -f2`
echo "$noOfLinesInFile"
done
It prints:
10
20
30
BUT, the for loop is only being entered once. In this example, it should be entering the loop 3 times.
Can someone please tell me what I should do here to achieve this?
If you quote the variable
for line in "$list"
there is only one word, so the loop is executed just once.
Without quotes, $line would be populated with any word found in the $list, which is not what you want, either, as it would process the values one by one, not lines.
You can set the $IFS variable to newline to split $list on newlines:
IFS=$'\n'
for line in $list ; do
...
done
Don't forget to reset IFS to the original value - either put the whole part into a subshell (if no variables should survive the loop)
(
IFS=$'\n'
for ...
)
or backup the value:
IFS_=$IFS
IFS=$'\n'
for ...
IFS=$IFS_
...
done
This is because list in shell are just defined using space as a separator.
# list="a b c"
# for i in $list; do echo $i; done
a
b
c
# for i in "$list"; do echo $i; done
a b c
in your first loop, you actually are not building a list in shell sens.
You should setting other than default separators either for the loop, in the append, or in the cut...
Use arrays instead:
#!/bin/bash
files=()
linecounts=()
for file in *; do
files+=("$file")
linecounts+=("$(wc -l < "$file")")
done
for i in "${!files[#]}" ;do
echo "${linecounts[i]}"
printf '%s %s\n' "${files[i]}" "${linecounts[i]}" ## Another form.
done
Although it can be done simpler as printf '%s\n' "${linecounts[#]}".
wc -l will only output one value, so you don't need to sort it:
for file in *; do
list+="$file "$( wc -l < "$file" )$'\n'
done
echo "$list"
Then, you can use a while loop to read the list line-by-line:
while read file nlines; do
echo $nlines
done <<< "$list"
That while loop is fragile if any filename has spaces. This is a bit more robust:
while read -a words; do
echo ${words[-1]}
done <<< "$list"
Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}
As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done
You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count