I am in a situation similar to this one and having difficulties implementing this kind of solution for my situation.
I have file.tsv formatted as follows:
x y
dog woof
CAT meow
loud_goose honk-honk
duck quack
with a fixed number of columns (but variable rows) and I need to loop those pairs of values, all but the first one, in a script like the following (pseudocode)
for elements in list; do
./script1 elements[1] elements[2]
./script2 elements[1] elements[2]
done
so that script* can take the arguments from the pair and run with it.
Is there a way to do it in Bash?
I was thinking I could do something like this:
list1={`awk 'NR > 1{print $1}' file.tsv`}
list2={`awk 'NR > 1{print $2}' file.tsv`}
and then to call them in the loop based on their position, but I am not sure on how.
Thanks!
Shell tables are not multi-dimensional so table element cannot store two arguments for your scripts. However since you are processing lines from file.tsv, you can iterate on each line, reading both elements at once like this:
#!/usr/bin/env sh
# Populate tab with a tab character
tab="$(printf '\t')"
# Since printf's sub-shell added a trailing newline, remove it
tab="${tab%?}"
{
# Read first line in dummy variable _ to skip header
read -r _
# Iterate reading tab delimited x and y from each line
while IFS="$tab" read -r x y || [ -n "$x" ]; do
./script1 "$x" "$y"
./script2 "$x" "$y"
done
} < file.tsv # from this file
You could try just a while + read loop with the -a flag and IFS.
#!/usr/bin/env bash
while IFS=$' \t' read -ra line; do
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
done < <(tail -n +2 file.tsv)
Or without the tail
#!/usr/bin/env bash
skip=0 start=-1
while IFS=$' \t' read -ra line; do
if ((start++ >= skip)); then
echo ./script1 "${line[0]}" "${line[1]}"
echo ./script2 "${line[0]}" "${line[1]}"
fi
done < file.tsv
Remove the echo's if you're satisfied with the output.
Related
I have a variable foo.
echo "print foo" "$foo" ---> abc,bc,cde
I wanted to put quotes around each variable.
Expected result = 'abc','bc','cde'.
I have tried this way, but its not working:
join_lines() {
local IFS=${1:-,}
set --
while IFS= read -r line; do set -- "$#" "$'line'"; done
echo "$*"
}
Could you please try following, strictly written and tested with shown samples in GNU awk.
Without loop:
var="abc,bc,cde"
echo "$var" | awk -v s1="'" 'BEGIN{FS=",";OFS="\047,\047"} {$1=$1;$0=s1 $0 s1} 1'
With loop usual way to go through all fields(comma separated):
var="abc,bc,cde"
echo "$var" | awk -v s1="'" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){$i=s1 $i s1}} 1'
Output will be 'abc','bc','cde'.
As alternative, using 'sed: replacing every 'with'', and adding ' at the beginning and end of the line to wrap the first/last tokens.
sed -e "s/^/'/" -e "s/$/'/" -e "s/,/','/g"
On surface, the question is on how to convert comma separated list of values (stored in a shell variable) into a comma separate list of quoted tokens. Extending the logic provided by OP, but using shell arrays
foo="abc,bc,cde"
IFS=, read -a items <<< "$foo"
result=
for r in "${items[#]}" ; do
[ "$result" ] && result+=","
result+="'$r'"
done
echo "RESULT=$result"
If needed, logic can be placed into a function/filter
function join_lines {
local -a items
local input result
while IFS=, read -a items ; do
result=
for r in "${items[#]}" ; do
[ "$result" ] && result+=","
result+="'$r'"
done
echo "$result"
done
}
I am trying to parse a huge text file, say 200mb.
the text file contains some strings
123
1234
12345
12345
so my script looked like
while read line ; do
echo "$line"
done <textfile
however using this above method, my string " 12345" gets truncated to "12345"
I tried using
sed -n "$i"p textfile
but the the throughput is reduced from 27 to 0.2 lines per second, which is inacceptable ;-)
any Idea howto solve this?
You want to echo the lines without a fieldsep:
while IFS="" read line; do
echo "$line"
done <<< " 12345"
When you also want to skip interpretation of special characters, use
while IFS="" read -r line; do
echo "$line"
done <<< " 12345"
You can write the IFS without double quotes:
while IFS= read -r line; do
echo "$line"
done <<< " 12345"
This seems to be what you're looking for:
while IFS= read line; do
echo "$line"
done < textfile
The safest method is to use read -r in comparison to just read which will skip interpretation of special characters (thanks Walter A):
while IFS= read -r line; do
echo "$line"
done < textfile
OPTION 1:
#!/bin/bash
# read whole file into array
readarray -t aMyArray < <(cat textfile)
# echo each line of the array
# this will preserve spaces
for i in "${aMyArray[#]}"; do echo "$i"; done
readarray -- read lines from standard input
-t -- omit trailing newline character
aMyArray -- name of array to store file in
< <() -- execute command; redirect stdout into array
cat textfile -- file you want to store in variable
for i in "${aMyArray[#]}" -- for every element in aMyArray
"" -- needed to maintain spaces in elements
${ [#]} -- reference all elements in array
do echo "$i"; -- for every iteration of "$i" echo it
"" -- to maintain variable spaces
$i -- equals each element of the array aMyArray as it cycles through
done -- close for loop
OPTION 2:
In order to accommodate your larger file you could do this to help alleviate the work and speed up the processing.
#!/bin/bash
sSearchFile=textfile
sSearchStrings="1|2|3|space"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")
This will grep the file (faster) before it cycles it through the while command. Let me know how this works for you. Notice you can add multiple search strings to the $sSearchStrings variable.
OPTION 3:
and an all in one solution to have a text file with your search criteria and everything else combined...
#!/bin/bash
# identify file containing search strings
sSearchStrings="searchstrings.file"
while IFS= read -r string; do
# if $sSearchStrings empty read in strings
[[ -z $sSearchStrings ]] && sSearchStrings="${string}"
# if $sSearchStrings not empty read in $sSearchStrings "|" $string
[[ ! -z $sSearchStrings ]] && sSearchStrings="${sSearchStrings}|${string}"
# read search criteria in from file
done <"${sSearchStrings}"
# identify file to be searched
sSearchFile="text.file"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")
I have the following at the moment:
for file in *
do
list="$list""$file "`cat $file | wc -l | sort -k1`$'\n'
done
echo "$list"
This is printing:
fileA 10
fileB 20
fileC 30
I would then like to cycle through $list and cut column 2 and perform calculations.
When I do:
for line in "$list"
do
noOfLinesInFile=`echo "$line" | cut -d\ -f2`
echo "$noOfLinesInFile"
done
It prints:
10
20
30
BUT, the for loop is only being entered once. In this example, it should be entering the loop 3 times.
Can someone please tell me what I should do here to achieve this?
If you quote the variable
for line in "$list"
there is only one word, so the loop is executed just once.
Without quotes, $line would be populated with any word found in the $list, which is not what you want, either, as it would process the values one by one, not lines.
You can set the $IFS variable to newline to split $list on newlines:
IFS=$'\n'
for line in $list ; do
...
done
Don't forget to reset IFS to the original value - either put the whole part into a subshell (if no variables should survive the loop)
(
IFS=$'\n'
for ...
)
or backup the value:
IFS_=$IFS
IFS=$'\n'
for ...
IFS=$IFS_
...
done
This is because list in shell are just defined using space as a separator.
# list="a b c"
# for i in $list; do echo $i; done
a
b
c
# for i in "$list"; do echo $i; done
a b c
in your first loop, you actually are not building a list in shell sens.
You should setting other than default separators either for the loop, in the append, or in the cut...
Use arrays instead:
#!/bin/bash
files=()
linecounts=()
for file in *; do
files+=("$file")
linecounts+=("$(wc -l < "$file")")
done
for i in "${!files[#]}" ;do
echo "${linecounts[i]}"
printf '%s %s\n' "${files[i]}" "${linecounts[i]}" ## Another form.
done
Although it can be done simpler as printf '%s\n' "${linecounts[#]}".
wc -l will only output one value, so you don't need to sort it:
for file in *; do
list+="$file "$( wc -l < "$file" )$'\n'
done
echo "$list"
Then, you can use a while loop to read the list line-by-line:
while read file nlines; do
echo $nlines
done <<< "$list"
That while loop is fragile if any filename has spaces. This is a bit more robust:
while read -a words; do
echo ${words[-1]}
done <<< "$list"
I have a large config file that I use to define variables for a script to pull from it, each defined on a single line. It looks something like this:
var val
foo bar
foo1 bar1
foo2 bar2
I have gathered a list of out of date variables that I want to remove from the list. I could go through it manually, but I would like to do it with a script, which would be at least more stimulating. The file that contains the vlaues may contain multiple instances. The idea is to find the value, and if it's found, remove the entire line.
Does anyone know if this is possible? I know sed does this but I do not know how to make it use a file input.
#!/bin/bash
shopt -s extglob
REMOVE=(foo1 foo2)
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && echo "$LINE"
done < input_file.txt > output_file.txt
Or (Use with a copy first)
#!/bin/bash
shopt -s extglob
FILE=$1 REMOVE=("${#:2}")
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file pattern1 pattern2 ...
Or
#!/bin/bash
shopt -s extglob
FILE=$1 PATTERNS_FILE=$2
readarray -t REMOVE < "$PATTERNS_FILE"
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file patterns_file
Here's one with sed. Add words to the array. Then use
./script target_filename
(assuming you put the following in a file called script). (Not very efficient). I think it might be more efficient if we concat the words and put it in the regex like bbonev did
#!/bin/bash
declare -a array=("foo1" "foo2")
for i in "${array[#]}";
do
sed -i "/^${i}\s.*/d" $1
done
It's actually even simpler using file input
If you have a word file
word1
word2
word3
.....
then the following will do the job
#!/bin/bash
while read i;
do
sed -i "/^${i}\s.*/d" $2
done <$1
usage:
./script wordlist target_file
This question already has answers here:
How to concatenate multiple lines of output to one line?
(12 answers)
Closed 4 years ago.
I have a file csv :
data1,data2,data2
data3,data4,data5
data6,data7,data8
I want to convert it to (Contained in a variable):
variable=data1,data2,data2%0D%0Adata3,data4,data5%0D%0Adata6,data7,data8
My attempt :
data=''
cat csv | while read line
do
data="${data}%0D%0A${line}"
done
echo $data # Fails, since data remains empty (loop emulates a sub-shell and looses data)
Please help..
Simpler to just strip newlines from the file:
tr '\n' '' < yourfile.txt > concatfile.txt
In bash,
data=$(
while read line
do
echo -n "%0D%0A${line}"
done < csv)
In non-bash shells, you can use `...` instead of $(...). Also, echo -n, which suppresses the newline, is unfortunately not completely portable, but again this will work in bash.
Some of these answers are incredibly complicated. How about this.
data="$(xargs printf ',%s' < csv | cut -b 2-)"
or
data="$(tr '\n' ',' < csv | cut -b 2-)"
Too "external utility" for you?
IFS=$'\n', read -d'\0' -a data < csv
Now you have an array! Output it however you like, perhaps with
data="$(tr ' ' , <<<"${data[#]}")"
Still too "external utility?" Well fine,
data="$(printf "${data[0]}" ; printf ',%s' "${data[#]:1:${#data}}")"
Yes, printf can be a builtin. If it isn't but your echo is and it supports -n, use echo -n instead:
data="$(echo -n "${data[0]}" ; for d in "${data[#]:1:${#data[#]}}" ; do echo -n ,"$d" ; done)"
Okay, now I admit that I am getting a bit silly. Andrew's answer is perfectly correct.
I would much prefer a loop:
for line in $(cat file.txt); do echo -n $line; done
Note: This solution requires the input file to have a new line at the end of the file or it will drop the last line.
Another short bash solution
variable=$(
RS=""
while read line; do
printf "%s%s" "$RS" "$line"
RS='%0D%0A'
done < filename
)
awk 'END { print r }
{ r = r ? r OFS $0 : $0 }
' OFS='%0D%0A' infile
With shell:
data=
while IFS= read -r; do
[ -n "$data" ] &&
data=$data%0D%0A$REPLY ||
data=$REPLY
done < infile
printf '%s\n' "$data"
Recent bash versions:
data=
while IFS= read -r; do
[[ -n $data ]] &&
data+=%0D%0A$REPLY ||
data=$REPLY
done < infile
printf '%s\n' "$data"
A very simple single-line solution which requires no extra files as its quite easy to understand (I think, just cat the file together and perform sed-replace):
output=$(echo $(cat ./myFile.txt) | sed 's/ /%0D%0A/g')
Useless use of cat, punished! You want to feed the CSV into the loop
while read line; do
# ...
done < csv