sed command to select certain number of lines from a file - shell

I am trying to split huge files each of which will contain around say 30k of lines.
I found it can be done using sed -n 'from_line,to_line p' command but it is working fine if i have line numbers but here in my case i am using two variable and i am getting error for that.
here is script which i am using.
k=1
for i in `ls final*`
do
count=`wc -l $i|awk '{print $1}'`
marker1=1
marker2=30000
no_of_files=$(( count/30000 ))
#echo $no_of_files
no_of_files=$(( no_of_files+1 ))
while [[ no_of_files -ne 0 ]];do
if [[ $marker2 -gt $count ]];then
sed -n '$marker1,$count p' $i > purge$k.txt
else
sed -n '$marker1,$marker2 p' $i > purge$k.txt
marker1=$(( marker2+1 ))
marker2=$(( marker2+30000 ))
fi
no_of_files=$(( no_of_files-1 ))
k=$(( k+1 ))
done
done
I am getting below error while running the script.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$count p is not a recognized function.

It doesnt work probably because you use variables in ''
try to change sed commands as follow
sed -n "$marker1,$count p"
or better is this
sed -n '/'$marker1'/,/'$count'/p'

Some small changes.
Use double quote in sed. Do not use old back tics, use parentheses.
Change k=$(( k+1 )) to ((k++)).
k=1
for i in $(ls final*)
do
count=$(wc -l <$i)
marker1=1
marker2=30000
no_of_files=$(( count/30000 ))
#echo $no_of_files
(( no_of_files++ ))
while [[ no_of_files -ne 0 ]];do
if [[ $marker2 -gt $count ]];then
sed -n "$marker1,$count p" $i > purge$k.txt
else
sed -n "$marker1,$marker2 p" $i > purge$k.txt
marker1=$(( marker2+1 ))
marker2=$(( marker2+30000 ))
fi
(( no_of_files-- ))
(( k++ ))
done
done
This wc -l $i|awk '{print $1}' could be used like this:
awk 'END {print NR}' $i
or
wc -l < $i

As others have noted, you have your shell variables inside single quotes so they are not being expanded. But you are using the wrong tool. What you are doing creates N files using N passes. split -l 30000 "$i" will split the file into 30,000 line pieces called xaa, xab, ... You can tell split what to call the xaa files too.

Related

Why grep is not getting last argument?

This is a script that searches each line of a file ($1) into another file ($2):
val=$(wc -l < $1)
for ((i = 1; i <= val; i++))
do
line=$(sed '$i!d' $1)
if grep -q "$(echo $line)" $2
then
echo found
fi
done
But it gets stuck in the if grep.
It behaves as if it's not getting the $2.
a script that searches each line of a file ($1) into another file ($2)
No need to write your own script for that. Use grep's -f option:
if grep -qf "$1" "$2"; then
echo found
else
echo not found
fi
Solved, the problem was in how I was passing the line number in sed:
#!/bin/bash
val=$(wc -l < $1)
for ((i = 1; i <= val; i++))
do
line=$(sed "$i!d" $1)
if ! grep -q "$(echo $line)" $2
then
echo $line
fi
done
This works fine, if you do:
./script file1 file2
It gives you the lines of the first file that are missing in the second.

count number of lines from several files and store count from all text files into one variable using for loop

I want to count number of lines from many text files and then store them into a variable to find the lowest number. I am trying to do this in for loop but it stores only the result from last text file in loop.
for txt in home/data/*.txt
do
count_txt=$(cat $txt | wc -l) | bc
done
Thanks
give this one-liner a try:
wc -l /path/*.txt|awk 'NR==1{m=$1}{m=($1*1)<m?($1*1):m}END{print m}'
shopt -s nullglob
FILES=(home/data/*.txt) LOWEST_COUNT='N/A' FILE=''
[[ ${#FILES[#]} -gt 0 ]] && read -r LOWEST_COUNT FILE < <(exec wc -l "${FILES[#]}" | sort -n)
echo "$LOWEST_COUNT | $FILE"
You just need something like this (using GNU awk for ENDFILE):
awk 'ENDFILE{min = (min < FNR ? min : FNR)} END{print min}' home/data/*.txt
Update : According to EdMorton comment, awk is the right tool to use to solve this kind of problem, this approach isn't a final implementation and it fails for some filenames ( like filenames with spaces ),to conclude, awk is way more performant and reliable
If you want to use a for loop, you can do something like this :
#!/bin/bash
MAX="0"
MIN="INIT"
for F in home/data/*.txt
do
NBLINE=$(cat $F | wc -l)
if [[ "$NBLINE" -gt "$MAX" ]] ; then
MAX="$NBLINE"
BIG_FILE="$F"
fi
if [[ "$MIN" == "INIT" ]] ; then
MIN="$NBLINE"
SMA_FILE="$F"
fi
if [[ "$NBLINE" -lt "$MIN" ]] ; then
MIN="$NBLINE"
SMA_FILE="$F"
fi
done
echo "File = $BIG_FILE -- Lines = $MAX"
echo "File = $SMA_FILE -- Lines = $MIN"
exit

setting variables to inlines with variables in bash?

I'm trying to set the value of a variable to one line of a file, over and over.
for i in {1..5}
do
THIS = "grep -m $i'[a-z]' newdict2" | tail -1
echo $THIS
done
What's the trick to this black magic?
It's actually easier to run it with sed than tail and grep's -m option:
for i in {1..5}
do
THIS=$(grep -e '[a-z]' newdict2 | sed -ne "${i}p")
echo "$THIS"
done
If you start from 1 to x, other ways to solve it is through line reading in a loop:
while IFS= read -r THIS; do
echo "$THIS"
done < <(grep -e '[a-z]' newdict2)
And through awk:
while IFS= read -r THIS; do
echo "$THIS"
done < (awk '/[a-z]/ && ++i <= 5' newdict2)
Another awk version with different initial value:
while IFS= read -r THIS; do
echo "$THIS"
done < (awk 'BEGIN { i = 2 } /[a-z]/ && i++ <= 5' newdict2)
It's better to find the occurrences once, and loop over them.
grep -m "$i" '[a-z]' newdict |
nl |
while read i THIS; do
echo "$THIS"
done
If you don't need $i for anything inside the loop, remove nl and just read THIS.
Note also the use of double quotes around variable interpolations.

Transpose one line/lines from column to row using shell

I want convert a column of data in a txt file to a row of a csv file using unix commands.
example:
ApplChk1,
ApplChk2,
v_baseLoanAmountTI,
v_plannedClosingDateField,
downPaymentTI,
this is a column which present in a txt file
I want output as follows in a csv file
ApplChk1,ApplChk2,v_baseLoanAmountTI,v_plannedClosingDateField,downPaymentTI,
Please let me know how to do it.
Thanks in advance
If that's a single column, which you want to convert to row, then there are many possibilities:
tr -d '\n' < filename ; echo # option 1 OR
xargs echo -n < filename ; echo # option 2 (This option however, will shrink spaces & eat quotes) OR
while read x; do echo -n "$x" ; done < filename; echo # option 3
Please let us know, how the input would look like, for multi-line case.
A funny pure bash solution (bash ≥ 4.1):
mapfile -t < file.txt; printf '%s' "${MAPFILE[#]}" $'\n'
Done!
for i in `< file.txt` ; do echo -n $i; done; echo ""
gives the output
ApplChk1,ApplChk2,v_baseLoanAmountTI,v_plannedClosingDateField,downPaymentTI,
To send output to a file:
{ for i in `< file.txt` ; do echo -n $i ; done; echo; } > out.csv
When I run it, this is what happens:
[jenny#jennys:tmp]$ more file.txt
ApplChk1,
ApplChk2,
v_baseLoanAmountTI,
v_plannedClosingDateField,
downPaymentTI,
[jenny#jenny:tmp]$ { for i in `< file.txt` ; do echo -n $i ; done; echo; } > out.csv
[jenny#jenny:tmp]$ more out.csv
ApplChk1,ApplChk2,v_baseLoanAmountTI,v_plannedClosingDateField,downPaymentTI,
perl -pe 's/\n//g' your_file
the above will output to stdout.
if you want to do it in place:
perl -pi -e 's/\n//g' your_file
You could use the Linux command sed to replace line \n breaks by commas , or space :
sed -z 's/\n/,/g' test.txt > test.csv
You could also add the -i option if you want to change file in-place :
sed -i -z 's/\n/,/g' test.txt

count words in a file without using wc

Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}
As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done
You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count

Resources