I am trying to split huge files each of which will contain around say 30k of lines.
I found it can be done using sed -n 'from_line,to_line p' command but it is working fine if i have line numbers but here in my case i am using two variable and i am getting error for that.
here is script which i am using.
k=1
for i in `ls final*`
do
count=`wc -l $i|awk '{print $1}'`
marker1=1
marker2=30000
no_of_files=$(( count/30000 ))
#echo $no_of_files
no_of_files=$(( no_of_files+1 ))
while [[ no_of_files -ne 0 ]];do
if [[ $marker2 -gt $count ]];then
sed -n '$marker1,$count p' $i > purge$k.txt
else
sed -n '$marker1,$marker2 p' $i > purge$k.txt
marker1=$(( marker2+1 ))
marker2=$(( marker2+30000 ))
fi
no_of_files=$(( no_of_files-1 ))
k=$(( k+1 ))
done
done
I am getting below error while running the script.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$marker2 p is not a recognized function.
sed: $marker1,$count p is not a recognized function.
It doesnt work probably because you use variables in ''
try to change sed commands as follow
sed -n "$marker1,$count p"
or better is this
sed -n '/'$marker1'/,/'$count'/p'
Some small changes.
Use double quote in sed. Do not use old back tics, use parentheses.
Change k=$(( k+1 )) to ((k++)).
k=1
for i in $(ls final*)
do
count=$(wc -l <$i)
marker1=1
marker2=30000
no_of_files=$(( count/30000 ))
#echo $no_of_files
(( no_of_files++ ))
while [[ no_of_files -ne 0 ]];do
if [[ $marker2 -gt $count ]];then
sed -n "$marker1,$count p" $i > purge$k.txt
else
sed -n "$marker1,$marker2 p" $i > purge$k.txt
marker1=$(( marker2+1 ))
marker2=$(( marker2+30000 ))
fi
(( no_of_files-- ))
(( k++ ))
done
done
This wc -l $i|awk '{print $1}' could be used like this:
awk 'END {print NR}' $i
or
wc -l < $i
As others have noted, you have your shell variables inside single quotes so they are not being expanded. But you are using the wrong tool. What you are doing creates N files using N passes. split -l 30000 "$i" will split the file into 30,000 line pieces called xaa, xab, ... You can tell split what to call the xaa files too.
Related
This is a script that searches each line of a file ($1) into another file ($2):
val=$(wc -l < $1)
for ((i = 1; i <= val; i++))
do
line=$(sed '$i!d' $1)
if grep -q "$(echo $line)" $2
then
echo found
fi
done
But it gets stuck in the if grep.
It behaves as if it's not getting the $2.
a script that searches each line of a file ($1) into another file ($2)
No need to write your own script for that. Use grep's -f option:
if grep -qf "$1" "$2"; then
echo found
else
echo not found
fi
Solved, the problem was in how I was passing the line number in sed:
#!/bin/bash
val=$(wc -l < $1)
for ((i = 1; i <= val; i++))
do
line=$(sed "$i!d" $1)
if ! grep -q "$(echo $line)" $2
then
echo $line
fi
done
This works fine, if you do:
./script file1 file2
It gives you the lines of the first file that are missing in the second.
I want to count number of lines from many text files and then store them into a variable to find the lowest number. I am trying to do this in for loop but it stores only the result from last text file in loop.
for txt in home/data/*.txt
do
count_txt=$(cat $txt | wc -l) | bc
done
Thanks
give this one-liner a try:
wc -l /path/*.txt|awk 'NR==1{m=$1}{m=($1*1)<m?($1*1):m}END{print m}'
shopt -s nullglob
FILES=(home/data/*.txt) LOWEST_COUNT='N/A' FILE=''
[[ ${#FILES[#]} -gt 0 ]] && read -r LOWEST_COUNT FILE < <(exec wc -l "${FILES[#]}" | sort -n)
echo "$LOWEST_COUNT | $FILE"
You just need something like this (using GNU awk for ENDFILE):
awk 'ENDFILE{min = (min < FNR ? min : FNR)} END{print min}' home/data/*.txt
Update : According to EdMorton comment, awk is the right tool to use to solve this kind of problem, this approach isn't a final implementation and it fails for some filenames ( like filenames with spaces ),to conclude, awk is way more performant and reliable
If you want to use a for loop, you can do something like this :
#!/bin/bash
MAX="0"
MIN="INIT"
for F in home/data/*.txt
do
NBLINE=$(cat $F | wc -l)
if [[ "$NBLINE" -gt "$MAX" ]] ; then
MAX="$NBLINE"
BIG_FILE="$F"
fi
if [[ "$MIN" == "INIT" ]] ; then
MIN="$NBLINE"
SMA_FILE="$F"
fi
if [[ "$NBLINE" -lt "$MIN" ]] ; then
MIN="$NBLINE"
SMA_FILE="$F"
fi
done
echo "File = $BIG_FILE -- Lines = $MAX"
echo "File = $SMA_FILE -- Lines = $MIN"
exit
I'm trying to set the value of a variable to one line of a file, over and over.
for i in {1..5}
do
THIS = "grep -m $i'[a-z]' newdict2" | tail -1
echo $THIS
done
What's the trick to this black magic?
It's actually easier to run it with sed than tail and grep's -m option:
for i in {1..5}
do
THIS=$(grep -e '[a-z]' newdict2 | sed -ne "${i}p")
echo "$THIS"
done
If you start from 1 to x, other ways to solve it is through line reading in a loop:
while IFS= read -r THIS; do
echo "$THIS"
done < <(grep -e '[a-z]' newdict2)
And through awk:
while IFS= read -r THIS; do
echo "$THIS"
done < (awk '/[a-z]/ && ++i <= 5' newdict2)
Another awk version with different initial value:
while IFS= read -r THIS; do
echo "$THIS"
done < (awk 'BEGIN { i = 2 } /[a-z]/ && i++ <= 5' newdict2)
It's better to find the occurrences once, and loop over them.
grep -m "$i" '[a-z]' newdict |
nl |
while read i THIS; do
echo "$THIS"
done
If you don't need $i for anything inside the loop, remove nl and just read THIS.
Note also the use of double quotes around variable interpolations.
I want convert a column of data in a txt file to a row of a csv file using unix commands.
example:
ApplChk1,
ApplChk2,
v_baseLoanAmountTI,
v_plannedClosingDateField,
downPaymentTI,
this is a column which present in a txt file
I want output as follows in a csv file
ApplChk1,ApplChk2,v_baseLoanAmountTI,v_plannedClosingDateField,downPaymentTI,
Please let me know how to do it.
Thanks in advance
If that's a single column, which you want to convert to row, then there are many possibilities:
tr -d '\n' < filename ; echo # option 1 OR
xargs echo -n < filename ; echo # option 2 (This option however, will shrink spaces & eat quotes) OR
while read x; do echo -n "$x" ; done < filename; echo # option 3
Please let us know, how the input would look like, for multi-line case.
A funny pure bash solution (bash ≥ 4.1):
mapfile -t < file.txt; printf '%s' "${MAPFILE[#]}" $'\n'
Done!
for i in `< file.txt` ; do echo -n $i; done; echo ""
gives the output
ApplChk1,ApplChk2,v_baseLoanAmountTI,v_plannedClosingDateField,downPaymentTI,
To send output to a file:
{ for i in `< file.txt` ; do echo -n $i ; done; echo; } > out.csv
When I run it, this is what happens:
[jenny#jennys:tmp]$ more file.txt
ApplChk1,
ApplChk2,
v_baseLoanAmountTI,
v_plannedClosingDateField,
downPaymentTI,
[jenny#jenny:tmp]$ { for i in `< file.txt` ; do echo -n $i ; done; echo; } > out.csv
[jenny#jenny:tmp]$ more out.csv
ApplChk1,ApplChk2,v_baseLoanAmountTI,v_plannedClosingDateField,downPaymentTI,
perl -pe 's/\n//g' your_file
the above will output to stdout.
if you want to do it in place:
perl -pi -e 's/\n//g' your_file
You could use the Linux command sed to replace line \n breaks by commas , or space :
sed -z 's/\n/,/g' test.txt > test.csv
You could also add the -i option if you want to change file in-place :
sed -i -z 's/\n/,/g' test.txt
Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}
As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done
You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count