Unix : Head , Tail , Middle of all files recursively - shell

Requirement :
There might be multiple files within a folder . For Each File , i want to retrive Top10, Bottom 10 and Middle 10 of each file and dump into One File .
Example :
Input Files : APPLE.TXT, ORAGNE.TXT , BANANA.TXT
Output File: Final.TXT which will contain Top 10, Bottom 10 , Middle 10 of each file Above.
Final.Txt will have :
Apple.txt
ABC
CDE
EFG
ORANGE.TXT
DEF
GEH
IJK
etc.
Thanks for you help.

Here are a few pointers to get you started:
Use head to get the first ten lines:
head -10 file
To append the output of the command to a file, use >> e.g. head -10 file >> output
Use tail to get the last ten lines:
tail -10 file
Use sed to get the middle ten lines. You need to work out the line numbers first as shown below:
total=$(wc -l < file)
middle=$((total/2))
start=$((middle-4))
end=$((middle+5))
sed -n ${start},${end}p file
Of course, you should first check that your file has at least ten lines.

Related

Read data from a text file and convert images to PDF referred to it

I have a text file with this content:
123412-01
123413-01
123411-01
123414-01
123415-01
I would like to write a script (Bash/shell/command line or PHP) that will select the 6 first digits for each line and copy the files in an other directory that contains images named like that.
123412-123.jpg
123412-124.jpg
123412-123.jpg
123413-123.jpg
123414-123.jpg
123415-123.jpg
123416-123.jpg
I don't know if I'm clear in my question.
Read the first line and select the 3 first digits (keep it in memory)
Read the first line and now select the 6 first digits (keep it in memory)
Convert the images (in a second directory) that contain the 3 first digits and the nine 6 in folder and subfolder. For Example using the command:
convert /blabla/Images/H(3 first digits)/(6digits)-*.jpg /test/PDF/(6digits)-01.pdf
Read the second line...
I have write this code to try it but it's not working.
cat id.txt
sF= cut -b 1-3 id.txt
F=cut cut -b 1-6 id.txt
while read -r line ;
do
convert /blabla/Images/H$sF/$F.jpg /test/PDF/'$F'-01.pdf
done
I think I'm doing wrong with the variable and the path but I can't find any solution to solve it.
I think you mean this:
#!/bin/bash
while read line; do
first3=${line:0:3}
first6=${line:0:6}
echo convert "/blah/blah/images/H${first3}/${first6}-*.jpg" "/test/PDF/${first6}-01.pdf"
done < file.txt
which gives this:
Sample Output
convert /blah/blah/images/H123/123412-*.jpg /test/PDF/123412-01.pdf
convert /blah/blah/images/H123/123413-*.jpg /test/PDF/123413-01.pdf
convert /blah/blah/images/H123/123411-*.jpg /test/PDF/123411-01.pdf
convert /blah/blah/images/H123/123414-*.jpg /test/PDF/123414-01.pdf
convert /blah/blah/images/H123/123415-*.jpg /test/PDF/123415-01.pdf
bash:
cat file.txt|cut -b 1-6|while read N; do cp "FIRSTDIR/image_$N.jpg" SECONDDIR; done

How to sum up numbers in my file?

I have a folder, my_folder, which contains over 800 files, myfile_* where * is the unique ID for each file. In my file I basically have a variety of repeated fields but the one I am interested in is the <rating> field. Lines of this field look like the following: <rating>n where n is the rating score. These lines occur every 14th line, starting at line 10 (10 + 14i) and ending when the file ends. It is my job to write a script, myscript.sh, to sum up all values of n per file in my folder and then sort from highest to smallest. The output would look as follows
myfile_1234 5112
myfile_5214 2134
myfile_6124 1233
...
where the number suffixes are the sum of n per file. My files vary in length dramatically from as little as 20 fields to as many as 2500. How would I go about doing this? I figure that I will use some form of grep command to find occurences of <rating> and then sum up the numbers following the occurences, or maybe could use the fact that the lines occur every 10 + 14i lines, starting at 10. Thanks for your time any suggestions are much appreciated.
Input File:
<Overall Rating>2.5
<Avg. Price>$155
<URL>
<Author>Jeter5
<Content>I hope we're not disappointed! We enjoyed New Orleans...
<Date>Dec 19, 2008
<No. Reader>-1
<No. Helpful>-1
<rating>4
<Value>-1
<Rooms>3
<Location>5
<Cleanliness>3
<Check in / front desk>5
<Service>5
<Business service>5
<Author>...
repeat fields again...
The script must take the folder name as an argument in the command line, such as ./myscript.sh my_folder
Here's my solution:
#/bin/bash
dir=$1
grep -P -o '(?<=<rating>).*' $dir/* |awk -F: '{A[$1]+=$2;next}END{for(i in A){print i,A[i]}}'|sort -n -k2
Looks like the sort at the end wasn't needed, so you could remove that.
you could use awk and don't care about the starting line
If I well understood, if you type the following command:
grep rating fileName.txt
you'll have something like (I've created a sample input file):
grep "<rating>" myfile_12345
<rating>7
<rating>1
<rating>2
you can use this awk
awk -F"<rating>" 'BEGIN{sum=0}{sum+=$2}END{print sum}' myfile_12345
ouput:
10
then you can use it in a for loop
for file in $(find . -name "myfile_*")
do
printf "%s $file "
awk -F"<rating>" 'BEGIN{sum=0}{sum+=$2}END{printf " %s\t\n", sum}' $file
done
output:
./myfile_12345 10
./myfile_17676 19
./myfile_9898 24
Best Regards
Claudio

Extract line from a file in shell script

I have a text file of 5000000 lines and I want to extract one line from each 1000 and write them into a new text file. The new text file should be of 5000 line.
Can you help me?
I would use a python script to do so. However, the same logic can be used with your shell as well. Here is the python code.
input_file = 'path/file.txt'
output_file = 'path/output.txt'
n = 0
with open(input_file, 'r') as f:
with ope(output_file, 'w') as o:
for line in f:
n += 1
if n == 1000:
o.write(line)
n = 0
Basically, you initialise a counter then you iterate over the file line by line, you increment the counter for each line, if the counter hits 1000 you write the line in the new file and reset the counter back.
Here is how to iterate over lines of a file using Bash shell.
Try:
awk 'NR%1000==1' infile > outfile
see this link for more options: remove odd or even lines from text file in terminal in linux
You can use either head or tail, depends which line you'd like to extract.
To extract first line from each file (for instance *.txt files):
head -n1 *.txt | grep -ve ^= -e ^$ > first.txt
To extract the last line from each file, simply use tail instead of head.
For extracting specific line, see: How do I use Head and Tail to print specific lines of a file.

Shell script copying lines from multiple files

I have multiple files which have the same structure but not the same data. Say their names are values_#####.txt (values_00001.txt, values_00002.txt, etc.).
I want to extract a specific line from each file and copy it in another file. For example, I want to extract the 8th line from values_00001.txt, the 16th line from values_00002.txt, the 24th line from values_00003.txt and so on (increment = 8 each time), and copy them line by line in a new file (say values.dat).
I am new to shell scripting, I tried to use sed, but I didn't figure out how to do that.
Thank you in advance for your answers !
I believe ordering of files is also important to make sure you get output in desired sequence.
Consider this script:
n=8
while read f; do
sed $n'q;d' "$f" >> output.txt
((n+=8))
done < <(printf "%s\n" values_*.txt|sort -t_ -nk2,2)
This can make it:
for var in {1..NUMBER}
do
awk -v line=$var 'NR==8*line' values_${var}.txt >> values.dat
done
Explanation
The for loop is basic.
-v line=$var "gives" the $var value to awk, so it can be used with the variable line.
'NR==8*line' prints the line number 8*{value we are checking}.
values_${var}.txt gets the file values_1.txt, values_2.txt, and so on.
>> values.dat redirects to values.dat file.
Test
I created 3 equal files a1, a2, a3. They contain 30 lines, being each one the line number:
$ cat a1
1
2
3
4
...
Executing the one liner:
$ for var in {1..3}; do awk -v line=$var 'NR==8*line' a${var} >> values.dat; done
$ cat values.dat
8
16
24

Copying part of a large file using command line

I've a text file with 2 million lines. Each line has some transaction information.
e.g.
23848923748, sample text, feild2 , 12/12/2008
etc
What I want to do is create a new file from a certain unique transaction number onwards. So I want to split the file at the line where this number exists.
How can I do this form the command line?
I can find the line by doing this:
cat myfile.txt | grep 23423423423
use sed like this
sed '/23423423423/,$!d' myfile.txt
Just confirm that the unique transaction number cannot appear as a pattern in some other part of the line (especially, before the correctly matching line) in your file.
There is already a 'perl' answer here, so, i'll give one more AWK way :-)
awk '{BEGIN{skip=1} /number/ {skip=0} // {if (skip!=1) print $0}' myfile.txt
On a random file in my tmp directory, this is how I output everything from the line matching popd onwards in a file named tmp.sh:
tail -n+`grep -n popd tmp.sh | cut -f 1 -d:` tmp.sh
tail -n+X matches from that line number onwards; grep -n outputs lineno:filename, and cut extracts just lineno from grep.
So for your case it would be:
tail -n+`grep -n 23423423423 myfile.txt | cut -f 1 -d:` myfile.txt
And it should indeed match from the first occurrence onwards.
It's not a pretty solution, but how about using -A parameter of grep?
Like this:
mc#zolty:/tmp$ cat a
1
2
3
4
5
6
7
mc#zolty:/tmp$ cat a | grep 3 -A1000000
3
4
5
6
7
The only problem I see in this solution is the 1000000 magic number. Probably someone will know the answer without using such a trick.
You can probably get the line number using Grep and then use Tail to print the file from that point into your output file.
Sorry I don't have actual code to show, but hopefully the idea is clear.
I would write a quick Perl script, frankly. It's invaluable for anything like this (relatively simple issues) and as soon as something more complex rears its head (as it will do!) then you'll need the extra power.
Something like:
#!/bin/perl
my $out = 0;
while (<STDIN>) {
if /23423423423/ then $out = 1;
print $_ if $out;
}
and run it using:
$ perl mysplit.pl < input > output
Not tested, I'm afraid.

Resources