Shell script copying lines from multiple files - bash

I have multiple files which have the same structure but not the same data. Say their names are values_#####.txt (values_00001.txt, values_00002.txt, etc.).
I want to extract a specific line from each file and copy it in another file. For example, I want to extract the 8th line from values_00001.txt, the 16th line from values_00002.txt, the 24th line from values_00003.txt and so on (increment = 8 each time), and copy them line by line in a new file (say values.dat).
I am new to shell scripting, I tried to use sed, but I didn't figure out how to do that.
Thank you in advance for your answers !

I believe ordering of files is also important to make sure you get output in desired sequence.
Consider this script:
n=8
while read f; do
sed $n'q;d' "$f" >> output.txt
((n+=8))
done < <(printf "%s\n" values_*.txt|sort -t_ -nk2,2)

This can make it:
for var in {1..NUMBER}
do
awk -v line=$var 'NR==8*line' values_${var}.txt >> values.dat
done
Explanation
The for loop is basic.
-v line=$var "gives" the $var value to awk, so it can be used with the variable line.
'NR==8*line' prints the line number 8*{value we are checking}.
values_${var}.txt gets the file values_1.txt, values_2.txt, and so on.
>> values.dat redirects to values.dat file.
Test
I created 3 equal files a1, a2, a3. They contain 30 lines, being each one the line number:
$ cat a1
1
2
3
4
...
Executing the one liner:
$ for var in {1..3}; do awk -v line=$var 'NR==8*line' a${var} >> values.dat; done
$ cat values.dat
8
16
24

Related

awk to write different columns from different lines into single line of output file?

I am using a while do loop to read in from a file that contains a list of hostnames, run a command against the host list, and input specific data from the results into a second file. I need the output to be from line 33 column 3 and line 224 column 7, output to a single line in the second file. I can do it for either one or the other but having trouble getting it to work for both. example:
while read i; do
/usr/openv/netbackup/bin/admincmd/bpgetconfig -M $i |\
awk -v j=33 -v k=3 'FNR == j {print $k}' > /tmp/clientversion.txt
done < /tmp/clientlist.txt
Any hints or help is greatly appreciated!
You could use something like this:
awk 'NR==33{a=$3}NR==224{print a,$7}'
This saves the value in the third column of line 33 to the variable a, then prints it out along with the seventh column of line 224.
However, you're currently overwriting the file /tmp/clientversion.txt every iteration of the while loop. Assuming you want the file to contain all of the output once the loop has run, you should move the redirection outside the loop:
while read -r i; do
/usr/openv/netbackup/bin/admincmd/bpgetconfig -M $i |\
awk 'NR==33{a=$3}NR==224{print a,$7}'
done < /tmp/clientlist.txt > /tmp/clientversion.txt
As a bonus, I have added the -r switch to read (see related discussion here). Depending on the contents of your input file, you might also want to use double quotes around "$i" as well.

How to display only lines 12-24 of an arbitrary text file?

I have a set of text files and I'd like to display lines 12-14 by running a bash script on each file.
For one of the files, this works:
tail -14 | head -11
But since other files have different lengths, I cannot run the same script on them.
What is the command I'm looking for to output lines 12-24 of the text file?
Use sed with -n argument
sed -n 12,24p <FILENAME>
For a funny pure Bash (≥4) possibility:
mapfile -t -s 11 -n 13 lines < file
printf '%s\n' "${lines[#]}"
This will skip the first 11 lines (with -s 11) and read 13 lines (with -n 13) and store each line in a field of the array lines.
Using awk:
awk '12<= NR && NR <= 24' file
In awk, NR is the line number. The above condition insists that NR be both greater than or equal to 12 and less than or equal to 24. If it is, then the line is printed. Otherwise, it isn't.
A more efficient solution
It would be more efficient to stop reading the file after the upper line limit has been reached. This solution does that:
awk 'NR>24 {exit;} NR>=12' file

Extract line from text file based on leading characters of each line

I have a very large data dump that I need to manipulate. Basically, I receive a text file that has data from multiple tables in it. The first two characters of each line will tell me what table this is from. I need to read each of these lines and then extract them into a TEXT file... It would append each line to the text file. Each table should have it's own text file.
For example, lets say the data file looks like this...
HDxxxxxxxxxxxxx
HDyyyyyyyyyyyyy
ENxxxxxxxxxxxxx
ENyyyyyyyyyyyyy
HSyyyyyyyyyyyyy
What I would need is the first two lines to be in a text file named HD_out.txt, the 3rd and 4th lines in one named EN_out.txt, and the last one in a file named HS_out.txt.
Does anyone know how could this be done with either a simple batch file or UNIX shell script?
Use awk to split file based on first 2 characters:
gawk -v FIELDWIDTHS='2 99999' '{print $2 > $1"_out.txt"}' input.txt
Using bash:
while read -r line; do
echo "${line:2}" >> "${line:0:2}_out.txt"
done < inputFile
${var:startposition:length} is a bash string function to capture sub-strings. This would cause your inputfile to be split based on the first two chars. If you want to include the table prefix, just use echo "$line" >> "${line:0:2}_out.txt" instead of what is shown above.
Demo:
$ ls
file
$ cat file
HDxxxxxxxxxxxxx
HDyyyyyyyyyyyyy
ENxxxxxxxxxxxxx
ENyyyyyyyyyyyyy
HSyyyyyyyyyyyyy
$ while read -r line; do echo "${line:2}" >> "${line:0:2}_out.txt"; done < file
$ ls
EN_out.txt file HD_out.txt HS_out.txt
$ head *.txt
==> EN_out.txt <==
xxxxxxxxxxxxx
yyyyyyyyyyyyy
==> HD_out.txt <==
xxxxxxxxxxxxx
yyyyyyyyyyyyy
==> HS_out.txt <==
yyyyyyyyyyyyy

Cut and paste a line with an exact match using sed

I have a text file (~8 GB). Lets call this file A. File A has about 100,000 lines with 19 words and integers separated by a space. I need to cut several lines from file A and paste them into a new file (file B). The lines should be deleted from file A. The lines to be cut from file A should have an exact matching string.
I then need to repeat this several times, removing lines from file A with a different matching string every time. Each time, file A is getting smaller.
I can do this using "sed" but using two commands, like this:
# Finding lines in file A with matching string and copying those lines to file B
sed -ne '/\<matchingString\>/ p' file A > file B
#Again finding the lines in file A with matching string and deleting those lines,
#writing a tmp file to hold the lines that were not deleted.
sed '/\<matchingString\>/d'file A > tmp
# Replacing file A with the tmp file.
mv tmp file A
Here is an example of files A and B. I want to extract all lines containing hg15
File A:
ID pos frac xp mf ...
23 43210 0.1 2 hg15...
...
...
File B:
23 43210 0.1 2 hg15...
I´m fairly new to writing shell scripts and using all the Unix tools, but I feel I should be able to do this more elegantly and faster. Can anyone please guide me along to improving this script. I don´t specifically need to use "sed". I have been searching the web and stackoverflow without finding a solution to this exact problem. I´m using RedHat and bash.
Thanks.
This might work for you (GNU sed):
sed 's|.*|/\\<&\\>/{w fileB\nd}|' matchingString_file | sed -i.bak -f - fileA
This makes a sed script from the matching strings that writes the matching lines to fileB and deletes them from fileA.
N.B. a backup of fileA is made too.
To make a different file for each exact word match use:
sed 's|.*|/\\<&\\>/{w "&.txt"\nd}|' matchingString_file | sed -i.bak -f - fileA
I'd use grep for this but besides this small improvement this is probably the fastest way to do it already, even if this means to apply the regexp to each line twice:
grep '<matchingString>' A > B
grep -v '<matchingString>' A > tmp
mv tmp A
The next approach would be to read the file line by line, check the line, and write it depending on the check either to B or to tmp. (And mv tmp A again in the end.) But there is no standard Unix tool which does this (AFAIK), and doing it in shell will probably reduce performance massively:
while IFS='' read line
do
if expr "$line" : '<matchingString>' >/dev/null
then
echo "$line" 1>&3
else
echo "$line"
fi > B 3> tmp
done < A
You could try to do this using Python (or similar scripting languages):
import re
with open('B', 'w') as b:
with open('tmp', 'w') as tmp:
with open('A') as a:
for line in a:
if re.match(r'<matchingString>', line):
b.write(line)
else:
tmp.write(line)
os.rename('tmp', 'A')
But this is a little out of scope here (not shell anymore).
Hope this will help you...
cat File A | while read line
do
#Finding lines in file A wit matching string and copying those lines to file B
sed -ne '/\<matchingString\>/ p' file A >> file B
#Again finding the lines in file A with matching string and deleting those lines
#writing a tmp file to hold the lines that were not deleted
sed '/\<matchingString\>/d'file A >> tmp
done
#once you are done with greping and copy pasting Replacing file A with the tmp file
`mv tmp file A`
PS: I'm appending to the file B since we are greping in a loop when the match pattern found.

Using bash and sed

Okay, so I'm not too great at this, but I have a bash script to pick a random number, then use sed to read lines off of files.
It's not working and I must have done something wrong. Could anyone correct my code?
I want the code to pull the line (random number) from each of those files, then output it as a single string (with spaces between).
NUMBER=$[ ( $RANDOM % 100 ) + 1 ]
sed -n NUMBER'p' /Users/user/Desktop/Street.txt
sed -n NUMBER'p' /Users/user/Desktop/City.txt
sed -n NUMBER'p' /Users/user/Desktop/State.txt
sed -n NUMBER'p' /Users/user/Desktop/Zip.txt
You probably need to use $NUMBER in your sed commands, rather than just NUMBER (or ${NUMBER} if other text is directly next to it). Example:
sed -n "${NUMBER}p" /Users/user/Desktop/Street.txt
The following script will use the same randomly chosen number to grab that line from each of the 4 input files you specified and concatenate those lines into a single variable called $outstring.
#!/bin/bash
NUMBER=$(((RANDOM % 100)+1))
for file in Street City State Zip; do
outstring+="$(sed -n "${NUMBER}p" "./${file}.txt") "
done
echo $outstring
Note: If you want (potentially) different line numbers from each of the 4 input files, then simply put the NUMBER= statement inside the for-loop.
This has the advantage of choosing from the whole of each file rather than only the first 100 lines. It will choose a different line from each file.
for f in Street City State Zip
do
printf '%s ' "$(shuf -n 1 "/Users/user/Desktop/$f.txt")"
done
printf '\n'

Resources