This question already has answers here:
Looping through the content of a file in Bash
(16 answers)
Closed 7 years ago.
I am trying to grep a series of patterns that are, one per line, in a text file (list.txt), to find how many matches there are of each pattern in the target file (file1). The target file looks like this:
$ cat file1
2346 TGCA
2346 TGCA
7721 GTAC
7721 GTAC
7721 CTAC
And I need counts of each numerical pattern (2346 and 2271).
I have this script that works if you provide a list of the patterns in quotes:
$ for p in '7721' '2346'; do printf '%s = ' "$p"; grep -c "$p" file1; done
7721 = 3
2346 = 2
What I would like to do is search for all patterns in list.txt:
$ cat list.txt
7721
2346
6555
25425
22
125
....
19222
How can I convert my script above to look in the list.txt and search for each pattern, and return the same output as above (pattern = count) e.g.:
2346 = 2
7721 = 3
....
19222 = 6
try this awk oneliner:
awk 'NR==FNR{p[$0];next}$1 in p{p[$1]++}END{for(x in p)print x, p[x]}' list.txt file
Related
I don't have much experience using UNIX and I wonder how to do this:
I have a bash variable with this content:
82 195 9 53
Current file looks like:
A
B
C
D
I want to add a new column to a file with that numbers, like this:
A 82
B 195
C 9
D 53
Hope you can help me. Thanks in advance.
Or simply use paste with a space as the delimiter, e.g. with your example file content in file:
var="82 195 9 53"
paste -d ' ' file <(printf "%s\n" $var)
(note: $var is use unquoted in the process-substitution with printf)
Result
A 82
B 195
C 9
D 53
Note for a general POSIX shell solution, you would simply pipe the output of printf to paste instead of using the bash-only process substitution, e.g.
printf "%s\n" $var | paste -d ' ' file -
With bash and an array:
numbers="82 195 9 53"
array=($numbers)
declare -i c=0 # declare with integer flag
while read -r line; do
echo "$line ${array[$c]}"
c=c+1
done < file
Output:
A 82
B 195
C 9
D 53
One idea using awk:
x='82 195 9 53'
awk -v x="${x}" 'BEGIN { split(x,arr) } { print $0,arr[FNR] }' file.txt
Where:
-v x="${x}" - pass OS variable "${x}" is as awk variable x
split(x,arr) - split awk variable x into array arr[] (default delimiter is space); this will give us arr[1]=82, arr[2]=195, arr[3]=9 and arr[4]=53
This generates:
A 82
B 195
C 9
D 53
The question has been tagged with windows-subsystem-for-linux.
If the input file has Windows/DOS line endings (\r\n) the proposed awk solution may generate incorrect results, eg:
82 # each line
195 # appears
9 # to begin
53 # with a space
In this scenario OP has a couple options:
before calling awk run dos2unix file.txt to convert to unix line engines (\n) or ...
change the awk/BEGIN block to BEGIN { RS="\r\n"; split(x,arr) }
With mapfile aka readarray which is a bash4+ feature.
#!/usr/bin/env bash
##: The variable with numbers
variable='82 195 9 53'
##: Save the the variable into an array named numbers
mapfile -t numbers <<< "${variable// /$'\n'}"
##: Save the content of the file into an array named file_content
mapfile -t file_content < file.txt
##: Loop through the indices of both the arrays and print them side-by-side
for i in "${!file_content[#]}"; do
printf '%s %d\n' "${file_content[i]}" "${numbers[i]}"
done
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a file consisting of 4000 rows, I need to iterate the records of that file over shell script and extract first 10 rows and send that rows to my java code which i already wrote, and then next 10 rows and so on
To pass 10 lines at a time as arguments to your script:
< file xargs -d$'\n' -n 10 myscript
To pipe 10 lines at a time as input to your script:
< file xargs -d$'\n' -n 10 sh -c 'printf "%s\n" "$#" | myscript' {}
Assuming your input is in a file named file which I'm creating with 30 instead of 4000 lines of input:
$ seq 30 > file
and modifying to have some lines that contain spaces, some that contain shell variables, and some that contain regexp and globbing chars to show no type of shell expansion is being done:
$ head -10 file
1
here is a multi-field line
3
4
$HOME
6
.*
8
9
10
Here's 10 args at a time being passed to an awk script:
$ < file xargs -d$'\n' -n 10 awk 'BEGIN{for (i=1; i<ARGC; i++) print i, "<" ARGV[i] ">"; exit} END{print "---"}'
1 <1>
2 <here is a multi-field line>
3 <3>
4 <4>
5 <$HOME>
6 <6>
7 <.*>
8 <8>
9 <9>
10 <10>
---
1 <11>
2 <12>
3 <13>
4 <14>
5 <15>
6 <16>
7 <17>
8 <18>
9 <19>
10 <20>
---
1 <21>
2 <22>
3 <23>
4 <24>
5 <25>
6 <26>
7 <27>
8 <28>
9 <29>
10 <30>
---
and here's 10 lines of input at a time being passed to an awk script:
$ < file xargs -d$'\n' -n 10 sh -c 'printf "%s\n" "$#" | awk '\''{print NR, "<" $0 ">"} END{print "---"}'\''' {}
1 <1>
2 <here is a multi-field line>
3 <3>
4 <4>
5 <$HOME>
6 <6>
7 <.*>
8 <8>
9 <9>
10 <10>
---
1 <11>
2 <12>
3 <13>
4 <14>
5 <15>
6 <16>
7 <17>
8 <18>
9 <19>
10 <20>
---
1 <21>
2 <22>
3 <23>
4 <24>
5 <25>
6 <26>
7 <27>
8 <28>
9 <29>
10 <30>
---
Considering that OP wants to pass lines as an argument to OP's code if that is the case then could you please try following once(haven't tested it by running it since I don't have OP's java code etc).
awk '
FNR%10==0{
system("your_java_code " value OFS $0)
value=""
}
{
value=(value?value OFS:"")$0
}
END{
if(value){
system("your_java_code " value)
}
}
' Input_file
OR
awk '
{
value=(value?value OFS:"")$0
}
FNR%10==0{
system("your_java_code " value)
value=""
}
END{
if(value){
system("your_java_code " value)
}
}
' Input_file
PS: Just for safer side, I kept END section of awk code so that in case there are left over lines(let's say total number of lines are NOT completely divided by 10) then it will call java program with remaining lines to it.
This might work for you (GNU parallel):
parallel -kN10 javaProgram :::: file
This will pass the lines 1-10, 11-20, ... as arguments to program javaProgram
If you want to pass 10 lines at time, use:
parallel -kN10 --cat javaProgram :::: file
Sounds to me like you want to slice out rows from a file, then pipe those rows to java. This interpretation differs from the other answers, so let me know if I'm not understanding you:
$ file=/etc/services
$ count=$(wc -l < "${file}")
$ start=1
$ stride=10
$ for ((i=start; i<=count; i+=stride)); do
awk -v i="${i}" -v stride="${stride}" \
'NR > (i+stride) { exit } NR >= i && NR < (i + stride)' "${file}" \
| java ...
done
file holds the path to the data rows. count is the total count of rows in that file. start is the first row, stride is how many you want to slice out in each iteration.
The for loop then performs the stride addition, while awk slices out the rows so numbered. We pipe them to the java program on standard in.
Assuming that you are passing the 10 lines groups from your file to your script as command line arguments, this is an answer:
rows=4000 # the number of rows in file
groupsize=10 # the size of lines groups
OIFS="$IFS"; IFS=$'\n' # use newline as input field separator to avoid `for` splitting on spaces
groups=$(($rows / $groupsize)) # the number of groups of lines
for i in $(seq 1 $groups); do # loop through each group of lines
from=$((($i * $groupsize) - $groupsize + 1))
to=$(($i * $groupsize))
# build the arguments for each script invocation by concatenating each group of lines
for line in `sed -n -e ${from},${to}p file`; do # 'file' is your input file name
arguments=$arguments \"$line\"
done
echo script $arguments # remove echo and change 'script' with your script name
done
IFS="$OIFS" # restore original input field separator
Like this :
for ((i=0; i<=4000; i+=10)); do
arr=( ) # create a new empty array
for ((j=$i; j<=i+10; j++)); do
arr+=( $j ) # add id to array
done
printf '%s\n' "${arr[#]}" # or execute command with all the id
done
I have a log file with a plenty of collected logs, I already made a grep command with a regex that outputs the number of lines that matches it.
This is the grep command I'm using to output the matched lines:
grep -n -E 'START_REGEX|END_REGEX' Example.log | cut -d ':' -f 1 > ranges.txt
The regex is conditional it can match the begin of a specific log or its end, thus the output is something like:
12
45
128
136
...
The idea is to use this as a source of ranges to make specific cut on the log file from first number to the second and save them on another file.
The ranges are made by couples of the output, according to the example the first range is 12,45 and the second 128,136.
I expect to see in the final file all the text from line 12 to 45 and then from 128 to 136.
The problem I'm facing is that the sed command seems to work with only one range at time.
sed -E -iTMP "$START_RANGE,$END_RANGE! d;$END_RANGEq" $FILE_NAME
Is there any way (maybe with awk) to do that just in one "cycle"?
Constraints: I can only use supported bash command.
You can use an awk statement, too
awk '(NR>=12 && NR<=45) || (NR>=128 && NR<=136)' file
where, NR is a special variable in Awk which keep tracks of the line number as it processes the file.
An example,
seq 1 10 > file
cat file
1
2
3
4
5
6
7
8
9
10
awk '(NR>=1 && NR<=3) || (NR>=8 && NR<=10)' file
1
2
3
8
9
10
You can also avoid, hard-coding the line numbers by using the -v variable option,
awk -v start1=1 -v end1=3 -v start2=8 -v end2=10 '(NR>=start1 && NR<=end1) || (NR>=start2 && NR<=end2)' file
1
2
3
8
9
10
With sed you can do multiple ranges of lines like so:
sed -n '12,45p;128,136p'
This would output lines 12-45, then 128-136.
I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.
To begin, I have written the following code to separate the relevant data into easy chunks:
#!/bin/bash
destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
done
This worked perfectly, so the next step is the worlds simplest cat command:
cat $file3 $file2 > outfile
However, what I need to do is to stitch file2 to the previous file3. Look at this screenshot of the directory for better understanding.
See how these files are all sequential over time:
*_20090412T235945_20090413T235944_* ### April 13
*_20090413T235945_20090414T235944_* ### April 14
So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.
This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.
Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.
"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."
If I understand what you want correctly, it can be done with one line of code:
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.
Example
To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:
$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15
Here is the result of running our command:
$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15
As you can see, the first two lines of each file have been transferred to the preceding file.
How it works
awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.
NR==1 || FNR==16{f=FILENAME ".new"}
When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.
For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.
print>f
This prints the current line to file f.
(If this was a shell script, we would use >>. This is not a shell script. This is awk.)
Using a glob to specify the file names
destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' "$destination"*.ascii
Your task is not that difficult at all. You want to gather a list of all _end files in the directory (using a for loop and globbing, NOT looping on the results of ls). Once you have all the end files, you simply parse the dates using parameter expansion w/substing removal say into d1 and d2 for date1 and date2 in:
stuff_20090413T235945_20090414T235944_end
| d1 | | d2 |
then you simply subtract 1 from d1 into say date0 or d0 and then construct a previous filename out of d0 and d1 using _snip instead of _end. Then just test for the existence of the previous _snip filename, and if it exists, paste your info from the current _end file to the previous _snip file. e.g.
#!/bin/bash
for i in *end; do ## find all _end files
d1="${i#*stuff_}" ## isolate first date in filename
d1="${d1%%T*}"
d2="${i%T*}" ## isolate second date
d2="${d2##*_}"
d0=$((d1 - 1)) ## subtract 1 from first, get snip d1
prev="${i/$d1/$d0}" ## create previous 'snip' filename
prev="${prev/$d2/$d1}"
prev="${prev%end}snip"
if [ -f "$prev" ] ## test that prev snip file exists
then
printf "paste to : %s\n" "$prev"
printf " from : %s\n\n" "$i"
fi
done
Test Input Files
$ ls -1
stuff_20090413T235945_20090414T235944_end
stuff_20090413T235945_20090414T235944_snip
stuff_20090414T235945_20090415T235944_end
stuff_20090414T235945_20090415T235944_snip
stuff_20090415T235945_20090416T235944_end
stuff_20090415T235945_20090416T235944_snip
stuff_20090416T235945_20090417T235944_end
stuff_20090416T235945_20090417T235944_snip
stuff_20090417T235945_20090418T235944_end
stuff_20090417T235945_20090418T235944_snip
stuff_20090418T235945_20090419T235944_end
stuff_20090418T235945_20090419T235944_snip
Example Use/Output
$ bash endsnip.sh
paste to : stuff_20090413T235945_20090414T235944_snip
from : stuff_20090414T235945_20090415T235944_end
paste to : stuff_20090414T235945_20090415T235944_snip
from : stuff_20090415T235945_20090416T235944_end
paste to : stuff_20090415T235945_20090416T235944_snip
from : stuff_20090416T235945_20090417T235944_end
paste to : stuff_20090416T235945_20090417T235944_snip
from : stuff_20090417T235945_20090418T235944_end
paste to : stuff_20090417T235945_20090418T235944_snip
from : stuff_20090418T235945_20090419T235944_end
(of course replace stuff_ with your actual prefix)
Let me know if you have questions.
You could store the previous $file3 value in a variable (and do a check if it is not the first run with -z check):
#!/bin/bash
destination='media/user/directory/'
prev=""
for file1 in $destination*.ascii
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
if [ -z "$prev" ]; then
cat $prev $file2 > outfile
fi
prev=$file3
done
This question already has answers here:
Length of string in bash
(11 answers)
Closed 6 years ago.
Is it even possible? I currently have a one-liner to count the number of words in a file. If I output what I currently have it looks like this:
3 abcdef
3 abcd
3 fec
2 abc
This is all done in 1 line without loops and I was thinking if I could add a column with length of each word in a column. I was thinking I could use wc -m to count the characters, but I don't know if I can do that without a loop?
As seen in the title, no AWK, sed, perl.. Just good old bash.
What I want:
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3
Where the last column is length of each word.
while read -r num word; do
printf '%s %s %s\n' "$num" "$word" "${#word}"
done < file
You can do something like this also:
File
> cat test.txt
3 abcdef
3 abcd
3 fec
2 abc
Bash script
> cat test.txt.sh
#!/bin/bash
while read line; do
items=($line) # split the line
strlen=${#items[1]} # get the 2nd item's length
echo $line $strlen # print the line and the length
done < test.txt
Results
> bash test.txt.sh
3 abcdef 6
3 abcd 4
3 fec 3
2 abc 3