save stream output as multiple files - bash

I have a program (pull) which downloads files and emits their contents (JSON) to stdout, the input of the program is the id of every document I want to download, like so:
pull one two three
>
> { ...one }
> {
...two
}
> { ...three }
However, I now would like to pipe that output to a different file for each file it has emitted, ideally being able to reference the filename by the order of args initially used: one two three.
So, the outcome I am looking for, would something like the below.
pull one two three | > $1.json
>
> saved one.json
> saved two.json
> saved three.json
Is there any way to achieve this or something similar at all?
Update
I just would like to clarify how the program works and why it may not be ideal looping through arguments and executing the program multiple times for each argument declared.
Whenever pull gets executed, it performs two operations:
A: Expensive operation (timely to resolve): This retrieves all documents available in a database where we can lookup items by the argument names provided when invoking pull.
B: Operation specific to the provided argument: after A resolves, we will use its response in order to get the data needed for specifically retrieving the individual document.
This means that, having A+B called multiple times for every argument, wouldn't be ideal as A is an expensive operation.
So instead of having, AB AB AB AB I would like to have ABBBB.

You're doing it the hard way.
for f in one two three; do pull "$f" > "$f.json" & done
Unless something in the script is not compatible with multiple simultaneous copies, this will make the process faster as well. If it is, just change the & to ;.
Update
Try just always writing the individual files. If you also need to be able to send them to stdout, just cat the file afterwards, or use tee when writing it.
If that's not ok, then you will need to clearly identify and parse the data blocks. For example, if the start of a section is THE ONLY place { appears as the first character on a line, that's a decent sentinel value. Split your output to files using that.
For example, throw this into another script:
awk 'NR==FNR { ndx=1; split($0,fn); name=""; next; } /^{/ { name=fn[ndx++]; } { if (length(name)) print $0 > name".json"; }' <( echo "$#" ) <( pull "$#" )
call that script with one two three and it should do what you want.
Explanation
awk '...' <( echo "$#" ) <( pull "$#" )
This executes two commands and returns their outputs as "files", streams of input for awk to process. The first just puts the list of arguments provided on one line for awk to load into an array. The second executes your pull script with those args, which provides the streaming output you already get.
NR==FNR { ndx=1; split($0,fn); name=""; next; }
This tells awk to initialize a file-controlling index, read the single line from the echo command (the args) and split them into an array of filename bases desired, then skip the rest of processing for that record (it isn't "data", it's metadata, and we're done with it.) We initialize name to an empty string so that we can check for length - otherwise those leading blank lines end up in .json, which probably isn't what you want.
/^{/ { name=fn[ndx++]; }
This tells awk each time it sees { as the very first character on a line, set the output filename base to the current index (which we initialized at 1 above) and increment the index for the next time.
{ if (length(name)) print $0 > name".json"; }
This tells awk to print each line to a file named whatever the current index is pointing at, with ".json" appended. if (length(name)) throws away the leading blank line(s) before the first block of JSON.
The result is that each new set will trigger a new filename from your given arguments.
That work for you?
In Use
$: ls *.json
ls: cannot access '*.json': No such file or directory
$: pull one two three # my script to simulate output
{ ...one... }
{
...two...
}
{ ...three... }
$: splitstream one two three # the above command in a file to receive args
$: grep . one* two* three* # now they exist
one.json:{ ...one... }
two.json:{
two.json: ...two...
two.json:}
three.json:{ ...three... }

Related

Compare two text files line by line, finding differences but ignoring numerical values differences

I'm working on a bash script to compare two similar text files line by line and find the eventual differences between each line of the files, I should point the difference and tell in which line the difference is, but I should ignore the numerical values in this comparison.
Example:
Process is running; process found : 12603 process is listening on port 1200
Process is running; process found : 43023 process is listening on port 1200
In the example above, the script shouldn't find any difference since it's just the process id and it changes all the time.
But otherwise I want it to notify me of the differences between the lines.
Example:
Process is running; process found : 12603 process is listening on port 1200
Process is not running; process found : 43023 process is not listening on port 1200
I already have a working script to find the differences, and i've used the following function to find the difference and ignore the numerical values, but it's not working perfectly, Any suggestions ?
COMPARE_FILES()
{
awk 'NR==FNR{a[FNR]=$0;next}$0!~a[FNR]{print $0}' $1 $2
}
Where $1 and $2 are the two files to compare.
Would you please try the following:
COMPARE_FILES() {
awk '
NR==FNR {a[FNR]=$0; next}
{
b=$0; gsub(/[0-9]+/,"",b)
c=a[FNR]; gsub(/[0-9]+/,"",c)
if (b != c) {printf "< %s\n> %s\n", $0, a[FNR]}
}' "$1" "$2"
}
Any suggestions ?
Jettison digits before making comparison, I would ameloriate your code following way replace
NR==FNR{a[FNR]=$0;next}$0!~a[FNR]{print $0}
using
NR==FNR{a[FNR]=$0;next}gensub(/[[:digit:]]/,"","g",$0)!~gensub(/[[:digit:]]/,"","g",a[FNR]){print $0}
Explanation: I harness gensub string function as it does return new string (gsub alter selected variable value). I replace [:digit:] character using empty string (i.e. delete it) globally.
Using any awk:
compare_files() {
awk '{key=$0; gsub(/[0-9]+(\.[0-9]+)?/,0,key)} NR==FNR{a[FNR]=key; next} key!~a[FNR]' "${#}"
}
The above doesn't just remove the digits, it replaces every set of numbers, whether they're integers like 17 or decimals like 17.31, with the number 0 to avoid false matches.
For example, given input like:
file1: foo 1234 bar
file2: foo bar
If you just remove the digits then those 2 lines incorrectly become identical:
file1: foo bar
file2: foo bar
whereas if you replace all numbers with a 0 then they correctly remain different:
file1: foo 0 bar
file2: foo bar
Note that with the above though we're comparing the lines after converting numbers to 0, we're not modifying the original lines so the output would show the original lines, not the modified ones, for ease of further investigating the differences.

How to search for multiple Substrings in a string

I'm new to bash programming and I've been trying to write a function to search for multiple substrings in a string for log analysis.
For example:
I have a Log-File which contains a string like this:
"01-01-2020 STREETNEW Function Triggered Command 3 processed - New street created."
Now I want to search for 2 substrings in this string.
The first substring I'm looking for is "Command 3" to identify which action was triggered. If "Command 3" is found, I want to search for a second substring "New street created" to check the output of the triggered action.
So far, I wrote a contain function which helps me to find a match and this works fine so far. The problem is, that this function is only able to find a match for one substring.
My function looks like this:
declare -a arrayLog # This Array contains my Log-File line per line
declare -a arrayCodeNumber # Containing the code numbers, e.g. "Command 3"
declare -a arrayActionTriggered # Containing the result of the triggered action, e.g. "New street created"
#[... build arrays etc...]
function contains() {
local n=$#
local value=${!n}
for ((i=1;i < $#;i++)) {
shopt -s nocasematch
[[ "${!i}" =~ "${value^^}" ]] && echo "y;$i" || echo "n;$i"
}
}
#I'm calling the function like this in a for-loop:
contains "${arrayLog[#]}" "${arrayCodeNumber[i]}"
#[... processing function results ...]
My function returns "y;$i" or "n;$i" to indicate if there was a match and in which line of the log file the match was found - i need this output for the processing of the matching results later in my code.
Unfortunately I don't know how to extend or improve my function to search for multiple substrings in a line.
What would I do to extend the function to accept 2 input arrays (for my matching parameter) and 1 log array and also extending the matching process?
Thanks a lot in advance!
Kind regards,
Tobi
Consider this approach
#!/bin/bash
cmd=('Command 2' 'Command 3')
act=('Street destroyed' 'New street created')
for i in ${!cmd[#]}; {
grep -no "${cmd[$i]}.*${act[$i]}" file
}
Usage
$ ./test
2:Command 2 processed - Street destroyed
1:Command 3 processed - New street created
From grep help
$ grep --help
...
-o, --only-matching show only the part of a line matching PATTERN
-n, --line-number print line number with output lines
...

How to make and name multiple text files after using the cut command?

I have about 50 data text files that I need to remove several columns from.
I have been using the cut command to remove and rename them individually but I will have many more of the files and need a way to do it large scale.
Currently I have been using:
cut -f1,6,7,8 filename.txt >> filename_Fixed.txt
And I am able to remove the columns from all the files using:
cut -f1,6,7,8 *.txt
But I'm only able to get all the output in the terminal or I can write it to a single text file.
What I want is to edit several files using cut to remove the required columns:
filename1.txt
filename2.txt
filename3.txt
filename4.txt
.
.
.
And get the edited output to write to individual files:
filename_Fixed1.txt
filename_Fixed2.txt
filename_Fixed3.txt
filename_Fixed4.txt
.
.
.
But haven't been able to find a way to write the output to new text files. I'm new to using the command line and not much of a coder, so maybe I don't know what terms to search for? I haven't even been able to find anything doing google searches that has helped me. It seems like it should be simple, but I am struggling.
In desperation, I did try this bit of code, knowing it wouldn't work:
cut -f1,6,7,8 *.txt >> ( FILENAME ".fixed" )
I found the portion after ">>" nested in an awk command that output multiple files.
I also tried (again knowing it wouldn't work) to wild card the output files but got an ambiguous redirect error.
Did you try for?
for f in *.txt ; do
cut -f 1,6,7,8 "$f" > $(basename "$f" .txt)_fixed.txt
done
(N.B. I can't try the basename now, you can replace it with "${f}_fixed")
You can also process it all in awk itself which would make the process much more efficient, especially for large numbers of files, for example:
awk '
NF < 8 {
print "contains less than 8 fields: ", FILENAME
next
}
{ fn=FILENAME
idx=match(fn, /[0-9]+.*$/)
if (idx == 0) {
print "no numeric suffix for file: ", fn
next;
}
newfn=substr(fn,1,idx-1) "_Fixed" substr(fn,idx)
print $1,$6,$7,$8 > newfn
}
' *.txt
Which contains two rules (the expressions between {...}). The first:
NF < 8 {
print "contains less than 8 fields: ", FILENAME
next
}
simply checks that the file contains at least 8 fields (since you want field 8 as your last field). If the file contains less than 8 fields, it just skips to the next file in your list.
The second rule:
{ fn=FILENAME
idx=match(fn, /[0-9]+.*$/)
if (idx == 0) {
print "no numeric suffix for file: ", fn
next;
}
newfn=substr(fn,1,idx-1) "_Fixed" substr(fn,idx)
print $1,$6,$7,$8 > newfn
}
fn=FILENAME stores the current filename as fn to cut down typing,
idx=match(fn, /[0-9]+.*$/) locates the index where the numeric suffix for the filename begins (e.g. were "3.txt" starts),
if (idx == 0) then a numeric suffix was not found, warn, and move on to the next file,
newfn=substr(fn,1,idx-1) "_Fixed" substr(fn,idx) form the new filename from the non-numeric prefix (e.g. "filename"), add "_Fixed" with string-concatenation and then add the numeric suffix, and finally
print $1,$6,$7,$8 > newfn print fields (columns) 1,6,7,8 redirecting output to the new filename.
For more information on each of the string-functions used above, see the GNU awk User's Guide - 9.1.3 String-Manipulation Functions
If I understand what you were attempting, this should be able to handle as many files as you have -- so long as the files have a numeric suffix to place "_Fixed" before in the filename and each file has at least 8 fields (columns). You can just copy/middle-mouse-paste the entire command at the command-line to test.

Find Replace using Values in another File

I have a directory of files, myFiles/, and a text file values.txt in which one column is a set of values to find, and the second column is the corresponding replace value.
The goal is to replace all instances of find values (first column of values.txt) with the corresponding replace values (second column of values.txt) in all of the files located in myFiles/.
For example...
values.txt:
Hello Goodbye
Happy Sad
Running the command would replace all instances of "Hello" with "Goodbye" in every file in myFiles/, as well as replace every instance of "Happy" with "Sad" in every file in myFiles/.
I've taken as many attempts at using awk/sed and so on as I can think logical, but have failed to produce a command that performs the action desired.
Any guidance is appreciated. Thank you!
Read each line from values.txt
Split that line in 2 words
Use sed for each line to replace 1st word with 2st word in all files in myFiles/ directory
Note: I've used bash parameter expansion to split the line (${line% *} etc) , assuming values.txt is space separated 2 columnar file. If it's not the case, you may use awk or cut to split the line.
while read -r line;do
sed -i "s/${line#* }/${line% *}/g" myFiles/* # '-i' edits files in place and 'g' replaces all occurrences of patterns
done < values.txt
You can do what you want with awk.
#! /usr/bin/awk -f
# snarf in first file, values.txt
FNR == NR {
subs[$1] = $2
next
}
# apply replacements to subsequent files
{
for( old in subs ) {
while( index(old, $0) ) {
start = index(old, $0)
len = length(old)
$0 = substr($0, start, len) subs[old] substr($0, start + len)
}
}
print
}
When you invoke it, put values.txt as the first file to be processed.
Option One:
create a python script
with open('filename', 'r') as infile, etc., read in the values.txt file into a python dict with 'from' as key, and 'to' as value. close the infile.
use shutil to read in directory wanted, iterate over files, for each, do popen 'sed 's/from/to/g'" or read in each file interating over all the lines, each line you find/replace.
Option Two:
bash script
read in a from/to pair
invoke
perl -p -i -e 's/from/to/g' dirname/*.txt
done
second is probably easier to write but less exception handling.
It's called 'Perl PIE' and it's a relatively famous hack for doing find/replace in lots of files at once.

Adding file information to an AWK comparison

I'm using awk to perform a file comparison against a file listing in found.txt
while read line; do
awk 'FNR==NR{a[$1]++;next}$1 in a' $line compare.txt >> $CHECKFILE
done < found.txt
found.txt contains full path information to a number of files that may contain the data. While I am able to determine that data exists in both files and output that data to $CHECKFILE, I wanted to be able to put the line from found.txt (the filename) where the line was found.
In other words I end up with something like:
File " /xxxx/yyy/zzz/data.txt "contains the following lines in found.txt $line
just not sure how to get the /xxxx/yyy/zzz/data.txt information into the stream.
Appended for clarification:
The file found.txt contains the full path information to several files on the system
/path/to/data/directory1/file.txt
/path/to/data/directory2/file2.txt
/path/to/data/directory3/file3.txt
each of the files has a list of parameters that need to be checked for existence before appending additional information to them later in the script.
so for example, file.txt contains the following fields
parameter1 = true
parameter2 = false
...
parameter35 = true
the compare.txt file contains a number of parameters as well.
So if parameter35 (or any other parameter) shows up in one of the three files I get it's output dropped to the Checkfile.
Both of the scripts (yours and the one I posted) will give me that output but I would also like to echo in the line that is being read at that point in the loop. Sounds like I would just be able to somehow pipe it in, but my awk expertise is limited.
It's not really clear what you want but try this (no shell loop required):
awk '
ARGIND==1 { ARGV[ARGC] = $0; ARGC++; next }
ARGIND==2 { keys[$1]; next }
$1 in keys { print FILENAME, $1 }
' found.txt compare.txt > "$CHECKFILE"
ARGIND is gawk-specific, if you don't have it add FNR==1{ARGIND++}.
Pass the name into awk inside a variable like this:
awk -v file="$line" '{... print "File: " file }'

Resources