Extend parsing command to multiple files - bash

There are a number of files I use which can all be parsed into three files (per input file) with:
third_file_1_1.txt
first_file_1_2.txt....
in an automated process?

You can use the FILENAME builtin variable to know what file you are currently processing. This allows you to do:
awk '/^[01]+$/ {print > "third_"FILENAME; next}/ [...]' $(<list.txt)

Related

How to generate MD5 hash for the first n bytes in sh scripts?

I have the following php code line:
$md5_code = md5(file_get_contents($filePath, FALSE, NULL, 0, n_bytes));
that generates a MD5 hash code of a file's first n_bytes.
I would like to make a similar executable script/program, that gets the file and exports in a text file the MD5 hash code generated from the first n_bytes.
I think I need to mention that the script should work on both Windows and Linux machines.
The script can work something like this:
Move the selected file where the script is
Execute script
Get generated code from the newly made text file
Is this even possible ?
Assuming you have functional Linux utilities on your window box, you can use bash and md5sum. You do not need to create a new file, as 'md5sum' can process data on stdin.
# Modify variable to use parameters/values as needed.
filename=...
n_bytes=100
out_file=md5.out
head -c "$n_bytes" "$filename" | md5sum | awk '{ print $1 }' > "$out_file"

String of a filename used within a bash script

I have a list of files:
name1andlast1.address
name2andlast2.address
name2andlast2.address
...
I want to create a bash script that use each *.address file + use files which names are strings of the *.address filename. The delimiter will be "and" (without the file extension), so I can use name#.list and last#.list
For example, within a bash script that run for name1andlast1.address, I would like to use a set of files (in another directory) called name1.list and last1.list:
grep "something" name1.list > output1
grep "something" last1.list > output2
grep "something" name1andlast1.address > output3
The order of using grep is not important. What is more important is how to use the filename (i.e name1andlast1.address) to input in my bash script name1.list and last1.list. I need to find a way to extract those bnames separate by “and”. I need to find a way to do it iterative over multiple *.address files
to find a way to extract those bnames separate by “and”. I need to
find a way to do it iterative over multiple *.address files
bash solution:
for f in *.address; do
name_l=${f%and*}.list
last_l=${f#*and}
last_l=${last_l//.address/.list}
# ... do futher search/processing with "$name_l" and "$last_l"
done

Replace string with result of command

I have data in zdt format (like this), where I want to perform this python script only on the third column (the pinyin one). I have tried to do this with sed and awk but I have not had any success due to my limited knowledge of these tools. Ideally, I want to feed the column’s contents to the python script and then have the source replaced with the yield of the script.
This is roughly what I envision but the call is not executed, not even when in quotes.
s/([a-z]+[1,2,3,4]?)(?=.*\t)/decode_pinyin(\1)/g
I am not too strict of the tools (sed, awk, python, …) used, I just want a shell script for batch processing of a number of files. It would be best if the original spaces are preserved.
Try something like this:
awk -F'\t' '{printf "decode_pinyin(\"%s\")\n", $3}' file
This outputs:
decode_pinyin("ru4xiang1 sui2su2")
decode_pinyin("ru4")
decode_pinyin("xiang1")
decode_pinyin("sui2")
decode_pinyin("su2")

bash - reading multiple input files and creating matching output files by name and sequence

I do not know much bash scripting, but I know the task I would like to do would be greatly simplified by it. I would like to test a program against expected output using many test input files.
For example, I have files named "input1.txt, input2.txt, input3.text..." and expected output in files "output1.txt, output2.txt, output3.txt...". I would like to run my program with each of the input files and output a corresponding "test1.txt, test2.txt, test3.txt...". Then I would do a "cmp output1.txt test1.txt" for each file.
So I think it would start like this.. roughly..
for i in input*;
do
./myprog.py < "$i" > someoutputthing;
done
One question I have is: how would I match the numbers in the filename? Thanks for your help.
If the input file name pattern is inputX.txt, you need to remove input from the beginning. You do not have to remove the extension, as you want to use the same for output:
output=output${i#input}
See Parameter Expansion in man bash.

method for merging two files, opinion needed

Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.

Resources