Find string from a file to another file in shell script - bash

I am new to shell scripting. Just wanna know how can I obtain the result I wanted with the following:
I have two files (FILE_A and FILE_B)
FILE_A contains:
09228606355,71295939,1,http://sun.net.ph/043xafj.xml,01000001C123000D30
09228505450,71295857,1,http://sun.net.ph/004xafk.xml,01000001C123000D30
FILE_B contains:
http://sun.net.ph/161ybfq.xml ,9220002354016,93111
http://sun.net.ph/004xafk.xml ,9220002354074,93111
If the URL (4th field) in FILE_A is present in FILE_B, the out will be:
09228505450,71295857,1,http://sun.net.ph/004xafk.xml,01000001C123000D30,9220002354074,93111
It will display the whole line in FILE_A and added 2nd and 3rd field of FILE_B.
I hope my question is clear. Thank you.

This might work for you (GNU sed):
sed -r 's/^\s*(\S+)\s*,(.*)/\\#^([^,]*,){3}\1#s#$#,\2#p/' fileB | sed -nrf - fileA
This builds a sed script from fileB and runs it against fileA. The second sed script is run in silent mode and only those lines that match the sed script are printed out.

Try this:
paste -d , A B | awk -F , '{if ($4==$6) print "match", $1,$2,$3,$4,$5,$7,$8;}'
I removed the spaces in your file B for the $4==$6 to work.
I use paste to create a composite line using , as the delimiter to get a line with , . I then use awk comparison to check the URLs from both files and if a match is found I print all the fields you care about.

Related

Remove filler spaces from blank lines in linux script

I am trying to work on a bash script that will take files from one github repo and copy them over to another one.
I have this mostly working however 1 file I am trying to move over has spaces on all of its blank lines like so:
FROM metrics_flags ORDER BY DeliveryDate ASC
)
SELECT * FROM selected;
""";
Notice how its not just a blank line, there are actually 10-20 spaces in between the 2 blocks of code on that blank line.
Is there some unix command that can parse the file and remove the spaces (but keep the blank line)?
I tried
awk 'NF { $1=$1; print }' file.txt
and
sed -e 's/^[ \t]*//' file.txt
with no success.
awk used without changing delimeters splits records (lines) into white-space-separated fields. By default any print commands obey the same separators for the output but any empty fields can be removed resulting in their white-space-separators not being used.
The 'trick' is to get awk to re-evaluate the line by setting any field (even empty ones) to itself:
awk '{$1=$1; print}' test.txt
will remove all white space that is not surrounding other printable characters and return the file contents to stdout where it can be redirected to file if required.
I don't know why you used NF as a pattern in your awk attempt, nor why it caused it to fail, but the similar approach without it, as above, works fine.
edit after a quick experiment, I think what was happening with your awk attempt was that setting the pattern to NF caused awk to skip lines with no printable fields completely. Removing that pattern allows the now empty lines to be printed.
This should do what you describe, replacing leading whitespace only from empty lines:
sed -E 's|^\s+$||' file
The -E (extended regex) is required for \s+ (\t also), meaning one or more whitespace characters. I think you might have accidentally used a lower e.
If you like the output, you can add -i to apply the edit to your file.
This is an example of using awk to achieve the same:
awk '{gsub(/^\s+$/, "")}; { print }' file
To apply it, use -i inplace:
awk -i inplace '{gsub(/^\s+$/, "")}; { print }' file
I tested this on Ubuntu 22.04 with GNU sed 4.8 and GNU awk 5.1.0
Odd ...
sed -i 's/^[[:space:]]*$//g' file.txt
definitely works for me; I don't see why your sed version wouldn't, though.
On MacOS, this works (TESTED):
sed -E -i "" 's/^[[:space:]]*$//g' file.txt

Delete all strings that do not contain any uppercase in Bash

I need to delete from a file all the words that do not contain any uppercase in bash.
I use the sed command but the output is the same as the input:
I tried sed 's/[^0-9]*//' file
Example input:
sjasd
ksaLK
asdn
Asdw
Output
ksaLK
Asdw
Could you please try following.
sed -n '/[A-Z]/p' Input_file
As per #PaulHodges's comment, once you are happy with results use sed -i .... option in above code to make changes in Input_file itself.
To make a file without those:
grep '[A-Z]' infile > outfile
This is a nondestructive way to check first. Then you could replace the old file with the new one.
If you really want to edit the existing file in place:
sed -i '/[A-Z]/!d' infile
This says to delete all lines that do not have a capital letter.

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

Bash script delete a line in the file

I have a file, which has multiple lines.
For example:
a
ab#
ad.
a12fs
b
c
...
I want to use sed or awk delete the line, if the line include symbols or numbers. (For example, I want to delete: ab#, ad., a12fs.... lines)
or in another words, I just want to keep the line which include [a-z][A-Z] .
I know how to delete number line,
sed '/[0-9]/d' file.txt
but I do not know how to delete symbols lines.
Or there has any easy way to do that?
To keep blank lines:
grep '^[[:alpha:]]*$' file
sed '/[^[:alpha:]]/d' file
awk '/^[[:alpha:]]*$/' file
To remove blank lines:
grep '^[[:alpha:]]+$' file
sed -E -n '/^[[:alpha:]]+$/p' file
awk '/^[[:alpha:]]+$/' file
grep works well too and is even simpler: just do the reverse: keep the lines that interest you, which are way easier to define
grep -i '^[a-z]*$' file.txt
(match lines containing only letters and empty lines, and -i option makes grep case-insensitive)
to remove empty lines as well:
grep -i '^[a-z]+$' file.txt
caution when using Windows text files, as there's a carriage return at the end of the line, so nothing would match depending on grep versions (tested on windows here and it works)
but just in case:
grep -iP '^[a-z]*\r?$'
(note the P option to enable perl expressions or \r is not recognized)
You can use this sed:
sed '/^[A-Za-z0-9]\+$/!d' file
(OR)
sed '/[^A-Za-z0-9]/d' file
$ awk '!/[^[:alpha:]]/' file.txt
a
b
c

Shell script - How do I insert data into a separate file at a specific line?

In the following shell script, how do I insert ${today} into a separate existing file index.html at line 4? (Line 1-3 in index.html already has some code. Line 4 is empty. Line 5-EOL has some html code.)
#!/bin/sh
Today=$(date "+%Y.%m.%d-%H.%M.%S")
#insert ${today} into a separate existing file (index.html) in line 4
#<to-do>
I'd use awk for this:
awk 'NR==4 {print strftime("%Y.%m.%d-%H.%M.%S", systime())} 1' file
You can also pass in a variable if you don't want to generate the date string inside awk:
Today=$(date "+%Y.%m.%d-%H.%M.%S")
awk -vtoday=$Today 'NR==4 {print today} 1' file
The sed utility can insert text at specific lines. This might not be the best way to express it; it overwrites anything it finds on line 4.
Today=$(date "+%Y.%m.%d-%H.%M.%S")
sed -i -e "4s/^.*$/$Today/" index.html
The -i argument tells sed to edit in place--it effectively overwrites the input file. I think this option makes sed a better choice than awk for your problem. For testing, remove the -i argument, and it will write to stdout instead.
If you want this to work only if line 4 is a blank line (no whitespace, no characters), use this instead.
Today=$(date "+%Y.%m.%d-%H.%M.%S")
sed -i -e "4s/^$/$Today/" index.html

Resources