Read a file in a Bash script - bash

I have a file in my file system. I want to read that file in bash script. File format is different i want to read only selected values from the file. I don't want to read the whole file as the file is very huge. Below is my file format:
Name=TEST
Add=TEST
LOC=TEST
In the file it will have data like above. From that I want to get only Add date in a variable. Could you please suggest me how I can do this.
As of now i am doing this to read the file:
file="data.txt"
while IFS= read line
do
# display $line or do somthing with $line
echo "$line"
done < "$file"

Use the right tool meant for the job, Awk in this case to speed things up!
dateValue="$(awk -F"=" '$1=="Add"{print $2; exit}' file)"
printf "%s\n" "dateValue"
TEST
The idea is to split input lines by = as the de-limiter. The awk logic works by checking the $1 field which equals to Add and prints the corresponding value associated with it.
The exit part after print is optional. It will quit the processing as soon as the Add string is met. It will help in quick processing if the file is huge as you have indicated.

You could rewrite your loop this way, notice the break after you got your line:
while IFS='=' read -r key value; do
if [[ $value == "Add" ]]; then
# your logic
break
fi
done < "$file"
If your intention is to just get the very first occurrence of "Add=", then you could use grep this way:
value=$(grep -m 1 '^Add=' "$file" | cut -f2 -d=)

Related

Bash script output not replacing source file content

I would be grateful for education on my question. The goal is to use the filename of an image to create alternate text in Markdown image references for multiple instances in a large number of Markdown files. (I realize from an accessibility standpoint this is a far-from-optimal practice to create alternate text - this is a temporary solution.) For example, I would like:
![](media/image-dir/image-file-with-hyphens.png)
to become
![image file with hyphens](media/image-dir/image-file-with-hyphens.png)
Current script:
for file in *.md; do
while read -r line; do
if [[ $line =~ "![]" ]]; then
# CREATE ALTERNATIVE TEXT OUT OF IMAGE FILENAME
# get text after last instance of / in filepath
processLine=`echo $line | grep -oE "[^\/]+$"`
# remove image filetypes
processLine2=`echo $processLine | sed 's/.png)//g'`
processLine3=`echo $processLine2 | sed 's/.jpg)//g'`
# remove numbers at end of filename
processLine4=`echo $processLine3 | sed 's/[0-9+]$//g'`
# remove hyphens in filename
processLine5=`echo $processLine4 | sed 's/-/ /g'`
# PUT ALTERNATIVE TEXT IN IMAGE FILEPATH
# trim ![ off front of original line
assembleLine2=`echo $line | sed 's/!\[//g'`
# string together `![` + filename without hyphens + rest of image filepath
assembleLine3='!['"$processLine5"''"$assembleLine2"''
fi
done < $file > $file.tmp && mv $file.tmp $file
done
As it stands, the file comes out blank.
If I add echo $file before while read -r line, the file maintains its original state, but all image references are as follows:
Text
![](media/image-dir/image-file-with-hyphens.png)
![image file with hyphens](media/image-dir/image-file-with-hyphens.png)
Text
If I remove > $file.tmp && mv $file.tmp $file, the console returns nothing.
I've never encountered this in any other Bash script and, at least from the terms I'm using, am not finding the right help in any resource. If anyone is able to help me understand my errors or point me in the right direction, I would be grateful.
If your aim is to replace
![](media/image-dir/image-file-with-hyphens.png)`
with
![image file with hyphens](media/image-dir/image-file-with-hyphens.png)`,
then you can try this sed
for file in *.md;
do sed -E 's/(\S+\[)(\].\S+.)/\1image file with hyphens\2/' "$file" > file.tmp;
done
The block of code calculates an assembleLine3, but you don't have echo "${assembleLine3}".
Nothing is written to stdout or to a file.
When you are debugging (or trying to ask a minimal question), remove the first while-loop and most of the processing. Testing the following code is easy:
while read -r line; do
if [[ $line =~ "![]" ]]; then
processLine=`echo $line | grep -oE "[^\/]+$"`
fi
done < testfile.md

parse and echo string in a bash while loop

I have a file with this structure:
picture1_123.txt
picture2_456.txt
picture3_789.txt
picture4_012.txt
I wanted to get only the first segment of the file name, that is, picture1 to picture4.
I first used the following code:
cat picture | while read -r line; do cut -f1 -d "_"; echo $line; done
This returns the following output:
picture2
picture3
picture4
picture1_123.txt
This error got corrected when I changed the code to the following:
cat picture | while read line; do s=$(echo $line | cut -f1 -d "_"); echo $s; done
picture1
picture2
picture3
picture4
Why in the first:
The lines are printed in a different order than the original file?
no operation is done on picture1_123.txt and picture1 is not printed?
Thank you!
What Was Wrong
Here's what your old code did:
On the first (and only) iteration of the loop, read line read the first line into line.
The cut command read the entire rest of the file, and wrote the results of extracting only the desired field to stdout. It did not inspect, read, or modify the line variable.
Finally, your echo $line wrote the first line in entirety, with nothing being cut.
Because all input had been consumed by cut, nothing remained for the next read line to consume, so the loop never ran a second time.
How To Do It Right
The simple way to do this is to let read separate out your prefix:
while IFS=_ read -r prefix suffix; do
echo "$prefix"
done <picture
...or to just run nothing but cut, and not use any while read loop at all:
cut -f1 -d_ <picture

Evaluating a log file using a sh script

I have a log file with a lot of lines with the following format:
IP - - [Timestamp Zone] 'Command Weblink Format' - size
I want to write a script.sh that gives me the number of times each website has been clicked.
The command:
awk '{print $7}' server.log | sort -u
should give me a list which puts each unique weblink in a separate line. The command
grep 'Weblink1' server.log | wc -l
should give me the number of times the Weblink1 has been clicked. I want a command that converts each line created by the Awk command above to a variable and then create a loop that runs the grep command on the extracted weblink. I could use
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
done
(source: Read a file line by line assigning the value to a variable) but I don't want to save the output of the Awk script in a .txt file.
My guess would be:
while IFS='' read -r line || [[ -n "$line" ]]; do
grep '$line' server.log | wc -l | ='$variabel' |
echo " $line was clicked $variable times "
done
But I'm not really familiar with connecting commands in a loop, as this is my first time. Would this loop work and how do I connect my loop and the Awk script?
Shell commands in a loop connect the same way they do without a loop, and you aren't very close. But yes, this can be done in a loop if you want the horribly inefficient way for some reason such as a learning experience:
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
n=$(grep -c "$line" server.log)
echo "$line" clicked $n times
done
# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.
# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case
# $line is a URL which cannot contain whitespace and practically
# cannot be a glob. $n is a number and definitely safe.
# grep -c does the count so you don't need wc -l
or more simply
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
echo "$line" clicked $(grep -c "$line" server.log) times
done
However if you just want the correct results, it is much more efficient and somewhat simpler to do it in one pass in awk:
awk '{n[$7]++}
END{for(i in n){
print i,"clicked",n[i],"times"}}' |
sort
# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
END{PROCINFO["sorted_in"]="#ind_str_asc";
for(i in n){
print i,"clicked",n[i],"times"}}'
The associative array n collects the values from the seventh field as keys, and on each line, the value for the extracted key is incremented. Thus, at the end, the keys in n are all the URLs in the file, and the value for each is the number of times it occurred.

Naming awk output in loop

I'm relatively new to the world of shell scripts so hopefully this won't be too difficult. I have a file (dirlist) with a list of directories. I want to
cat 'dirlist' with the path to each file
use a program called samtools to modify the file from dirlist
use awk to subset the samtools output on a variable chr17
write the output to a file that uses the 8th field of the directory, from 'dirlist' for naming
do this for all the files listed in dirlist
I think I have all the pieces here. Items 1-3 are working fine but the loop is simply naming the file "echo".
for i in `cat dirlist`; do samtools depth $i | awk '$1 == "chr17" {print $0}' echo $i | awk -F'[/]' '{print $8}'; done
Any help would be greatly appreciated
A native bash implementation (just one process, rather than starting an awk for every file) follows:
while IFS= read -r filename; do
while IFS= read -r line; do
if [[ $line = "chr17"[[:space:]]* ]]; then
IFS=/ read -r -a pieces <<<"$filename"
printf '%s\n' "${pieces[7]}"
fi
done < <(samtools depth "$filename")
done <dirlist
I think that's what you want to do
... | awk -v f="$i" 'BEGIN{split(f,fs,"/")} $1=="chr17" {print > fs[8]}'
the final file name will be generated from the original file name split by "/" and use only the 8th segment. Kind of unusual, perhaps needs some error handling.
not tested, caveat emptor...

How can I delete specific lines using awk/sed based on the contents of another file

How can I delete a specific lines from a file based on line numbers that are contained in another file? I know how to delete specific lines by just providing them on the command line, but I do not know how to delete specific lines based on lines numbers that are contained in another file. The file containing the line numbers is in the following format:
15768
15775
15777
15782
15784
15789
15791
15798
15800
15807
15809
15815
15817
15824
15826
There are 2073 lines total that I need to remove. I've tried searching around for how to do this although I was not able to an example similar to this.
Thanks for your help.
Assuming the line numbers to be deleted are in a file to-be-deleted and the data is in big-data-file, then, using Bash process substitution:
sed -f <(sed 's/$/d/' to-be-deleted) big-data-file > smaller-data-file
The inner sed 's/$/d' command converts the line numbers into sed delete operations. The outer sed commands reads the delete commands and applies the operations to the big data file.
Using awk:
awk 'FNR==NR{a[$0];next} !(FNR in a)' f1 f2
ed is the standard editor.
Here's a possibility to drive ed to do your edit (in place):
#!/bin/bash
ed -s file < <(
while read line; do
[[ $line =~ ^[[:digit:]]+$ ]] || continue
printf "%d d\n" "$line"
done < lines
echo "wq"
)
this will open the file file with ed, read the file lines that contains the line numbers, check that each read line is indeed a number, then give to ed the command to delete that number, and when all is done ask ed to write and quit wq.
You might want to replace the [[ $line =~ ^[[:digit:]]+$ ]] || continue line by:
[[ $line =~ ^[[:digit:]]+$ ]] || { printf >&2 "*** WARNING: Line %d not deleted\n" "$line"; continue; }
so as to be warned when invalid lines are present in the file lines.
Make sure you read glenn jackmann's comment:
I've heard some older implementations of ed do not accept wq as a single command: printf "%s\n" w q
YMMV.

Resources