I know some people may say the solution is with sed but it didn't work for me
so, the thing is that I read a var with read var then I want to know how to control if that var exists in a column specified by me of my archive, and if it doesnt just keep asking Please enter a valid code, and if its correct just delete that line. thanks
CODE
read var
sed -i '/$var/d' file.txt
And i want to put some short of tester that confirm if u put a valid code or not.
The structure of the file is
code;Name;Surname
There's no spaces or odd bits to parse, so sed needs no single quotes here:
read var
sed -i /"$var"/d file.txt
And a demo -- make a list from 1 to 3, remove 2:
seq 3 > three.txt; var=2; sed -i /"$var"/d three.txt ; cat three.txt
Outputs:
1
3
The following used awk to search and remove lines which first column is $code. If a line is removed then awk will exit successfully and break will be called.
file="input_file"
while :; do
echo "Enter valid code:"
read -r code
[ -n "$code" ] || continue
awk -F';' -v c="$code" '$1 == c {f=1;next}1;END{exit(f?0:1)}' \
"$file" > "$file.out" && break
done
mv "$file.out" "$file"
This will continue to ask for a code in $file until the user enters a valid code at which point $file.out is created and the iteration broken.
Then file.out is renamed to file. Technically $file.out is created each time.
You can have $var or ${var}expanded with
read var
sed -i '/'${var}'/d' file.txt
But what will happen when $var has a space? Nothing good, so use double quotes as well:
read var
sed -i '/'"${var}"'/d' file.txt
I was trying with awk in below file :
$cat test.txt
code;Name;Surname
xyz;n1;s1
abc;dd;ff
xyz;w;t
abc;ft;op
It will print the lines that is going to delete .But I am not able to figure out how to delete the line from awk after printing the info .
$var=xyz | awk -v var="$var" -F ";" '{ if ($1 == var ) print "FOUND " var " And Going to delete the line" NR " " $0 ; }' test.txt
FOUND xyz And Going to delete the line2 xyz;n1;s1
FOUND xyz And Going to delete the line4 xyz;w;t
Below command will display the info and will delete the date . But it will take 2 parsing of the input file .
$var=xyz | awk -v var="$var" -F ";" '{ if ($1 == var ) print "FOUND " var " And Going to delete the line" NR " " $0 ; }' test.txt && awk -v var="$var" -F ";" '($1 != var)' test.txt > _tmp_ && mv _tmp_ test.txt
FOUND xyz And Going to delete the line2 xyz;n1;s1
FOUND xyz And Going to delete the line4 xyz;w;t
New File after deletion :
$cat test.txt
code;Name;Surname
abc;dd;ff
abc;ft;op
Related
I have a list of numbers in a file
cat to_delete.txt
2
3
6
9
11
and many txt files in one folder. Each file has tab delimited lines (can be more lines than this).
3 0.55667 0.66778 0.54321 0.12345
6 0.99999 0.44444 0.55555 0.66666
7 0.33333 0.34567 0.56789 0.34543
I want to remove the lines that the first number ($1 for awk) is in to_delete.txt and print only the lines that the first number is not in to_delete.txt. The change should be replacing the old file.
Expected output
7 0.33333 0.34567 0.56789 0.34543
This is what I got so far, which doesn't remove anything;
for file in *.txt; do awk '$1 != /2|3|6|9|11/' "$file" > "$tmp" && mv "$tmp" "$file"; done
I've looked through so many similar questions here but still cannot make it work. I also tried grep -v -f to_delete.txt and sed -n -i '/$to_delete/!p'
Any help is appreciated. Thanks!
In awk:
$ awk 'NR==FNR{a[$1];next}!($1 in a)' delete file
Output:
7 0.33333 0.34567 0.56789 0.34543
Explained:
$ awk '
NR==FNR { # hash records in delete file to a hash
a[$1]
next
}
!($1 in a) # if $1 not found in record in files after the first, output
' delete files* # mind the file order
My first idea was this:
printf "%s\n" *.txt | xargs -n1 sed -i "$(sed 's!.*!/& /d!' to_delete.txt)"
printf "%s\n" *.txt - outputs the *.txt files each on separate lines
| xargs -n1 execute the following command for each line passing the line content as the input
sed -i - edit file in place
$( ... ) - command substitution
sed 's!.*!/^& /d!' to_delete.txt - for each line in to_delete.txt, append the line with /^ and suffix with /d. That way from the list of numbers I get a list of regexes to delete, like:
/^2 /d
/^3 /d
/^6 /d
and so on. Which tells sed to delete lines matching the regex - line starting with the number followed by a space.
But I think awk would be simpler. You could do:
awk '$1 != 2 && $1 != 3 && $1 != 6 ... and so on ...`
but that would be longish, unreadable. It's easier to read the map from the file and then check if the number is in the array:
awk 'FNR==NR{ map[$1] } FNR!=NR && !($1 in map)' to_delete.txt "$file"
The FNR==NR is true only for the first file. So when we read it, we set the map[$1] (we "set" it, just so such element exists). Then FNR!=NR is true for the second file, for which we check if the first element is the key in the map. If it is not, the expression is true and the line gets printed out.
all together:
for file in *.txt; do awk 'FNR==NR{ map[$1] } FNR!=NR && !($1 in map)' to_delete.txt "$file" > "$tmp"; mv "$tmp" "$file"; done
I have file.txt exemplary here:
This line contains ABC
This line contains DEF
This line contains GHI
and here the following list.txt:
contains ABC<TAB>ABC
contains DEF<TAB>DEF
Now I am writing a script that executes the following commands for each line of this external file list.txt:
take the string from column 1 of list.txt and search in a third file file.txt
if the first command is positive, return the string from column 2 of list.txt
So my output.txt is:
ABC
DEF
This is my code for grep/echo with putting the query/return strings manually:
if grep -i -q 'contains abc' file.txt
then
echo ABC >output.txt
else
echo -n
fi
if grep -i -q 'contains def' file.txt
then
echo DEF >>output.txt
else
echo -n
fi
I have about 100 search terms, which makes the task laborious if done manually. So how do I include while read line; do [commands]; done<list.txt together with the commands about column1 and column2 inside that script?
I would like to use simple grep/echo/awkcommands if possible.
Something like this?
$ awk -F'\t' 'FNR==NR { a[$1] = $2; next } {for (x in a) if (index($0, x)) {print a[x]}} ' list.txt file.txt
ABC
DEF
For the lines of the first file (FNR==NR), read the key-value pairs to array a. Then for the lines of the second line, loop through the array, check if the key is found on the line, and if so, print the stored value. index($0, x) tries to find the contents of x from (the current line) $0. $0 ~ x would instead take x as a regex to match with.
If you want to do it in the shell, starting a separate grep for each and every line of list.txt, something like this:
while IFS=$'\t' read k v ; do
grep -qFe "$k" file.txt && echo "$v"
done < list.txt
read k v reads a line of input and splits it (based on IFS) into k and v.
grep -F takes the pattern as a fixed string, not a regex, and -q prevents it from outputting the matching line. grep returns true if any matching lines are found, so $v is printed if $k is found in file.txt.
Using awk and grep:
for text in `awk '{print $4}' file.txt `
do
grep "contains $text" list.txt |awk -F $'\t' '{print $2}'
done
There are some data files being imported with header names on the first row and others dont have headers. The ones that are with headers are having always "company" as first field on the first row. For loading them into DB I need to get rid of the first row. So I need to write .sh scrict that deletes first row only of those files that have first column first row="company". I guess I need to combine awk with if statement but I dont know exactly how.
if head -n 1 input.csv | cut -f 1 -d ',' | grep company
then tail -n +2 input.csv > output.csv
else
cp input.csv output.csv
fi
If you're sure the string "company" appears only as 1st field on headers, you can go this way
sed -e /^company,/d oldfile > newfile
supposing the separator is a comma.
Another solution :
if [ head -1 oldfile | grep -q "^company,"] ; then
sed -e 1d oldfile > newfile
else
cp oldfile newfile
fi
No if needed. Just do it straight forward as you stated your requirements:
Print the first line unless it starts with company:
strip_header_if_present() {
IFS='' read -r first_line
echo "$first_line" | grep -v ^company,
Now print the remaining lines:
cat
}
To use this shell function:
strip_header_if_present < input.csv > output.csv
I have the following file:
>A6NGG8_201_I_F
line2
>B1AK53_719_S_R
line4
>B1AK53_744_D_N
line5
>B7U540_205_R_H
line6
>B7U540_354_T_M
line7
where I want to print out all odd lines. I can do this by:
$ sed -n 1~2p file
>A6NGG8_201_I_F
>B1AK53_719_S_R
>B1AK53_744_D_N
>B7U540_205_R_H
>B7U540_354_T_M
and so I want to store the number in each line as a variable in bash, however I run into a problem - storing the result of sed puts the output all on one line:
#!/bin/bash
line1=$(sed -n 1~2p)
echo ${line1}
in which the output is:
>A6NGG8_201_I_F >B1AK53_719_S_R >B1AK53_744_D_N >B7U540_205_R_H >B7U540_354_T_M
so that when I do something like:
#!/bin/bash
line1=$(sed -n 1~2p)
pos=$(echo ${line1} | awk -F"[__]" 'NF>2{print $2}')
echo ${pos}
I get
201
where I of course want:
201
719
744
205
354
How do I store the result of sed into separate lines so that they are processed properly when piped into my awk statement? I see you can use the /anotation, however when I tried sed -n '/1~2p/a' filethis does not work in my bash script. Thanks
As said in comments, you need to quote the variable to make this happen:
echo "${line1}"
instead of
echo ${line1}
However, you can directly say:
awk -F_ 'NR%2 && NF>2 {print $2}' file
This will process even lines and, in them, print the 2nd field on _ separated, just if it there are more than 2 fields.
From tripleee's answer I observe that a FASTA file can contain a different format. If so, I guess you will still want to get the ID in the lines starting with ">". This can be translated as:
awk -F_ '/^>/ && NF>2 {print $2}' file
See an example of how quoting preserves the format:
The file:
$ cat a
hello
bye
Read it into a variable:
$ var=$(< a)
echo without quoting:
$ echo $var
hello bye
Let's quote!
$ echo "$var"
hello
bye
If you are trying to get the header lines out of a FASTA file, your problem statement is wrong -- the data between the headers could be more than one line. You could simply do
sed -n '/^>/!d;s/^[^_]*//;s/_.*//p' file.fasta
to get just the second underscore-delimited field out of each header line; or equivalently, in Awk,
awk -F _ '/^>/ { print $2 }' file.fasta
I would like to count the copies of each line in a txt file and I have tried so many things until know, but none worked well. In my case the text has just a word in each line.
This was my last try
echo -n 'enter file for edit: '
read file
for line in $file ; do
echo 'grep -w $line $file'
done; <$file
For example:
input file
a
a
a
c
c
Output file
a 3
c 2
Thanks in advance.
$ sort < $file | uniq -c | awk '{print $2 " " $1}'