Can someone walk through what this code is doing?
if [ -f "saved.txt" ]; then // What does -f do?
rm saved.txt
fi
in=$(echo "{query}" | tr -d "\\") // How does this work?
// What does | tr -d "\\" mean?
echo "$in" > saved.txt // Is this simply putting the
// value of $in into saved.txt?
The initial if statement will test if the file is a regular file. More on file test operators here.
This script will echo the characters {query} and pipe it to the command tr, which with the -d will delete characters that are specified. tr stands for translate. In this case it takes a SET and per the man page, it will delete backslashes if you use \\.
The result is stored in $in.
Finally, the result stored in in will be outputted to saved.text.
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard output.
-c, -C, --complement
first complement SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that
character
-t, --truncate-set1
first truncate SET1 to length of SET2
--help display this help and exit
--version
output version information and exit
SETs are specified as strings of characters. Most represent themselves. Interpreted sequences are:
\NNN character with octal value NNN (1 to 3 octal digits)
\\ backslash
The first part tests if saved.txt exists before trying to remove it.
The second part copies the contents of query (I'm assuming a typo and that should be ${query}, not {query}) into in, minus any backslashes.
The third part, you are correct; it writes the value of in to the file saved.txt.
Related
The text file is like this,
#एक
1के
अंकगणित8IU
अधोरेखाunderscore
$thatऔर
%redएकyellow
$चिह्न
अंडरस्कोर#_
The desired text file should be like,
#
1
8IU
underscore
$that
%redyellow
$
#_
This is what I have tried so far, using awk
awk -F"[अ-ह]*" '{print $1}' filename.txt
And the output that I am getting is,
#
1
$that
%red
$
and using this awk -F"[अ-ह]*" '{print $1,$2}' filename.txt and I am getting an output like this,
#
1 े
ं
ो
$that
%red yellow
$ ि
ं
Is there anyway to solve this in bash script?
Using perl:
$ perl -CSD -lpe 's/\p{Devanagari}+//g' input.txt
#
1
8IU
underscore
$that
%redyellow
$
#_
-CSD tells perl that standard streams and any opened files are encoded in UTF-8. -p loops over input files printing each line to standard output after executing the script given by -e. If you want to modify the file in place, add the -i option.
The regular expression matches any codepoints assigned to the Devanagari script in the Unicode standard and removes them. Use \P{Devanagari} to do the opposite and remove the non-Devanagari characters.
Using awk you can do:
awk '{sub(/[^\x00-\x7F]+/, "")} 1' file
#
1
8IU
underscore
$that
%redyellow
See documentation: https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html
using [\x00-\x7F].
This matches all values numerically between zero and 127, which is the defined range of the ASCII character set. Use a complemented character list [^\x00-\x7F] to match any single-byte characters that are not in the ASCII range.
tr is a very good fit for this task:
LC_ALL=C tr -c -d '[:cntrl:][:graph:]' < input.txt
It sets the POSIX C locale environment so that only US English character set is valid.
Then instructs tr to -d delete -c complement [:cntrl:][:graph:], control and drawn characters classes (those not control or visible) characters. Since it is sets all the locale setting to C, all non-US-English characters are discarded.
I have a small script that downloads a value from a web page.
Before anyone looses their mind because I am not using an HTML parser, besides the headers, the whole web page only has 3 lines of text between one pair of pre tags. I am just after the number values - that is it.
</head><body><pre>
sym
---
12300
</pre></body></html>
This is the script :
#!/bin/bash
wget -O foocounthtml.txt "http://foopage"
tr -d "\n" foocounthtml.txt > foocountnonewlines.txt
Anyhow the tr command is throwing an error.
tr: extra operand ‘foocounthtml.txt’
Only one string may be given when deleting without squeezing repeats.
Try 'tr --help' for more information.
Yes, I could use sed for inplace modification with the -i tag. However I am perplexed by this tr error. Redirecting tr output works fine from command line, but not in a script.
The 'tr' command operates on SETs of text rather than files. From the man page:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard output.
...
SETs are specified as strings of characters. Most represent themselves. Interpreted sequences are:
So tr is expecting the actual content you want to operate on rather than the target file. You can simply pipe the files contents to tr for the resuts you want
cat foocounthtml.txt | tr -d "\n" > foocountnonewlines.txt
or as #CHarlesDUffy points out, it would be faster to read directly from the file:
tr -d "\n" < foocounthtml.txt > foocountnonewlines.txt
I've stored my data in neckrev_dim.csv file, structured like the following
subjectID,dim3,pixdim3
MR44825,405,0.625
I also have a seperate subjects.csv, just containing all the subjectIDs
MR44825
MR55843
Now I want to use this data in basic arithmetic operations using bash.
subjlist=subjects.csv
for subj in ` cat $subjlist `
do
dim3=$(grep -w '$subj' neckrev_dim.csv | cut -d ',' -f 2)
pixdim3=$(grep -w '$subj' neckrev_dim.csv | cut -d ',' -f 3)
total_length=$(($dim3*$pixdim3))
echo $total_length
done
This leads to the following error:
syntax error: operand expected (error token is "*")
I think the problem lies within the grep, but I can't figure it out.
Thanks in advance!
The main issue is that POSIX arithmetic does not support decimals, only integers.
You will have to use something else, like bc for non-integer arithmetic.
The other issue is that you are single-quoting $subj -- you should use double quotes so the variable gets expanded.
Try the following:
subjlist=subjects.csv
while read -r subj
do
dim3=$(grep -w "$subj" neckrev_dim.csv | cut -d ',' -f 2)
pixdim3=$(grep -w "$subj" neckrev_dim.csv | cut -d ',' -f 3)
echo "$dim3 * $pixdim3" | bc
done < "$subjlist"
Note, here bc is reading from standard input, so we just need to echo the arithmetic expression to bc.
You need to change the single quotes to double quotes around the $subj. Single quotes won't expand the variable.
The solution below is designed to work accurately and more generally for different types of key values and different CSV lines, avoiding some of the limitations and failure modes of the other solutions.
Description of the code
Using single key fields read one per line from file keys.txt, search for the key in the first field in a CSV file generic.csv and do some floating-point (non-integer) math on the numbers in the other fields.
Performance enhancements:
If $key selects a unique row in the file, change XexitX below to exit so that awk doesn't keep reading the rest of the file unnecessarily; otherwise, delete XexitX and it will do all the lines matching that key.
If generic.csv is a large file, then sort it and replace the awk line with the look --binary line. This will replace a linear search with a binary search. Make sure you sort the whole file:
sort -o generic.csv generic.csv
Limitations:
The $key key must not contain backslashes or double quotes in the awk version. This could be fixed using sed -e 's/\\/&&/g' -e 's/"/\\"/g' on the field. The look --binary version doesn't care.
The generic.csv file must use commas only, no "quoted" CSV fields. This means no fields may contain commas.
The look --binary version does key prefix matching on the CSV lines, so you can't have a key that is a prefix of another, e.g. keys ABC and AB aren't distinct. The awk version doesn't have this problem.
Advantages of this over other solutions:
Reads the CSV only once per key, not multiple times.
The $key is matched exactly on the first field and not on any fields that might be added to the rest of the CSV line - no false matches. (The look --binary version does do prefix matching, so you can't have a key that is a prefix of another.)
The key field is a text field, not a regular expression, so it may contain special characters without any need to worry about escaping regular expression metacharacters to avoid errors.
No need to use grep or cut to separate fields; only one pipe, not three.
Can easily scale up to huge CSV files by using look --binary instead of awk.
while read -r key ; do
# SEE NOTES: look --binary "$key" generic.csv \
awk -F, "\$1 == \"$key\" { print ; XexitX }" generic.csv \
| while IFS=, read -r key num1 num2 ; do
echo "$key: $(dc -e "$num1 $num2 * p")"
done
done <keys.txt
(to LaTeX users) I want to search for manually labeled items
(to whom it may concern) script file on GitHub
I tried to find solution, but what I've found suggested to remove spaces first. In my case, I think there should be simpler solution. It could be using grep or awk or some other tool.
Consider the following lines:
\item[a)] some text
\item [i) ] any text
\item[ i)] foo and faa
\item [ 1) ] foo again
I want to find (or count) if there are items with a single ) inside brackets. The format could have blank spaces inside the brackets and/or around it. Also, the char before the closing parentheses could be any letter or number.
Edit: I tried grep "\[a)\]" but it missed [ a) ].
Since there are many possible ways to write an item, I can not decide about a possible pattern. I think that it is enough for me such as
\item<blank spaces>[<blank spaces><letter or number>)<blank spaces>]
Replace blank space could not work because the patter above in general contains text around it (for example: \item[ a)] consider the function...)
The output should indicate is there are such patterns or not. It could be zero or the number of occurrences.
So to do it all in the grep itself:
grep -c -E '\\item\s*\[\s*\w+\)\s*\]' file.txt
Note all the \s* checks for spaces. Also -c to get the count.
Breaking it down:
\\ a backslash (needs escape in grep)
item "item"
\s* optional whitespaces
\[ "[" (needs escape in -E)
\s* optional whitespaces
\w+ at least one 'word' char
\) ")" (needs escape in -E)
\s* optional whitespaces
\] "]" (needs escape in -E)
Following awk may also help here(I am simply removing the spaces between [ to ] and then looking for pattern of either digit or character in it.
awk '
match($0,/\[.*\]/){
val=substr($0,RSTART+1,RLENGTH-1);
gsub(/[[:space:]]+/,"",val);
if(val ~ /[a-z0-9]+\)/){ count++ }
}
END{
print count
}' Input_file
So I am thinking something like this:
tr -d " \t" < file.txt | grep -c '\\item\[[0-9A-Za-z])\]'
This will count the number of matches for you.
Edit: Added \t to tr call. Now removes all spaces and tabs.
Here is a grep only version. This could be useful for printing out all of the matches (by removing -c) as well since the above version modifies the input:
grep -c '\\item *\[ *[0-9A-Za-z]) *\]' file.txt
Here is a more versatile answer if this is what you looking for. Here, we output the matches to a file and count the lines from the file to get the number of matches...
grep '\\item *\[ *[0-9A-Za-z]) *\]' file.txt > matches.txt
wc -l < matches.txt
Input:-
echo "1234ABC89,234" # A
echo "0520001DEF78,66" # B
echo "46545455KRJ21,00"
From the above strings, I need to split the characters to get the alphabetic field and the number after that.
From "1234ABC89,234", the output should be:
ABC
89,234
From "0520001DEF78,66", the output should be:
DEF
78,66
I have many strings that I need to split like this.
Here is my script so far:
echo "1234ABC89,234" | cut -d',' -f1
but it gives me 1234ABC89 which isn't what I want.
Assuming that you want to discard leading digits only, and that the letters will be all upper case, the following should work:
echo "1234ABC89,234" | sed 's/^[0-9]*\([A-Z]*\)\([0-9].*\)/\1\n\2/'
This works fine with GNU sed (I have 4.2.2), but other sed implementations might not like the \n, in which case you'll need to substitute something else.
Depending on the version of sed you can try:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1\n\2/'
or:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1$\2/' | tr '$' '\n'
DEF
78,66
Explanation: the regular expression replaces the input with the expected output, except instead of the new-line it puts a "$" sign, that we replace to a new-line with the tr command
Where do the strings come from? Are they read from a file (or other source external to the script), or are they stored in the script? If they're in the script, you should simply reformat the data so it is easier to manage. Therefore, it is sensible to assume they come from an external data source such as a file or being piped to the script.
You could simply feed the data through sed:
sed 's/^[0-9]*\([A-Z]*\)/\1 /' |
while read alpha number
do
…process the two fields…
done
The only trick to watch there is that if you set variables in the loop, they won't necessarily be visible to the script after the done. There are ways around that problem — some of which depend on which shell you use. This much is the same in any derivative of the Bourne shell.
You said you have many strings like this, so I recommend if possible save them to a file such as input.txt:
1234ABC89,234
0520001DEF78,66
46545455KRJ21,00
On your command line, try this sed command reading input.txt as file argument:
$ sed -E 's/([0-9]+)([[:alpha:]]{3})(.+)/\2\t\3/g' input.txt
ABC 89,234
DEF 78,66
KRJ 21,00
How it works
uses -E for extended regular expressions to save on typing, otherwise for example for grouping we would have to escape \(
uses grouping ( and ), searches three groups:
firstly digits, + specifies one-or-more of digits. Oddly using [0-9] results in an extra blank space above results, so use POSIX class [[:digit:]]
the next is to search for POSIX alphabetical characters, regardless if lowercase or uppercase, and {3} specifies to search for 3 of them
the last group searches for . meaning any character, + for one or more times
\2\t\3 then returns group 2 and group 3, with a tab separator
Thus you are able to extract two separate fields per line, just separated by tab, for easier manipulation later.