How can I count _ in csv file with command line? - terminal

I have csv document with "_" in some lines. And I want to find out how many _ there are via terminal.
My file look like this:
NP_000008.1 MAAALARASGPARRALC
NP000010.7 MAPWPHENSSLAPWPDLPTL
NP_000011.2 MTLGSPRKGLLMLLMALVTQG
NP_000016.1 MAPWPHENSSLAPWPDLPTL
NP000043.4 MDPSMGVNS

grep -c '_' file.csv
try the above

Related

Unix sed command - global replacement is not working

I have scenario where we want to replace multiple double quotes to single quotes between the data, but as the input data is separated with "comma" delimiter and all column data is enclosed with double quotes "" got an issue and the same explained below:
The sample data looks like this:
"int","","123","abd"""sf123","top"
So, the output would be:
"int","","123","abd"sf123","top"
tried below approach to get the resolution, but only first occurrence is working, not sure what is the issue??
sed -ie 's/,"",/,"NULL",/g;s/""/"/g;s/,"NULL",/,"",/g' inputfile.txt
replacing all ---> from ,"", to ,"NULL",
replacing all multiple occurrences of ---> from """ or "" or """" to " (single occurrence)
replacing 1 step changes back to original ---> from ,"NULL", to ,"",
But, only first occurrence is getting changed and remaining looks same as below:
If input is :
"int","","","123","abd"""sf123","top"
the output is coming as:
"int","","NULL","123","abd"sf123","top"
But, the output should be:
"int","","","123","abd"sf123","top"
You may try this perl with a lookahead:
perl -pe 's/("")+(?=")//g' file
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
Where input is:
cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
Breakup:
("")+: Match 1+ pairs of double quotes
(?="): If those pairs are followed by a single "
Using sed
$ sed -E 's/(,"",)?"+(",)?/\1"\2/g' input_file
"int","","123","abd"sf123","top"
"int","","NULL","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
In awk with your shown samples please try following awk code. Written and tested in GNU awk, should work in any version of awk.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<=NF;i++){
if($i!~/^""$/){
gsub(/"+/,"\"",$i)
}
}
}
1
' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as , for all the lines of Input_file. Then traversing through each field of line, if a field is NOT NULL then Globally replacing all 1 or more occurrences of " with single occurrence of ". Then printing the line.
With sed you could repeat 1 or more times sets of "" using a group followed by matching a single "
Then in the replacement use a single "
sed -E 's/("")+"/"/g' file
For this content
$ cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
The output is
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
sed s'#"""#"#' file
That works. I will demonstrate another method though, which you may also find useful in other situations.
#!/bin/sh -x
cat > ed1 <<EOF
3s/"""/"/
wq
EOF
cp file stack
cat stack | tr ',' '\n' > f2
ed -s f2 < ed1
cat f2 | tr '\n' ',' > stack
rm -v ./f2
rm -v ./ed1
The point of this is that if you have a big csv record all on one line, and you want to edit a specific field, then if you know the field number, you can convert all the commas to carriage returns, and use the field number as a line number to either substitute, append after it, or insert before it with Ed; and then re-convert back to csv.

AWK post-procession of multi-column data

I am working with the set of txt file containing multi column information present in one line. Within my bash script I use the following AWK expression to take the filename from each of the txt filles as well as the number from the 5th column and save it in 2 column format in results.CSV file (piped to SED, which remove path of the file and its extension from the final CSV file):
awk '-F, *' '{if(FNR==2) printf("%s| %s \n", FILENAME,$5) }' ${tmp}/*.txt | sed 's|\/Users/gleb/Desktop/scripts/clusterizator/tmp/||; s|\.txt||' >> ${home}/"${experiment}".csv
obtaining something (for 5 txt filles) like this as CSV:
lig177_cl_5.2| -0.1400
lig331_cl_3.5| -8.0000
lig394_cl_1.9| -4.3600
lig420_cl_3.8| -5.5200
lig550_cl_2.0| -4.3200
How it would be possible to modify my AWK expression in order to exclude "_cl_x.x" from the name of each txt file as well as add the name of the CSV as the comment to the first line of the resulted CSV file:
# results.CSV
lig177| -0.1400
lig331| -8.0000
lig394| -4.3600
lig420| -5.5200
lig550| -4.3200
based on the rest of the pipe, I think you want to do something like this and get rid of sed invocations.
awk -F', *' 'FNR==2 {f=FILENAME;
sub(/.*\//,"",f);
sub(/_.*/ ,"",f);
printf("%s| %s\n", f, $5) }' "${tmp}"/*.txt >> "${home}/${experiment}.csv"
this will convert
/Users/gleb/Desktop/scripts/clusterizator/tmp/lig177_cl_5.2.txt
to
lig177
The pattern replacement is generic
/path/to/the/file/filename_otherstringshere...
will extract only filename. From the last / char to the first _ char. This is based the greedy matching of regex patterns.
For the output filename, it's easier to do it before awk call, since it's a one line only.
$ echo "${experiment}.csv" > "${home}/${experiment}.csv"
$ awk ... >> "${home}/${experiment}.csv"

grep string containing `":"` patterns

This is the piece of my log file in server
"order_items_subtotal":"60.5100","order_final_due_amount":"0.0000","items":[{"product_id"
I need to grep the logs which contain "order_final_due_amount":"0.0000" in my whole log file.
for this, I did like this
tail -f pp_create_shipment2018-12-05.log | grep "order_final_due_amount":"0.0000"
but I got zero results. what would be wrong on my tail command
" is interpreted by the shell (it's used to quote e.g. spaces).
grep "order_final_due_amount":"0.0000"
is equivalent to
grep order_final_due_amount:0.0000
To pass " to grep, you need to quote it:
grep '"order_final_due_amount":"0\.0000"'
(Also, . is special in regexes and should be escaped.)
Using Perl, you need to just escape the "." alone. The qr// takes cares of remaining.
Check this out:
> cat product.log
order items
items1
"order_items_subtotal":"60.5100","order_final_due_amount":"0.0000","items":[{"product_id"
item2
"order_items_subtotal":"60.5100","order_final_due_amount":"000000","items":[{"product_id"
items3
"order_items_subtotal":"60.5100",order_final_due_amount:"0.0000","items":[{"product_id"
items4
> perl -ne ' $pat=qr/"order_final_due_amount":"0\.0000"/; print if /$pat/ ' product.log
"order_items_subtotal":"60.5100","order_final_due_amount":"0.0000","items":[{"product_id"
>
Thanks to melpomene, the below also works
> perl -ne ' print if /"order_final_due_amount":"0\.0000"/ ' product.log
"order_items_subtotal":"60.5100","order_final_due_amount":"0.0000","items":[{"product_id"
>

Remove Leading Spaces from a variable in Bash

I have a script that exports a XML file to my desktop and then extracts all the data in the "id" tags and exports that to a csv file.
xmlstarlet sel -t -m '//id[1]' -v . -n </users/$USER/Desktop/List.xml > /users/$USER/Desktop/List2.csv
I then use the following command to add commas after each number and store it as a variable.
devices=$(sed "s/$/,/g" /users/$USER/Desktop/List2.csv)
If I echo that variable I get an output that looks like this:
123,
124,
125,
etc.
What I need help with is removing those spaces so that output will look like 123,124,125 with no leading space. I've tried multiple solutions that I can't get to work. Any help would be amazing!
If you don't want newlines, don't tell xmlstarlet to put them there in the first place.
That is, change -n to -o , to put a comma after each value rather than a newline:
{ xmlstarlet sel -t -m '//id[1]' -v . -o ',' && printf '\n'; } \
<"/users/$USER/Desktop/List.xml" \
>"/users/$USER/Desktop/List2.csv"
The printf '\n' here puts a final newline at the end of your CSV file after xmlstarlet has finished writing its output.
If you don't want the trailing , this leaves on the output file, the easiest way to be rid of it is to read the result of xmlstarlet into a variable and manipulate it there:
content=$(xmlstarlet sel -t -m '//id[1]' -v . -o ',' <"/users/$USER/Desktop/List.xml")
printf '%s\n' "${content%,}" >"/users/$USER/Desktop/List2.csv"
For a sed solution, try
sed ':a;N;$!ba;y/\n/,/' /users/$USER/Desktop/List2.csv
or if you want a comma even after the last:
sed ':a;N;$!ba;y/\n/,/;s/$/,/' /users/$USER/Desktop/List2.csv
but then more easy would be
cat /users/$USER/Desktop/List2.csv | tr "\n" ","

I need to format data from a command in unix shell scripting and put it in an array/table.I have given my command below

`grep 'MOVING_DATA Phase' /tmp/$$.status | awk '{print $11}'`
this is my command and I am getting result like
"database"."table1"
"database"."table2"
"database"."table3"
.....
.....
"database"."tablen"
I want to remove the double quotes and remove the delimeter '.' and store database name and table names in an array.I would appreciate if the result is displayed in the form of a table.I am new to Unix.Please help me.
You can get rid of grep and use gsub awk function to remove " and .:
awk '/MOVING_DATA Phase/{gsub(/["]/, "", $11);sub(/\./, " ", $11);print $11}'` /tmp/$$.status
To create BASH arrays:
#!/bin/bash
tabArr=()
dbArr=()
while read -r db table; do
tabArr+=("$table")
dbArr+=("$db")
done < <(awk '/MOVING_DATA Phase/{gsub(/["]/, "", $11);sub(/\./, " ", $11);print $11}'` /tmp/$$.status)

Resources