Different number of matches with xpath and string() [duplicate] - xpath

This question already has answers here:
XPath text with children
(2 answers)
Get xmllint to output xpath results \n-separated, for attribute selector
(5 answers)
Closed 3 years ago.
I use xmllint as an external command in VIM editor. When I ran the command below on a buffer containing well-formed xml
%!xmllint --xpath '//a/#href' --encode UTF-8 -
it returned the number of matches that I expected. But as I wanted only the values of href's I ran the other command that made use of string():
%!xmllint --xpath 'string(//a/#href)' --encode UTF-8 -
but I was very surprised to see only the value of the first matched href. Don't I understand something in XPath or is it a bug?
I can attach the source XML (ca 450 lines) if needed but I think it should be clear from the description that the culprit is somewhere else.

Related

How to extract links from a text file? [duplicate]

This question already has answers here:
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 3 months ago.
Suppose there is a text file test.txt. It contains text and links to resources such as https://example.com/kqodbjcuic49w95rofwjue. How can I extract only the list of these links from there? (preferably via bash, but not required)
I tried this solution:
sed 's/^.*href="\([^"]*\).*$/\1/'
But it didn't help me.
grep -o "((?:(?:http|ftp|ws)s?|sftp):\/\/?)?([^:/\s.#?]+\.[^:/\s#?]+|localhost)(:\d+)?((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?([^#]+)?(#[\w-]*)?" test.txt
will display all URLs inside the file.
(The regex comes from BSimjoo's link)
Grep text files guide at https://www.linode.com/docs/guides/how-to-grep-for-text-in-files/

grep names in a small file matching a large file [duplicate]

This question already has answers here:
Are shell scripts sensitive to encoding and line endings?
(14 answers)
grep not showing result which read id from file
(2 answers)
Closed 12 months ago.
My small file contains this information line by line:
abc.123
abc.258
abc.952
I wanted to get those lines matching in my bigger file (~30Gb). I tried this command but it didn't give me any result.
grep -f small.txt big.txt
I have tested all abc.123, abc.258 and abc.952 does exist in my bigger file, meaning that I tried to grep each of these names one by one it gave me the exact result I want.
grep "abc.123" big.txt
I have no idea where I could possibly go wrong?

How to replace quotes inside a quoted field of a non-standard CSV file using a one-liner bash command? [duplicate]

This question already has answers here:
What's the most robust way to efficiently parse CSV using awk?
(6 answers)
Closed 4 years ago.
This post was edited and submitted for review 11 months ago and failed to reopen the post:
Original close reason(s) were not resolved
I have a file like this:
col1×col2×col3
12×"Some field with "quotes" inside it"×"Some field without quotes inside but with new lines \n"
And I would like to replace the interior double quotes with single quotes so the result will look like this:
col1×col2×col3
12×"Some field with 'quotes' inside it"×"Some field without quotes inside but with new lines \n"
I guess this can be done with sed, awk or ex but I haven't been able to figure out a clean and quick way of doing it. Real CSV files are of the order of millions of lines.
The preferred solution would be a one-liner using the aforementioned programs.
A simple workaround using sed, based on your fields separator ×, could be:
sed -E "s/([^×])\"([^×])/\1'\2/g" file
This replace each " which is preceded and followed by any characters other that ×, with '.
Note that sed not support positive lookahead, so we have to group and reinsert the patterns.

Decode URL Unix/Bash Command Line (without sed) [duplicate]

This question already has answers here:
Bash script to convert from HTML entities to characters
(12 answers)
Closed 4 years ago.
I am scraping a website with curl and parsing out what I need.
The URLs are returned with Ascii encoded characters like
GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1
How can I convert this to UTF-8 (char) directly from the command line (ideally something I can pipe | to) so that the result is...
GET v2.12/...?fields={fieldname_of_type_Tab} HTTP/1.1
EDIT: There are a number of solutions with sed but the regex that goes along with it is quite ugly. Since the provided answer leveraging perl is very clean I hope we can leave this question open
It's html-entities.
Decode like this using perl :
$ echo 'http://domain.tld/?fields={fieldname_of_type_Tab&#125' |
perl -MHTML::Entities -pe 'decode_entities($_)'
Output :
http://domain.tld/?fields={fieldname_of_type_Tab}

jq not working with key including dash [duplicate]

This question already has answers here:
jq not working on tag name with dashes and numbers
(2 answers)
Closed 5 years ago.
I have a REST API, which returns something like this:
{
"foo": 1,
"bar": 2,
"foo-bar": 3
}
when I do `http /endpoint/url | jq '.foo-bar', it gave the following error:
jq: error (at <stdin>:1): null (null) and boolean (true) cannot be subtracted
it looks like jq thinks I'm trying to do arithmetic operation with foo-bar.
How do I correctly form this kind of path? Or this is a bug of jq?
In JSON text, JSON keys are always double-quoted. Perhaps your REST API was formatting it properly in double-quotes and your example in your last edit was incorrect. Because without the same jq cannot parse the syntax as a valid JSON.
As far the issue you are seeing, you need to put the field within quotes to let jq know that it is a single field foo-bar you are accessing and not as separate fields
jq '."foo-bar"'
Or more specifically use the array access operator as jq '.["foo-bar"]'

Resources