XPATH to eliminate characters

XPATH to eliminate characters - xpath

In the below link i have Source : Reuters with copyright , how do i eliminate all the characters after reuters.
http://www.rediff.com/cricket/report/confident-australia-move-on-to-perth-with-urn-on-their-minds-ashes-2017-18-pix/20171208.htmt
"Source: source
© Copyright 2019 Reuters Limited. All rights reserved. Republication or redistribution of Reuters content, including by framing or similar means, is expressly prohibited without the prior written consent of Reuters. Reuters shall not be liable for any errors or delays in the content, or for any actions taken in reliance thereon."
This text is under //div[#class='grey1']/text()[2]
I know that i have to use substring functions, but i am not quite sure how to use it though.
Is this correct ?
substring-before(//div[#class='grey1']/text()[2],'Limited')
Is this the correct way to eliminate characters ?

Assuming that the structure of that text snippet is always the same (for example, that the word "Reuters" is always followed by "Limited"), this xpath expression should work:
substring-after(substring-before( //div[#class='grey1']/text()[2],' Limited'),' © ' )
Output:
Copyright 2019 Reuters

Related

Does SonarQube count comment blocks in duplicated code?

Simple question. SonarQube has a duplicate code scanner. Does it examine comment blocks in this algorithm? If I have 1000 source files with the same copyright header, will it detect these as duplicated code?

From the SonarQube documentation on duplications:
SonarQube allows to detect "Type 2" duplications which means : structurally/syntactically identical fragments except for variations in literals and comments.
So your copyright header will not be counted as duplication.

"descript.ion" file spec?

There appears to be a somewhat standard "descript.ion" file in Windows programs universe which provides meta data for all/some of the files in a given directory.
I know there are various programs which write this file (example: NewsBin, UseNet downloader) and read it (Example: "FAR", a file manager mimicking old Norton Commander).
I'm writing my own file indexer, and would like to add the ability to parse and use the info from "descript.ion" files.
The problem I have is that I have not been able to find an actual spec for the file, despine much googling.
I reverse engineered it as best I could, but I'm not certain whether I captured 100% of the possible details, so I figured I'd ask SO.
Here are example lines from the file:
"Rus Song1.mp3" SovietMus 1/2, rus_song#gmail.com, Fri Aug 08 00:46:27 2008
RusSong2.mp3 SovietMus 2/2, rus_song#gmail.com, Fri Aug 08 01:46:22 2008
As it seems the structure is:
First "token" is a file name.
If the token starts with any letter but double quote, the token ends at the first space character.
If the token starts with the double quote, the end of token is the following double quote
Not sure what happens if filename contains a double quote, IIRC it's illegal in Windows filesystems, so escaping the quote may be a moot question)
Last token (end of line to the very last comma moving backwards) is a timestamp.
Second to last token (the very last comma to second-to-last comma moving backwards) is the name of the poster from the Usenet newsgroup. I'm not quite sure what happens in generic format since the only descript.ion files I saw were from NewsBin that is obviously Usenet centric.
Everything in between is a description, in NewsBin's case coming from post's subject.
QUESTIONs:
Does anyone know of a bit more official "descript.ion" file spec/documentation?
(or, at elast, have your own knowledge of those files and can verify my spec)
Does anyone know of any other programs that read or write this file?
Thanks!

The description files on my system are from Total Commander as well. They follow the basic spec mentioned in the other answers:
Filename Text I typed to describe the file
"Long filename" Some text
Each line ends in a normal Windows line break.
In addition, the program stores multi-line comments as follows:
Filename This is the first line\\nSecond line\\nLast line\x04\xc2
Here, I mean that the descript.ion file contains a backslash and a letter 'n' where I typed a line break, and two special characters 04 C2 at the end of the comment. In addition, the line is ended by a Windows line break 0D 0A.
Apparently, the two extra characters at the end of the line signal the end of a multiline comment. If I remove them, the comment is rendered as a single line in the GUI, and the '\n' sequences are displayed literally.

The original usage of DESCRIPT.ION was to provide longer more descriptive names to 8.3 filenames; all it had was the shortname and a longer description. As you've found, others have co-opted the name with varying formats and usages. Frankly speaking, I don't think you'll find any specific commonality among the various usages.

Format is simple: FileName remainder of the line is a description of the file
https://jpsoft.com/ascii/descfile.txt
(Wayback Machine)

The descript.ion file is extensively used in the file management utility "total commander", a shareware found in www.ghisler.com. From version 7.5 of TC, it can have length of 4096 bytes. I have been using it extensively to annotate my files without any issues. You may look up different user's experience at the total commander users forum.

the answer above looks correct for me, just a addition:
from http://filext.com/file-extension/ION
The ION file type is primarily associated with '4DOS'. Note: Norton Utilities also uses 4DOS.
http://www.optimasc.com/products/fileid/4dos-descext.pdf
Collected links to 4DOS description-aware programs of all kind and 4DOS tools.
http://www.4dos.info/4tools.htm
http://drupal.org/node/289988

Visual Basic 6: Tif Page convert from compression type ZLW, JPEG etc to Group4

Does anyone know of an open source library in Visual Basic 6 that converts pages in a TIF file from lets say LZW to Group4 format?
Thanks.

An edit of my answer to your other question!
ImageMagick is an excellent free open source image manipulation package. There is an OLE (ActiveX) control which you could use from VB6. I've never tried it myself - I always use the ImageMagick command line. I understand the control just takes the normal command lines anyway.
The command-line element for changing the TIFF format would be -compress. Something like this below to write in Group4 format (air code based on tutorial and manual)
convert myfile.tiff -compress Group4 myGroup4file.tiff
Choices for the -compress argument qualifier include: None, Group4, and LZW.
EDIT: ImageMagick is licensed under the GPL: if you use the control and redistribute your program, it's possible your program would have to be free open-source. Apparently it's not yet been legally tested whether dynamic linking to a GPL library or control invokes the GPL. You could always launch the ImageMagick command-line which to me should be safe [I am not a lawyer].
EDIT2: The ImageMagick website says it uses GPL but the license wording doesn't look like GPL 1 2 or 3 to me. It also contains this "For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof"

The FreeImage library has worked well for me in the past.
http://freeimage.sourceforge.net/

using par for formatting comments in code with international characters

I'm using Par (in linux) to get nice comments formatting quickly. The problem is that now I want to introduce comments that include some international characters, like áéíóú or äëïöü...
The program Berkeley Par considers these international characters as 2 ASCII characters (I believe) and it outputs the comments somehow broken because it doesn't count characters properly.
Did you face this problem before? Do you have any solution? Ideas?

You mean the code from Add multibyte characters support in "par" (or just the patches applied to the original source) don't work for you?
Then maybe it is a problem with your shell or the font it uses. Are you sure the shell and font you use is able to reproduce unicode characters

Par, as distributed in Ubuntu from Hardy on, is supposed to handle multi-byte encodings.
http://packages.ubuntu.com/hardy/par

I've never even heard of this tool, but check out par 1.52.
The latest version of Par, released on 2001-Apr-29, tar'd and gzip'd. The only real change is better support for 8-bit character sets (as opposed to just 7-bit ASCII), but see also the release notes.
Edit: On the page, see par_1.52-i18n.3.diff.gz:
A patch by Jérôme Pouiller that adds
support for multibyte charsets (like
UTF-8), plus Debian packaging. Copied
from http://sysmic.org/par/debian/.
See also his original announcement.

What information to put in comments at the top of a sourcecode file?

What information do you consider worth to put in the comment at the beginning of a sourcecode file?
All I could think about was the name of the author and perhaps the date the file was created (although I'm not to sure if there is any useful value to this information).
[EDIT] To clarify, I don't mean comments before a class, but at the first lines of the file, before include statements and what else. Like
/**
* Author: Name
* Created: 11.05.2009
*
* (c) Copyright by Blub Corp.
**/

What to put in the file header:
Library/component that source code is part of
Copyright details
Brief and meaningful description of class(es) in source file
What NOT to put in the file header:
Anything that duplicates low level logic which is part of the code itself. This can lead to maintenance problems if it isn't updated when the source code changes.
Author name(s). Why?
In the world of revision control systems, while there may be an initial author of some code, eventually ownership becomes blurred. This is especially true when code enters the maintenance phase of the life cycle where owners can change regularly.
All code eventually becomes "community wiki" after enough changes ;-)
Would you want your name associated with all of the code forever, knowing full well that you will not be responsible for the code until its death?
Creation and last changed dates. This is for similar reasons as list above. Revision control includes this information - why duplicate it in the header, making more work for yourself and risking leaving inaccurate information in the comment when things inevitably change?

Copyright
Original author(s)
License (if it's open-source)
One-line purpose statement or description
Further overall documentation and usage examples
Edit: Changed Author(s) to Original Author(s)

if the file is going to contain some very common class / functionality which can be understood with reasonable common sense, then you dont really need to put much in the description otherwise if the source code file is a class / function very specific to the project or is encompassing a complicated logic, then you should give a high level overview & purpose of the source code file.

File Encoding! (utf-8)
# -*- encoding: utf-8 -*-
Especially if you plan on sharing your code with someone else in some other part of the world at some point.

with all the above also put sort detail of the purpose of the code in that file also something u think can be helpful in debugging and understanding the functionality.this help in maintainence and support.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

XPATH to eliminate characters - xpath

Assuming that the structure of that text snippet is always the same (for example, that the word "Reuters" is always followed by "Limited"), this xpath expression should work: substring-after(substring-before( //div[#class='grey1']/text()[2],' Limited'),' © ' ) Output: Copyright 2019 Reuters

Related

Does SonarQube count comment blocks in duplicated code?

"descript.ion" file spec?

Visual Basic 6: Tif Page convert from compression type ZLW, JPEG etc to Group4

using par for formatting comments in code with international characters

What information to put in comments at the top of a sourcecode file?

Categories

Resources