This question already has answers here:
is it possible to add a comment to a diff file (unified)?
(3 answers)
Closed 5 years ago.
I would like to review a patch of a colleague. We are unable to use a review tool. So I would like to comment the patch file he made. Is it possible to write inline comments to a (svn) patch file?
I couldn't find any information in the svn red book on it. I was even unable to find the patch file grammar to figure it out myself.
The diff format is just the unified diff format. If you wanted you could put some text after the range info. Consider this diff produced with command svn diff -c 1544711 https://svn.apache.org/repos/asf/subversion/trunk:
Index: subversion/mod_dav_svn/mod_dav_svn.c
===================================================================
--- subversion/mod_dav_svn/mod_dav_svn.c (revision 1544710)
+++ subversion/mod_dav_svn/mod_dav_svn.c (revision 1544711)
## -1097,7 +1097,8 ##
/* Fill the filename on the request with a bogus path since we aren't serving
* a file off the disk. This means that <Directory> blocks will not match and
- * that %f in logging formats will show as "svn:/path/to/repo/path/in/repo". */
+ * %f in logging formats will show as "dav_svn:/path/to/repo/path/in/repo".
+ */
static int dav_svn__translate_name(request_rec *r)
{
const char *fs_path, *repos_basename, *repos_path;
## -1146,7 +1147,7 ##
if (repos_path && '/' == repos_path[0] && '\0' == repos_path[1])
repos_path = NULL;
- /* Combine 'svn:', fs_path and repos_path to produce the bogus path we're
+ /* Combine 'dav_svn:', fs_path and repos_path to produce the bogus path we're
* placing in r->filename. We can't use our standard join helpers such
* as svn_dirent_join. fs_path is a dirent and repos_path is a fspath
* (that can be trivially converted to a relpath by skipping the leading
## -1154,7 +1155,7 ##
* repository is 'trunk/c:hi' this results in a non canonical dirent on
* Windows. Instead we just cat them together. */
r->filename = apr_pstrcat(r->pool,
- "svn:", fs_path, repos_path, SVN_VA_NULL);
+ "dav_svn:", fs_path, repos_path, SVN_VA_NULL);
/* Leave a note to ourselves so that we know not to decline in the
* map_to_storage hook. */
If you add the option -x-p to that command you'll get:
Index: subversion/mod_dav_svn/mod_dav_svn.c
===================================================================
--- subversion/mod_dav_svn/mod_dav_svn.c (revision 1544710)
+++ subversion/mod_dav_svn/mod_dav_svn.c (revision 1544711)
## -1097,7 +1097,8 ## static int dav_svn__handler(request_rec *r)
/* Fill the filename on the request with a bogus path since we aren't serving
* a file off the disk. This means that <Directory> blocks will not match and
- * that %f in logging formats will show as "svn:/path/to/repo/path/in/repo". */
+ * %f in logging formats will show as "dav_svn:/path/to/repo/path/in/repo".
+ */
static int dav_svn__translate_name(request_rec *r)
{
const char *fs_path, *repos_basename, *repos_path;
## -1146,7 +1147,7 ## static int dav_svn__translate_name(request_rec *r)
if (repos_path && '/' == repos_path[0] && '\0' == repos_path[1])
repos_path = NULL;
- /* Combine 'svn:', fs_path and repos_path to produce the bogus path we're
+ /* Combine 'dav_svn:', fs_path and repos_path to produce the bogus path we're
* placing in r->filename. We can't use our standard join helpers such
* as svn_dirent_join. fs_path is a dirent and repos_path is a fspath
* (that can be trivially converted to a relpath by skipping the leading
## -1154,7 +1155,7 ## static int dav_svn__translate_name(request_rec *r)
* repository is 'trunk/c:hi' this results in a non canonical dirent on
* Windows. Instead we just cat them together. */
r->filename = apr_pstrcat(r->pool,
- "svn:", fs_path, repos_path, SVN_VA_NULL);
+ "dav_svn:", fs_path, repos_path, SVN_VA_NULL);
/* Leave a note to ourselves so that we know not to decline in the
* map_to_storage hook. */
Note how the function is added after the ## on the range lines. This portion of the lines are ignored by any software processing the diff. So you're free to put whatever you want there. You could put your comments there.
Unidiff hunks start each line with ' ' (space) to mean context (as in an unchanged line), '+' to mean an added line, or '-' to mean a removed line. A lot of parsers will (including Subversion's svn patch command) will discard lines that start with some other character. So you might be able to simply insert a line that starts with some other character. But that's not guaranteed to be as portable as the above method.
Related
With below configuration in pyproject.toml :
[tool.black]
# How many characters per line to allow.
line-length = 120
# When processing Jupyter Notebooks, add the given magic to the list of known
# python-magics (timeit, prun, capture, pypy, python3, python, time).
# Useful for formatting cells with custom python magics.
# python-cell-magics =
# Require a specific version of Black to be running
# (useful for unifying results across many environments e.g. with a pyproject.toml file).
# It can be either a major version number or an exact version.
# required-version =
# A regular expression that matches files and directories that should be
# included on recursive searches. An empty value means all files are included
# regardless of the name. Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# include = "(\.pyi?|\.ipynb)$"
# A regular expression that matches files and directories that should be
# excluded on recursive searches. An empty value means no paths are excluded.
# Use forward slashes for directories on all platforms (Windows, too).
# Exclusions are calculated first, inclusions later.
# exclude = "/(\.direnv|\.eggs|\.git|\.hg|\.mypy_cache|\.nox|\.tox|\.venv|venv|\.svn|\.ipynb_checkpoints|_build|buck-out|build|dist|__pypackages__)/"
# Like 'exclude', but adds additional files and directories on top of the excluded ones.
# (Useful if you simply want to add to the default).
# extend-exclude =
# Like 'exclude', but files and directories matching this regex will be excluded
# even when they are passed explicitly as arguments.
# force-exclude =
# The name of the file when passing it through stdin.
# Useful to make sure Black will respect 'force-exclude' option on some editors that rely on using stdin.
# stdin-filename =
# Number of parallel workers.
# Can be a number or a range.
# workers =
and this command line :
black --config "pyproject.toml" --target-version py39 --check --diff .
the following line of code is flagged :
ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # cspell: disable-line # fmt: skip
--- properties/datasets/models.py 2022-11-30 00:01:16.590743 +0000
+++ properties/datasets/models.py 2022-11-30 00:01:18.692767 +0000
## -746,11 +746,13 ##
calculate the mean value of all the dataset points
return: numerical value of this function when all variables are zero
rtype: float
"""
- ave_quantity = self.exec_math(math_iterable["mean"], "mean", []) # execute the "mean" fxn on the dataset # fmt:skip
+ ave_quantity = self.exec_math(
+ math_iterable["mean"], "mean", []
+ ) # execute the "mean" fxn on the dataset # fmt:skip
return getattr(ave_quantity, "magnitude", 0.0)
def serialize(self, flat=False):
return {
"type": "dataset",
would reformat properties/datasets/models.py
Oh no! 💥 💔 💥
1 file would be reformatted, 102 files would be left unchanged.
What am I missing here?
Using black v22.10.0
Also asked here --> https://github.com/psf/black/issues/451#issuecomment-1331478945
I would like to compare two of my log files generated before and after an implementation to see if it has impacted anything. However, the order of the logs I get is not the same all the time. Since, the log file also has multiple indented lines, when I tried to sort, everything is sorted. But, I would like to keep the child intact with the parent. Indented lines are spaces and not tab.
Any help would be greatly appreciated. I am fine with any windows solution or Linux one.
Eg of the file:
#This is a sample code
Parent1 to be verified
Child1 to be verified
Child2 to be verified
Child21 to be verified
Child23 to be verified
Child22 to be verified
Child221 to be verified
Child4 to be verified
Child5 to be verified
Child53 to be verified
Child52 to be verified
Child522 to be verified
Child521 to be verified
Child3 to be verified
I am posting another answer here to sort it hierarchically, using python.
The idea is to attach the parents to the children to make sure that the children under the same parent are sorted together.
See the python script below:
"""Attach parent to children in an indentation-structured text"""
from typing import Tuple, List
import sys
# A unique separator to separate the parent and child in each line
SEPARATOR = '#'
# The indentation
INDENT = ' '
def parse_line(line: str) -> Tuple[int, str]:
"""Parse a line into indentation level and its content
with indentation stripped
Args:
line (str): One of the lines from the input file, with newline ending
Returns:
Tuple[int, str]: The indentation level and the content with
indentation stripped.
Raises:
ValueError: If the line is incorrectly indented.
"""
# strip the leading white spaces
lstripped_line = line.lstrip()
# get the indentation
indent = line[:-len(lstripped_line)]
# Let's check if the indentation is correct
# meaning it should be N * INDENT
n = len(indent) // len(INDENT)
if INDENT * n != indent:
raise ValueError(f"Wrong indentation of line: {line}")
return n, lstripped_line.rstrip('\r\n')
def format_text(txtfile: str) -> List[str]:
"""Format the text file by attaching the parent to it children
Args:
txtfile (str): The text file
Returns:
List[str]: A list of formatted lines
"""
formatted = []
par_indent = par_line = None
with open(txtfile) as ftxt:
for line in ftxt:
# get the indentation level and line without indentation
indent, line_noindent = parse_line(line)
# level 1 parents
if indent == 0:
par_indent = indent
par_line = line_noindent
formatted.append(line_noindent)
# children
elif indent > par_indent:
formatted.append(par_line +
SEPARATOR * (indent - par_indent) +
line_noindent)
par_indent = indent
par_line = par_line + SEPARATOR + line_noindent
# siblings or dedentation
else:
# We just need first `indent` parts of parent line as our prefix
prefix = SEPARATOR.join(par_line.split(SEPARATOR)[:indent])
formatted.append(prefix + SEPARATOR + line_noindent)
par_indent = indent
par_line = prefix + SEPARATOR + line_noindent
return formatted
def sort_and_revert(lines: List[str]):
"""Sort the formatted lines and revert the leading parents
into indentations
Args:
lines (List[str]): list of formatted lines
Prints:
The sorted and reverted lines
"""
sorted_lines = sorted(lines)
for line in sorted_lines:
if SEPARATOR not in line:
print(line)
else:
leading, _, orig_line = line.rpartition(SEPARATOR)
print(INDENT * (leading.count(SEPARATOR) + 1) + orig_line)
def main():
"""Main entry"""
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <file>")
sys.exit(1)
formatted = format_text(sys.argv[1])
sort_and_revert(formatted)
if __name__ == "__main__":
main()
Let's save it as format.py, and we have a test file, say test.txt:
parent2
child2-1
child2-1-1
child2-2
parent1
child1-2
child1-2-2
child1-2-1
child1-1
Let's test it:
$ python format.py test.txt
parent1
child1-1
child1-2
child1-2-1
child1-2-2
parent2
child2-1
child2-1-1
child2-2
If you wonder how the format_text function formats the text, here is the intermediate results, which also explains why we could make file sorted as we wanted:
parent2
parent2#child2-1
parent2#child2-1#child2-1-1
parent2#child2-2
parent1
parent1#child1-2
parent1#child1-2#child1-2-2
parent1#child1-2#child1-2-1
parent1#child1-1
You may see that each child has its parents attached, all the way along to the root. So that the children under the same parent are sorted together.
Short answer (Linux solution):
sed ':a;N;$!ba;s/\n /#/g' test.txt | sort | sed ':a;N;$!ba;s/#/\n /g'
Test it out:
test.txt
parent2
child2-1
child2-1-1
child2-2
parent1
child1-1
child1-2
child1-2-1
$ sed ':a;N;$!ba;s/\n /#/g' test.txt | sort | sed ':a;N;$!ba;s/#/\n /g'
parent1
child1-1
child1-2
child1-2-1
parent2
child2-1
child2-1-1
child2-2
Explanation:
The idea is to replace the newline followed by an indentation/space with a non newline character, which has to be unique in your file (here I used # for example, if it is not unique in your file, use other characters or even a string), because we need to turn it back the newline and indentation/space later.
About sed command:
:a create a label 'a'
N append the next line to the pattern space
$! if not the last line, ba branch (go to) label 'a'
s substitute, /\n / regex for newline followed by a space
/#/ a unique character to replace the newline and space
if it is not unique in your file, use other characters or even a string
/g global match (as many times as it can)
I am able to read my files from the FTP location if I specify the exact filename. My problem is that I'm trying to automate this process where I have to read these files week over week and the filename changes randomly. There is no specific pattern to it, so it can't be predetermined.
Is there a way in SAS where I can read the name of all the files present at an FTP location and give the user a dialog box with this information, for them to enter the filename they want to read.
In the SAS Display Manager interface you can use a data step WINDOW or macro %WINDOW statements to define a simple picker, and DISPLAY or %DISPLAY to raise it. The simple picker is really simple, no scrollers or other modern adornments.
An FTP folder listing is retrieved using the filename FTP engine option LS
Sample code:
/**/
* location of FTP folder;
filename folder ftp
user = 'anonymous'
host = 'ftp.cdc.gov'
cd = '/pub/Health_Statistics/NCHS/Publications/ICD9-CM/2011'
ls
;
* retrieve listing;
data files;
infile folder;
input;
order + 1;
fileinfo = _infile_;
run;
/**/
%macro picker(
/* Dynamically build a %WINDOW definition, display it and return the last selected item */
name=, /* Name of window */
title=, /* First line text */
data=, /* data set containing items */
order=order, /* variable for ordering items in the picker*/
item=, /* variable to pick a value of */
result= /* name of macro variable that will contain the picked item, must pre-exist in caller scope */
);
%* field definitions will look like
%* #2 #2 field<i> 1 color=blue attr=rev_video " &filename<i>" ;
%local n i row field_def;
proc sql noprint;
select count(*) into :n trimmed from &data;
%do i = 1 %to &n;
%local field&i item&i;
%end;
select
&order, &item
into
:order1-, :item1-
from
&data;
quit;
%do i = 1 %to &N;
%let field_def = &field_def #%eval(&i+1) #2 field&i 1 color=blue attr=rev_video " &&item&i";
%end;
%WINDOW PICKER rows=30 columns=80
#1 #1 "&title. - Mark an item and Press F3"
&field_def
;
%display PICKER;
%do i = 1 %to &N;
%if %length(&&field&i) %then %let &result=&&item&i;
%end;
%mend;
%let selected=;
%picker(name=PICKER, title=Pick a file, data=files, item=fileinfo, result=selected);
%put &=selected;
More sophisticated pickers can be built using SAS/AF. Other possibilities include Stored Process prompt dialogs, SAS Studio snippets, or a SAS server page.
Ok, so this is a unique question.
We are getting files (daily) from a company. These files are downloaded from their servers to ours (SFTP). The company that we deal with deals with a third party provider that creates the files (and reduces their size) to make downloads faster and also reduce file-size on their servers.
We download 9 files daily from the server, 3 groups of 3 files
Each group of files consists of 2 XML files and one "image" file.
One of these XML files gives us information on the 'image' file.
Information in the XML file we need:
offset: Gives us where a section of data starts
length: Used with offset, gives us the end of that section
count: Gives us the number of elements held in the file
The 'image' file itself is unusable until we split the file into pieces based on the offset and length of each image in the file. The images are basically concatenated together. We need to extract these images to be able to view them.
An example of offset, length and count values are as follows:
offset: 0
length: 2670
offset: 2670
length: 2670
offset: 5340
length: 2670
offset: 8010
length: 2670
count: 4
This means that there are 4 (count) items. The first count item begins at offset[0] and is length[0] in length. The second item begins at offset[1] and is length[1] in length, etc.
I need to split the images at these points and these points PRECISELY without room for error. The third party provider will not provide us with the code and we are to figure this out ourselves. The image file is not readable without splitting the files and are essentially useless until then.
My question: Does anyone have a way of splitting files at a specific byte?
P.S. I do not have any code yet. I don't even know where to begin with this one. I am not new to coding, but I have never done file splitting by the byte.
I don't care which language this uses. I just need to make it work.
EDIT
The OS is Windows
You hooked me. Here's a rough Java method that can split a file based on offset and length. This requires at least Java 8.
A few of the classes used:
SeekableByteChannel
ByteBuffer
And an article I found helpful in producing this example.
/**
* Method that splits the data provided in fileToSplit into outputDirectory based on the
* collection of offsets and lengths provided in offsetAndLength.
*
* Example of input offsetAndLength:
* Long[][] data = new Long[][]{
* {0, 2670},
* {2670, 2670},
* {5340, 2670},
* {8010, 2670}
* };
*
* Output files will be placed in outputDirectory and named img0, img1... imgN
*
* #param fileToSplit
* #param outputDirectory
* #param offsetAndLength
* #throws IOException
*/
public static void split( Path fileToSplit, Path outputDirectory, Long[][] offsetAndLength ) throws IOException{
try (SeekableByteChannel sbc = Files.newByteChannel(fileToSplit, StandardOpenOption.READ )){
for(int x = 0; x < offsetAndLength.length; x++){
ByteBuffer buffer = ByteBuffer.allocate(offsetAndLength[x][4].intValue());
sbc.position(offsetAndLength[x][0]);
sbc.read(buffer);
buffer.flip();
File img = new File(outputDirectory.toFile(), "img"+x);
img.createNewFile();
try(FileChannel output = FileChannel.open(img.toPath(), StandardOpenOption.WRITE)){
output.write(buffer);
}
buffer.clear();
}
}
}
I leave parsing the XML file to you.
Parsing a directory tree with hundreds of thousands of of files looking for valid (non-empty, readable) log files. What is the most efficient order of tests for early bail?
Here's an example I use as a file::find preprocessor stage and, being new to Perl, I wonder what tests are slowests / redundant / inefficiently ordered?
sub filter {
my $nicename = substr( $File::Find::dir, $_pathLength );
my #clean;
my $filecount = my $dircount = 0;
foreach (#_) {
next unless -R $_; # readable
next unless -f _ || -d _; # file or dir.
next if ( $_ =~ m/^\./ ); # ignore files/folders starting with a period
if ( -f _ ) { # regular file
next unless ( my $size = -s _ ); # does it have a size?
next unless ( $_ =~ m/([^.]+)$/ )[0] eq $_log_file_ext; # correct file extension?
next if exists( $_previousRun{ $_ . " ($size)" } ); # don't add files we've already processed
$filecount++;
} elsif ( -d _ ) { # dir
$dircount++;
}
push( #clean, $_ );
}
$_fileCount += $filecount;
$_dirCount += $dircount;
Utils::logit("'$nicename' contains $filecount new files and $dircount folders to explore.");
return #clean;
}
Any info you can provide on Perls internals and behaviours would be useful to me.
At the very end I run some specific checks for "regular file" and "directory". Are there other things I should check for and avoid adding to my clean list?
As a rough rule of thumb, 'going to disk' it the most expensive thing you'll be doing.
So when trying to optimise IO based:
First, discard anything you can based on name/location. (e.g. 'does filename contain a .')
Then discard based on file attributes - coalesce if you can into a single stat call, because then you're making a single IO.
And then do anything else.
I'm at least fairly sure that your -s -d -f etc. will be triggering stat() operations each time they go. (Which will probably get cached, so it doesn't hurt that much). But you do also test -f and -d twice - once to do the next unless and again to do the if
But you might find you can do a single stat and get most of the metadata you're interested in:
http://perldoc.perl.org/functions/stat.html
In the grand scheme of things though - I wouldn't worry about it too much. Your limiting factor will be disk IO, and the odd additional stat or regular expressions won't make much difference to the overall speed.