Please help, I am new to unix,
I want to read file which are available before 130000(hhmiss), meaning file from 00 hours to 13 hours.
XYZ_20170504_{00,01,02,03,04,05,06,07,08,09,10,11,12}[0-5][0-9][0-5][0-9].csv
OR
XYZ_20170504_{13,14,15,16,17,18,19,20,21,22,23}[0-5][0-9][0-5][0-9]
Double digit pattern matching is not working for my case.
Examples
File pattern 1: XYZ_DATE_TIME.csv
I want to mget files from sftp server where twice a day.
String_yyyymmdd_hhmiss.csv
XYZ_20170514_035959.csv
OR
XYZ_20170514_165959.csv
First of all in your question, i can see an error in your search regarding the dates.
Your regex is looking for this
XYZ_20170504_
and your file contains this
XYZ_20170514_
So obviously these will never match. I think this is more like what you need
000000 to 125959
XYZ_20170514_(0[0-9]|1[0-2])[0-5][0-9][0-5][0-9].csv
OR
130000 to 235959
XYZ_20170514_(1[3-9]|2[0-3])[0-5][0-9][0-5][0-9].csv
Also this is a relatively quick fix, only looking for matches relating to the time. You should also look to further improve this so that you fully sure you are matching correctly on the dates also.
For your own reference there are many tools available online to help you with your regular expressions and pattern matching. The one i favour most is regex101, however there are many others available too.
Related
So I was spliting some large files, everything worked properly until a file of 81GB came to scene. The split command seems that made its job, but the last files has a non correlated name. Look at the right bottom of picture.
And I'm using the command like this:
split -b 125M ./2014.txt 2014/2014_
Anyone knows why instead of create the file 2014_za created the 2014_zaaa?
You can only have 676 files named [a-z][a-z], while your command required more.
Here are some options for what split could do:
Crash.
This is the behavior mandated by POSIX, and followed by macOS.
Start writing larger suffixes.
This is a bad choice because after _zz comes _aaa, but now the files will show up in the wrong order in ls and cat * will no longer join them in correct order.
Save the last range, _z, for longer suffixes.
This is a good choice because after _yz comes _zaaa, which has room to grow while still remaining in alphabetical order. This is what GNU does, and the behavior you're seeing.
If you want all the names to be uniform without triggering any of these behaviors, just use a larger suffix length with -a 6 to ensure you have enough room.
I have a large source code where most of the documentation and source code comments are in english. But one of the minor contributors wrote comments in a different language, spread in various places.
Is there a simple trick that will let me find them ? I imagine first a way to extract all comments from the code and generate a single text file (with possible source file / line number info), then pipe this through some language detection app.
If that matters, I'm on Linux and the current compiler on this project is CLang.
The only thing that comes to mind is to go through all of the code manually and check it yourself. If it's a similar language, that doesn't contain foreign letters, consider using something with a spellchecker. This way, the text that isn't recognized will get underlined, and easy to spot.
Other than that, I don't see an easy way to go through with this.
You could make a program, that reads the files and only prints the comments out to another output file, where you then spell check that file, but this would seem to be a waste of time, as you would easily be able to spot the comments yourself.
If you do make a program for that, however, keep in mind that there are three things to check for:
If comment starts with /*, make sure it stops reading when encountering */
If comment starts with //, only read one line - unless:
If line starting with // ends with \, read next line as well
While it is possible to detect a language from a string automatically, you need way more words than fit in a usual comment to do so.
Solution: Use your own eyes and your own brain...
There appears to be a somewhat standard "descript.ion" file in Windows programs universe which provides meta data for all/some of the files in a given directory.
I know there are various programs which write this file (example: NewsBin, UseNet downloader) and read it (Example: "FAR", a file manager mimicking old Norton Commander).
I'm writing my own file indexer, and would like to add the ability to parse and use the info from "descript.ion" files.
The problem I have is that I have not been able to find an actual spec for the file, despine much googling.
I reverse engineered it as best I could, but I'm not certain whether I captured 100% of the possible details, so I figured I'd ask SO.
Here are example lines from the file:
"Rus Song1.mp3" SovietMus 1/2, rus_song#gmail.com, Fri Aug 08 00:46:27 2008
RusSong2.mp3 SovietMus 2/2, rus_song#gmail.com, Fri Aug 08 01:46:22 2008
As it seems the structure is:
First "token" is a file name.
If the token starts with any letter but double quote, the token ends at the first space character.
If the token starts with the double quote, the end of token is the following double quote
Not sure what happens if filename contains a double quote, IIRC it's illegal in Windows filesystems, so escaping the quote may be a moot question)
Last token (end of line to the very last comma moving backwards) is a timestamp.
Second to last token (the very last comma to second-to-last comma moving backwards) is the name of the poster from the Usenet newsgroup. I'm not quite sure what happens in generic format since the only descript.ion files I saw were from NewsBin that is obviously Usenet centric.
Everything in between is a description, in NewsBin's case coming from post's subject.
QUESTIONs:
Does anyone know of a bit more official "descript.ion" file spec/documentation?
(or, at elast, have your own knowledge of those files and can verify my spec)
Does anyone know of any other programs that read or write this file?
Thanks!
The description files on my system are from Total Commander as well. They follow the basic spec mentioned in the other answers:
Filename Text I typed to describe the file
"Long filename" Some text
Each line ends in a normal Windows line break.
In addition, the program stores multi-line comments as follows:
Filename This is the first line\\nSecond line\\nLast line\x04\xc2
Here, I mean that the descript.ion file contains a backslash and a letter 'n' where I typed a line break, and two special characters 04 C2 at the end of the comment. In addition, the line is ended by a Windows line break 0D 0A.
Apparently, the two extra characters at the end of the line signal the end of a multiline comment. If I remove them, the comment is rendered as a single line in the GUI, and the '\n' sequences are displayed literally.
The original usage of DESCRIPT.ION was to provide longer more descriptive names to 8.3 filenames; all it had was the shortname and a longer description. As you've found, others have co-opted the name with varying formats and usages. Frankly speaking, I don't think you'll find any specific commonality among the various usages.
Format is simple: FileName remainder of the line is a description of the file
https://jpsoft.com/ascii/descfile.txt
(Wayback Machine)
The descript.ion file is extensively used in the file management utility "total commander", a shareware found in www.ghisler.com. From version 7.5 of TC, it can have length of 4096 bytes. I have been using it extensively to annotate my files without any issues. You may look up different user's experience at the total commander users forum.
the answer above looks correct for me, just a addition:
from http://filext.com/file-extension/ION
The ION file type is primarily associated with '4DOS'. Note: Norton Utilities also uses 4DOS.
http://www.optimasc.com/products/fileid/4dos-descext.pdf
Collected links to 4DOS description-aware programs of all kind and 4DOS tools.
http://www.4dos.info/4tools.htm
http://drupal.org/node/289988
In a solution with lots of files and projects - how would you find all completely commented files? I assume that every line of code starts with // (EDIT: or is empty) in such files.
I am using VS 2008, C#, ReSharper is available.
I know, normally such files should not exist - that's what a source safe is for ...
To find all files in and under the current directory in which all lines begin with '//':
find . -type f -exec sh -c 'grep -vq "^//" {} || echo {}' \;
Note that this will report empty files.
The argument to grep can easily be expanded to account for whitespace, or generalized to match an arbitrary regex.
There is no way to achieve this with a simple search style with the components you've mentioned. Doing this would require a bit of interpretation on the file but could be done with a fairly simple script.
It sounds like you're looking for files without code though vs. files with all comments. For example if there are 1000 lines where 900 are commented and 100 are blank, it seems to meet your criteria.
The script should be fairly straight forward to write but you would need to look out for the following weird cases
Block comments
if blocks which are always false. For example #if 0
Empty lines
Well, you could write a program (probably a console app) to recursively walk the directory and file tree. Read in all .cs files and check each line to see if its first non-space and non-tab characters are "//". If you wanted to get really fancy, you could count the total lines and the lines with "//" and display the percentages so you could catch files that didn't have absolutely every line commented out. You'll just need to understand a little bit about System.IO to get the files and string functions to look for the characters you are looking for. That should cover it.
This should be close to what you're looking for: http://www.codeproject.com/KB/cs/csharplinecounter.aspx
Look for the method in the project that determines if a line is commented or not, and you can use that to build a count, etc.
If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.