So I am trying to make a script that references other files. I want to be able to keep track of the file even if it moves. So I was thinking if I could assign a file a unique value then I could find the location of the file by searching by the unique value I assigned it.
Is there a better way to do this?
Basically I'd like to be able to find a file from a value it has as an extended attribute. But I don't know if this is possible.
Any help would be great. Thanks.
You could use the inode number (show it with ls -i /some/file) which will be unique per file and which does not change when the file is changed or moved, UNLESS you move the file to a different partition. If you don't need to track files over multiple partitions than this would be a very easy solution.
To find a file by inode number you can use find -inum <inode number>
If you need to move the file across filesystems, you can use the command setfattr to set extended attributes to file, note: your kernel must be able to do this.
There aren't one-shot command to find the file but you can create a simple script to search the file by your unique key.
To set to yourfile a key similar to user.unique and a string value as you like, e.g.: n1234
setfattr -n user.unique -v n1234 yourfile
to retrieve the value you can use the command getfattr
getfattr -n user.unique yourfile
or to get all extended file attributes:
getfattr -d yourfile
To test if your kernel is able to handle extended attributes on type of your filesystems:
zcat /proc/config.gz | grep FS_XATTR
To write a simple script to search using extended attributes you can refer to this Hack; on the same page you can read more about extended attributes.
Related
I have a directory on server B that contains 'dated' directories like:
2015-03-01_10.07.11
2015-03-02_10.05.02
2015-02-25_11.05.02
2015-02-24_11.07.05
I need to copy the content of the directory with the latest date.
In my example, I'd have to copy contents of the 2015-03-02_10.05.02 directory.
How would I do that?
Thanks,
These directories sort correctly according to their names, so you can use the usual ls -t ls -t commands to sort them.
So then the problem becomes how to capture the sort and extract the first (or last). Either an array or a string with regex can do this. There are probably many other ways too. For example look at find and sort manpages
I ended up using ls -1lr | tail -n 1
So I have a directory with ~50 files, and each contain different things. I often find myself not remembering which files contain what. (This is not a problem with the naming -- it is sort of like having a list of programs and not remembering which files contain conditionals).
Anyways, so far, I've been using
cat * | grep "desiredString"
for a string that I know is in there. However, this just gives me the lines which contain the desired string. This is usually enough, but I'd like it to give me the file names instead, if at all possible.
How could I go about doing this?
It sounds like you want grep -l, which will list the files that contain a particular string. You can also just pass the filename arguments directly to grep and skip cat.
grep -l "desiredString" *
In the directory containing the files among which you want to search:
grep -rn "desiredString" .
This can list all the files matching "desiredString", with file names, matching lines and line numbers.
I have to combine a lot of similar csv files to one file. They are stored in many different subdirectories but the single csv files have the same name.
I need to append them columnwise, but I need the first "name" column only once. So I want to keep the first column of the first csv file and remove them from all following. Referring to this question I tried the following command: Iterating through all the subdirectories while the final file is in the main directory (And is in the beginning a copy of one of the many csv files, so that it already contains the "name" column):
for i in */; do paste final_table.csv <(cut -f 2- "$i"single_table.csv) > final_table.csv ; done
However it seems like paste does not work when one of the input files is also the output file.
How would I solve this correctly?
Don't overwrite with output the file you're reading input from. Instead, mv/rename it to an intermediate name, let your script read from that file, and output to a file with the original name. Remove the input file when complete.
Alternatively, choose an intermediate name for output file, write all input to it, and only after all input was processed, mv/rename output file to the final name.
as intemediate name, appending a temporary file name ending ("extension") could be useful.
The sponge utility from the moreutils package is what I always use for this kind of situation:
for i in */; do
paste final_table.csv <(cut -f 2- "$i"single_table.csv) | sponge final_table.csv
done
sponge quite simply "soaks up" standard in and writes to the filename you give it afterwards. It is written specifically for situations like this, to avoid the need for you to create (and then remember to delete) a temporary file.
I'm working with a large CSV that follows a basic process.
Backup the working original
Generate a skeleton CSV
Read from another CSV, format the contents, and then append it to the skeleton
Append the data from the backup to the new one.
The issue I'm running into is that when I read in the contents from the backup, I'm using grep -Ev -f with a file containing regexes to exclude undesired data from the backup to be included in the next revision. This currently presents a problem because grep appears to evaluate each regex in the file against every line from STDIN which will cause duplicates. The simple solution would be to simply pipe it through sort | uniq and call it a day but that will screw with the formatting of the csv currently in use. I can elaborate if needed but the short of it is I run a script to bulk process IP addresses but there is also manual editing of the file by other people and with the current form of the script the final output will be all of the automated content with manual entries being at the bottom of the file.
So, is there anyway without some ugly looping of grep to tell it to stop evaluating a line after a pattern is matched? Using -m 1 will stop grep after the first match in the whole stream where I need it stop after each new line.
For the task you want to accomplish. It would be best in my opinion to use AWK. You can find an excellent tutorial for AWK at : http://www.grymoire.com/Unix/Awk.html. You basically need to change the input field separator for awk with
awk -f',' foo.awk bar.dat
As far as the problem with sorting is concerned follow this : http://www.linuxquestions.org/questions/linux-general-1/how-to-use-awk-to-sort-243177/
First off, I am not a Unix expert by any stretch, so please forgive a little naiveity in my question.
I have a requirement to list the unencrypted files in a given directory that potentially contains both encryped and unencrypted files.
I cannot reliably identify these files by file extension alone and was hoping someone in the SO community might be able to help me out.
I can run:
file * | egrep -w 'text|XML'
but that will only identify the files that are either text or XML. I could possibly use this if I can't do much better as currently the only other files in the directry are text or XML files but I really wanted to identify all unencrypted files whatever type they may be.
Is this possible in a single line command?
EDIT: the encrypted files are encrypted via openSSL
The command I use to unencrypt the files is:
openssl -d -aes128 -in <encrypted_filename> -out <unencrypted_filename>
Your problem is not a trivial one. The solaris file command uses "magic" - /etc/magic. This is a set of rules to attempt to attempt to determine what flavor a file is. It is not perfect.
If you read the /etc/magic file, note that the last column is verbiage that is in the output of the file command when it recognizes something, some structure in a file.
Basically the file command looks at the first few bytes of a file, just like the exec() family of system calls does. So, #/bin/sh in the very first line of a file, in the first characters of the line, identifies to exec() the "command interpreter" that exec() needs to invoke to "run" the file. file gets the same idea and says "command text" "awk text" etc.
Your issues are that you have to work out what types of files you are going to see as output from file. You need to spend time delving into the non-encrypted files to see what "answers" you can expect from file. Otherwise you can run file over the whole directory tree and sort out all of what you think are correct answers.
find /path/to/files -type f -exec file {} \; | nawk -F':' '!arr[$2]++' > outputfile
This gives you a list of distinct answers about what file thinks you have. Put the ones you like in a file, call it good.txt
find /path/to/files -type f -exec file {} \; > bigfile
nawk -F':' 'FILENAME=="good.txt" {arr$1]++}
FILENAME=="bigfile" {if($2 in arr) {print $1}} ' good.txt bigfile > nonencryptedfiles.txt
THIS IS NOT 100% guaranteed. file can be fooled.
The way to identify encrypted files is by the amount of randomness, or entropy, they contain. Files that are encrypted (or at least files that are encrypted well) should look random in the statistical sense. Files that contain unencrypted information—whether text, graphics, binary data, or machine code—are not statistically random.
A standard way to calculate randomness is with an autocorrelation function. You'd probably need to autocorrelate only the first few hundred bytes of each file, so the process can be fairly quick.
It's a hack, but you might be able to take advantage of one of the properties of compression algorithms: they work by removing randomness from data. Encrypted files cannot be compressed (or again, at least not much), so you might try compressing some portion of each file and comparing the compression ratios.
SO has several other questions about finding randomness or entropy, and many of them have good suggestions, like this one:
How can I determine the statistical randomness of a binary string?
Good luck!