I am using ExifTool to change the camera body serial number to be a unique serial number for each image in a group of images numbering several hundred. The camera body serial number is being used as a second place, in addition to where the serial number for the image is in IPTC, to put the serial number as it takes a little more effort to remove.
The serial number is in the format ###-###-####-#### where the last four digits is the number to increment. The first three groups of digits do not change for each batch I run. I only need to increment that last group of digits.
EXAMPLE
I if I have 100 images in my first batch, they would be numbered:
811-010-5469-0001, 811-010-5469-0002, 811-010-5469-0003 ... 811-010-5469-0100
I can successfully drag a group of images onto my ExifTool Shortcut that has the values
exiftool(-SerialNumber='001-001-0001-0001')
and it will change the Exif SerialNumber Tag on the images, but have not been successful in what to add to this to have it increment for each image.
I have tried variations on the below without success:
exiftool(-SerialNumber+=001-001-0001-0001)
exiftool(-SerialNumber+='001-001-0001-0001')
I realize most likely ExifTool is seeing these as numbers being subtracted in the first line and seeing the second line as a string. I have also tried:
exiftool(-SerialNumber+='1')
exiftool(-SerialNumber+=1)
just to see if I can even get it to increment with a basic, single digit number. This also has not worked.
Maybe this cannot be incremented this way and I need to use ExifTool from the command line. If so, I am learning the command line/powershell (Windows), but am still weak in this area and would appreciate some pointers to get started there if this is the route I need to take. I am not afraid to use the command line, just would need a bit more hand holding then normal for a starting point. I also am learning Linux and could do this project from there but again, not afraid to use it, just would need a bit more hand holding to get it done.
I do program in PHP, JavaScript and other languages so code is not foreign to me. Just experience in writing it for the command-line.
If further clarification is needed, please let me know in the comments.
Your help and guidance is appreciated!
You'll probably have to go to the command line rather than rely upon drag and drop as this command relies upon ExifTool's advance formatting.
Exiftool "-SerialNumber<001-001-0001-${filesequence;$_=sprintf('%04d', $_+1 )}" <FILE/DIR>
If you want to be more general purpose and to use the original serial number in the file, you could use
Exiftool "-SerialNumber<${SerialNumber}-${filesequence;$_=sprintf('%04d', $_+1 )}" <FILE/DIR>
This will just add the file count to the end of the current serial number in the image, though if you have images from multiple cameras in the same directory, that could get messy.
As for using the command line, you just need to rename to remove the commands in the parens and then either move it to someplace in the command line's path or use the full path to ExifTool.
As for clarification on your previous attempts, the += option is used with numbers and with lists. The SerialNumber tag is usually a string, though that could depend upon where it's being written to.
If I understand your question correctly, something like this should work:
1..100 | % {
$sn = '811-010-5469-{0:D4}' -f $_
# apply $sn
}
or like this (if you iterate over files):
$i = 1
Get-ChildItem 'C:\some\folder' -File | % {
$sn = '811-010-5469-{0:D4}' -f $i
# update EXIF data of current file with $sn
$i++
}
Related
I study genetic data from 288 fish samples (Fish_one, Fish_two ...)
I have four files per fish, each with a different suffix.
eg. for sample_name Fish_one:
file 1 = "Fish_one.1.fq.gz"
file 2 = "Fish_one.2.fq.gz"
file 3 = "Fish_one.rem.1.fq.gz"
file 4 = "Fish_one.rem.2.fq.gz"
I would like to apply the following concatenate instructions to all my samples, using maybe a text file containing a list of all the sample_name, that would be provided to a loop?
cp sample_name.1.fq.gz sample_name.fq.gz
cat sample_name.2.fq.gz >> sample_name.fq.gz
cat sample_name.rem.1.fq.gz >> sample_name.fq.gz
cat sample_name.rem.2.fq.gz >> sample_name.fq.gz
In the end, I would have only one file per sample, ideally in a different folder.
I would be very grateful to receive a bit of help on this one, even though I'm sure the answer is quite simple for a non-novice!
Many thanks,
NoƩ
I would like to apply the following concatenate instructions to all my
samples, using maybe a text file containing a list of all the
sample_name, that would be provided to a loop?
In the first place, the name of the cat command is mnemonic for "concatentate". It accepts multiple command-line arguments naming sources to concatenate together to the standard output, which is exactly what you want to do. It is poor form to use a cp and three cats where a single cat would do.
In the second place, although you certainly could use a file of name stems to drive the operation you describe, it's likely that you don't need to go to the trouble to create or maintain such a file. Globbing will probably do the job satisfactorily. As long as there aren't any name stems that need to be excluded, then, I'd probably go with something like this:
for f in *.rem.1.fq.gz; do
stem=${f%.rem.1.fq.gz}
cat "$stem".{1,2,rem.1,rem.2}.fq.gz > "${other_dir}/${stem}.fq.gz"
done
That recognizes the groups present in the current working directory by the members whose names end with .rem.1.fq.gz. It extracts the common name stem from that member's name, then concatenates the four members to the correspondingly-named output file in the directory identified by ${other_dir}. It relies on brace expansion to form the arguments to cat, so as to minimize code and (IMO) improve clarity.
I want to be able to separate data by weeks, and the week is stated in a specific field on every line and would like to know how to use grep, cut, or anything else that's relevant JUST on that field the week is specified in while still being able to save the rest of the data that's being given to me. I need to be able to pipe the information into it via | because that's how the rest of my program needs it to be.
as the output gets processed, it should look something like this
asset.14548.extension 0
asset.40795.extension 0
asset.98745.extension 1
I want to be able to sort those names by their week number while still being able to keep the asset name in my output because the number of times that asset shows up is counted up, but my problem is I can't make my program smart enough to take just the "1" from the week number but smart enough to ignore the "1" located in the asset name.
UPDATE
The closest answer I found was
grep "^.........................$week" ;
That's good, but it relies on every string being the same length. Is there a way I can have it start from the right instead of the left? Because if so then that'd answer my question.
^ tells grep to start checking from the left and . tells grep to ignore whatever's in that space
I found what I was looking for in some documentation. Anchor matches!
grep "$week$" file
would output this if $week was 0
asset.14548.extension 0
asset.40795.extension 0
I couldn't find my exact question or a closely similar question with a simple answer, so hopefully it helps the next person scratching their head on this.
I am working on this sketch on Processing which gets the videofeed from my webcam/smartphone and shows it when running. I want to import an .srt converted to txt subtitles file from a film to it. I can see that the text file has all these numbers that stand for the start and end subtitle frame before the actual text.
Here's an example:
{9232}{9331}Those swings are dangerous.|Stay off there. I haven't fixed them yet.
{9333}{9374}I think you're gonna live.
What I would like to do is to figure outa code that will
use the numbers and set them as start/end frames to run at the right time as in the film
display the subtitles
figure out how the '|'sign can be used as a symbol to trigger in the script the change of line.
I guess that might already be quite complicated but I just wanted to check whether someone has done anything similar in the past..
I guess what I want to do is save me from doing the whole
if ((current_frame > 9232) && ((current_frame < 9331)) {
text("Those swings are dangerous.", 200, 500/2);
text("Stay off there. I haven't fixed them yet..", 200, (500/2 + 35));
}
thing for each subtitle...
I am quite new to processing so not that familiar with many commands apart from 'for' and 'if', a newbie at importing .txt files and an ignoramus in working with arrays. But I really want to find a nice way in the last two bits..
Any help in any form will be greatly appreciated :)
Cheers,
George
For displaying the appropriate subtitle, you could do something like the following (explanation below, sorry in advance for the wall of text):
String[] subtitles = loadStrings("subtitles.txt");
int currentFrame = 0;
int subtitleIndex = -1;
int startFrame = -1, endFrame = -1;
int fontSize = 10; //change to suit your taste
String[] currentSubtitle;
...
//draw loop start:
//video drawing code goes here
if(currentFrame > endFrame){ //update which subtitle is now/next
subtitleIndex++;
startFrame = int(subtitles[subtitleIndex].split("\\}\\{")[0].substring(1));
endFrame = int(subtitles[subtitleIndex].split("\\}\\{")[1].split("\\}")[0]);
currentSubtitle = subtitles[subtitleIndex].split("\\}")[2].split("\\|");
}
if(currentFrame >= startFrame && currentFrame <= endFrame){
for(int i = 0; i < currentSubtitle.length; i++){
text(currentSubtitle[i], width/2, height - fontSize * (currentSubtitle.length - i));
}
}
currentFrame++;
//draw loop end
Probably that looks pretty intimidating to you, so here's some walk-through commentary. Your program will be a type of state machine. It will either be in the state of displaying a subtitle, or not. We'll keep this in mind later when we're designing the code. First, you need to declare and initialize your variables.
The first line uses the loadStrings() function, which reads through a text file and returns a String array where each element in the array is a line in the file. Of course, you'll need to change the filename to fit your file.
Your code uses a variable called current_frame, which is a very good idea, but I've renamed it to currentFrame to fit the java coding convention. We'll start at zero, and later on our code will increment it on every frame display. This variable will tell us where we are in the subtitle sequence and which message should be displayed (if any).
Because the information for what frame each subtitle starts and ends on is encoded in a string, it's a bit tricky to incorporate it into the code. For now, let's just create some variables that represent when the "current" subtitle-- the subtitle that we're either currently displaying or will be displaying next-- starts and ends. We'll also create an index to keep track of which element in the subtitles array is the "current" subtitle. These variables all start at -1, which may seem a bit odd. Whereas we initialized currentFrame to 0, these don't really have a real "initial" value, at least not for now. If we chose 0, then that's not really true, because the first subtitle may not (probably doesn't) begin and end at frame 0, and any other positive number doesn't make much sense. -1 is often used as a dummy index that will be replaced before the variable actually gets used, so we'll do that here, too.
Now for the final variable: currentSubtitle. The immediate thought would be to have this be a plain String, not a String array. However, because each subtitle may need to be split on the pipe (|) symbols, each subtitle may actually represent several lines of text, so we'll create an array just to be safe. It's possible that some subtitles may be a single-element array, but that's fine.
Now for the hard part!
Presumably, your code will have some sort of loop in it, where on each iteration the pertinent video frame is drawn to the screen and (if the conditions are met), the subtitle is drawn over top of it. I've left out the video portion, as that's not part of your question.
Before we do anything else, we need to remember that some of our variables don't have real values yet-- all those -1s from before need to be set to something. The basic logic of the drawing loop is 1) figure out if a subtitle needs to be drawn, and if so, draw it, and 2) figure out if the "current" subtitle needs to be moved to the next one in the array. Let's do #2 first, because on the first time through the loop, we don't know anything about it yet! The criterion (in general) for moving to the next subtitle is if we're past the end of the current one : currentFrame > endFrame. If that is true, then we need to shift all of our variables to the next subtitle. subtitleIndex is easy, we just add one and done. The rest are... not as easy. I know it looks disgusting, but I'll talk about that at the end so as to not break the flow. Skip ahead to the bottom if you just can't wait :)
After (if necessary) changing all of the variables so that they're relevant to the current subtitle, we need to do some actual displaying. The second if statement checks to see if we're "inside" the frame-boundaries of the current subtitle. Because the currentSubtitle variable can either refer to the subtitle that needs to be displayed RIGHT NOW, or merely just the next one in the sequence, we need to do some checking to determine which one it is for this frame. That's the second if statement-- if we're past the start and before the end, then we should be displaying the subtitle! Recall that our currentSubtitle variable is an array, so we can't just display it directly. We'll need to loop through it and display each element on a separate line. You mentioned the text() command, so I won't go too in depth here. The tricky bit is the y-coordinate of the text, since it's supposed to be on multiple lines. We want the first element to be above the second, which is above the third, etc. To do that, we'll have the y-coordinate depend on which element we're on, marked by i. We can scale the difference between lines by changing the value of fontSize; that'll just be up to your taste. Know that the number you set it to will be equal to the height of a line in pixels.
Now for the messy bit that I didn't want to explain above. This code depends on String's split() method, which is performed on the string that you want split and takes a string as a parameter that instructs it how to split the string-- a regex. To get the startFrame out of a subtitle line in the file, we need to split it along the curly braces, because those are the dividers between the numbers. First, we'll split the string everywhere that "}{" occurs-- right after the first number (and right before the second). Because split() returns an array, we can reference a single string from it using an index between square braces. We know that the first number will be in the first string return by splitting on "}{", so we'll use index 0. This will return (for example) "{1234", because split() removes the thing you're splitting on. Now we need to just take the substring that occurs after the first character, convert it to an int using int(), and we're done!
For the second number, we can take a similar approach. Let's split on "}{" again, only we'll take the second (index 1) element in the returned array this time. Now, we have something like "9331}Those swings are dang...", which we can split again on "}", choose the first string of that array, convert to int, and we're done! In both of these cases, we're using subtitles[subtitleIndex] as the original String, which represents the raw input of the file that we loaded using loadStrings() at the beginning. Note that during all this splitting, the original string in subtitles is never changed-- split(), substring(), etc only return new sequences and don't modify the string you called it on.
I'll leave it to you to figure out how the last line in that sequence works :)
Finally, you'll see that there are a bunch of backslashes cluttering up the split() calls. This is because split() takes in a regex, not a simple string. Regexs use a lot of special notation which I won't get into here, but if you just passed split() something like "}{", it would try to interpret it and it would not behave as expected. You need to escape the characters, telling split() that you don't want them to be interpreted as special and you just want the characters themselves. To do that, you use a backlash before any character that needs to be escaped. However, the backslash itself is yet another special character, so you need to escape it, too! This results in stuff like "\\{" -- the first backslash escapes the second one, which escapes whatever the third character is. Note that the | character also needs to be escaped.
Sorry for the wall of text! It's nice to see questions asked intelligently and politely, so I thought I'd give a good answer in return.
I created a GUI and used uiimport to import a dataset into matlab workspace, I would like to pass this imported data to another function in matlab...How do I pass this imported dataset into another function....I tried doing diz...but it couldnt pick diz....it doesnt pick the data on the matlab workspace....any ideas??
[file_input, pathname] = uigetfile( ...
{'*.txt', 'Text (*.txt)'; ...
'*.xls', 'Excel (*.xls)'; ...
'*.*', 'All Files (*.*)'}, ...
'Select files');
uiimport(file_input);
M = dlmread(file_input);
X = freed(M);
I think that you need to assign the result of this statement:
uiimport(file_input);
to a variable, like this
dataset = uiimport(file_input);
and then pass that to your next function:
M = dlmread(dataset);
This is a very basic feature of Matlab, which suggests to me that you would find it valuable to read some of the on-line help and some of the documentation for Matlab. When you've done that you'll probably find neater and quicker ways of doing this.
EDIT: Well, #Tim, if all else fails RTFM. So I did, and my previous answer is incorrect. What you need to pass to dlmread is the name of the file to read. So, you either use uiimport or dlmread to read the file, but not both. Which one you use depends on what you are trying to do and on the format of the input file. So, go RTFM and I'll do the same. If you are still having trouble, update your question and provide details of the contents of the file.
In your script you have three ways to read the file. Choose one on them depending on your file format. But first I would combine file name with the path:
file_input = fullfile(pathname,file_input);
I wouldn't use UIIMPORT in a script, since user can change way to read the data, and variable name depends on file name and user.
With DLMREAD you can only read numerical data from the file. You can also skip some number of rows or columns with
M = dlmread(file_input,'\t',1,1);
skipping the first row and one column on the left.
Or you can define a range in kind of Excel style. See the DLMREAD documentation for more details.
The filename you pass to DLMREAD must be a string. Don't pass a file handle or any data. You will get "Filename must be a string", if it's not a string. Easy.
FREAD reads data from a binary file. See the documentation if you really have to do it.
There are many other functions to read the data from file. If you still have problems, show us an example of your file format, so we can suggest the best way to read it.
I do have to deal with very large plain text files (over 10 gigabytes, yeah I know it depends what we should call large), with very long lines.
My most recent task involves some line editing based on data from another file.
The data file (which should be modified) contains 1500000 lines, each of them are e.g. 800 chars long. Each line is unique, and contains only one identity number, each identity number is unique)
The modifier file is e.g. 1800 lines long, contains an identity number, and an amount and a date which should be modified in the data file.
I just transformed (with Vim regex) the modifier file to sed, but it's very inefficient.
Let's say I have a line like this in the data file:
(some 500 character)id_number(some 300 character)
And I need to modify data in the 300 char part.
Based on the modifier file, I come up with sed lines like this:
/id_number/ s/^\(.\{650\}\).\{20\}/\1CHANGED_AMOUNT_AND_DATA/
So I have 1800 lines like this.
But I know, that even on a very fast server, if I do a
sed -i.bak -f modifier.sed data.file
It's very slow, because it has to read every pattern x every line.
Isn't there a better way?
Note: I'm not a programmer, had never learnt (in school) about algorithms.
I can use awk, sed, an outdated version of perl on the server.
My suggested approaches (in order of desirably) would be to process this data as:
A database (even a simple SQLite-based DB with an index will perform much better than sed/awk on a 10GB file)
A flat file containing fixed record lengths
A flat file containing variable record lengths
Using a database takes care of all those little details that slow down text-file processing (finding the record you care about, modifying the data, storing it back to the DB). Take a look for DBD::SQLite in the case of Perl.
If you want to stick with flat files, you'll want to maintain an index manually alongside the big file so you can more easily look up the record numbers you'll need to manipulate. Or, better yet, perhaps your ID numbers are your record numbers?
If you have variable record lengths, I'd suggest converting to fixed-record lengths (since it appears only your ID is variable length). If you can't do that, perhaps any existing data will not ever move around in the file? Then you can maintain that previously mentioned index and add new entries as necessary, with the difference is that instead of the index pointing to record number, you now point to the absolute position in the file.
I suggest you a programm written in Perl (as I am not a sed/awk guru and I don't what they are exactly capable of).
You "algorithm" is simple: you need to construct, first of all, an hashmap which could give you the new data string to apply for each ID. This is achieved reading the modifier file of course.
Once this hasmap in populated you may browse each line of your data file, read the ID in the middle of the line, and generate the new line as you've described above.
I am not a Perl guru too , but I think that the programm is quite simple. If you need help to write it, ask for it :-)
With perl you should use substr to get id_number, especially if id_number has constant width.
my $id_number=substr($str, 500, id_number_length);
After that if $id_number is in range, you should use substr to replace remaining text.
substr($str, -300,300, $new_text);
Perl's regular expressions are very fast, but not in this case.
My suggestion is, don't use database. Well written perl script will outperform database in order of magnitude in this sort of task. Trust me, I have many practical experience with it. You will not have imported data into database when perl will be finished.
When you write 1500000 lines with 800 chars it seems 1.2GB for me. If you will have very slow disk (30MB/s) you will read it in a 40 seconds. With better 50 -> 24s, 100 -> 12s and so. But perl hash lookup (like db join) speed on 2GHz CPU is above 5Mlookups/s. It means that your CPU bound work will be in seconds and you IO bound work will be in tens of seconds. If it is really 10GB numbers will change but proportion is same.
You have not specified if data modification changes size or not (if modification can be done in place) thus we will not assume it and will work as filter. You have not specified what format of your "modifier file" and what sort of modification. Assume that it is separated by tab something like:
<id><tab><position_after_id><tab><amount><tab><data>
We will read data from stdin and write to stdout and script can be something like this:
my $modifier_filename = 'modifier_file.txt';
open my $mf, '<', $modifier_filename or die "Can't open '$modifier_filename': $!";
my %modifications;
while (<$mf>) {
chomp;
my ($id, $position, $amount, $data) = split /\t/;
$modifications{$id} = [$position, $amount, $data];
}
close $mf;
# make matching regexp (use quotemeta to prevent regexp meaningful characters)
my $id_regexp = join '|', map quotemeta, keys %modifications;
$id_regexp = qr/($id_regexp)/; # compile regexp
while (<>) {
next unless m/$id_regexp/;
next unless $modifications{$1};
my ($position, $amount, $data) = #{$modifications{$1}};
substr $_, $+[1] + $position, $amount, $data;
}
continue { print }
On mine laptop it takes about half minute for 1.5 million rows, 1800 lookup ids, 1.2GB data. For 10GB it should not be over 5 minutes. Is it reasonable quick for you?
If you start think you are not IO bound (for example if use some NAS) but CPU bound you can sacrifice some readability and change to this:
my $mod;
while (<>) {
next unless m/$id_regexp/;
$mod = $modifications{$1};
next unless $mod;
substr $_, $+[1] + $mod->[0], $mod->[1], $mod->[2];
}
continue { print }
You should almost certainly use a database, as MikeyB suggested.
If you don't want to use a database for some reason, then if the list of modifications will fit in memory (as it currently will at 1800 lines), the most efficient method is a hashtable populated with the modifications as suggested by yves Baumes.
If you get to the point where even the list of modifications becomes huge, you need to sort both files by their IDs and then perform a list merge -- basically:
Compare the ID at the "top" of the input file with the ID at the "top" of the modifications file
Adjust the record accordingly if they match
Write it out
Discard the "top" line from whichever file had the (alphabetically or numerically) lowest ID and read another line from that file
Goto 1.
Behind the scenes, a database will almost certainly use a list merge if you perform this alteration using a single SQL UPDATE command.
Good deal on the sqlloader or datadump decision. That's the way to go.