MSBUILD: ReadLinesFromFile doesn't read duplicated lines - visual-studio-2010

I'm using MSBuild to read all SQL post-deployment files on which my database project is dependent and I write this data to one main script which is loaded.
I get all needed files:
<ReadLinesFromFile File="$(ScriptsList)" >
<Output TaskParameter="Lines" ItemName="IncludedFiles"/>
</ReadLinesFromFile>
And then I batch them (reading all files, line by line, into ListedData)
<ReadLinesFromFile File='$(ScriptDirectory)$([System.String]::Copy("%(IncludedFiles.Identity))' Condition="$([System.String]::Copy('%(IncludedFiles.Identity)').Substring(0,2))==':r'">
<Output TaskParameter="Lines" ItemName="ListedData"/>
</ReadLinesFromFile>
All files are found without problem and then I write it to output.sql.
But the file is missing several lines, which makes output.sql impossible to parse by sqlcmd.
SOURCE:
INSERT INTO [Characteristics] (
[CharacteristicID],
[CharName],
[RuleName],
[ActionRuleName],
[CriteriaSetID],
[ActionCriteriaSetID],
[ListCodeID],
[LocalID],
[BomCategory]
)
SELECT ...something,something... from Characteristics
INSERT INTO [CharacteristicDomain] (
[RuleSet],
[CharName],
[CharSlot],
[Description],
[Seq],
[ValueInteger],
[ValueFloat],
[ValueDate],
[ValueString]
)
SELECT ...something,something... from CharacteristicsDomain
As you see, there will be several lines with a single ')' bracket sign and the task reads only the first line, and then ignores all the duplicates (because it's an item group, not a list). So in effect i get a file looking like this:
OUTPUT:
INSERT INTO [Characteristics] (
[CharacteristicID],
[CharName],
[RuleName],
[ActionRuleName],
[CriteriaSetID],
[ActionCriteriaSetID],
[ListCodeID],
[LocalID],
[BomCategory]
)
SELECT ...something,something... from Characteristics
INSERT INTO [CharacteristicDomain] (
[RuleSet],
[CharName],
[CharSlot],
[Description],
[Seq],
[ValueInteger],
[ValueFloat],
[ValueDate],
[ValueString]
SELECT ...something,something... from CharacteristicsDomain
Does someone know a way to read lines from files using MSBuild, but not losing duplicate lines?
I thought maybe there some way to use Exec task? I surely can't write own tasks, and I'm also not allowed to modify sql files (I can't rely on users, that they will format the files the way i need it). I need to read files with MSBuild, because I modify some of them before I push them to sqlcmd.

How are you writing to output.sql? If you are batching on %(ListedData.Identity), then that will give you only unique lines. Use it as #(ListedData) and it should be fine.

Your second ReadLinesFromFile, the one that creates #(ListedData) is at fault. It is using task batching with %(IncludedFiles.Identity), so both lines with the ")" will be placed into a single batch.

Related

How to loop over multiple folders to concatenate FastQ files?

I have received multiple fastq.gz files from Illumina Sequencing for 100 samples. But all the fastq.gz files for the respective samples are in separate folders according to the sample ID. Moreover, I have multiple (8-16) R1.fastq.gz and R2.fastq.gz files for one sample. So, I used the following code for concatenating all the R1.fastq.gz and R2.fastq.gz into a single R1.fastq.gz and R2.fastq.gz.
cat V350043117_L04_some_digits-525_1.fq.gz V350043117_L04_some_digits-525_1.fq.gz V350043117_L04_some_digits-525_1.fq.gz > sample_R1.fq.gz
So in the sequencing file, the structure is like the above in the code. For each sample, the string with V has different number then L with different number and then another string of digits before the _1 and _2. For each sample, the numbers keep changing.
My questing is, how can I create a loop that will go over all the folders at once taking the different file numbering of sequence files into consideration for concatenating the multiple fq.gz files and combine them into a single R1 and R2 file?
Surely, I cannot just concatenate one by one by going into each sample folder.
Please give some helpful tips. Thank you.
The folder structure is the following:
/data/Sample_1/....._525_1_fq.gz /....._525_2_fq.gz /....._526_1_fq.gz /....._526_2_fq.gz
/data/Sample_2/....._580_1_fq.gz /....._580_2_fq.gz /....._589_1_fq.gz /....._589_2_fq.gz
/data/Sample_3/....._690_1_fq.gz /....._690_2_fq.gz /....._645_1_fq.gz /....._645_2_fq.gz
Below I have attached a screenshot of the folder structure.
Folder structure
Based on the provided file structure, would you please try:
#!/bin/bash
for d in Raw2/C*/; do
(
cd "$d"
id=${d%/}; id=${id##*/} # extract ID from the directory name
cat V*_1.fq.gz > "${id}_R1.fq.gz"
cat V*_2.fq.gz > "${id}_R2.fq.gz"
)
done
The syntax for d in Raw2/C*/ loops over the subdirectories starting with C.
The parentheses make the inner commands executed in a subshell so we don't have to care about returning from cd "$d" (at the expense of small extra execution time).
The variable id is assigned to the ID extracted from the directory name.
cat V*_1.fq.gz, for example, will be expanded as V350028825_L04_581_1.fq.gz V350028825_L04_582_1.fq.gz V350028825_L04_583_1.fq.gz ... according to the files in the directory and are concatenated into ${id}_R1.fastq.gz. Same for ${id}_R2.fastq.gz.

informatica Post command task

I am working with multiple source files with single source instance. I created three flat files and one destination table to experiment multiple sources. I am using ‘File list’ concept, for that I created a text file which contains all the flat file names.
Example:
Filename : File_list.txt
File content : Price1.txt
Price2.txt
Price3.txt
In the above example Price1.txt, Price2.txt and Price3.txt are flat file names. I specified File_list.txt as a source file while running the Workflow in Informatica. So it will iterate through all the flat files in the specified file (File_list.txt) and insert all the values to destination table.
Now what I want to do is once data is inserted to the destination, I need to delete that source file in that directory location.
How to achieve this?.
You'll need to write a custom script that will use the File_list.txt as input and perform the delete operations. You can then call it using Post-Session Success Command session component, or as a separate Command Task in the workflow linked using a $YourSessionName.Status = SUCCEEDED condition.

code reuse for file existence test in a Visual Studio vcxproj file

I have a vcxproj file that contains explicit Windows shell commands in the NMakeBuildCommandLine section:
<NMakeBuildCommandLine Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
move file1 file2
</NMakeBuildCommandLine>
I'm using MSBuild to execute the vcxproj, either directly or via a sln file. The problem is that when file1 does not exist, the output is very unhelpful and doesn't even list the file's name:
The system cannot find the file specified.
My naive solution is to replace move file1 file2 with:
if exist file1 (move file1 file2) else (echo file1 does not exist && exit 1)
(Note that I need to write && instead of &&.)
This works, but it's error-prone because I need to type file1 three times per line and make sure they all match. file1 is only one of many files that need to be moved. Furthermore, the series of commands is virtually identical across the various build configurations.
How can I avoid repeating myself both within a command line and across build configurations? I thought that the UserMacros property group might help me, but I couldn't figure out how to write such a macro via the IDE. (Right-clicking on the project in Visual Studio doesn't show a field for entering user macros.) Nor could I find any discussion of the syntax of this section on the Internet, so I didn't know how to write macros with a text editor (which I would actually prefer).
There may be yet a better way within a vcxproj file to identify files that need to exist, so I'm open to any alternatives.
MsBuild has 'variables' like most other programming languages called properties. You declare one in a PropertyGroup element in the project file xml and then reuse it using the $(PropertyName) syntax. Example for your case:
<PropertyGroup>
<Src>/path/to/my/src</Src>
<Dst>/path/to/my/dst</Dst>
</PropertyGroup>
<NMakeBuildCommandLine>
if exist $(Src) (move $(Src) $(Dst)) else (echo $(Src) does not exist && exit 1)
</NMakeBuildCommandLine>
If you want to use the IDE, which might get tedious if you have lots of values, you can indeed use so-called UserMacros but you have to declare those in a proprty sheet. Go to View->Property Manager, right-click on your project and select 'Add new Property Sheet'. Doubleclick on it, go to 'User Macros' and add key/value pairs there. If you save everything and look in the generated files you'll see the vcxproj now Imports the propertysheet, and the propertysheet itself has a PropertyGroup just like shown above - but editable thgough the IDE.
As an alternative which might be better (less duplication, easier to automate) in the long run you can use MsBuild code for checking file existence and moving files which has the benefit you only have to write the move command once as you can have MsBuild loop over items. Those are declared in an ItemGroup. Explaining everything here is a bit out of scope but an example should make things clear:
<Target Name="BatchMove">
<ItemGroup>
<SrcFiles Include="file1">
<Dest>file2</Dest>
</SrcFiles>
<SrcFiles Include="file3">
<Dest>file4</Dest>
</SrcFiles>
</ItemGroup>
<Warning Text="Source file %(SrcFiles.Identity) does not exist" Condition="!Exists(%(SrcFiles.Identity))" />
<Move SourceFiles="%(SrcFiles.Identity)" DestinationFiles="%(SrcFiles.Dest)" Condition="Exists(%(SrcFiles.Identity))" />
</Target>
This declares 2 source files file1/file3 and their respective destination files file2/file4. If the source does not exists (using standard MsBuild Exists check) a message is shown, else it is moved to the destination. Those % characters will make the line they occur in loop over each element of the SrcFiles collection. To add more files, just add to the ItemGroup. Last step is to get this target invoked from the nmake command line which is done simply by calling msbuild on the file itself and telling it to run the target:
<NMakeBuildCommandLine>
msbuild $(MSBuildThisFile) /t:BatchMove
</NMakeBuildCommandLine>

wanted to use results of find command in custom script that i am building

I want to validate my XML's for well-formed ness, but some of my files are not having a single root (which is fine as per business req eg. <ri>...</ri><ri>..</ri> is valid xml in my context) , xmlwf can do this, but it flags out a file if it's not having single root, So wanted to build a custom script which internally uses xmlwf, my custom script should do below,
iterate through list of files passed as input (eg. sample.xml or s*.xml or *.xml)
for each file prepare a temporary file as <A>+contents of file+</A>
and call xmlwf on that temp file,
Can some one help on this?
You could add text to the beginning and end of the file using cat and bash, so that your file has a root added to it for validation purposes.
cat <(echo '<root>') sample.xml <(echo '</root>') | xmlwf
This way you don't need to write temporary files out.

Generating TDB Dataset from archive containing N-TRIPLES files

Apologies, in advance, for a possible duplicate.
I have an archive containing 117,426 files (each in the N-TRIPLES format) that I wish to load into the default graph of a TDB dataset. Due to the large number of files, I need to be able to perform this import without manually selecting individual files for upload.
I am in Bash, with Jena and Fuseki distributions at my disposal.
If possible, I want to avoid the worst-case scenario of just writing a java application to do this. If I have to write a java application for this, what hooks exist in RIOT/TDB to perform programmatic bulk-loading?
As a genenral comment, one way is to concatenate the N-Triples files to generate one single file.
You can load many files at once with either tdbloader or tdbloader2.
tdbloader --loc DB ... your files ...
The 117,426 may strain you OS for a single command line invocation. You can pipe the files into tdbloader (it's just like concatenating the files first)
... | tdbloader --loc DB -- -
where ... is some way to get bash to cat the files (possible from a subshell).
e.g. (you'll need to adjust to file all 117,426 files):
( for x in data*.nt
do
cat $x
done
) | tdbloader --loc DB -- -

Resources