How to read the details of particular file using c# - text-files

How to read the details of particular sections using c#...
i am new to c# and i have to read the details from a text file based on the sections marked between square brackets "[]".. the file looks like
[Header]
This is the header info for the file
[Body]
This is the body information for the provided file
and it contains many information for the file
[Summary]
Summary for the file.
i need to read each of these sections details (eg. [Header], [Body])..
any help in this direction is highly appreciated...

Assuming the text between those headers does not contain brackets you can do it this way:
Dictionary<String,String> content = new Dictionary<String,String>();
String text = #"[Header]
This is the header info for the file
[Body]This is the body information for the provided file and it contains many information for the file
[Summary]Summary for the file.";
foreach (String section in text.Split("[".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
{
String[] sectionParts = section.Split(']');
content.Add(sectionParts[0], sectionParts[1]);
}
The Dictionary will contain the content of your File as header-text-pairs

Related

citation style language - extend with an additional field

I produce a bibliography with pandoc from a bibtex file. In my bibtex entries I have the location of the pdf (not an url, just a file reference in a field file). I would like to include this reference in the bibliography, but do not see how to extend the chicago-author-date.csl - I am completely new to CSL...
I assume I have to add something like
<text macro="file" prefix=". "/>
in the layout section. But how to define the macro? How is the connection between the bibtex field and the CSL achieved?
Is there somewhere a "how to" page?
Thank you for help!
An example bibtex entry is:
author = {Frank, Andrew U.},
title = {Geo-Ontologies Are Scale Dependent (abstract only)},
booktitle = {European Geosciences Union, General Assembly 2009, Session Knowledge and Ontologies},
year = {2009},
editor = {Pulkkinen, Tuija},
url = {http://publik.tuwien.ac.at/files/PubDat-175453.pdf},
file = {docs/docs4/4698_GeoOntologies_abstarct_EUG_09.pdf},
keywords = {Onto},
owner = {frank},
timestamp = {2018.11.29},
}
the file entry should be inserted in the output as a relative web reference (clickable) - in addition to the usual output from the chicago-author-data style.
I add a list of nocite to the markdown text (read in from file) and process it (in Haskell) with the API
res <- processCites' markdownText
It works ok, I miss only the file value.

Google Doc Api : download inline object

I'm creating a Google Doc to HTML converter, I want to use the Doc Api and not export it as HTML using the Drive Api :
$service = new Google_Service_Docs($client);
$request = $service->documents->get($docId);
$elements = $request->getBody()->getContent();
$elements is an array of Google_Service_Docs_StructuralElement
Looping through paragraph > elements, if there is an inline object, the inlineObjectElement property is set with a Google_Service_Docs_InlineObjectElement
Question is : how to get the content of an Google_Service_Docs_InlineObjectElement to save it as a file ?
All we have in this object is an inlineObjectId...
I was able to find a solution for this on this blog post.
Basically, all inline elements are located at:
$inlineObjects = $request->getInlineObjects();
Btw. I recommend renaming "$request" to "$document"
Now, with the inlineObjectId you can get the particular object you want - and there, you get a contentUri which contains the binary content.
Here, a screenshot of $inlineObjects contents, which is an assoc array. The key is the inlineObjectId:

Parsing several csv files using Spring Batch

I need to parse several csv files from a given folder. As each csv has different columns, there are separate tables in DB for each csv. I need to know
Does spring batch provide any mechanism which scans through the given folder and then I can pass those files one by one to the reader.
As I am trying to make the reader/writer generic, is it possible to just get the column header for each csv, based upon that I am trying to build tokenizer and also the insert query.
Code sample
public ItemReader<Gdp> reader1() {
FlatFileItemReader<Gdp> reader1 = new FlatFileItemReader<Gdp>();
reader1.setResource(new ClassPathResource("datagdp.csv"));
reader1.setLinesToSkip(1);
reader1.setLineMapper(new DefaultLineMapper<Gdp>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {
{
setNames(new String[] { "region", "gdpExpend", "value" });
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Gdp>() {
{
setTargetType(Gdp.class);
}
});
}
});
return reader1;
}
Use a MultiResourceItemReader to scan all files.
I think you need a sort of classified ItemReader as MultiResourceItemReader.delegate but SB doesn't offer that so you have to write your own.
For ItemProcessor and ItemWriter SB offers a classifier-aware implementation (ClassifierCompositeItemProcessor and ClassifierCompositeItemWriter).
Obviously more different input file you have more XML config must be write,but it should be straightforward to do.
I suppose you are expecting this kind of implementation.
During the Partition Step Builder, read all the files names, file header, insert query for the writer and save the same in the Execution Context.
In the slave step, for every reader and writer, pass on the Execution context, get the file to read, file header to the tokenizer, insert query that needs to be inserted for that writer.
This resolves your question.
Answers for your questions:
I don't know about a specific mechanism on spring batch to scan files.
You can use opencsv as generic CSV reader, there are a lot of mechanisms reading files.
About OpenCSV:
If you are using maven project, try to import this dependency:
<dependency>
<groupId>net.sf.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>2.0</version>
</dependency>
You can read your files making an object for specific formats, or generic headers like this below:
private static List<DadosPeople> extrairDadosPeople() throws IOException {
CSVReader readerPeople = new CSVReader(new FileReader(people));
List<PeopleData> listPeople = new ArrayList<PeopleData>();
String[] nextLine;
while ((nextLine = readerPeople.readNext()) != null) {
PeopleData people = new PeopleData();
people.setIncludeData(nextLine[0]);
people.setPartnerCode(Long.valueOf(nextLine[1]));
listPeople.add(people);
}
readerPeople.close();
return listPeople;
}
There are a lot of other ways to read CSV files using opencsv:
If you want to use an Iterator style pattern, you might do something like this:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
Or, if you might just want to slurp the whole lot into a List, just call readAll()...
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
List myEntries = reader.readAll();
which will give you a List of String[] that you can iterate over. If all else fails, check out the Javadocs here.
If you want to customize quote characters and separators, you'll find constructors that cater for supplying your own separator and quote characters. Say you're using a tab for your separator, you can do something like this:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t');
And if you single quoted your escaped characters rather than double quote them, you can use the three arg constructor:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'');
You may also skip the first few lines of the file if you know that the content doesn't start till later in the file. So, for example, you can skip the first two lines by doing:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'', 2);
Can I write csv files with opencsv?
Yes. There is a CSVWriter in the same package that follows the same semantics as the CSVReader. For example, to write a tab separated file:
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), '\t');
// feed in your array (or convert your data to an array)
String[] entries = "first#second#third".split("#");
writer.writeNext(entries);
writer.close();
If you'd prefer to use your own quote characters, you may use the three arg version of the constructor, which takes a quote character (or feel free to pass in CSVWriter.NO_QUOTE_CHARACTER).
You can also customise the line terminators used in the generated file (which is handy when you're exporting from your Linux web application to Windows clients). There is a constructor argument for this purpose.
Can I dump out SQL tables to CSV?
Yes you can. There is a feature on CSVWriter so you can pass writeAll() a ResultSet.
java.sql.ResultSet myResultSet = ....
writer.writeAll(myResultSet, includeHeaders);
Is there a way to bind my CSV file to a list of Javabeans?
Yes there is. There is a set of classes to allow you to bind a CSV file to a list of JavaBeans based on column name, column position, or a custom mapping strategy. You can find the new classes in the com.opencsv.bean package. Here's how you can map to a java bean based on the field positions in your CSV file:
ColumnPositionMappingStrategy strat = new ColumnPositionMappingStrategy();
strat.setType(YourOrderBean.class);
String[] columns = new String[] {"name", "orderNumber", "id"}; // the fields to bind do in your JavaBean
strat.setColumnMapping(columns);
CsvToBean csv = new CsvToBean();
List list = csv.parse(strat, yourReader);

Read image IPTC data

I'm having some trouble with reading out the IPTC data of some images, the reason why I want to do this, is because my client has all the keywords already in the IPTC data and doesn't want to re-enter them on the site.
So I created this simple script to read them out:
$size = getimagesize($image, $info);
if(isset($info['APP13'])) {
$iptc = iptcparse($info['APP13']);
print '<pre>';
var_dump($iptc['2#025']);
print '</pre>';
}
This works perfectly in most cases, but it's having trouble with some images.
Notice: Undefined index: 2#025
While I can clearly see the keywords in photoshop.
Are there any decent small libraries that could read the keywords in every image? Or am I doing something wrong here?
I've seen a lot of weird IPTC problems. Could be that you have 2 APP13 segments. I noticed that, for some reasons, some JPEGs have multiple IPTC blocks. It's possibly the problem with using several photo-editing programs or some manual file manipulation.
Could be that PHP is trying to read the empty APP13 or even embedded "thumbnail metadata".
Could be also problem with segments lenght - APP13 or 8BIM have lenght marker bytes that might have wrong values.
Try HEX editor and check the file "manually".
I have found that IPTC is almost always embedded as xml using the XMP format, and is often not in the APP13 slot. You can sometimes get the IPTC info by using iptcparse($info['APP1']), but the most reliable way to get it without a third party library is to simply search through the image file from the relevant xml string (I got this from another answer, but I haven't been able to find it, otherwise I would link!):
The xml for the keywords always has the form "<dc:subject>...<rdf:Seq><rdf:li>Keyword 1</rdf:li><rdf:li>Keyword 2</rdf:li>...<rdf:li>Keyword N</rdf:li></rdf:Seq>...</dc:subject>"
So you can just get the file as a string using file_get_contents(get_attached_file($attachment_id)), use strpos() to find each opening (<rdf:li>) and closing (</rdf:li>) XML tag, and grab the keyword between them using substr().
The following snippet works for all jpegs I have tested it on. It will fill the array $keys with IPTC tags taken from an image on wordpress with id $attachment_id:
$content = file_get_contents(get_attached_file($attachment_id));
// Look for xmp data: xml tag "dc:subject" is where keywords are stored
$xmp_data_start = strpos($content, '<dc:subject>') + 12;
// Only proceed if able to find dc:subject tag
if ($xmp_data_start != FALSE) {
$xmp_data_end = strpos($content, '</dc:subject>');
$xmp_data_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_data_length);
// Look for tag "rdf:Seq" where individual keywords are listed
$key_data_start = strpos($xmp_data, '<rdf:Seq>') + 9;
// Only proceed if able to find rdf:Seq tag
if ($key_data_start != FALSE) {
$key_data_end = strpos($xmp_data, '</rdf:Seq>');
$key_data_length = $key_data_end - $key_data_start;
$key_data = substr($xmp_data, $key_data_start, $key_data_length);
// $ctr will track position of each <rdf:li> tag, starting with first
$ctr = strpos($key_data, '<rdf:li>');
// Initialize empty array to store keywords
$keys = Array();
// While loop stores each keyword and searches for next xml keyword tag
while($ctr != FALSE && $ctr < $key_data_length) {
// Skip past the tag to get the keyword itself
$key_begin = $ctr + 8;
// Keyword ends where closing tag begins
$key_end = strpos($key_data, '</rdf:li>', $key_begin);
// Make sure keyword has a closing tag
if ($key_end == FALSE) break;
// Make sure keyword is not too long (not sure what WP can handle)
$key_length = $key_end - $key_begin;
$key_length = (100 < $key_length ? 100 : $key_length);
// Add keyword to keyword array
array_push($keys, substr($key_data, $key_begin, $key_length));
// Find next keyword open tag
$ctr = strpos($key_data, '<rdf:li>', $key_end);
}
}
}
I have this implemented in a plugin to put IPTC keywords into WP's "Description" field, which you can find here.
ExifTool is very robust if you can shell out to that (from PHP it looks like?)

Iterate over Umbraco getAllTagsInGroup result

I'm trying to get a list of tags from a particular tag group in Umbraco (v4.0.2.1) using the following code:
var tags = umbraco.editorControls.tags.library.getAllTagsInGroup("document downloads");
What I want to do is just output a list of those tags. However, if I output the variable 'tags' it just outputs a list of all tags in a string. I want to split each tag onto a new line.
When I check the datatype of the 'tags' variable:
string tagType = tags.GetType().ToString();
...it outputs MS.Internal.Xml.XPath.XPathSelectionIterator.
So question is, how do I get the individual tags out of the 'tags' variable? How do I work with a variable of this data type? I can find examples of how to do it by loading an actual XML file, but I don't have an actual XML file - just the 'tags' variable to work with.
Thanks very much for any help!
EDIT1: I guess what I'm asking is, how do I loop through the nodes returned by an XPathSelectionIterator data type?
EDIT2: I've found this code, which almost does what I need:
XPathDocument document = new XPathDocument("file.xml");
XPathNavigator navigator = document.CreateNavigator();
XPathNodeIterator nodes = navigator.Select("/tags/tag");
nodes.MoveNext();
XPathNavigator nodesNavigator = nodes.Current;
XPathNodeIterator nodesText = nodesNavigator.SelectDescendants(XPathNodeType.Text, false);
while (nodesText.MoveNext())
debugString += nodesText.Current.Value.ToString();
...but it expects the URL of an actual XML file to load into the first line. My XML file is essentially the 'tags' variable, not an actual XML file. So when I replace:
XPathDocument document = new XPathDocument("file.xml");
...with:
XPathDocument document = new XPathDocument(tags);
...it just errors.
Since it is an Iterator, I would suggest you iterate it. ;-)
var tags = umbraco.editorControls.tags.library.getAllTagsInGroup("document downloads");
foreach (XPathNavigator tag in tags) {
// handle current tag
}
I think this does the trick a little better.
The problem is that getAllTagsInGroup returns the container for all tags, you need to get its children.
foreach( var tag in umbraco.editorControls.tags.library.getAllTagsInGroup("category").Current.Select("/tags/tag") )
{
/// Your Code
}

Resources