How to add file in Solr? - spring-boot

I use Apache Solr so that I can work with files, I can add regular text fields via Spring, but I don’t know how to add TXT / pdf
#SolrDocument(solrCoreName = "accounting")
public class Accounting {
#Id
#Field
private String id;
#Field
private File txtFile;
#Field
private String docType;
#Field
private String docTitle;
public Accounting() {
}
public Accounting(String id, String docType, String docTitle) {
this.id = id;
this.docTitle = docTitle;
this.docType = docType;
}
here is the problem with the txtFile field
<field name="docTitle" type="strings"/>
<field name="docType" type="strings"/>
These fields that I manually added to schema.xml, I can not figure out how to add a field here that will be responsible for the file, for example, I will add here a txt file, how to do it? Thank you very much. And do I correctly declare the field private File txtFile; in the entity for the file?

Solr will not store the actual file anywhere. Depending on your config it can store the binary content though. Using the extract request handler Apache Solr which relies on Apache Tika to extract the content from the document.
You can try something like below code. The current code is not using anything from the springboot. Here the content is read from the pdf document and then the data is indexed into solr along with id and filename. I have used the tika apis to extract the content of the pdf.
public static void main(final String[] args) throws IOException, TikaException, SAXException {
String urlString = "http://localhost:8983/solr/TestCore1";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
File file = new File("C://Users//abhijitb//Desktop//TestDocument.pdf");
FileInputStream inputstream = new FileInputStream(file);
ParseContext pcontext = new ParseContext();
// parsing the document using PDF parser
PDFParser pdfparser = new PDFParser();
pdfparser.parse(inputstream, handler, metadata, pcontext);
// getting the content of the document
//System.out.println("Contents of the PDF :" + handler.toString());
try {
String fileName = file.getName();
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "123456");
document.addField("title", fileName);
document.addField("text", handler.toString());
solr.add(document);
solr.commit();
} catch (SolrServerException | IOException e) {
e.printStackTrace();
}
}
Once you index the data, it can be verified on the solr admin page by querying for it.
Please find the image for your reference.

Related

Spring boot ResourceUtils read file content is giving Path Traversal vulnerability

I am building a spring boot application where i need to read json files for my component tests. I have a utility method which takes the name of the file and reads the content using ResourceUtils. Here is the code:
public static String getContent(String path) throws IOException {
File file = ResourceUtils.getFile(MyTest.class.getResource(path));
String content = new String(Files.readAllBytes(file.toPath()));
return content;
}
The checkmarx is reporting the above code as "This may cause a Path
Traversal vulnerability."
How to fix this?
Thanks
See this example for path traversal vulnerability Path Traversal
To fix this change it something like
private static final String BASE_PATH ="/yourbasepath/somewherewherefileisstored";
public static String getContent(String path) throws IOException {
File file = new File(BASE_PATH, path);
if (file.getCanonicalPath().startsWith(BASE_PATH)){
String content = new String(Files.readAllBytes(file.toPath()));
return content;
}
else{
//throw some error
}
}

Gson: How do I deserialize an inner JSON object to a map if the property name is not fixed?

My client retrieves JSON content as below:
{
"table": "tablename",
"update": 1495104575669,
"rows": [
{"column5": 11, "column6": "yyy"},
{"column3": 22, "column4": "zzz"}
]
}
In rows array content, the key is not fixed. I want to retrieve the key and value and save into a Map using Gson 2.8.x.
How can I configure Gson to simply use to deserialize?
Here is my idea:
public class Dataset {
private String table;
private long update;
private List<Rows>> lists; <-- little confused here.
or private List<HashMap<String,Object> lists
Setter/Getter
}
public class Rows {
private HashMap<String, Object> map;
....
}
Dataset k = gson.fromJson(jsonStr, Dataset.class);
log.info(k.getRows().size()); <-- I got two null object
Thanks.
Gson does not support such a thing out of box. It would be nice, if you can make the property name fixed. If not, then you can have a few options that probably would help you.
Just rename the Dataset.lists field to Dataset.rows, if the property name is fixed, rows.
If the possible name set is known in advance, suggest Gson to pick alternative names using the #SerializedName.
If the possible name set is really unknown and may change in the future, you might want to try to make it fully dynamic using a custom TypeAdapter (streaming mode; requires less memory, but harder to use) or a custom JsonDeserializer (object mode; requires more memory to store intermediate tree views, but it's easy to use) registered with GsonBuilder.
For option #2, you can simply add the names of name alternatives:
#SerializedName(value = "lists", alternate = "rows")
final List<Map<String, Object>> lists;
For option #3, bind a downstream List<Map<String, Object>> type adapter trying to detect the name dynamically. Note that I omit the Rows class deserialization strategy for simplicity (and I believe you might want to remove the Rows class in favor of simple Map<String, Object> (another note: use Map, try not to specify collection implementations -- hash maps are unordered, but telling Gson you're going to deal with Map would let it to pick an ordered map like LinkedTreeMap (Gson internals) or LinkedHashMap that might be important for datasets)).
// Type tokens are immutable and can be declared constants
private static final TypeToken<String> stringTypeToken = new TypeToken<String>() {
};
private static final TypeToken<Long> longTypeToken = new TypeToken<Long>() {
};
private static final TypeToken<List<Map<String, Object>>> stringToObjectMapListTypeToken = new TypeToken<List<Map<String, Object>>>() {
};
private static final Gson gson = new GsonBuilder()
.registerTypeAdapterFactory(new TypeAdapterFactory() {
#Override
public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
if ( typeToken.getRawType() != Dataset.class ) {
return null;
}
// If the actual type token represents the Dataset class, then pick the bunch of downstream type adapters
final TypeAdapter<String> stringTypeAdapter = gson.getDelegateAdapter(this, stringTypeToken);
final TypeAdapter<Long> primitiveLongTypeAdapter = gson.getDelegateAdapter(this, longTypeToken);
final TypeAdapter<List<Map<String, Object>>> stringToObjectMapListTypeAdapter = stringToObjectMapListTypeToken);
// And compose the bunch into a single dataset type adapter
final TypeAdapter<Dataset> datasetTypeAdapter = new TypeAdapter<Dataset>() {
#Override
public void write(final JsonWriter out, final Dataset dataset) {
// Omitted for brevity
throw new UnsupportedOperationException();
}
#Override
public Dataset read(final JsonReader in)
throws IOException {
in.beginObject();
String table = null;
long update = 0;
List<Map<String, Object>> lists = null;
while ( in.hasNext() ) {
final String name = in.nextName();
switch ( name ) {
case "table":
table = stringTypeAdapter.read(in);
break;
case "update":
update = primitiveLongTypeAdapter.read(in);
break;
default:
lists = stringToObjectMapListTypeAdapter.read(in);
break;
}
}
in.endObject();
return new Dataset(table, update, lists);
}
}.nullSafe(); // Making the type adapter null-safe
#SuppressWarnings("unchecked")
final TypeAdapter<T> typeAdapter = (TypeAdapter<T>) datasetTypeAdapter;
return typeAdapter;
}
})
.create();
final Dataset dataset = gson.fromJson(jsonReader, Dataset.class);
System.out.println(dataset.lists);
The code above would print then:
[{column5=11.0, column6=yyy}, {column3=22.0, column4=zzz}]

How to externalize the queries to xml files using spring

I am using spring and their JDBC template to do read/write operations to the database. I am facing a problem in my reporting module that i have to frequently change the query sqls to cater to frequent changes.
Though using spring jdbc ORM, is there a way to externalize my query parameters such that i just change it in the XML & restart and there is no need to rebuild my source again for deployment. Any approach ORM (preferred) or simple Sql will do.
As of now i have to change the query again and again ,rebuild the source and deploy.
I am not sure if Spring provides some out of the box solutions to implement what you want. But here is one way to get it done, which i had implemented ones. So i will try to reduce some hardwork for you.
You might need to implement a utility to load from resources xml file. Something like this.
public final class LoadFromResourceFileUtils {
public static String loadQuery(final String libraryPath,
final String queryName) {
final InputStream is = StreamUtils
.streamFromClasspathResource(libraryPath);
if (is == null) {
throw new RuntimeException(String.format(
"The SQL Libary %s could not be found.", libraryPath));
}
final Document doc = XMLParseUtils.parse(is);
final Element qryElem = (Element) doc.selectSingleNode(String.format(
"SQLQueries/SQLQuery[#name='%s']", queryName));
final String ret = qryElem == null ? null : qryElem.getText();
return ret;
}
}
You would need to store your queries in an XML say queries.xml and keep it in your classpath, for e.g
<?xml version="1.0" encoding="UTF-8"?>
<SQLQueries>
<SQLQuery name="myQuery">
<![CDATA[
your query
]]>
</SQLQuery>
</SQLQueries>
And in your DAO you can do this to get the query
String query = LoadFromResourceFileUtils.loadQuery(
"queries.xml", "myQuery");
XMLParseUtils and StreamUtils for your reference
public final class XMLParseUtils {
public static Document parse(final InputStream inStream) {
Document ret = null;
try {
if (inStream == null) {
throw new RuntimeException(
"XML Input Stream for parsing is null");
}
final SAXReader saxReader = new SAXReader();
ret = saxReader.read(inStream);
} catch (final DocumentException exc) {
throw new RuntimeException("XML Parsing error", exc);
}
return ret;
}
}
public final class StreamUtils {
public static InputStream streamFromClasspathResource(
final String resourceClassPath) {
final Class<StreamUtils> clazz = StreamUtils.class;
final ClassLoader clLoader = clazz.getClassLoader();
final InputStream inStream = clLoader
.getResourceAsStream(resourceClassPath);
if (inStream == null) {
if(LOGGER.isDebugEnabled()){
LOGGER.debug(String.format("Resource %s NOT FOUND.",
resourceClassPath));
}
}
return inStream;
}
}

How to Convert gson to LinkedHashMap<String, List<String>>?

i'm new to gson and i wonder how convert json data to LinkedHashMap<String, List<String>>
my json data is show like below:
{ "data":
{
"data1": ["asdf", "qwer"],
"data2": ["xczv", "aweqrfds123", "sfdgq234"],
"data3": ["dsafasd", "xcvr123", "sdfa324123"]
}
}
field names of json data of data are dynamic, so i want to convert json data of data to LinkedHashMap<String, List<String>>
how can i do that ?
You can use TypeToken to convert it into expected type with Gson#fromJson(Reader,Type)
As per JSON string it is LinkedHashMap<String,LinkedHashMap<String,ArrayList<String>>>
Sample code:
BufferedReader reader = new BufferedReader(new FileReader(new File("json.txt")));
Type type = new TypeToken<LinkedHashMap<String,LinkedHashMap<String,ArrayList<String>>>>() {}.getType();
LinkedHashMap<String,LinkedHashMap<String,ArrayList<String>>> data = new Gson().fromJson(reader, type);
LinkedHashMap<String,ArrayList<String>> innerMap = data.get("data");
System.out.println(new GsonBuilder().setPrettyPrinting().create().toJson(innerMap));
This is not how it works in Gson world - you can't convert JSON to any Java class you want, unless you want to do all of that manually. The common approach works as described below:
Create a Java class, which matches your JSON format, e.g. you can use a Java class generator described here: http://jsongen.byingtondesign.com/
Use GsonBuilder to read your Json from a file and to import it to the generated class
I've used that approach and the Java file that has been generated (after I've fixed a minor syntax error in your initial JSON) looks like this:
package com.json;
import java.util.List;
public class Data{
private List data1;
private List data2;
private List data3;
public List getData1(){
return this.data1;
}
public void setData1(List data1){
this.data1 = data1;
}
public List getData2(){
return this.data2;
}
public void setData2(List data2){
this.data2 = data2;
}
public List getData3(){
return this.data3;
}
public void setData3(List data3){
this.data3 = data3;
}
}
To start working with the newly created class you can use the template below:
is = new InputStreamReader(new FileInputStream(new File('<path-to-json>')), "UTF-8")/;
Gson gson = new GsonBuilder().create();
Data d = gson.fromJson(is, Data.class);
// Start using your d instance here

Save an object with image ( save both object data and image too) inside mongoDB using Java

I want to know specifically about saving an object with an image inside it. What I want to do is saving an entire object with image inside it, Image must be saved. I tried this but it saves only File instance with file path. Its not saving the image. Any help would be appreciated. Thank you. Here is my code for saving an object but its saving a file instance instead of an image.
import java.io.File;
import org.springframework.data.mongodb.core.mapping.Document;
import com.discusit.model.Artwork;
#Document(collection="Artwork")
public class ArtworkImpl implements Artwork {
private String artworkName;
private String artworkVersion;
private String fileName;
private File file;
public ArtworkImpl() {
}
public ArtworkImpl(String name, String version, String fileName, File file) {
this.artworkName = name;
this.artworkVersion = version;
this.fileName = fileName;
this.file = file;
}
public String getArtworkName() {
return artworkName;
}
public void setArtworkName(String artworkName) {
this.artworkName = artworkName;
}
public String getArtworkVersion() {
return artworkVersion;
}
public void setArtworkVersion(String artworkVersion) {
this.artworkVersion = artworkVersion;
}
public String getFileName() {
return fileName;
}
public void setFileName(String fileName) {
this.fileName = fileName;
}
public File getFile() {
return file;
}
public void setFile(File file) {
this.file = file;
}
}
Here is my main method :-
NOTE : Main method works fine, but not saving image, instead saving file instance.
public class MainApplication {
public static void main(String[] args) {
ApplicationContext ctx =
new AnnotationConfigApplicationContext(SpringMongoConfig.class);
GridFsOperations gridOperations =
(GridFsOperations) ctx.getBean("gridFsTemplate");
DBObject metaData = new BasicDBObject();
metaData.put("extra1", "anything 1");
metaData.put("extra2", "anything 2");
InputStream inputStream = null;
try {
inputStream = new FileInputStream("/home/discusit/Downloads/birds.jpg");
gridOperations.store(inputStream, "birds.jpg", "image/jpg", metaData);
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println("Done");
}
}
I want to save object with image.
UPDATE : I did this but by converting an image to byte array and fetching byte array and converting back to Image, just want to know is there any other way by which I can save an image directly in mongoDB but without converting it to byte array ????
You need to clarify the following about mongoDB:
1. MongoDB is a document oriented database, in which the documents are stored in a format called BSON and limited to a maximun of 16MB. "Think of BSON as a binary representation of JSON (JavaScript Object Notation) documents"[1].
The BSON format support a BinData type, in which you can store the binary representation of a file as long as the 16MB limit will not be exceded.
2. MongoDB provides a way to store files GridFS "GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB"[2].
GridFS divide the files in chunks of 256K and use two collections to store the files, one for metadata and one for the file chunks, this collections are called fs.files and fs.chunks respectively.
A file stored within these collections, looks like this:
>db.fs.files.find()
{
"_id": ObjectId("51a0541d03643c8cf4122760"),
"chunkSize": NumberLong("262144"),
"length": NumberLong("3145782"),
"md5": "c5dda7f15156783c53ffa42b422235b2",
"filename": "test.png",
"contentType": "image/bmp",
"uploadDate": ISODate("2013-05-25T06:03:09.611Z"),
"aliases": null,
"metadata": {
"extra1": "anything 1",
"extra2": "anything 2"
}
}
>db.fs.chunks.find()
{
"_id": ObjectId("51a0541e03643c8cf412276c"),
"files_id": ObjectId("51a0541d03643c8cf4122760"),
"n": 11,
"data": BinData(0, "BINARY_DATA_WILL_BE_STORED_HERE")
}
.
.
Note how the ObjectId in the files collections match the files_id in the chunks collections.
After this clarification the short answer to your question is:
Yes, you can store the files directly in mongoDB using GridFS.
In the following link you can find a GridFS working example using Spring Data:
http://www.mkyong.com/mongodb/spring-data-mongodb-save-binary-file-gridfs-example/
[1] http://docs.mongodb.org/manual/reference/glossary/#term-bson
[2] http://docs.mongodb.org/manual/core/gridfs/

Resources