here I streaming the data from streaming directory and the write it to a output location. I am also trying to implement the process of moving hdfs files from a input folder to the streaming directory. This move happens one time before the streaming context starts. But I want this move to get executed every time for each Batch of Dstream. is that even possible?
val streamed_rdd = ssc.fileStream[LongWritable, Text, TextInputFormat](streaming_directory, (t:Path)=> true , true).map { case (x, y) => (y.toString) }
streamed_rdd.foreachRDD( rdd => {
rdd.map(x =>x.split("\t")).map(x => x(3)).foreachPartition { partitionOfRecords =>
val connection: Connection = connectionFactory.createConnection()
connection.setClientID("Email_send_module_client_id")
println("connection started with active mq")
val session: Session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
println("created session")
val dest = session.createQueue("dwEmailsQueue2")
println("destination queue name = dwEmailsQueue2")
val prod_queue = session.createProducer(dest)
connection.start()
partitionOfRecords.foreach { record =>
val rec_to_send: TextMessage = session.createTextMessage(record)
println("started creating a text message")
prod_queue.send(rec_to_send)
println("sent the record")
}
connection.close()
}
}
)
**val LIST = scala.collection.mutable.MutableList[String]()
val files_to_move = scala.collection.mutable.MutableList[String]()
val cmd = "hdfs dfs -ls -d "+load_directory+"/*"
println(cmd)
val system_time = System.currentTimeMillis
println(system_time)
val output = cmd.!!
output.split("\n").foreach(x => x.split(" ").foreach(x => if (x.startsWith("/user/hdpprod/")) LIST += x))
LIST.foreach(x => if (x.toString.split("/").last.split("_").last.toLong < system_time) files_to_move += x)
println("files to move" +files_to_move)
var mv_cmd :String = "hdfs dfs -mv "
for (file <- files_to_move){
mv_cmd += file+" "
}
mv_cmd += streaming_directory
println(mv_cmd)
val mv_output = mv_cmd.!!
println("moved the data to the folder")**
if (streamed_rdd.count().toString == "0") {
println("no data in the streamed list")
} else {
println("saving the Dstream at "+System.currentTimeMillis())
streamed_rdd.transform(rdd => {rdd.map(x => (check_time_to_send+"\t"+check_time_to_send_utc+"\t"+x))}).saveAsTextFiles("/user/hdpprod/temp/spark_streaming_output_sent/sent")
}
ssc.start()
ssc.awaitTermination()
}
}
I tried doing same stuff in java implementation as below. you can call this method from foreachPartion on rdd
public static void moveFiles(final String moveFilePath,
final JavaRDD rdd) {
for (final Partition partition : rdd.partitions()) {
final UnionPartition unionPartition = (UnionPartition) partition;
final NewHadoopPartition newHadoopPartition = (NewHadoopPartition)
unionPartition.parentPartition();
final String fPath = newHadoopPartition.serializableHadoopSplit()
.value().toString();
final String[] filespaths = fPath.split(":");
if ((filespaths != null) && (filespaths.length > 0)) {
for (final String filepath : filespaths) {
if ((filepath != null) && filepath.contains("/")) {
final File file = new File(filepath);
if (file.exists() && file.isFile()) {
try {
File destFile = new File(moveFilePath + "/" +
file.getName());
if (destFile.exists()) {
destFile = new File(moveFilePath + "/" +
file.getName() + "_");
}
java.nio.file.Files.move((file
.toPath()), destFile.toPath(),
StandardCopyOption.REPLACE_EXISTING);
} catch (Exception e) {
logger.error(
"Exception while moving file",
e);
}
}
}
}
}
}
}
Related
I am trying to copy an image that is stored in my application folder to a predefined folder in my gallery.
I started from an image sharing code..
This is my code :
val extension = when (requireNotNull(pictureResult).format) {
PictureFormat.JPEG -> "jpg"
PictureFormat.DNG -> "dng"
else -> throw RuntimeException("Unknown format.")
}
val timestamp = System.currentTimeMillis()
val namePhoto = "picture_"+timestamp+"."+extension;
val destFile = File(filesDir, namePhoto)
val folder = "/CustomFolder"
CameraUtils.writeToFile(requireNotNull(pictureResult?.data), destFile) { file ->
if (file != null) {
// Code to share - it works
/*
val context = this#PicturePreviewActivity
val intent = Intent(Intent.ACTION_SEND)
intent.type = "image/*"
val uri = FileProvider.getUriForFile(context, context.packageName + ".provider", file)
intent.putExtra(Intent.EXTRA_STREAM, uri)
intent.addFlags(Intent.FLAG_GRANT_READ_URI_PERMISSION)
startActivity(intent)
*/
*/
// Code to save image to gallery - doesn't work :(
val photoDirectory = File(Environment.DIRECTORY_PICTURES+folder, namePhoto)
val sourcePath = Paths.get(file.toURI())
Log.i("LOG","sourcePath : "+sourcePath.toString()) // /data/user/0/com.app.demo/files/picture_1663772068143.jpg
val targetPath = Paths.get(photoDirectory.toURI())
Log.i("LOG","targetPath : "+targetPath.toString()) // /Pictures/CustomFolder/picture_1663772068143.jpg
Files.move(sourcePath, targetPath, StandardCopyOption.REPLACE_EXISTING)
// Error here but I don't know why
} else {
Toast.makeText(this#PicturePreviewActivity, "Error while writing file.", Toast.LENGTH_SHORT).show()
}
}
How do I copy the image to a predefined folder?
Ok, I did it !
Solution :
val folder = "/CustomFolder/" // name of your folder
val timestamp = System.currentTimeMillis()
val namePicture = "picture_"+timestamp+"."+extension;
try {
val path = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_PICTURES+folder);
if (!path.exists()) {
path.mkdir();
}
val pathImage = File(path,namePicture)
val stream = FileOutputStream(pathImage)
stream.write(imageByteArray).run {
stream.flush()
stream.close()
}
} catch (e: FileNotFoundException) {
e.printStackTrace()
}
Acording the blog, hdfs uses the lease mechanism to avoid two client writing the same file. So I think one can not delete the file which is written by the other client. Howerver, it's wrong.
When client A is writing the lock.txt, client B can delete lock.txt immediately. Client A can continue writing although the file no longer exists. Just when closing the stream, A will encounter the exception:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/lock.txt (inode 643845185): File does not exist. Holder DFSClient_NONMAPREDUCE_-1636584585_1 does not have any open files**
Why this happen? My haddop version is 2.7.3.
================================
This is my test codeļ¼
// write process
object Create {
private val filePath = "/user/lock.txt"
def main(args: Array[String]): Unit = {
println("Create start!")
val fs = FileSystem.get(new Configuration())
val os = Option(fs.create(new Path(filePath), false))
if (os.isDefined) {
println(s"Create result! $os", System.currentTimeMillis())
0.until(300).foreach(index => {
os.get.write(100)
os.get.flush()
println("Writing...")
Thread.sleep(1000)
})
println("pre close" + System.currentTimeMillis())
os.get.close()
println("close success!" + System.currentTimeMillis())
}
}
}
// delete process
object Delete {
private val filePath = "/user/lock.txt"
def main(args: Array[String]): Unit = {
println("Delete start!")
val fs = FileSystem.get(new Configuration())
while (!fs.exists(new Path(filePath))) {
println("File no exist!")
Thread.sleep(1000)
}
println("File exist!")
while (true) {
println("try delete!")
val tmp = Option(fs.delete(new Path(filePath), false))
if (tmp.isDefined) {
println(s"delete result:${tmp.get}!" + System.currentTimeMillis())
}
println("Try recover")
if (fs.asInstanceOf[DistributedFileSystem].recoverLease(new Path(filePath))) {
println("Recover lease success!")
val res = Option(fs.delete(new Path(filePath), false))
println(s"File delete success:${res.get}")
} else {
println("Recover lease failed!")
}
Thread.sleep(1000)
}
}
}
I was toying with the idea of rewriting some existing bash scripts in kotlin script.
One of the scripts has a section that unzips all the files in a directory. In bash:
unzip *.zip
Is there a nice way to unzip a file(s) in kotlin script?
The easiest way is to just use exec unzip (assuming that the name of your zip file is stored in zipFileName variable):
ProcessBuilder()
.command("unzip", zipFileName)
.redirectError(ProcessBuilder.Redirect.INHERIT)
.redirectOutput(ProcessBuilder.Redirect.INHERIT)
.start()
.waitFor()
The different approach, that is more portable (it will run on any OS and does not require unzip executable to be present), but somewhat less feature-full (it will not restore Unix permissions), is to do unzipping in code:
import java.io.File
import java.util.zip.ZipFile
ZipFile(zipFileName).use { zip ->
zip.entries().asSequence().forEach { entry ->
zip.getInputStream(entry).use { input ->
File(entry.name).outputStream().use { output ->
input.copyTo(output)
}
}
}
}
If you need to scan all *.zip file, then you can do it like this:
File(".").list { _, name -> name.endsWith(".zip") }?.forEach { zipFileName ->
// any of the above approaches
}
or like this:
import java.nio.file.*
Files.newDirectoryStream(Paths.get("."), "*.zip").forEach { path ->
val zipFileName = path.toString()
// any of the above approaches
}
this code is for unziping from Assets
1.for unzping first u need InputStream
2.put it in ZipInputStream
3.if directory is not exist u have to make by .mkdirs()
private val BUFFER_SIZE = 8192//2048;
private val SDPath = Environment.getExternalStorageDirectory().absolutePath
private val unzipPath = "$SDPath/temp/zipunzipFile/unzip/"
var count: Int
val buffer = ByteArray(BUFFER_SIZE)
val context: Context = this
val am = context.getAssets()
val stream = context.getAssets().open("raw.zip")
try {
ZipInputStream(stream).use { zis ->
var ze: ZipEntry
while (zis.nextEntry.also { ze = it } != null) {
var fileName = ze.name
fileName = fileName.substring(fileName.indexOf("/") + 1)
val file = File(unzipPath, fileName)
val dir = if (ze.isDirectory) file else file.getParentFile()
if (!dir.isDirectory() && !dir.mkdirs())
throw FileNotFoundException("Invalid path: " + dir.getAbsolutePath())
if (ze.isDirectory) continue
val fout = FileOutputStream(file)
try {
while ( zis.read(buffer).also { count = it } != -1)
fout.write(buffer, 0, count)
} finally {
val fout : FileOutputStream =openFileOutput(fileName, Context.MODE_PRIVATE)
fout.close()
}
}
for unziping from externalStorage:
private val sourceFile= "$SDPath/unzipFile/data/"
ZipInputStream zis = null;
try {
zis = new ZipInputStream(new BufferedInputStream(new
FileInputStream(sourceFile)));
ZipEntry ze;
int count;
byte[] buffer = new byte[BUFFER_SIZE];
while ((ze = zis.getNextEntry()) != null) {
String fileName = ze.getName();
fileName = fileName.substring(fileName.indexOf("/") + 1);
File file = new File(destinationFolder, fileName);
File dir = ze.isDirectory() ? file : file.getParentFile();
if (!dir.isDirectory() && !dir.mkdirs())
throw new FileNotFoundException("Invalid path: " +
dir.getAbsolutePath());
if (ze.isDirectory()) continue;
FileOutputStream fout = new FileOutputStream(file);
try {
while ((count = zis.read(buffer)) != -1)
fout.write(buffer, 0, count);
} finally {
fout.close();
}
}
} catch (IOException ioe) {
Log.d(TAG, ioe.getMessage());
return false;
} finally {
if (zis != null)
try {
zis.close();
} catch (IOException e) {
}
}
return true;
i can use elasticsearch 2.3.0 version with C# (Nest)
i want to use the analysis with index,
but index settings doesn't change and i don't know why.
Here's my code:
private void button1_Click_1(object sender, EventArgs e)
{
var conn = new Uri("http://localhost:9200");
var config = new ConnectionSettings(conn);
var client = new ElasticClient(config);
string server = cmb_Serv.Text.Trim();
if (server.Length > 0)
{
string ser = server;
string uid = util.getConfigValue("SetUid");
string pwd = util.getConfigValue("SetPwd");
string dbn = cmb_Db.Text;
string tbl = cmb_Tbl.Text;
setWorkDbConnection(ser, uid, pwd, dbn);
string query = util.getConfigValue("SelectMC");
query = query.Replace("###tbl###",tbl);
using (SqlCommand cmd1 = new SqlCommand())
{
using (SqlConnection con1 = new SqlConnection())
{
con1.ConnectionString = util.WorkConnectionString;
con1.Open();
cmd1.CommandTimeout = 0; cmd1.Connection = con1;
cmd1.CommandText = query;
int id_num =0;
SqlDataReader reader = cmd1.ExecuteReader();
while (reader.Read())
{ id_num++;
Console.Write("\r" + id_num);
var mc = new mc
{
Id = id_num,
code = reader[0].ToString(),
mainclass = reader[1].ToString().Trim()
};
client.Index(mc, idx => idx.Index("mctest_ilhee"));
client.Alias(x => x.Add(a => a.Alias("mcAlias").Index("mctest_ilhee")));
client.Map<mc>(d => d
.Properties(props => props
.String(s => s
.Name(p => p.mainclass)
.Name(p2 => p2.code).Index(FieldIndexOption.Analyzed).Analyzer("whitespace"))));
} reader.Dispose();
reader.Close();
}
IndexSettings Is = new IndexSettings();
Is.Analysis.Analyzers.Add("snowball", new SnowballAnalyzer());
Is.Analysis.Analyzers.Add("whitespace", new WhitespaceAnalyzer());
}
}
}
Well first of all you code is strange.
Why you are doing mapping in while? Do mapping only once.
Its impossible to help you because you are not even providing error you get. I would recomend to add simple debug method.
protected void ValidateResponse(IResponse response)
{
if (!response.IsValid ||
(response is IIndicesOperationResponse && !((IIndicesOperationResponse) response).Acknowledged))
{
var error = string.Format("Request to ES failed with error: {0} ", response.ServerError != null ? response.ServerError.Error : "Unknown");
var esRequest = string.Format("URL: {0}\n Method: {1}\n Request: {2}\n",
response.ConnectionStatus.RequestUrl,
response.ConnectionStatus.RequestMethod,
response.ConnectionStatus.Request != null
? Encoding.UTF8.GetString(response.ConnectionStatus.Request)
: string.Empty);
}
}
All requests such as client.Alias, client.Map returns status. So you can do
var result = client.Map<mc>(.....YOUR_CODE_HERE....)
ValidateResponse(result);
Then you will see two things, propper error which ES return + request which NEST sends to ES
I could get the handle to the google text doc i needed. I am now stuck at how to read the contents.
My code looks like:
GoogleOAuthParameters oauthParameters = new GoogleOAuthParameters();
oauthParameters.setOAuthConsumerKey(Constants.CONSUMER_KEY);
oauthParameters.setOAuthConsumerSecret(Constants.CONSUMER_SECRET);
oauthParameters.setOAuthToken(Constants.ACCESS_TOKEN);
oauthParameters.setOAuthTokenSecret(Constants.ACCESS_TOKEN_SECRET);
DocsService client = new DocsService("sakshum-YourAppName-v1");
client.setOAuthCredentials(oauthParameters, new OAuthHmacSha1Signer());
URL feedUrl = new URL("https://docs.google.com/feeds/default/private/full/");
DocumentQuery dquery = new DocumentQuery(feedUrl);
dquery.setTitleQuery("blood_donor_verification_template_dev");
dquery.setTitleExact(true);
dquery.setMaxResults(10);
DocumentListFeed resultFeed = client.getFeed(dquery, DocumentListFeed.class);
System.out.println("feed size:" + resultFeed.getEntries().size());
String emailBody = "";
for (DocumentListEntry entry : resultFeed.getEntries()) {
System.out.println(entry.getPlainTextContent());
emailBody = entry.getPlainTextContent();
}
Plz note that entry.getPlainTextContent() does not work and throws object not TextContent type exception
finally i solved it as:
for (DocumentListEntry entry : resultFeed.getEntries()) {
String docId = entry.getDocId();
String docType = entry.getType();
URL exportUrl =
new URL("https://docs.google.com/feeds/download/" + docType
+ "s/Export?docID=" + docId + "&exportFormat=html");
MediaContent mc = new MediaContent();
mc.setUri(exportUrl.toString());
MediaSource ms = client.getMedia(mc);
InputStream inStream = null;
try {
inStream = ms.getInputStream();
int c;
while ((c = inStream.read()) != -1) {
emailBody.append((char)c);
}
} finally {
if (inStream != null) {
inStream.close();
}
}
}