automatic copy paste from Browser to Microsoft Word - windows

I want to copy some specific texts from internet browser(chrome) and want to paste them in proper fields of Microsoft word.. Let me explain what I want exactly... I have this kind of page structure in chrome-
Name-Deepak,Raju,Jhon,Robert.......
Salary-200,254,673,953...
Phone-987535747,856889479,64688539,357954228....
Etc..
I have a table in MS word as-
Sl. Phone. Name. Salary.
Can I make a auto copy paste program to make my table-
Sl. Phone. Name. Salary
1. 987535747. Deepak. 200
2. .......
Like this? Suggest me the best suitable platform to compile this.. Its best for me, if a bat file can do the job.. I know bit odd question.. And I should not ask the entire program,rather a section of it..Bt still....... actually I don't know from where to start..

Rather than use a wget which will only retrieve the document, what you want is a way of parsing the results of the web content and writing into an output file.
After searching the web, I could only come across
lynx which
is a text based browser and you can parse the -dump parameter to
output the text into file which you can then write a script to do
the final bit.
Also take a look at this
link
for more info on switches you can use most especially if the desired
text has links in it (-nolist)
elinks which is an advanced text based browser

Related

how to add a separator after each word with ghostscript -sDEVICE=txtwrite

I have used ghostscript to successfully extract text from PDFs that have tables.
This simple command works very well:
gswin64c -sDEVICE=txtwrite -o test.txt "c:\reports\sample.pdf"
However some words get joined together especially from tables, for example:
234801111111109-12-2014 16:17:04764030208117034 2883253100.00 Payment
234801111111109-12-2014 16:18:461088956908117033 2883253400.00 Payment
234801111111109-12-2014 16:19:48769948208117040 2883253750.00 Payment
should actually be:
2348011111111 09-12-2014 16:17:04 764030208117034 2883253 100.00 Payment
2348011111111 09-12-2014 16:18:46 1088956908117033 2883253 400.00 Payment
2348011111111 09-12-2014 16:19:48 769948208117040 2883253 750.00 Payment
Please is there a way to add a separator character at the end of each word.
That would solve this perfectly.
No sorry, this idea simply won't work.
There is no such thing as a 'word' in a PDF file, there is simply a sequence of character codes and positions. The txtwrite code goes to some lengths to try and reconstruct words by looking at the position of each piece of text, and the metrics of the fonts used, but there are no words in the original.
I don't claim this is perfect, if you'd like me to look at it you will need to supply the original file. Best solution is to open a bug report and attach the file to it.
This is still an area I'm looking at, for a different project (RTF output) so now is a good time to report it. I cannot guarantee being able to resolve it, but it may well simply be that the 'rebuild the page layout' code is being too simple-minded about the location of the text.
You can, however, get a lower level output, the XML-like output will give you each fragment of text individually, and its position on the page. You could use that information yourself to rebuild the content.
The default option tries to build a simple representation of the page by using space characters to reproduce the layout of the original, as far as possible, but I have no illusions that there aren't bugs :-)

Create Multiple Slides from a List with Common Template

I have created a certificate design with powerpoint.
Now I have to create 100+ copies of it... each with a different name (the recipent).
I was wondering if there was an easy way to do it...
I can have the list of names in excel or txt.
I am open to other ideas as well, like changing the slide into an images and batch processing it in a simple way
You may also try out SlideMight, a tool for merging hierarchical data with PowerPoint templates. SlideMight supports iteration over data, to generate slides or to populate tables. There is more functionality, but you don't seem to need that. SlideMight is in fact a coding system, like mail merge for Word is.
Input data format is at this time just JSON; you would need to convert your Excel sheets first, e.g. using this Excel to JSON add-in for Excel.
There are versions for Windows and Mac OS X.
More information is at www.SlideMight.com
Disclaimer:
I am the owner of Delftware Technology, the company that developed SlideMight.
And I am one of the developers.
This is a question that really belongs in SuperUser, not StackOverflow (which is intended for coding questions, not software how-to-use questions).
But ...
Save your names to a plain notepad TXT file, one name per line.
Start PowerPoint, choose File, Open and point to your TXT file (you may force the matter by choosing . in Files of type:
Apply whatever template you like to the result.
I have a commercial add-in that'll do this and quite a bit more, but from your description, you don't need it.

PowerBuilder and batch processing

I'm using PowerBuilder 10.5 and as a newbie I'm a bit stuck and since Google isn't giving me a satisfying answer I'm asking some advice from the Stack Overflow group.
I have a Rich Text Edit field in which the user can write something, insert pictures and so forth. Once finished, he goes to the „Search“ command button and clicking it searches for the batch file that will suit his needs (copy that text into an existing word document, create a new word and place the folder on web, and so fort – there are 6 different batches). The code in the clicked event of „Search“ command button is this:
String ls_s
GetFileOpenName('PB_app', ls_s, ls_s, 'BAT', "Win Batch Files (*.BAT),*.BAT", 'C:\Programs\Test')
And here come my problems: I can't connect my app and the selected batch file. I'd like the path of the selected batch file to be visible in the Single Line Edit filed, but I have no idea how to get there, not to mention I'm point blank at how to connect PB app, batch file, how to even say to the batch file – „That text in rich text edit field is the one you have to work with?“…?
So I need some advice, guidance, perhaps some links or names of any literature that would help me understand how it should be done. I've lost two days and got nowhere, and I just need some piece of advice to get me going…
Your problem is that the original programmer used one variable for two return values. If you declare a new string variable and pass it instead of the first ls_s, you'll see this will return you the path. If you run into trouble, PB has a good help file (and the manuals are also online) which covers GetFileOpenName().
Good luck,
Terry

Insert a hyperlink to another file (Word) into Visual Studio code file

I am currently developing some functionality that implements some complex calculations. The calculations themselves are explained and defined in Word documents.
What I would like to do is create a hyperlink in each code file that references the assocciated Word document - just as you can in Word itself. Ideally this link would be placed in or near the XML comments for each class.
The files reside on a network share and there are no permissions to worry about.
So far I have the following but it always comes up with a file not found error.
file:///\\165.195.209.3\engdisk1\My Tool\Calculations\111-07 MyToolCalcOne.docx
I've worked out the problem is due to the spaces in the folder and filenames.
My Tool
111-07 MyToolCalcOne.docx
I tried replacing the spaces with %20, thus:
file:///\\165.195.209.3\engdisk1\My%20Tool\Calculations\111-07%20MyToolCalcOne.docx
but with no success.
So the question is; what can I use in place of the spaces?
Or, is there a better way?
One way that works beautifully is to write your own URL handler. It's absolutely trivial to do, but so very powerful and useful.
A registry key can be set to make the OS execute a program of your choice when the registered URL is launched, with the URL text being passed in as a command-line argument. It just takes a few trivial lines of code to will parse the URL in any way you see fit in order to locate and launch the documentation.
The advantages of this:
You can use a much more compact and readable form, e.g. mydocs://MyToolCalcOne.docx
A simplified format means no trouble trying to encode tricky file paths
Your program can search anywhere you like for the file, making the document storage totally portable and relocatable (e.g. you could move your docs into source control or onto a website and just tweak your URL handler to locate the files)
Your URL is unique, so you can differentiate files, web URLs, and documentation URLs
You can register many URLs, so can use different ones for specs, designs, API documentation, etc.
You have complete control over how the document is presented (does it launch Word, an Internet Explorer, or a custom viewer to display the docs, for example?)
I would advise against using spaces in filenames and URLs - spaces have never worked properly under Windows, and always cause problems (or require ugliness like %20) sooner or later. The easiest and cleanest solution is simply to remove the spaces or replace them with something like underscores, dashes or periods.

Methods of Parsing Large PDF Files

I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text

Resources