How to markup ascii-art tables automatically? - markup

I have a plain text file with tables:
http://uucode.com/tmp/20110913/vim-index.txt
(or type ":help index" in vim)
I want to convert the table into a computer-friendly form. For example, into HTML or into Excel.
Any suggestion how to make it automatically? Maybe some lightweight markup tool can recognize such tables? Or probably there are some research papers with useful algorithms?

The plugin "vimwiki : Personal Wiki for Vim" ( http://www.vim.org/scripts/script.php?script_id=2226 ) defines a table syntax for its wiki pages:
| Year | Temperature (low) | Temperature (high) |
|------+-------------------+--------------------|
| 1900 | -10 | 25 |
| 1910 | -15 | 30 |
| 1920 | -10 | 32 |
| 1930 | _N/A_ | _N/A_ |
| 1940 | -2 | 40 |
>
In HTML the following part >
| Year | Temperature (low) | Temperature (high) |
|------+-------------------+--------------------|
>
is higlighted as a table header.
And it provides a command to convert from text to html:
*:Vimwiki2HTML*
Convert current wiki page to HTML.
Maybe you could use some regular expressions to convert your tables to this format, and then use this plugin to convert it to HTML.

Related

How can i merge multiple columns from two different files in talend

Lets say i have multiple columns coming from two different files like that :
USERNAME | AGE | GENDER | CHILDREN
Joe | 23 | male | 2
Annie | 45 | female | 5
| | |
And another one like this :
USERNAME | AGE |
Jonathan | 33 |
Mike | 41 |
And i want to merge the data of the columns that have the same name into one like this while keeping the data of the columns that are unique at each field:
USERNAME | AGE | GENDER | CHILDREN
Joe | 23 | male | 2
Annie | 45 | female | 5
Jonathan | 33 | |
Mike | 41 | |
Sorry if the answer is obvious, im new to talend, thanks.
What tool is available toy you?
The Append function in SAS for example can do this for you.
You can use the append approach in Python, R or other language you intend using.
For Talen:
Copy the complete subjob1 – copy me sub job and paste it to create a second sub job.
Link the two sub jobs using an onSubjobOK link.
Open tFixedFlowInput, and change Records from first subjob to Records from second subjob.
Open tFileOutputDelimited on the new sub job, and tick Append, as shown in the following screenshot:
use a tUnite component to accomplish that
here is the link of the documentation : https://help.talend.com/r/fr-FR/8.0/orchestration/tunite
your flow would be
tFileInput1(excel or csv ) ----------------------------------------------
|
| ->tUnite -> tLogRow
tFileInput2(excel or csv )->tMap (add to empty fields GENDER & Children )|

<blockquote> tag inserted when using image in cell of RST table?

When I use the following code:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | |.. image:: /images/merchant-rating.png |
+----------------------+---------------+---------------------------------------------------------------------+
The text preceding the image in column C gets wrapped in <blockquote> tags in the HTML output. Is there any way to avoid this?
To avoid the blockquote tag in the first paragraph of the third column, you could try using this:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | | |img| |
+----------------------+---------------+---------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png
Instead, you'll get two paragraphs.
Use a substitution and remove the separating line so that Sphinx interprets the content as a single block of text.
+-----------------+--------------+------------------------------------------------------------------+
| A | B | C |
+=================+==============+==================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |img| |
+-----------------+--------------+------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png

Efficient way to join by levenshtein in Hive or Impala

I have two tables one includes about 17K (NLIST) records while the other 57K (FNAMES).
I would like to join the both by comparing the records using levenshtein formula.
Here is the example for the content of tables:
Table NLIST:
+------+-------------+
| ID | S_NAME |
+------+-------------+
| 1 | Avi |
| 2 | Moshe |
| 3 | David |
....
Table FNAMES:
+------+-------------+
| ID | NICKNAMES |
+------+-------------+
| 1 | Avile |
| 2 | Dudi |
| 3 | Moshiko |
| 4 | Avi |
| 5 | DAVE |
....
The above tables are just examples. In the real case the names column can include more than one word.
The required result should be:
+------+-------------+--------+
| ID | NICKNAMES | S_NAME |
+------+-------------+--------+
| 1 | Avile | Avi |
| 2 | Dudi | David |
| 3 | Moshiko | Moshe |
| 4 | Avi | Avi |
| 5 | DAVE | David |
...
Here is the code I use:
select FNAMES.NICKNAMES, NLIST.S_NAME
from NICKNAMES
LEFT OUTER JOIN NLIST
ON(true)
WHERE levenshtein (FNAMES.NICKNAMES, NLIST.S_NAME) <=4
The above code runs for a very long time and I stopped its running.
How can I make it run in a reasonable time?
In addition, I think the levenshtein distance depends on the length of the words. How can I find the optimal value for the distance (in this case I chose 4 arbitrarily)?
Hive Table performance is depends upon various point .
Query enginee
File format
use VECTORIZATION set hive.vectorized.execution.enabled = true;set hive.vectorized.execution.reduce.enabled = true;
If you have good server you can try with Impala and definitely it is faster than Hive.
You can do the fine tuning of impala which will give you an edge to execute this query faster .Tuning Impala for Performance

JasperReports: exporting the report to csv format

I am working with JasperReports 4.5.0, Spring 3.0.5 RELEASE. I am exporting my JR report in pdf, html, csv formats. With pdf, html the report is generating fine.
But when i am exporting my report in csv it is displaying all the fields and values in one column only. My code flow is exactly like this link.
Below is the example how I am getting now.
| A | B | C | D |
|S.No,IPAddress,TotalDuration,TotalBdrCount | | | |
|1,null,266082,null | | | |
2,null,null,null | | | |
3,null,null,null | | | |
4,null,null,null | | | |
Where S.No,IPAddress,TotalDuration,TotalBdrCount are the column headers and 1,null,266082,null are the values to the respective columns.
But my requirement is
| A | B | C | D |
| S.No | IPAddress | TotalDuration | TotalBdrCount |
I think you understood my problem. For this am i need to set any parameters? I am not getting. Can any one help me out regarding this issue.

IRB and large variables?

How can I print a large variable nicely in an irb prompt? I have a variable that contains many variables which are long and the printout becomes a mess to wade through. What if I just want the variable names without their values? Or, can I print each one on a separate line, tabbed-in depending on depth?
Or, can I print each one on a separate line, tabbed-in depending on depth?
Use pp (pretty print):
require 'pp'
very_long_hash = Hash[(1..23).zip(20..42)]
pp very_long_hash
# Prints:
{1=>20,
2=>21,
3=>22,
4=>23,
5=>24,
6=>25,
7=>26,
8=>27,
9=>28,
10=>29,
11=>30,
12=>31,
13=>32,
14=>33,
15=>34,
16=>35,
17=>36,
18=>37,
19=>38,
20=>39,
21=>40,
22=>41,
23=>42}
If you want something that is even more awesome than "pretty" print, you can use "awesome" print. And for a truly spaced out experience, sprinkle some hirbal medicine on your IRb!
Hirb, for example, renders ActiveRecord objects (or pretty much any database access library) as actual ASCII tables:
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| id | created_at | description | name | namespace | predicate | value |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| 907 | 2009-03-06 21:10:41 UTC | | gem:tags=yaml | gem | tags | yaml |
| 906 | 2009-03-06 08:47:04 UTC | | gem:tags=nomonkey | gem | tags | nomonkey |
| 905 | 2009-03-04 00:30:10 UTC | | article:tags=ruby | article | tags | ruby |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+

Resources