how to use special utf-8 characters in submodules?

how to use special utf-8 characters in submodules? - utf-8

Im trying to write my master-degree thesis with latex and that is my first real project with latex, In my thesis I need japanese and some polish characters.
I divided my thesis by submodules. My main module looks like
\documentclass[11pt]{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{CJK}
\includeonly{spis_tresci}
\begin{document}
% Definition of title and author
\title{ My Thesis title. }
\author{Mazeryt Freager \\
\\
\begin{CJK*}{UTF8}{min}
一部の日本人のもの
\end{CJK*}
\\ Polish characters are ąćśżźółęń}
\maketitle
\clearpage
\input{Table_of_Contents}
\end{document}
And the above code works perfect.but the problem is in submodule "Table of Contents"
%Also I need utf-8 in file header because Table of Contents include "ś" character in PL
\section{Spis Treści}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{CJK}
%When I add here something more than ASCI code I got into compilation failure
%No mather if it is:
%\begin{CJK*}{UTF8}{min}
%一部の日本人のもの
%\end{CJK*}
%\\ Polish characters are ąćśżźółęń}
abcdefghijklmnoprstuwxyzABCDEFGHIJKLMNOPRSTUWXYZ
%but standard ASCI works
I search a lot about this but I didn't find any solution that works for me

See my answer here on TeX.SX - I'll post it again here for the sake of completeness.
I believe the problem lies in the misconception about the use of the CJK environment - as #egreg said, it can't be enabled and disabled. Just enclose the whole document in one CJK environment and when using CJKutf8 (see here for what difference it makes) utf8 characters using latin script but outside of ASCII will be fine.
Thus your MWE in a fixed version would be:
\documentclass[11pt]{article}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{CJKutf8}
\begin{document}
% Definition of title and author
\begin{CJK*}{UTF8}{min}
\title{My Thesis title.}
\author{Mazeryt Freager\\ \\一部の日本人のもの\\śćóœ}
\maketitle
\clearpage
\input{Table_of_Contents}
\end{CJK*}
\end{document}
with the `Table_of_Contents.tex' having the following contents:
一部の日本人のもの\\
Polish characters are: ąćśżźółęń\\
ASCII: abcdefghijklmnoprstuwxyzABCDEFGHIJKLMNOPRSTUWXYZ
and the output being on the title page:
and
on the first page.

Related

How can I print unicode characters in LaTeX?

I need to print a bunch of unicode characters using LaTeX, and cannot find a solution.
Here is the simplest (not)working example:
\documentclass[10pt]{article}
\begin{document}
Test: $\beta$ βᵝᵦꞵ𝛃𝛽𝜷𝝱𝞫
\end{document}
The output is:
Test: β with XeLaTeX and LuaLaTex
With PdfLaTex I get the standard error:
Package inputenc error: Unicode character (...) not set up for use with LaTex
I am aware of the possibility to re-define all unicode characters to a single stndardized one, that of \beta. However, that is not the solution, as I need to print the characters exactly as displayed above or in any decent text editor.
The file I use is encoded in UTF-8. I am using TexMaker, also set-up for UTF-8.

How do I add comments on an algorithm environment in LaTeX?

I am using LaTeX to write a pseudo algorithm using the algorithm package. I want to add comments on the code in a way that they get aligned. The following lines are what I could do, but the comments are not aligned. How do I do that?
\begin{algorithm}[H]
\caption{}
\label{}
\begin{tabbing}
quad \=\quad \=\quad \kill
\keyw{for} each a $\in$ A \keyw{do} \\
\> command; \qquad \qquad $\blacktriangleright$ add text here \\
\keyw{end} \\
\end{tabbing}
\end{algorithm}
The comments are like that:
one comment here\\
other here\\
other here\\
How do I align them?

If you're setting algorithms, use a dedicated pseudo code setting package. Here's one using algorithmicx's algpseudocode:
\documentclass{article}
\usepackage{algorithm,algpseudocode}
\algnewcommand{\algorithmicforeach}{\textbf{for each}}
\algdef{SE}[FOR]{ForEach}{EndForEach}[1]
{\algorithmicforeach\ #1\ \algorithmicdo}% \ForEach{#1}
{\algorithmicend\ \algorithmicforeach}% \EndForEach
\begin{document}
\begin{algorithm}
\caption{An algorithm}
\begin{algorithmic}[1]
\ForEach{$a \in A$}%
\State command \algorithmiccomment{This is a comment}
\State another command \algorithmiccomment{This is another comment}
\EndForEach
\end{algorithmic}
\end{algorithm}
\end{document}
algpseudocode already defines \ForAll. However, in the above code, I copied that definition into \ForEach. Comments can be added using \algorithmiccomment. Formatting and placement can be modified.

compiling Persian characters in javac

I have to define a string which consists persian text. But by running javac on my program i see "unmappable character for encoding cp1252 " by trying javac -encoding ISO-8859-1 from here.
Now My program does not see any error but only numbers in text are shown and Persian characters disappear.For example my string is "من 2 کتاب و 3 کامپیوتر دارم" which means I have two books and three computers. The shown text is just : 2 3.
By the wat when I use netbeans I have no errors and all text is shown but javac make those problems.

Removing diacritical marks from a Greek text in an automatic way

I have a decompiled stardict dictionary in the form of a tab file
κακός <tab> bad
where <tab> signifies a tabulation.
Unfortunately, the way the words are defined requires the query to include all diacritical marks. So if I want to search for ζῷον, I need to have all the iotas and circumflexes correct.
Thus I'd like to convert the whole file so that the keyword has the diacritic removed. So the line would become
κακος <tab> <h3>κακός</h3> <br/> bad
I know I could read the file line by line in bash, as described here [1]
while read line
do
command
done <file
But what is there any way to automatize the operation of converting the line? I heard about iconv [2] but didn't manage to achieve the desired conversion using it. I'd best like to use a bash script.
Besides, is there an automatic way of transliterating Greek, e.g. using the method Perseus has?
/edit: Maybe we could use the Unicode codes? We can notice that U+1F0x, U+1F8x for x < 8, etc. are all variants of the letter α. This would reduce the amount of manual work. I'd accept a C++ solution as well.
[1] http://en.kioskea.net/faq/1757-how-to-read-a-file-line-by-line
[2] How to remove all of the diacritics from a file?

You can remove diacritics from a string relatively easily using Perl:
$_=NFKD($_);s/\p{InDiacriticals}//g;
for example:
$ echo 'ὦὢῶὼώὠὤ ᾪ' | perl -CS -MUnicode::Normalize -pne '$_=NFKD($_);s/\p{InDiacriticals}//g'
ωωωωωωω Ω
This works as follows:
The -CS enables UTF8 for Perl's stdin/stdout
The -MUnicode::Normalize loads a library for Unicode normalisation
-e executes the script from the command line; -n automatically loops over lines in the input; -p prints the output automatically
NFKD() translates the line into one of the Unicode normalisation forms; this means that accents and diacritics are decomposed into separate characters, which makes it easier to remove them in the next step
s/\p{InDiacriticals}//g removes all characters that Unicoded denotes as diacritical marks
This should in fact work for removing diacritics etc for all scripts/languages that have good Unicode support, not just Greek.

I'm not so familiar with Ancient Greek as I am with Modern Greek (which only really uses two diacritics)
However I went through the vowels and found out which combined with diacritics. This gave me the following list:
ἆἂᾶὰάἀἄ
ἒὲέἐἔ
ἦἢῆὴήἠἤ
ἶἲῖὶίἰἴ
ὂὸόὀὄ
ὖὒῦὺύὐὔ
ὦὢῶὼώὠὤ
I saved this list as a file and passed it to this sed
cat test.txt | sed -e 's/[ἆἂᾶὰάἀἄ]/α/g;s/[ἒὲέἐἔ]/ε/g;s/[ἦἢῆὴήἠἤ]/η/g;s/[ἶἲῖὶίἰἴ]/ι/g;s/[ὂὸόὀὄ]/ο/g;s/[ὖὒῦὺύὐὔ]/υ/g;s/[ὦὢῶὼώὠὤ]/ω/g'
Credit to hungnv
It's a simple sed. It takes each of the options and replaces it with the unmarked character. The result of the above command is:
ααααααα
εεεεε
ηηηηηηη
ιιιιιιι
οοοοο
υυυυυυυ
ωωωωωωω
Regarding transliterating the Greek: the image from your post is intended to help the user type in Greek on the site you took it from using similar glyphs, not always similar sounds. Those are poor transliterations. e.g. β is most often transliterated as v. ψ is ps. φ is ph, etc.

Character replacement batch file

I'm trying to do a batch script using Windows command line to convert some characters for example:
É to Й
Ö to Ц
Ó to У
Ê to К
Å to Е
Í to Н
Ã to Г
Ø to Ш
Ù to Щ
Ç to З
with no success. That's because I am using a program that does not support a Cyrillic font.
And I have already the file with these words, like:
ОБОГРЕВ ЗОНЫ 1
ДАВЛЕНИЕ ЦВЕТА 1
...
and so on...
Is it possible?

I'm guessing that you'd like to convert the character set (alias code page) of a file so you can open and read it.
I'm assuming you are using a Windows computer.
Let's say that your file is russian.txt and when you open it with notepad, the characters doesn't make any sense. The russian.txt file's character encoding is most propably ANSI and it's code page is Windows-1251.
Some words about character encoding:
In ANSI one character is one byte long.
Different languages have different code pages: Windows-1251 = Russian, Windows-1252 = Western Languages (English, German, Swedish...), Windows-1253 = Greek ...
In UTF-8 English characters are one byte long and non-English characters two bytes long.
In Unicode all characters are two bytes long.
UTF-8 and Unicode doesn't need code pages.
You can check the encoding by opening the file in notepad and clicking File, Save As. At the right bottom corner beside the Save-button you can see the encoding.
With some googling I found a site where you can do the character encoding conversion online. I Haven't tested it, but here's the address:
http://i-tools.org/charset
I've made a script (= a small program) which changes the character encoding from any ANSI and code page combination to UTF-8 or Unicode or vice versa.
Let's say you have and English Windows computer and want to convert the russian.txt (ANSI / Windows-1251) to UTF-8.
Here's how:
Open this web-page and copy the script in it to the clipboard:
VB6/VBScript change file encoding to ansi
Create a new file named ConvertCharset.vbs to the same folder, where the russian.txt is, say C:\Temp.
Open the ConvertCharset.vbs in notepad (right click+edit) and paste.
Open CMD (Windows-button+R, cmd, Enter).
In CMD-window type (hit Enter-key at each end of the line):
cd C:\Temp\
cscript ConvertCharset.vbs /InputCharset:Windows-1251 /OutputCharset:utf-8 /InputFile:russian.txt /OutputFile:russian_utf-8.txt
Now the you can open the russian_utf-8.txt in notepad and you'll see the Russian characters OK.
More info:
http://en.wikipedia.org/wiki/Character_encoding
http://en.wikipedia.org/wiki/Windows-1251
http://en.wikipedia.org/wiki/UTF-8
VB6/VBScript change file encoding to ansi

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio