I installed the latest version of the free hp vertica server on OS Linux CentOS release 6.6 (Final). Next, I set up a server and created a database IM_0609. Next, I created a table with the command:
CREATE TABLE MARKS (SERIAL_NUM varchar(30),PERIOD smallint,MARK_NUM decimal(20,0), END_MARK_NUM decimal(20,0),OLD_MARK_NUM decimal(20,0),DEVICE_NAME varchar(256),DEVICE_MARK varchar(256),CALIBRATION_DATE date);
Next, from the DB2 database I executed EXPORT data to txt file:
5465465|12|+5211.|+5211.||Комплексы компьютеризированные самостоятельного предрейсового экспресс-обследования функционального состояния машиниста, водителя и оператора|ЭкОЗ-01|2004-12-09
5465465|12|+5211.|+5211.||Спектрометры эмиссионные|Metal Lab|2004-12-09
б/н|12|+5207.|+5207.|+5205.|Спектрометры эмиссионные|Metal Lab|2004-12-09
б/н|12|+5207.|+5207.|+5205.|Спектрометры эмиссионные|Metal Test|2004-12-09
....
and I changed the file encoding to UTF-8.
I then import the data from the text file into a database table using the hp vertica here this command:
copy MARKS from '/home/dbadmin/result.txt' delimiter '|' null as '' exceptions '/home/dbadmin/copy-error.log' ABORT ON ERROR;
All data loaded, but Russian characters display some weird characters, apparently this is due to the problems of character encoding the command COPY.
5465465 12 5211 5211 (null) Êîìïëåêñû êîìïüşòåğèçèğîâàííûå ñàìîñòîÿòåëüíîãî ïğåäğåéñîâîãî ıêñïğåññ-îáñëåäîâàíèÿ ôóíêöèîíàëüíîãî ñîñòîÿíèÿ ìàøèíèñòà, âîäèòåëÿ è îï İêÎÇ-01 2004-12-09
5465465 12 5211 5211 (null) Ñïåêòğîìåòğû ıìèññèîííûå Metal Lab 2004-12-09
Question: How can I fix this problem?
Make sure your file encoding us utf-8
[dbadmin#DCG023 ~]$ file rus
rus: UTF-8 Unicode text
[dbadmin#DCG023 ~]$ cat rus
5465465|12|+5211.|+5211.||Комплексы компьютеризированные самостоятельного предрейсового экспресс-обследования функционального состояния машиниста, водителя и оператора|ЭкОЗ-01|2004-12-09
5465465|12|+5211.|+5211.||Спектрометры эмиссионные|Metal Lab|2004-12-09
б/н|12|+5207.|+5207.|+5205.|Спектрометры эмиссионные|Metal Lab|2004-12-09
б/н|12|+5207.|+5207.|+5205.|Спектрометры эмиссионные|Metal Test|2004-12-09
Load the data
[dbadmin#DCG023 ~]$ vsql
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type: \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
(dbadmin#:5433) [dbadmin] > copy MARKS from '/home/dbadmin/rus' delimiter '|' null as '' ABORT ON ERROR;
Rows Loaded
-------------
4
(1 row)
Query the data
(dbadmin#:5433) [dbadmin] > select * from Marks;
SERIAL_NUM | PERIOD | MARK_NUM | END_MARK_NUM | OLD_MARK_NUM | DEVICE_NAME | DEVICE_MARK | CALIBRATION_DATE
------------+--------+----------+--------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------+-------------+------------------
5465465 | 12 | 5211 | 5211 | | Комплексы компьютеризированные самостоятельного предрейсового экспресс-обследования функционального состояния машиниста, водителя и оп | ЭкОЗ-01 | 2004-12-09
5465465 | 12 | 5211 | 5211 | | Спектрометры эмиссионные | Metal Lab | 2004-12-09
б/н | 12 | 5207 | 5207 | 5205 | Спектрометры эмиссионные | Metal Lab | 2004-12-09
б/н | 12 | 5207 | 5207 | 5205 | Спектрометры эмиссионные | Metal Test | 2004-12-09
(4 rows)
Related
I am trying to save a file on a library inside a Iseries database using the GxFtpPut on genexus 10 V3 with .net but when sending the file genexus tries to send it to a windows directory instead of sending it to the library which works using the ftp command on the cmd
I've already tried to changing the route is using to no avail and trying to find another way of sending the file through genexus.
for example when using the cmd I just put this :
put C:\FILES\Filename.txt Library/Filename
And it works on sending the file inside the library,
but when doing this on genexus:
Call("GxFtpPut", &FileDirectory , 'Library/'+&FileName,'B' )
Does not work and tries to find a directory with that name inside the windows files of the server
I just want to be able to send it to the server library without issue.
IBM i has two distinct name formats depending on the file system you are trying to use. NAMEFMT 0 is the library/filename format, and is likely unknown to PC FTP clients. NAMEFMT 1 is the typical hierarchical directory path used by non-IBM i computers, and also works with IBM i if you want to put a file anywhere in the IFS (Integrated File System).
Fun fact, the native library file system is also accessible from the IFS. But to address it you need to use a format that might be a little unfamiliar. /QSYS.lib/library.lib/filename.file/membername.mbr You may be able to drop the member name.
To change name format, you can issue the SITE sub-command on your remote host like this:
QUOTE SITE NAMEFMT 0 -- This sets name format 0 (library/filename)
QUITE SITE NAMEFMT 1 -- This sets name format 1 (directory path)
I did some testing with a plain Windows FTP client. The test file on the PC was a text file created in Notepad++. Turns out that we start out in NAMEFMT 0 unless it is changed. It looks like genexus only supports a limited set of commands. So here is the limited FTP script that works:
ascii
put test.txt mylib/testpf
I can now pull up testpf on the greenscreen utilities and read it. I can also read testpf in my GUI SQL client. The ASCII text has been converted properly to EBCDIC.
|TESTPF |
|--------------------------------------------------------------------------------|
| |
|// ------------------------------------ |
|// Sweep |
|// |
|// Performs the sweep logic |
|// ------------------------------------ |
|dcl-proc Sweep; |
| |
| |
| exec sql |
| update atty a |
| set ymglsb = (select ymglsb from glaty |
| where atty = a.atty) |
| where atty in (select atty from glaty where atty = a.atty); |
|// where ymglsb in (select ymglsb from glaty where atty = a.atty); |
| if %subst(sqlstate: 1: 2) < '00' or |
| %subst(sqlstate: 1: 2) > '02'; |
| exec sql get diagnostics condition 1 |
| :message = message_text; |
| SendSqlMsg('02: ' + message); |
| endif; |
| |
| exec sql |
| update atty a |
| set ymglsb = '000' |
| where not exists (select * from glaty where atty = a.atty); |
| if %subst(sqlstate: 1: 2) < '00' or |
| %subst(sqlstate: 1: 2) > '02'; |
| exec sql get diagnostics condition 1 |
| :message = message_text; |
| SendSqlMsg('03: ' + message); |
| endif; |
| |
|end-proc; |
However, if I try to transfer in binary mode, the resulting data in the file looks like this:
|TESTPF |
|--------------------------------------------------------------------------------|
|ëÏÁÁø&ÁÊÃ?Ê_ËÈÇÁËÏÁÁø% |
|?ÅÑÄÀÄ%øÊ?ÄëÏÁÁøÁÌÁÄËÉ% |
|ÍøÀ/ÈÁ/ÈÈ`/ËÁÈ`_Å%ËÂËÁ%ÁÄÈ`_Å%ËÂÃÊ?_Å%/È` |
|ÏÇÁÊÁ/ÈÈ`//ÈÈ`ÏÇÁÊÁ/ÈÈ`Ñ>ËÁ%ÁÄÈ/ÈÈ`ÃÊ?_Å%/È`ÏÇÁÊÁ/ÈÈ |
|`//ÈÈ`ÏÇÁÊÁ`_Å%ËÂÑ>ËÁ%ÁÄÈ`_Å%ËÂÃÊ?_Å%/È`ÏÇÁÊÁ/ÈÈ`//ÈÈ |
|`ÑöËÍÂËÈËÉ%ËÈ/ÈÁ?ʶËÍÂËÈËÉ%ËÈ/ÈÁ |
|ÁÌÁÄËÉ%ÅÁÈÀÑ/Å>?ËÈÑÄËÄ?>ÀÑÈÑ?>_ÁËË/ÅÁ_ÁËË/ÅÁ¬ÈÁÌÈ |
|ëÁ>ÀëÉ%(ËÅ_ÁËË/ÅÁÁ>ÀÑÃÁÌÁÄËÉ%ÍøÀ/ÈÁ/ÈÈ`/ |
|ËÁÈ`_Å%ËÂÏÇÁÊÁ>?ÈÁÌÑËÈËËÁ%ÁÄÈÃÊ?_Å%/È`ÏÇÁÊÁ/ÈÈ`// |
|ÈÈ`ÑöËÍÂËÈËÉ%ËÈ/ÈÁ?ʶËÍÂËÈËÉ%ËÈ/ÈÁ |
|ÁÌÁÄËÉ%ÅÁÈÀÑ/Å>?ËÈÑÄËÄ?>ÀÑÈÑ?>_ÁËË/ÅÁ_ÁËË/ÅÁ¬ÈÁÌÈ |
|ëÁ>ÀëÉ%(ËÅ_ÁËË/ÅÁÁ>ÀÑÃÁ>ÀøÊ?Ä |
This has not been converted because we have told IBM i FTP server not to convert to EBCDIC because it is binary.
So try ASCII mode, use the library/filename format. The target file does not need to pre-exist.
I am interacting with a sqlite3 database on linux with bash scripts. It is a small tool I use for myself. I am displaying the data in the terminal. Some of the columns contain a lot of text, too much to show in a single line. Is there a possibility to word wrap the output of the select-query? The output I am looking for should look something like this:
rowid | column1 | column2 | column3
------------------------------------------------
1 | value 11 | value 21 | value 31
------------------------------------------------
2 | value 12 | This is a | value 32
| | very long |
| | text |
------------------------------------------------
3 | value 13 | value 23 | value 33
------------------------------------------------
4 | value 14 | value 24 | value 34
Is there a possibility to do this? I was not able to find a solution to this problem. Thanks in advance and BR!
I have a massive chunk of files in an extremely fast SAN disk that I like to do Hive query on them.
An obvious option is to copy all files into HDFS by using a command like this:
hadoop dfs -copyFromLocal /path/to/file/on/filesystem /path/to/input/on/hdfs
However, I don't want to create a second copy of my files, just to be to Hive query in them.
Is there any way to point an HDFS folder into a local folder, such that Hadoop sees it as an actual HDFS folder? The files keep adding to the SAN disk, so Hadoop needs to see the new files as they are being added.
This is similar to Azure's HDInsight approach that you copy your files into a blob storage and HDInsight's Hadoop sees them through HDFS.
For playing around with small files using the local file system might be fine, but I wouldn't do it for any other purpose.
Putting a file in an HDFS means that it is being split to blocks which are replicated and distributed.
This gives you later on both performance and availability.
Locations of [external] tables can be directed to the local file system using file:///.
Whether it works smoothly or you'll start getting all kinds of error, that's to be seen.
Please note that for the demo I'm doing here a little trick to direct the location to a specific file, but your basic use will probably be for directories.
Demo
create external table etc_passwd
(
Username string
,Password string
,User_ID int
,Group_ID int
,User_ID_Info string
,Home_directory string
,shell_command string
)
row format delimited
fields terminated by ':'
stored as textfile
location 'file:///etc'
;
alter table etc_passwd set location 'file:///etc/passwd'
;
select * from etc_passwd limit 10
;
+----------+----------+---------+----------+--------------+-----------------+----------------+
| username | password | user_id | group_id | user_id_info | home_directory | shell_command |
+----------+----------+---------+----------+--------------+-----------------+----------------+
| root | x | 0 | 0 | root | /root | /bin/bash |
| bin | x | 1 | 1 | bin | /bin | /sbin/nologin |
| daemon | x | 2 | 2 | daemon | /sbin | /sbin/nologin |
| adm | x | 3 | 4 | adm | /var/adm | /sbin/nologin |
| lp | x | 4 | 7 | lp | /var/spool/lpd | /sbin/nologin |
| sync | x | 5 | 0 | sync | /sbin | /bin/sync |
| shutdown | x | 6 | 0 | shutdown | /sbin | /sbin/shutdown |
| halt | x | 7 | 0 | halt | /sbin | /sbin/halt |
| mail | x | 8 | 12 | mail | /var/spool/mail | /sbin/nologin |
| uucp | x | 10 | 14 | uucp | /var/spool/uucp | /sbin/nologin |
+----------+----------+---------+----------+--------------+-----------------+----------------+
You can mount your hdfs path into local folder, for example with hdfs mount
Please follow this for more info
But if you want speed, it isn't an option
I have a multiline date, and I'd like to insert it in a table. Then of course, I'd like to retrieve it while preserving the places of cartridge returns.
For example. I have data like this in text file
-------------------------------
| ID | text |
| | |
| 01 | This is headline. |
| 02 | This is all the text.|
| | ¤ |
| | Of great story once |
| 03 | Great weather |
-------------------------------
The ¤ is the indicator of cartridge return. When I try to run the query then data comes like this:
-------------------------------
| ID | text |
| | |
| 01 | This is headline. |
| 02 | This is all the text.|
| 03 | Great weather |
-------------------------------
What I'd like to have in table: (I have no idea how to show cartridge return in the example below)
-----------------------------------------------------
| ID | text |
| | |
| 01 | This is headline. |
| 02 | This is all the text. Of great story once |
| 03 | Great weather |
-----------------------------------------------------
Which is of course, wrong as the data for ID 02 wasn't imported completely.
Here is my script:
LOAD DATA
INFILE "file.txt" BADFILE "file.bad" DISCARDFILE "file.dsc"
APPEND
INTO TABLE text_table
FIELDS TERMINATED BY X'7C' TRAILING NULLCOLS
(
employee_id,
exp_pro CHAR(4000)
)
Any ideas?
First make sure the issue isn't with how you're viewing the data (or the IDE used). Sometimes viewers will simply stop at a linefeed (or carriage return, or some binary char).
Try dumping a hex representation of some data first. for example:
with txt as (
select 'This is line 1.' || chr(13) || chr(10) || 'Line 2.' as lines
from dual
)
select dump(txt.lines, 16) from txt;
You should be able to see the 0d0a (crlf) chars, or whatever other "non-printable" chars exists, if any.
I am migrating a MySQL 5.1 database in Amazon's EC2, and I am having issues tables with longblob datatype we use for image storage. Basically, after the migration, the data in the longblob column is a different size, due to the fact that the character encoding seems to be different.
First of all, here is an example of before and after the migration:
Old:
x??]]??}?_ѕ??d??i|w?%?????q$??+?
New:
x��]]����_ѕ��d��i|w�%�����q$��+�
I checked the character set variables on both machines and they are identical. I also checked the 'show create table' and they are identical as well. The client's are both connecting the same way (no SET NAMES, or specifying character sets).
Here is the mysqldump command I used (I tried it without --hex-blob as well):
mysqldump --hex-blob --default-character-set=utf8 --tab=. DB_NAME
Here is how I loaded the data:
mysql DB_NAME --default-character-set=utf8 -e "LOAD DATA INFILE 'EXAMPLE.txt' INTO TABLE EXAMPLE;"
Here are the MySQL character set variables (identical):
Old:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
New:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I'm not sure what else to try to be able to run mysqldump and have the blob data be identical on both machines. Any tips would be greatly appreciated.
The issue seems to be a bug in mysql (http://bugs.mysql.com/bug.php?id=27724). The solution is to not use mysqldump, but to write your own SELECT INTO OUTFILE script for the tables that have blob data. Here is an example:
SELECT
COALESCE(column1, #nullval),
COALESCE(column2, #nullval),
COALESCE(HEX(column3), #nullval),
COALESCE(column4, #nullval),
COALESCE(column5, #nullval)
FROM table
INTO OUTFILE '/mnt/dump/table.txt'
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
To load the data:
SET NAMES utf8;
LOAD DATA INFILE '/mnt/dump/table.txt'
INTO TABLE table
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
(column1, column1, #column1, column1, column1)
SET data = UNHEX(#column1)
This loads the blob data correctly.