Replace character in pig - hadoop

My data is in the following format..
{"Foo":"ABC","Bar":"20090101100000","Quux":"{\"QuuxId\":1234,\"QuuxName\":\"Sam\"}"}
I need it to be in this format:
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}
I'm trying to using Pig's replace function to get it in the format I need..
So, I tried ..
"LOGS = LOAD 'inputloc' USING TextStorage() as unparsedString:chararray;;" +
"REPL1 = foreach LOGS REPLACE($0, '"{', '{');" +
"REPL2 = foreach REPL1 REPLACE($0, '}"', '}');"
"STORE REPL2 INTO 'outputlocation';"
It throws an error.. Unexpected token '{' in expression or statement.
So based on an answer here, I tried:
"REPL1 = foreach LOGS REPLACE($0, '"\\{', '\\{');"
Now, it gives an error.. Unexpected token '\\' in expression or statement.
Any help is sincerely appreciated..
Thanks

Works for me:
REPL1 = FOREACH LOGS GENERATE REPLACE($0, '"\\{', '\\{');
In your code you are missing the GENERATE and the double quotes at the beginning and end are wrong.

Please check the below code.
LOGS = load 'inputlocation' as unparsedString:chararray;
REPL1 = foreach LOGS generate REPLACE($0, '"\\{', '\\{');
REPL2 = foreach REPL1 generate REPLACE($0, '}"', '}');
STORE REPL2 INTO 'outputlocation';
Hope it will work.

Load the data using the delimiter as shown below:
sam = load 'sampledata' using PigStorage(',');
sam1 = foreach sam generate $0,$1,CONCAT(REPLACE($2,'([^A-Za-z0-9:"{]+)',''),REPLACE($3,'([^A-Za-z0-9:"}]+)',''));
This will give you the desired output.
({"Foo":"ABC","Bar":"20090101100000","Quux":"{"QuuxId":1234"QuuxName":"Sam"}"})

Related

Oracle Query in Codeigniter giving ORA-01722 and ORA-01756

Im usually use mysql database in my website, but i trying to learn more about the oracle...
My code working 2days ago, but right now its giving an error message such as ORA-number
this is my database fields
KODE_GUDANG CHAR
GUDANG CHAR
LASTUPDATE CHAR
KODE_UNIT CHAR
NOMER_REKJURNAL CHAR
KODE_GUDANG_KREDIT CHAR
this is my models for query
function getDataOneColumn($getCol, $table, $column, $id) {
return $this->db->query("SELECT $getCol as val FROM $table WHERE $column = $id")->row_array();
}
This is for my controller that giving an error : ORA-01722
$this->data['no_rek'] = ($this->data['no_rek'] =='')?$this->m_dao->getDataOneColumn("NOMER_REKJURNAL","TBL_MASTER_GUDANG","KODE_GUDANG",$this->data['kode_gdg'])['VAL']:$this->data['no_rek'];
and after that i reading the docummentation, its means "You executed a SQL statement that tried to convert a string to a number"
i try to change my code to
$this->data['no_rek'] = ($this->data['no_rek'] =='')?$this->m_dao->getDataOneColumn("NOMER_REKJURNAL","TBL_MASTER_GUDANG","KODE_GUDANG",'"'.$this->data['kode_gdg'])['VAL'].'"':"'".$this->data['no_rek']."'";
this one giving an other ORA error,ORA-01756. its means "You tried to execute a statement that contained a string that was not surrounded by two single quotes"
New Error
Error Number: 1722
ORA-01722: invalid number
SELECT NOMER_REKJURNAL as val FROM TBL_MASTER_GUDANG WHERE KODE_GUDANG = 04
Filename: C:/xampp/htdocs/formula/system/database/DB_driver.php
Line Number: 691
Can somebody tell me why my code getting an error after 2days ?
And
How to solve this error?
thank you
After reading lot of post with ORA error, I can solve my problem.
I just need adding " ' ".$val." ' "
$this->data['no_rek'] = ($this->data['no_rek'] =='')?$this->m_dao->getDataOneColumn("NOMER_REKJURNAL","TBL_MASTER_GUDANG","KODE_GUDANG","'".$this->data['kode_gdg']."'")['VAL']:$this->data['no_rek'];

Retrieve bibtex data from crossref by sending DOI from matlab: translation from ruby

I want to retrieve bibtex data (for building a bibliography) by sending a DOI (Digital Object Identifier) to http://www.crossref.org from within matlab.
The crossref API suggests something like this:
curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.1038/nrd842
based on this source.
Another example from here suggests the following in ruby:
open("http://dx.doi.org/10.1038/nrd842","Accept" => "text/bibliography; style=bibtex"){|f| f.each {|line| print line}}
Although I've heard ruby rocks I want to do this in matlab and have no clue how to translate the ruby message or interpret the crossref command.
The following is what I have so far to send a doi to crossref and retrieve data in xml (in variable retdat), but not bibtex, format:
clear
clc
doi = '10.1038/nrd842';
URL_PATTERN = 'http://dx.doi.org/%s';
fetchurl = sprintf(URL_PATTERN,doi);
numinputs = 1;
www = java.net.URL(fetchurl);
is = www.openStream;
%Read stream of data
isr = java.io.InputStreamReader(is);
br = java.io.BufferedReader(isr);
%Parse return data
retdat = [];
next_line = toCharArray(br.readLine)'; %First line contains headings, determine length
%Loop through data
while ischar(next_line)
retdat = [retdat, 13, next_line];
tmp = br.readLine;
try
next_line = toCharArray(tmp)';
if strcmp(next_line,'M END')
next_line = [];
break
end
catch
break;
end
end
%Cleanup java objects
br.close;
isr.close;
is.close;
Help translating the ruby statement to something matlab can send using a script such as that posted to establish the communication with crossref would be greatly appreciated.
Edit:
Additional constraints include backward compatibility of the code (back at least to R14) :>(. Also, no use of ruby, since that solves the problem but is not a "matlab" solution, see here for how to invoke ruby from matlab via system('ruby script.rb').
You can easily edit urlread for what you need. I won't post my modified urlread function code due to copyright.
In urlread, (mine is at C:\Program Files\MATLAB\R2012a\toolbox\matlab\iofun\urlread.m), as the least elegant solution:
Right before "% Read the data from the connection." I added:
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');
The answer from user2034006 lays the path to a solution.
The following script works when urlread is modified:
URL_PATTERN = 'http://dx.doi.org/%s';
doi = '10.1038/nrd842';
fetchurl = sprintf(URL_PATTERN,doi);
method = 'post';
params= {};
[string,status] = urlread(fetchurl,method,params);
The modification in urlread is not identical to the suggestion of user2034006. Things worked when the line
urlConnection.setRequestProperty('Content-Type','application/x-www-form-urlencoded');
in urlread was replaced with
urlConnection.setRequestProperty('Accept','text/bibliography; style=bibtex');

Cannot compute MAX

Setup data
mkdir data
echo -e "1\n2\n3\n4\n8\n4\n3\n6" > data/data.txt
Launch Pig in local mode
pig -x local
Script
a = load 'data' Using PigStorage() As (value:int);
b = foreach a generate MAX(value);
dump b;
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.MAX as multiple or none of them fit. Please use an explicit cast.
Just found the answer, it just take a GROUP ALL before calling the function ... Kind of feel the error message could be a little clearer ...
a = load 'data' Using PigStorage() As (value:int);
b = GROUP a ALL;
c = foreach b generate MAX(a.value);
dump c;
> 8

Magento stock update with csv

I am using the following script
http://www.sonassi.com/knowledge-base/magento-kb/mass-update-stock-levels-in-magento-fast/
It works beautifully with the test CSV file.
My POS creates a CSV file but it puts a different heading so the script does not work. I want to automate the process. Is there any way to change the names of headers automatically?
The script requires the headers to be
“sku”,”qty”
my CSV is
“ITEM”,”STOCK”
Is there any way for these two different names to be linked within the script so that my script sees ITEM as sku and STOCK as qty?
You should create a php script with an input of the yourfilename.csv, which is the unformatted file.
$file = file_get_contents('yourfilename.csv');
$file = str_replace('ITEM', 'sku', $file);
$file = str_replace('STOCK', 'qty', $file);
file_put_contents('yourfilename.csv', $file);
The below links are for your reference.
find and replace values in a flat-file using PHP
http://forums.phpfreaks.com/index.php?topic=327900.0
Hope it helps.
Cheers
PHP isn't usually the best way to go for file manipulation granting the fact you have SSH access.
You could also run the following commands (if you have perl installed, which is default in most setups...):
perl -pi -e 's/ITEM/sku/g' /path/to/your/csvfile.csv
perl -pi -e 's/STOCK/qty/g' /path/to/your/csvfile.csv
If you want qty update using raw sql way then you can create a function like below:
function _updateStocks($data){
    $connection     = _getConnection('core_write');
    $sku            = $data[0];
    $newQty         = $data[1];
    $productId      = _getIdFromSku($sku);
    $attributeId    = _getAttributeId();
 
    $sql            = "UPDATE " . _getTableName('cataloginventory_stock_item') . " csi,
                       " . _getTableName('cataloginventory_stock_status') . " css
                       SET
                       csi.qty = ?,
                       csi.is_in_stock = ?,
                       css.qty = ?,
                       css.stock_status = ?
                       WHERE
                       csi.product_id = ?
                       AND csi.product_id = css.product_id";
    $isInStock      = $newQty > 0 ? 1 : 0;
    $stockStatus    = $newQty > 0 ? 1 : 0;
    $connection->query($sql, array($newQty, $isInStock, $newQty, $stockStatus, $productId));
}
And call the above function by passing csv row data as arguments. This is just a hint.
In order to get full working code with details you can refer to the following blog article:
Updating product qty in Magento in an easier & faster way
Hope this helps!

PHP parse error in rss parse function

I have a client who needs a website urgently, but I have no access to information such as the control panel.
PHP Version is 4.4 Which is a pain as I'm used to 5.
The first problem is I keep getting:
Parse error: parse error, unexpected T_OBJECT_OPERATOR, expecting ')' in D:\hshome\*******\********\includes\functions.php on line 37
This is the function in question:
function read_rss($display=0,$url='') {
$doc = new DOMDocument();
$doc->load($url);
$itemArr = array();
foreach ($doc->getElementsByTagName('item') as $node) {
if ($display == 0) {
break;
}
$itemRSS = array(
'title'=>$node->getElementsByTagName('title')->item(0)->nodeValue,
'description'=>$node->getElementsByTagName('description')->item(0)->nodeValue,
'link'=>$node->getElementsByTagName('link')->item(0)->nodeValue);
array_push($itemArr, $itemRSS);
$display--;
}
return $itemArr;
}
And the line in question:
'title'=>$node->getElementsByTagName('title')->item(0)->nodeValue,
PHP4 does not support object dereferencing. So $obj->something()->something will not work. You need to do $tmp = $obj->something(); $tmp->something...
You can't do that in PHP 4.
Have to do something like
$nodes = $node->getElementsByTagName('title');
$item = $nodes->item(0);
$value = $item->nodeValue,
Try it and it will work.
You can't chain object calls in PHP 4. You're going to have to make each call separately to a variable and store it all.
$titleobj = $node->getElementsByTagName('title');
$itemobj = $titleobj->item(0);
$value = $itemobj->nodeValue;
...
'title'=>$value,
you'll have to do it on all those chained calls
As for .htaccess ... you need to talk to someone who controls the actual server. It sounds like .htaccess isn't allowed to change the setting you're trying to change.
You need to break down that line into individual variables. PHP 4 does not like -> following parentheses. Do this instead:
$title = $node->getElementsByTagName('title');
$title = $title->item(0);
$description = $node->getElementsByTagName('description');
$description = $description->item(0);
$link = $node->getElementsByTagName('link');
$link = $link->item(0);
$itemRSS = array(
'title'=>$title->nodeValue,
'description'=>$description->nodeValue,
'link'=>$link->nodeValue);
The two variable declarations for each may be redundant and condensed, I'm not sure how PHP4 will respond. You can try to condense them if you want.
DOMDocument is php 5 function.You cant use it.
you may need to use DOM XML (PHP 4) Functions

Resources