DataError: Supplied data does not contain specified dimensions, the following dimensions were not found: ['pickup_x', 'pickup_y'] - parquet

I tried to following code from Nytaxi Hover and needed to read file format ".parq" this is the following code to read:
Read the parquet file
df = dd.read_parquet('./data/nyc_taxi_wide.parq').persist()
this parquet file format is an error and I tried using another parquet data that was downloaded from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page but the error still happened and this is the sentences of the error:
DataError: Supplied data does not contain specified dimensions, the following dimensions were not found: ['pickup_x', 'pickup_y'].
How can I resolve this issue?

Related

Converting from .dat to netcdf(.nc)

I have a .dat file https://www.dropbox.com/s/8dbjg0i6l7a4sb6/CRUST10-xyz-complete.dat?dl=0 that I need to convert to either .grd or .nc, so that I can visualize the data in GMT(Generic Mapping Tool). I tried to do this using cdo using following command:
cdo -f nc import_binary CRUST10-xyz-complete.dat CRUST10-xyz-complete.nc
but got following error:
Open Error: Unknown keyword in description file
--> The invalid description file record is:
--> 0.5,89.5,4.19,0,2,0.7,0,0.73,1.62,5.01,14.25,10.06,7.36,2.7,1.5,3.81,2,3.5,0,5,6.5,7.1,8.07,5.5865805168986,6.7596467391304,2.3888888888889
The data file was not opened.
cdo import_binary (Abort): Open failed!
Can anyone please guide?
First make .ctl file then apply:-
cdo -f nc import_binary CRUST10-xyz-complete.ctl CRUST10-xyz-complete.nc
Here is the example link for to make .ctl file. http://cola.gmu.edu/grads/gadoc/descriptorfile.html
It will definitely work for you. Thanks!
Without the data itself, it is hard to see what went wrong. In any case, just look at the following message from cdo forum:
https://code.mpimet.mpg.de/boards/1/topics/3631
which you could use as an example of how to convert ASCII data to netCDF using cdo's.

importing tbl file to monetDB on windows 10

I am having trouble importing the data of the TPCH-Benchmark (generated with dbgen) into my monetDB-Database.
I've already created all the tables and I'm trying to import using the following command:
COPY RECORDS INTO region FROM "PATH\region.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
And I get the following error message:
syntax error, unexpected RECORDS, expecting BINARY or INTO in: "copy records"
I also found out this one on the internet:
COPY INTO sys.region 'PATH/region.tbl' using delimiters '|','\n';
But I get the following error message:
syntax error, unexpected IDENT, expecting FROM in: "copy into sys.region "C:\ProgramData\MySQL\MySQL Server 5.7\Uploads\region."
Because I'm a new monetDB user I'm not getting
What I'm doing wrong ?
Any help will be appreciate :)
The RECORDS construct expects a number, specifically how many records you are to load. I usually do this:
COPY 5 RECORDS INTO region FROM '/path/to/region.tbl' USING DELIMITERS '|', '|\n' LOCKED;
Also in the second attempt you are missing a FROM before the path to the file like
COPY INTO sys.region FROM '/path/to/region.tbl' USING DELIMITERS '|', '\n';
See here for more information: https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto

ImageMagick Warning (TIF Conversion)

When I convert a Multi-Image TIF File to Individual TIF Files using ImageMagick I'm getting a Warning stating "Invalid TIFF Directory; tags are not sorted in ascending order'.
convert.exe: Invalid TIFF directory; tags are not sorted in ascending order.
'TIFFReadDirectoryCheckorder' # warning/tiff.c/TIFFWarnings/847
Hoping for Any advice I can get on this error. I am trying to take a Multi-Image TIF File and Turn it into Individual Files, however I need them to be in the order in which they were listed in the Original TIF File for this to work.
If you still have the issue check the limitation disk:
convert -list resource
Example for debian 9 /etc/ImageMagick-6/policy.xml.
I hope that could be helpful.

Load multiple files in pig - extended

Please help me out...
I have spent a lot of hours on this.
I have files in a folder in which i wish them to be loaded according to the order of their file name.
I have even went to the extend of writing Java code to convert the file names to match the format in the guides in the following links.
Load multiple files in pig
Pig Latin: Load multiple files from a date range (part of the directory structure)
http://netezzaadmin.wordpress.com/2013/09/25/passing-parameters-to-pig-scripts/
I am using pig 11.0
In my script.pig,
set io.sort.mb 10;
REGISTER 'path_to/lib/pig/piggybank.jar';
data_ = LOAD '$input' USING org.apache.pig.piggybank.storage.XMLLoader('Data') AS (data_:chararray);
DUMP data_;
In shell
[root#servername currentfolder]# pig -x local script.pig -param input=/20131217/{1..10}.xml
Error returned:
[main] ERROR.org.apache.pig.Main - ERROR 2999: Unexpected error. Undefined parameter : input
I dont know why are you using input parameters.
For example for loading every file in folder MyFolder/CurrentDate/ (in YYYYMMDD format), I am using following script:
%default DATE `date +%Y%m%d`;
x_basic_table = LOAD '/MyFolder/$DATE';
Nice day

Error running hive script in pseudo distributed mode

I am trying to run a hive script in pseudo distributed mode. The commands in the script runs absolutely fine when I run it interactive mode. However, when I add all those commands in a script and run I get an error.
The script:
add jar /path/to/jar/file;
create table flights(year int, month int,code string) row format serde 'com.bizo.hive.serde.csv.CSVSerde';
load data inpath '/tmp/hive-user/On_Time_On_Time_Performance_2013_1.csv' overwrite into table flights;
The 'On_Time_On_Time_Performance_2013_1.csv' does exist in the HDFS. The error I get is:
FAILED: SemanticException Line 3:17 Invalid path ''/tmp/hive-user/On_Time_On_Time_Performance_2013_1.csv'': No files matching path hdfs://localhost:54310/tmp/hive-user/On_Time_On_Time_Performance_2013_1.csv
fs.default.name=hdfs://localhost:54310
My hadoop is running fine.
Can someone give any pointers?
Thanks.
This is not really an answer, but it is a more detailed and repeatable formulation of your question.
a) one needs to download the csv-serde from here: git clone https://github.com/ogrodnek/csv-serde
b) Build it using mvn package
c) Create a text file containing three comma separated fields corresponding to the three fields of the given table.
c) If the path is say "/shared" then the following is the correct sequence to load:
add jar /shared/csv-serde/target/csv-serde-1.1.2-0.11.0-all.jar;
drop table if exists flights;
create table flights(year int, month int,code string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' stored as textfile;
load data inpath '/tmp/hive-user/On_Time_On_Time_Performance_2013_1.csv' overwrite into table flights;
I do see the same error as in the OP: FAILED: SemanticException Line 2:17 Invalid path ''/tmp/hive-user/On_Time_On_Time_Performance_2013_1.csv'': No files matching path hdfs://localhost:9000/tmp/hive-user/On_Time_On_Time_Performance_2013_1.csv

Resources