Good day,
I am using rclone to upload large files to Gdrive and I always end up with some files causing errors for example :
9975 files to upload to Gdrive
output
Errors: 75
Checks: 0
Transferred: 9975
does that mean that all the files have been uploaded correctly or do i loose those 75 files that caused an error.
Thank you !
Found out that this means 75 files did not get transferred to Gdrive and need to be uploaded again.
Related
This is my setup:
I use AWS Batch that is running a custom Docker image
The startup.sh file is an entrypoint script that is reading the nth line of a text file and copying it from s3 into the docker.
For example, if the first line of the .txt file is 'Startup_00001/ Startup_000018 Startup_000019', the bash script reads this line, and uses a for loop to copy them over.
This is part of my bash script:
STARTUP_FILE_S3_URL=s3://cmtestbucke/Config/
Startup_FileNames=$(sed -n ${LINE}p file.txt)
for i in ${Startup_FileNames}
do
Startup_FileURL=${STARTUP_FILE_S3_URL}$i
echo $Startup_FileURL
aws s3 cp ${Startup_FileURL} /home/CM_Projects/ &
done
Here is the log output from aws:
s3://cmtestbucke/Config/Startup_000017
s3://cmtestbucke/Config/Startup_000018
s3://cmtestbucke/Config/Startup_000019
Completed 727 Bytes/727 Bytes (7.1 KiB/s) with 1 file(s) remaining download: s3://cmtestbucke/Config/Startup_000018 to Data/Config/Startup_000018
Completed 731 Bytes/731 Bytes (10.1 KiB/s) with 1 file(s) remaining download: s3://cmtestbucke/Config/Startup_000017 to Data/Config/Startup_000017
fatal error: *An error occurred (404) when calling the HeadObject operation: Key
"Config/Startup_000019 " does not exist.*
My s3 bucket certainly contains the object s3://cmtestbucke/Config/Startup_000019
I noticed this happens regardless of filenames. The last iteration always gives this error.
I tested this bash logic locally with the same aws commands. It copies all 3 files.
Can someone please help me figure out what is wrong here?
The problem was with EOL of the text file. It was set to Windows(CR LF). The docker image is running Ubuntu which caused the error. I changed the EOL to Unix(LF). The problem was solved.
When I try to upload files that are about 20 GB into HDFS they usually upload till about 12-14 GB then they stop uploading and I get a bunch of these warnings through command line
"INFO hdfs.DataStreamer: Slow ReadProcessor read fields for block BP-222805046-10.66.4.100-1587360338928:blk_1073743783_2960 took 62414ms (threshold=30000ms); ack: seqno: 226662 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0, targets:"
However, if I try to upload the files like 5-6 times, they sometimes work after the 4th or 5th attempt. I believe if I alter some data node storage settings I can achieve consistent uploads without issues but I don't know what parameters to modify in the hadoop configurations. Thanks!
Edit: This happens when I put the file into HDFS through python program which uses a subprocess call to put the file in. However, even if I directly call it from command line I still run into the same issue.
I'm sending binary .gz files from Linux to z/OS via ftps. The file transfers seem to be fine, but when the mainframe folks pkunzip the file, they get a warning:
PEX013W Record(s) being truncated to lrecl= 996. Record# 1 is 1000 bytes.
Currently I’m sending the site commands:
SITE TRAIL
200 SITE command was accepted
SITE CYLINDERS PRIMARY=50 SECONDARY=50
200 SITE command was accepted
SITE RECFM=VB LRECL=1000 BLKSIZE=32000
200 SITE command was accepted
SITE CONDDISP=delete
200 SITE command was accepted
TYPE I
200 Representation type is Image
...
250 Transfer completed successfully.
QUIT
221 Quit command received. Goodbye.
They could read the file after the pkunzip, but having a warning is not a good thing.
Output from pkunzip:
SDSF OUTPUT DISPLAY RMD0063A JOB22093 DSID 103 LINE 25 COLUMNS 02- 81
COMMAND INPUT ===> SCROLL ===> CSR
PCM123I Authorized services are unavailable.
PAM030I INPUT Archive opened: TEST.FTP.SOA5021.GZ
PAM560I ARCHIVE FASTSEEK processing is disabled.
PDA000I DDNAME=SYS00001,DISP_STATUS=MOD,DISP_NORMAL=CATALOG,DISP_ABNORMAL=
PDA000I SPACE_TYPE=TRK,SPACE_TYPE=CYL,SPACE_TYPE=BLK
PDA000I SPACE_PRIMARY=4194304,SPACE_DIRBLKS=5767182,INFO_ALCFMT=00
PDA000I VOLUMES=DPPT71,INFO_CNTL=,INFO_STORCLASS=,INFO_MGMTCLASS=
PDA000I INFO_DATACLASS=,INFO_VSAMRECORG=00,INFO_VSAMKEYOFF=0
PDA000I INFO_COPYDD=,INFO_COPYMDL=,INFO_AVGRECU=00,INFO_DSTYPE=00
PEX013W Record(s) being truncated to lrecl= 996. Record# 1 is 1000 bytes.
PEX002I TEST.FTP.SOA5021
PEX003I Extracted to TEST.FTP.SOA5021I.TXT
PAM140I FILES: EXTRACTED EXCLUDED BYPASSED IN ERROR
PAM140I 1 0 0 0
PMT002I PKUNZIP processing complete. RC=00000004 4(Dec) Start: 12:59:48.86 End
Is there a better set of site commands to transfer a .gz file from Linux to z/OS to avoid this error?
**** Update ****
Using SaggingRufus's answer below, it turns out it doesn't much matter how you send the .gz file, as long as it's binary. His suggestion pointed us to the parameters sent to the pkunzip for the output file, which was VB and was truncating 4 bytes off the record.
Because it is a variable block file, there are 4 bytes allocated to the record attributes. Allocate the file with an LRECL of 1004 and it will be fine.
Rather than generating a .zip file, perhaps generate a .tar.gz file and transfer it to z/OS UNIX? Tar is shipped with z/OS by default, and Rocket Software provides a port of gzip that is optimized for z/OS.
I'm using following simple code to upload files to hdfs.
FileSystem hdfs = FileSystem.get(config);
hdfs.copyFromLocalFile(src, dst);
The files are generated by webserver java component and rotated and closed by logback in .gz format. I've noticed that sometimes the .gz file is corrupted.
> gunzip logfile.log_2013_02_20_07.close.gz
gzip: logfile.log_2013_02_20_07.close.gz: unexpected end of file
But the following command does show me the content of the file
> hadoop fs -text /input/2013/02/20/logfile.log_2013_02_20_07.close.gz
The impact of having such files is quite disaster - since the aggregation for the whole day fails, and also several slave nodes is marked as blacklisted in such case.
What can I do in such case?
Can hadoop copyFromLocalFile() utility corrupt the file?
Does anyone met similar problem ?
It shouldn't do - this error is normally associated with GZip files which haven't been closed out when originally written to local disk, or are being copied to HDFS before they have finished being written to.
You should be able to check by running an md5sum on the original file and that in HDFS - if they match then the original file is corrupt:
hadoop fs -cat /input/2013/02/20/logfile.log_2013_02_20_07.close.gz | md5sum
md5sum /path/to/local/logfile.log_2013_02_20_07.close.gz
If they don't match they check the timestamps on the two files - the one in HDFS should be modified after the local file system one.
Hi guys : I'm trying to get the s3 distcp jar file via s3, in an EMR cluster :
s3cmd get s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar
However, the "get" command is not working:
ERROR: Skipping libs/s3distcp/: No such file or directory
This file exists in other s3 regions, also, so I even have tried :
s3cmd get s3://us-east-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar
But the ecommand still fails. But alas -- this .jar file CLEARLY exists, when we run "s3cmd ls", we can see it listed. See below for the details (example with the eu-west region) :
hadoop#ip-10-58-254-82:/mnt$ s3cmd ls s3://eu-west-1.elasticmapreduce/libs/s3distcp/
Bucket 'eu-west-1.elasticmapreduce':
2012-06-01 00:32 3614287 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar
2012-06-05 17:14 3615026 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.2/s3distcp.jar
2012-06-12 20:52 1893078 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.3/s3distcp.jar
2012-06-20 01:17 1893140 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.4/s3distcp.jar
2012-06-27 21:27 1893846 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.5/s3distcp.jar
2012-03-15 21:21 3613175 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0/s3distcp.jar
2012-06-27 21:27 1893846 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar
The above seems to confirm that, in fact the file exists.
*How can I enable the "get" command to work for this file ? *
The jar is just working fine, can you paste the error message you are getting after the get command?
:s3cmd ls s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar
2012-06-01 00:32 3614287 s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar
:s3cmd get s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar
s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar -> ./s3distcp.jar [1 of 1]
3614287 of 3614287 100% in 3s 1008.86 kB/s done