Pass argument in quotes with percent signs in Windows cmd - windows

I'm trying to use Hadoop's stat command to retrieve file information from my HDFS. On Linux, you pass formatting strings to stat (similar to the GNU stat builtin) like:
$ hdfs dfs -stat "type:%F perm:%a %u:%g size:%b mtime:%y atime:%x name:%n" /file
But I can't figure out how to get this to work in Windows' cmd prompt:
C:\> hdfs dfs -stat "type:%F" /file
C:\> hdfs dfs -stat "type:%F" /
-stat: java.net.URISyntaxException: Relative path in absolute URI: type:F
...
...it looks like it's trying to interpret the first argument as the path, instead of the second one. So I thought "maybe I need to include literal quotes?" Trying to escape the quotes with ^" doesn't work:
C:\> hdfs dfs -stat "^"type:%F^"" /
-stat: java.net.URISyntaxException: Relative path in absolute URI: ^^type:F
...
...in fact, it looks like it auto-escaped the ^ I sent in, but didn't send any quotes at all. Trying to surround the whole argument in single quotes rather than double quotes also doesn't work:
C:\> hdfs dfs -stat '^"type:%F^"' /
-stat: java.net.URISyntaxException: Relative path in absolute URI: 'type:F'
...
This time, it included the single quotes but again skipped the double quotes. Double-escaping the carats also doesn't work:
C:\> hdfs dfs -stat '^^"type:%F^^"' /
-stat: java.net.URISyntaxException: Relative path in absolute URI: 'type:F%5E%5E'
...
Triple-escaping the carats yields the same result as using a single carat.
I've found that a kludgey solution is to begin the formatting string with %3, and not surround it with quotes
C:\> hdfs dfs -stat %3%u /
3Andrew.Watsonuu
C:\> hdfs dfs -stat %3%u%g /
3Andrew.Watsonsupergroupgg
...but you can see that the returned string then has a 3 at the start, and the last flag character is doubled at the end uu or gg. I think this is because the %N are converted to the arguments that I passed to that function, like:
C:\> hdfs dfs -stat %0 /
stat: `hadoop': No such file or directory
2019-09-25 12:28:00
C:\> hdfs dfs -stat %1 /
stat: `fs': No such file or directory
2019-09-25 12:28:00
C:\> hdfs dfs -stat "%2" /
-stat: Illegal option -stat
You can see that %0, %1, and %2 correspond to the first, second, and third tokens in that command. So when I call %3, it substitutes the fourth argument into itself. This explains the weird repeated, glitchy output:
C:\> hdfs dfs -stat %3"repeat" /
3repeat"repeat"repeat
So the best solution I've come up with so far is to pass a superfluous argument at the end (which will throw an error), but then reference that argument earlier in the command, like:
C:\> hdfs dfs -stat -R %6 / "%%u %%g %%Y" 2> nul
Andrew.Watson supergroup 1569414480510
Andrew.Watson supergroup 1568728730673
...
Andrew.Watson supergroup 1568103636381
Andrew.Watson supergroup 1568103590659
It throws that error at the end, which I hide by piping it to nul. There's got to be a better way to do this. Any ideas?

Related

How can we set the block size in hadoop specific to each file?

for example if my input fie has 500MB i want this to split 250MB each, if my input file is 600MB block size should be 300MB
If you are loading files into hdfs you can put with dfs.blocksize oprtion, you can calculate parameter in a shell depending on size.
hdfs dfs -D dfs.blocksize=268435456 -put myfile /some/hdfs/location
If you already have files in HDFS and want to change it's block size, you need to rewrite it.
(1) move file to tmp location:
hdfs dfs -mv /some/hdfs/location/myfile /tmp
(2) Copy it back with -D dfs.blocksize=268435456
hdfs dfs -D dfs.blocksize=268435456 -cp /tmp/myfile /some/hdfs/location

“hdfs dfs -du” vs “hdfs dfs -count”, differences on expecting same results

Why hdfs dfs -du -s and hdfs dfs -count -v (supposeed same bytes at CONTENT_SIZE field) are (near but) not the same values?
Example
# at user1#borderNode1
hdfs dfs -count -v "hdfs://XYZ/apps/hive/warehouse/p_xx_db.db"
# DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
# 9087 1610048 141186781009632 hdfs://XYZ/apps/hive/warehouse/p_xx_db.db
hdfs dfs -du -s "hdfs://XYZ/apps/hive/warehouse/p_xx_db.db"
#141186781010380 hdfs://XYZ/apps/hive/warehouse/p_xx_db.db
The value 141186781009632 is not 141186781010380.
The difference 141186781010380-141186781009632=748 is less tham the blocksize (134217728 in the example)... so, perhaps, one is exact and other not, but I not see this kind of documentation on Hadoop.
PS: no clues here neither at guide,
hdfs dfs -count: "Count the number of ... bytes under the directory... output column CONTENT_SIZE".
dfs -du: "Displays sizes files... contained in the given directory".
Guide say only that both are number of bytes contained under the directory.

hdfs ls on directory returns No such file or directory error

HDFS ls on below two directories returning No such file or directory error.
[mybox]$ hdfs dfs -ls /data/tdc/dv1/corp/base/dpp/raw/load_date=2018-05-01/ | grep Tenant
drwxr-xr-x - tdcdv1r tdcdv1c 0 2018-05-01 18:28 /data/tdc/dv1/corp/base/dpp/raw/load_date=2018-05-01/rtng_ky=Access.NBNOrder.Amend.Info.{Tenant}.Rejected.v2.event
drwxr-xr-x - tdcdv1r tdcdv1c 0 2018-05-01 15:35 /data/tdc/dv1/corp/base/dpp/raw/load_date=2018-05-01/rtng_ky=Access.NBNOrder.Amend.Info.{Tenant}.v2.event
See the error:
[mybox]$ hdfs dfs -ls /data/tdc/dv1/corp/base/dpp/raw/load_date=2018-05-01/rtng_ky=Access.NBNOrder.Amend.Info.{Tenant}.Rejected.v2.event
ls: `/data/tdc/dv1/corp/base/dpp/raw/load_date=2018-05-01/rtng_ky=Access.NBNOrder.Amend.Info.{Tenant}.Rejected.v2.event': No such file or directory
I am not able to understand. Its a directory, it should return the content but its returning error.
You just need to escape the weird characters ({ and }) in the path:
hdfs dfs -ls /data/tdc/dv1/corp/base/dpp/raw/load_date=2018-05-01/rtng_ky=Access.NBNOrder.Amend.Info.\\{Tenant\\}.Rejected.v2.event
EDIT
As in the comments said you can comment the path to avoid escape the weird characters.
This should work fine:
hdfs dfs -ls '/data/tdc/dv1/corp/base/dpp/raw/load_date=2018-05-01/rtng_ky=Access.NBNOrder.Amend.Info.{Tenant}.Rejected.v2.event'

Difference between 'hdfs dfs -ls' and 'hdfs dfs -ls /'

Why hdfs dfs -ls points to the different location than hdfs dfs -ls /?
It can be clearly seen from below screenshot of two commands give different output:
What is the main cause of the outputs above?
From the official source code org.apache.hadoop.fs.shell.Ls.java . Just search for DESCRIPTION word. It will list below statements:-
public static final String DESCRIPTION =
"List the contents that match the specified file pattern. If " +
"path is not specified, the contents of /user/<currentUser> " +
"will be listed. For a directory a list of its direct children " +
"is returned (unless -" + OPTION_DIRECTORY +
" option is specified)"
hadoop fs -ls will list home directory content of current user.
hadoop fs -ls / will list direct childs of root directory.
The default location for -ls in Hadoop is the home directory of the user, in this case /user/root.
Adding the / makes the -ls command point at the root directory of the file system.
The / looks for the root Folder of the Hdfs

Hadoop fs lookup for block size?

In Hadoop fs how to lookup the block size for a particular file?
I was primarily interested in a command line, something like:
hadoop fs ... hdfs://fs1.data/...
But it looks like that does not exist. Is there a Java solution?
The fsck commands in the other answers list the blocks and allow you to see the number of blocks. However, to see the actual block size in bytes with no extra cruft do:
hadoop fs -stat %o /filename
Default block size is:
hdfs getconf -confKey dfs.blocksize
Details about units
The units for the block size are not documented in the hadoop fs -stat command, however, looking at the source line and the docs for the method it calls we can see it uses bytes and cannot report block sizes over about 9 exabytes.
The units for the hdfs getconf command may not be bytes. It returns whatever string is being used for dfs.blocksize in the configuration file. (This is seen in the source for the final function and its indirect caller)
Seems hadoop fs doesn't have options to do this.
But hadoop fsck could.
You can try this
$HADOOP_HOME/bin/hadoop fsck /path/to/file -files -blocks
I think it should be doable with:
hadoop fsck /filename -blocks
but I get Connection refused
Try to code below
path=hdfs://a/b/c
size=`hdfs dfs -count ${path} | awk '{print $3}'`
echo $size
For displaying the actual block size of the existing file within HDFS I used:
[pety#master1 ~]$ hdfs dfs -stat %o /tmp/testfile_64
67108864

Resources