How to count the number of lines in a HDFS file?
Contents
WC command
wc(word count) command is used in Linux/Unix to find out the number of lines,word count,byte and character count in a file. It can also be combine with pipes for counting number of lines in a HDFS file.
Print the number of lines in Unix/Linux
1 |
wc -l |
The wc command with option -l will return the number of lines present in a file. We can combine this command with the hadoop command to get the number of lines in a HDFS file.
Count the number of lines in a HDFS file
Method 1:
1 |
hdfs dfs -cat <filename> | wc -l |
If we combine the wc -l along with hdfs dfs -cat command,it will return the number of lines in a HDFS file.
Example:
1 2 |
hdfs dfs -cat /apps/revisit/employee_part12-0001 | wc -l 12893 |
Method 2:
1 |
hdfs dfs -text <filename> | wc -l |
hadoop fs -text command takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream.
Example:
1 2 |
hdfs dfs -text /apps/revisit/customer_20190611060814.txt | wc -l 1672 |
Recommended Articles