360SDN.COM

首页/Hadoop/列表

hadoop运行wordcount实例,hdfs简单操作

来源:好程序员  2017-09-19 09:01:00    评论:0点击:

1.查看hadoop版本

 

[hadoop@ltt1 sbin]$ hadoop version Hadoop 2.6.0-cdh5.12.0Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0ddCompiled by jenkins on 2017-06-29T11:33Z Compiled with protoc 2.5.0From source with checksum 7c45ae7a4592ce5af86bc4598c5b4 This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.0-cdh5.12.0.jar

 

2.通过hadoop自带的jar文件,可以简单测试一些功能。

查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表

 

[hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar An example program must be given as the first argument. Valid program names are:  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.  dbcount: An example job that count the pageview counts from a database.  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.  grep: A map/reduce program that counts the matches of a regex in the input.  join: A job that effects a join over sorted, equally partitioned datasets  multifilewc: A job that counts words from several files.  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.  randomwriter: A map/reduce program that writes 10GB of random data per node.  secondarysort: An example defining a secondary sort to the reduce.  sort: A map/reduce program that sorts the data written by the random writer.  sudoku: A sudoku solver.  teragen: Generate data for the terasort  terasort: Run the terasort  teravalidate: Checking results of terasort  wordcount: A map/reduce program that counts the words in the input files.  wordmean: A map/reduce program that counts the average length of the words in the input files.  wordmedian: A map/reduce program that counts the median length of the words in the input files.  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

 

3.在hdfs上创建文件夹

hadoop fs -mkdir /input

4.查看hdfs的更目录列表

[hadoop@ltt1 ~]$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /input
drwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp

5.上传本地文件到hdfs

hadoop fs -put $HADOOP_HOME/*.txt /input

6.查看hdfs上input目录下文件

[hadoop@ltt1 ~]$ hadoop fs -ls /input Found 3 items-rw-r--r--   2 hadoop supergroup      85063 2017-09-17 08:15 /input/LICENSE.txt-rw-r--r--   2 hadoop supergroup      14978 2017-09-17 08:15 /input/NOTICE.txt-rw-r--r--   2 hadoop supergroup       1366 2017-09-17 08:15 /input/README.txt

7.wordcount简单测试。

 

[hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount /input /output17/09/17 08:19:12 INFO input.FileInputFormat: Total input paths to process : 317/09/17 08:19:13 INFO mapreduce.JobSubmitter: number of splits:317/09/17 08:19:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_000217/09/17 08:19:14 INFO impl.YarnClientImpl: Submitted application application_1505605169997_000217/09/17 08:19:14 INFO mapreduce.Job: The url to track the job: http://ltt1.bg.cn:9180/proxy/application_1505605169997_0002/17/09/17 08:19:14 INFO mapreduce.Job: Running job: job_1505605169997_000217/09/17 08:19:27 INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false17/09/17 08:19:27 INFO mapreduce.Job:  map 0% reduce 0%17/09/17 08:19:39 INFO mapreduce.Job:  map 33% reduce 0%17/09/17 08:19:48 INFO mapreduce.Job:  map 100% reduce 0%17/09/17 08:19:50 INFO mapreduce.Job:  map 100% reduce 100%17/09/17 08:19:50 INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully17/09/17 08:19:50 INFO mapreduce.Job: Counters: 50

    File System Counters        FILE: Number of bytes read=42705        FILE: Number of bytes written=588235        FILE: Number of read operations=0        FILE: Number of large read operations=0        FILE: Number of write operations=0        HDFS: Number of bytes read=101699        HDFS: Number of bytes written=30167        HDFS: Number of read operations=12        HDFS: Number of large read operations=0        HDFS: Number of write operations=2    Job Counters        Launched map tasks=3        Launched reduce tasks=1        Data-local map tasks=2        Rack-local map tasks=1        Total time spent by all maps in occupied slots (ms)=47617        Total time spent by all reduces in occupied slots (ms)=8244        Total time spent by all map tasks (ms)=47617        Total time spent by all reduce tasks (ms)=8244        Total vcore-milliseconds taken by all map tasks=47617        Total vcore-milliseconds taken by all reduce tasks=8244        Total megabyte-milliseconds taken by all map tasks=48759808        Total megabyte-milliseconds taken by all reduce tasks=8441856    Map-Reduce Framework        Map input records=2035        Map output records=14239        Map output bytes=155828        Map output materialized bytes=42717        Input split bytes=292        Combine input records=14239        Combine output records=2653        Reduce input groups=2402        Reduce shuffle bytes=42717        Reduce input records=2653        Reduce output records=2402        Spilled Records=5306        Shuffled Maps =3        Failed Shuffles=0        Merged Map outputs=3        GC time elapsed (ms)=881        CPU time spent (ms)=22320        Physical memory (bytes) snapshot=690192384        Virtual memory (bytes) snapshot=10862809088        Total committed heap usage (bytes)=380243968    Shuffle Errors        BAD_ID=0        CONNECTION=0        IO_ERROR=0        WRONG_LENGTH=0        WRONG_MAP=0        WRONG_REDUCE=0    File Input Format Counters        Bytes Read=101407    File Output Format Counters        Bytes Written=30167

 

8.查看wordcount运行结果(由于结果太长,只举出了部分结果)

 

[hadoop@ltt1 ~]$ hadoop fs -cat /output/*worldwide,    4 would    1 writing    2 writing,    4 written    19 xmlenc    1 year    1 you    12 your    5 zlib    1  252.227-7014(a)(1))    1 §    1 “AS    1 “Contributor    1 “Contributor”    1 “Covered    1 “Executable”    1 “Initial    1 “Larger    1 “Licensable”    1 “License”    1 “Modifications”    1 “Original    1 “Participant”)    1 “Patent    1 “Source    1 “Your”)    1 “You”    2 “commercial    3 “control”    1

 

   


阅读原文

为您推荐

友情链接 |九搜汽车网 |手机ok生活信息网|ok生活信息网|ok微生活
 Powered by www.360SDN.COM   京ICP备11022651号-4 © 2012-2016 版权