Spark with HDFS

Example

  • Check the configurate (core-site.xml) to get hdfs address.
<configuration>
<property>
   <name>fs.defaultFS</name>
   <value>hdfs://master:9000</value>
</property>
...
<configuration>
  • Access HDFS in PySpark
pyspark
  • read from hdfs

    lines = sc.textFile("hdfs://master:9000/user/input/sample.txt")
    lines.count()
    
  • safe mode

    hadoop dfsadmin -safemode leave
    
  • write into hdfs

    rowdata = sc.parallelize([[1,2],[3,4]])
    rowdata.map(lambda row: row[0] + row[1])\
      .saveAsTextFile("hdfs://master:9000/user/input/example")
    
hadoop fs -cat hdfs://master:9000/user/input/example/part-00000

results matching ""

    No results matching ""