Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS chart : unable to put files in the filesystem #39

Closed
loicdescotte opened this issue Dec 11, 2020 · 3 comments
Closed

HDFS chart : unable to put files in the filesystem #39

loicdescotte opened this issue Dec 11, 2020 · 3 comments

Comments

@loicdescotte
Copy link

loicdescotte commented Dec 11, 2020

I am trying to copy a local file on my HDFS, deployed with the helm chart.

I am doing :
helm install hdfs https://github.com/Gradiant/charts/releases/download/hdfs-0.1.0/hdfs-0.1.0.tgz -f hdfs-values.yaml

My hdfs-values file :

conf:
  coreSite:
  hdfsSite:
    dfs.replication: 2
dataNode:
  replicas: 2  
  pdbMinAvailable: 2  
  resources:
    requests:
      memory: "256Mi"
      cpu: "10m"
    limits:
      memory: "2048Mi"
      cpu: "1000m"

kubectl get pods show that all pods are running and ready.

NAME                                  READY   STATUS      RESTARTS   AGE
hdfs-httpfs-5686fd75df-2pgk7          1/1     Running     0          59m
hdfs-namenode-0                       2/2     Running     1          59m
hdfs-datanode-0                       1/1     Running     0          59m
hdfs-datanode-1                       1/1     Running     0          58m

I use port forward to access to the K8S HDFS from my local machine :

# namenode web UI
kubectl port-forward svc/hdfs-namenode 50070:50070

# hdfs port
kubectl port-forward hdfs-namenode-0 8020:8020

On my local machine, I have just unzipped a hadoop 2 distribution (2.10.0) and updated core-site.xml like this, to use the forwarded port :

<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
    <description>The name of the default file system.  Either the
      literal string "local" or a host:port for NDFS.
    </description>
    <final>true</final>
  </property>
# ok : 
hdfs dfs -mkdir /jars 
# not ok : 
hdfs dfs -put helloSpark.jar /jars
20/12/11 09:50:53 INFO hdfs.DataStreamer: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
        at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1699)
        at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1655)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:710)
20/12/11 09:50:53 WARN hdfs.DataStreamer: Abandoning BP-831521929-10.42.1.6-1607678530556:blk_1073741827_1003
20/12/11 09:50:53 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.42.1.5:50010,DS-4153f502-30da-42d7-a415-69601658066a,DISK]

I don't have any error in the datanodes logs, and in the namenode , the only error is the same as above.

Did I miss something in the configuration?

Thanks :)

@loicdescotte
Copy link
Author

It is working when :

  • I copy my jar into the namenode pod
  • I run the hdfs dfs -put command

So it must be something missing in my hadoop client configuration?

@cgiraldo
Copy link
Member

cgiraldo commented Dec 11, 2020

your hdfs cli must be part of the hadoop cluster. It is not enough to configure a port-forward for the namenode pod.

If your client is not a pod running in the same kubernetes cluster, I recommend you to use the httpfs service:

https://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html

Just use deployed the httpfs service:

kubectl get services

or ingress (your kubernetes cluster must have an ingress controller for this option):

kubectl get ingress

@loicdescotte
Copy link
Author

It is working with httpfs, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants