.. _quickstart:

Quickstart
==========

This section will help you to quickly getting started with the platform. For more details have a look at the rest of this guide, and also check the `Tutorials <https://bigdata.cesga.es/#tutorials>`_ that we have prepared and the :ref:`want_to_know_more` section.

.. warning:: Before connecting we always recommend that you first start the VPN. If not you will not have access to some services.

If for some reason you are not using the VPN, then one alternative could be to launch a remote desktop from the visualization platform and then connect from there.

By far, the most common way to connect is by establishing an SSH session::

    ssh username@hadoop3.cesga.es

Once connected, you will notice that there are two main filesytems:

- **HOME**: The standard filesystem when you log in
- **HDFS**: The distributed Hadoop filesystem

To migrate your HDFS data from the old platform to the new one, you can use a command similar to the following::

    hadoop distcp -i -pat -update hdfs://10.121.13.19:8020/user/uscfajlc/wcresult hdfs://nameservice1/user/uscfajlc/wcresult

.. note::
   It is recommended to launch the distcp command inside a screen session so it will continue later.

See the :ref:`migrating_data` section for more details about how to migrate your data from the previous platform.

You can then start using the tools you are interested in like :ref:`spark` or :ref:`hive`.

.. note:: The default version of Spark is 2.4.0. If you plan to use code coming from Spark 1.6 take that into account.

There is also a nice web user interface that you can use to get started with the platform. You can find more information in the :ref:`webui` and :ref:`hue` sections.