Exercise - Storing and Moving Data
In this exercise we will try out transferring files from your personal computer (PC) to Frontera using command-line tools. These tools are already present in Linux and Mac systems; on Windows, you may want to install WSL to get the same functionality. First we will go over a few simple examples in which a couple of tools are used on individual files, and then a more detailed example to be used for larger files and directories.
A good place to start is to check your allocation and storage usage. Log in to Frontera and try /usr/local/etc/taccinfo
(or simply view its output, which appears automatically upon login). Examine the information there, especially the "Disk quotas for user" section. You'll want to remain logged in so you can complete some of the steps in this exercise.
First, let's get the data we're going to transfer set up. We have some small example files for you to use that you can download, which are the words of Rev. Dr. Martin Luther King Jr. in his famous "I Have a Dream" speech. Make a new directory in your home directory on your PC named speech
and download part1.txt, part2.txt, and part3.txt into this directory. Use cat
or a text editor to view the contents if you wish to become familiar with what you are transferring.
The next step is to prepare a directory on Frontera to receive the files you will be transferring:
$ cdw
$ mkdir speech
Now, from the PC, transfer a single file to the new $WORK/speech
directory on Frontera with scp (note, the commands to be executed on your PC will be prefaced with the prompt [myPC]$):
Remember to replace <username>
with your username. Note that you can use Frontera environment variables in the path within scp
commands by putting a \
in front of them. Be aware that TACC multi-factor authentication (MFA) will be required for each transfer operation. Subsequently, you may verify the arrival of the file on Frontera using ls
, and inspect its contents with cat
if you like.
Next, move the second file using rsync
:
As you can see, environment variables in the path work with rsync
as well. Since rsync
only transfers updates or changes, you can transfer the remaining file by using rsync
to synchronize the directories as follows:
You may wish to further explore the flags for rsync
as well.
The above example is fine for small files, and small numbers of files. Otherwise, you should use compression and striping, so let's walk through it. Let's imagine each file in our directory is large (though these example files are small), and that the directory has many files (even though we will only work with these 3). So, our first step would be to compress the directory, for which we will use tar
and gzip
(via the tar -z
flag). On your PC:
[myPC]$ tar -czf civil_rights.tar.gz ~/speech
Next, we'll make a new directory called "Test" under $WORK
on Frontera and set the stripe count, since it will be our receiving directory (login first):
$ mkdir Test
$ lfs setstripe -c 2 Test
This will cause files written to this directory to be striped across 2 OSTs. Note that this is just an example, but TACC's rule of thumb for Frontera is "allow at least one stripe for each 100GB in the file" so adjust accordingly. Now, on your PC:
On Frontera, you can verify the striping and reverse the compression (from the Test
directory):
$ cd Test
$ lfs getstripe civil_rights.tar.gz
$ tar -xzf civil_rights.tar.gz
You should be able to see all 3 parts of the famous speech in the uncompressed directory. This second method can be used for any single large file, even one that did not begin as a directory. If you are curious, you can check /usr/local/etc/taccinfo
to see how your storage usage has changed.
To clean up when you are done:
$ cdw
$ rm -rf Test speech