Swift - STARDUST
To store raw traffic traces and flow-level traces, STARDUST uses cloud-based Swift object storage. The following tutorial will show you how to access and process objects in Swift.
“OpenStack Object Storage (swift) is used for redundant, scalable data storage using clusters of standardized servers to store petabytes of accessible data. It is a long-term storage system for large amounts of static data which can be retrieved and updated. Object Storage uses a distributed architecture with no central point of control, providing greater scalability, redundancy, and permanence. Objects are written to multiple hardware devices, with the OpenStack software responsible for ensuring data replication and integrity across the cluster. Storage clusters scale horizontally by adding new nodes. Should a node fail, OpenStack works to replicate its content from other active nodes.” (from: https://docs.openstack.org/swift/latest/admin/objectstorage-intro.html)
To understand the basic concepts of account (aka project), container, and object please check this overview from the OpenStack documentation.
Authenticating with Swift
Log into your VM.
Find your credential file (each computer will be different).
user@vm001:~$ ls -a user@vm001:~$ cat .limbo_cred export OS_PROJECT_NAME=telescope export OS_USERNAME=userxxx export OS_PASSWORD=xxxxx export OS_AUTH_URL=https://hermes-auth.caida.org export OS_IDENTITY_API_VERSION=3
- This file is used to set some environment variables that allow your Swift client to authenticate with the Swift cluster.
To source the file which loads the variables into the environment (note the first dot followed by a space):
user@vm001:~$ . .limbo_cred
To check that it is in the environment you can use:
user@vm001:~$ echo $OS_PROJECT_NAME telescope
When you use the Swift command:
user@vm001:~$ swift auth
it uses the environment variables to authenticate with Swift.
Accessing objects in containers
If you use Swift with no arguments, it will give you a list of possible commands.
swift list with no argument will list all the containers:
user@vm001:~$ swift list
One of the containers,
telescope-ucsdnt-pcap-live, contains the raw pcap files for the last 5 weeks.
To list the contents of the container:
user@vm001:~$ swift list <container name>
- Note: Since Swift containers store many files, it is not recommended to list all the files in a container.
user@vm001:~$ swift list telescope-ucsdnt-pcap-live | head datasource=ucsd-nt/year=2020/month=09/day=01/hour=07/ucsd-nt.1598943600.pcap.gz datasource=ucsd-nt/year=2020/month=09/day=01/hour=08/ucsd-nt.1598947200.pcap.gz datasource=ucsd-nt/year=2020/month=09/day=01/hour=09/ucsd-nt.1598950800.pcap.gz datasource=ucsd-nt/year=2020/month=09/day=01/hour=10/ucsd-nt.1598954400.pcap.gz datasource=ucsd-nt/year=2020/month=09/day=01/hour=11/ucsd-nt.1598958000.pcap.gz ...
Since Swift does not have the concept of a hierarchical system like the UNIX file system does, you can convince Swift to have a pseudo hierarchy by setting a delimiter:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / datasource=ucsd-nt/
- Set a delimiter to slash (/). This will list everything until that slash (/).
- When listing the objects in the container, setting the delimiter will list them as if they are inheritable in which the hierarchy is separated by a slash (/).
Navigate down the hierarchy by including a prefix:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/ datasource=ucsd-nt/year=2020/
- the prefix
_datasource=ucsd-nt/_lists everything that has that prefix and delimited by a /
- the next element should be
Continue to navigate down the directory:
- This will list the months
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/ datasource=ucsd-nt/year=2020/month=09/ datasource=ucsd-nt/year=2020/month=10/
Now list the days:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/month=09/ datasource=ucsd-nt/year=2020/month=09/day=01/ datasource=ucsd-nt/year=2020/month=09/day=02/ datasource=ucsd-nt/year=2020/month=09/day=03/ datasource=ucsd-nt/year=2020/month=09/day=04/ datasource=ucsd-nt/year=2020/month=09/day=05/ ...
To look for the file for 9 AM on the 27th:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
This type of hierarchy is useful because there are so many objects in the container.
To check how many objects are in the container:
user@vm001:~$ swift list <container name> | wc -l 838
user@vm001:~$ swift list telescope-ucsdnt-pcap-live | wc -l 838
Once you have the file you want to process, get information about the file by using the
user@vm001:~$ swift stat <container name> <object name>
user@vm001:~$ swift stat telescope-ucsdnt-pcap-live datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz Account: 40914e7833c441859d1f0d08866bb74c Container: telescope-ucsdnt-pcap-live Object: datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz Content Type: application/vnd.tcpdump.pcap Content Length: 114818289707 Last Modified: Sun, 27 Sep 2020 11:14:51 GMT ETag: "98212d49cf5e8b02f694b5d551c8ac29" Manifest: .telescope-ucsdnt-pcap-live-segments/datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz/1601205047.078826/114818289707/1073741824/ Meta Mtime: 1601205047.078826 Accept-Ranges: bytes Server: nginx Connection: keep-alive X-Timestamp: 1601205290.08723 X-Trans-Id: txaf4ce74e394048e2be429-005f7c15af X-Openstack-Request-Id: txaf4ce74e394048e2be429-005f7c15af
- This gives statistics of the object in this container
- Content Length is the size of the file.
swift stat --lhto see also the file size in a more human-readable format Example
user@vm001:~$ swift stat --lh telescope-ucsdnt-pcap-live datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz Account: 40914e7833c441859d1f0d08866bb74c ... Content Type: application/vnd.tcpdump.pcap ...
To process the file you will need to get the data to your VM because the file is stored on the Swift cluster, not locally.
With the size of the file, you will not be able to download the file due to the size availability on your VM. You can check the storage size and availability of your VM with the following:
user@vm001:~$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/vda1 97G 3.0G 94G 4% /
- For complete documentation of Swift command-line tools please check OpenStack documentation.
Processing files stored on Swift storage
Because there is not enough available space on the VM to download a file, you can process the data while downloading it on the cluster rather than downloading the entire file and processing it.
These are the tools to do so: