Swift - STARDUST
Swift
To store raw traffic traces and flow-level traces, STARDUST uses cloud-based Swift object storage. The following tutorial will show you how to access and process objects in Swift.
Intro
“OpenStack Object Storage (swift) is used for redundant, scalable data storage using clusters of standardized servers to store petabytes of accessible data. It is a long-term storage system for large amounts of static data which can be retrieved and updated. Object Storage uses a distributed architecture with no central point of control, providing greater scalability, redundancy, and permanence. Objects are written to multiple hardware devices, with the OpenStack software responsible for ensuring data replication and integrity across the cluster. Storage clusters scale horizontally by adding new nodes. Should a node fail, OpenStack works to replicate its content from other active nodes.” (from: https://docs.openstack.org/swift/latest/admin/objectstorage-intro.html)
Basic concepts
To understand the basic concepts of account (aka project), container, and object please check this overview from the OpenStack documentation.
Authenticating with Swift
Log into your VM.
Find your credential file (each computer will be different).
user@vm001:~$ ls -a
user@vm001:~$ cat .limbo_cred
export OS_PROJECT_NAME=telescope
export OS_USERNAME=userxxx
export OS_PASSWORD=xxxxx
export OS_AUTH_URL=https://hermes-auth.caida.org
export OS_IDENTITY_API_VERSION=3
- This file is used to set some environment variables that allow your Swift client to authenticate with the Swift cluster.
To source the file which loads the variables into the environment (note the first dot followed by a space):
user@vm001:~$ . .limbo_cred
To check that it is in the environment you can use:
user@vm001:~$ echo $OS_PROJECT_NAME
telescope
When you use the Swift command:
user@vm001:~$ swift auth
it uses the environment variables to authenticate with Swift.
Accessing objects in containers
If you use Swift with no arguments, it will give you a list of possible commands.
user@vm001:~$ swift
swift list
with no argument will list all the containers:
user@vm001:~$ swift list
One of the containers, telescope-ucsdnt-pcap-live
, contains the raw pcap files for the last 5 weeks.
To list the contents of the container:
user@vm001:~$ swift list <container name>
- Note: Since Swift containers store many files, it is not recommended to list all the files in a container.
Example
user@vm001:~$ swift list telescope-ucsdnt-pcap-live | head
datasource=ucsd-nt/year=2020/month=09/day=01/hour=07/ucsd-nt.1598943600.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=08/ucsd-nt.1598947200.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=09/ucsd-nt.1598950800.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=10/ucsd-nt.1598954400.pcap.gz
datasource=ucsd-nt/year=2020/month=09/day=01/hour=11/ucsd-nt.1598958000.pcap.gz
...
Since Swift does not have the concept of a hierarchical system like the UNIX file system does, you can convince Swift to have a pseudo hierarchy by setting a delimiter:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d /
datasource=ucsd-nt/
- Set a delimiter to slash (/). This will list everything until that slash (/).
- When listing the objects in the container, setting the delimiter will list them as if they are inheritable in which the hierarchy is separated by a slash (/).
Navigate down the hierarchy by including a prefix:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/
datasource=ucsd-nt/year=2020/
- the prefix
_datasource=ucsd-nt/_
lists everything that has that prefix and delimited by a / - the next element should be
_year=2020_
Continue to navigate down the directory:
- This will list the months
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/
datasource=ucsd-nt/year=2020/month=09/
datasource=ucsd-nt/year=2020/month=10/
Now list the days:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/month=09/
datasource=ucsd-nt/year=2020/month=09/day=01/
datasource=ucsd-nt/year=2020/month=09/day=02/
datasource=ucsd-nt/year=2020/month=09/day=03/
datasource=ucsd-nt/year=2020/month=09/day=04/
datasource=ucsd-nt/year=2020/month=09/day=05/
...
To look for the file for 9 AM on the 27th:
user@vm001:~$ swift list telescope-ucsdnt-pcap-live -d / -p datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/
datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
This type of hierarchy is useful because there are so many objects in the container.
To check how many objects are in the container:
user@vm001:~$ swift list <container name> | wc -l
838
Example
user@vm001:~$ swift list telescope-ucsdnt-pcap-live | wc -l
838
Once you have the file you want to process, get information about the file by using the stat
command:
user@vm001:~$ swift stat <container name> <object name>
Example
user@vm001:~$ swift stat telescope-ucsdnt-pcap-live datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
Account: 40914e7833c441859d1f0d08866bb74c
Container: telescope-ucsdnt-pcap-live
Object: datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz
Content Type: application/vnd.tcpdump.pcap
Content Length: 114818289707
Last Modified: Sun, 27 Sep 2020 11:14:51 GMT
ETag: "98212d49cf5e8b02f694b5d551c8ac29"
Manifest: .telescope-ucsdnt-pcap-live-segments/datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz/1601205047.078826/114818289707/1073741824/
Meta Mtime: 1601205047.078826
Accept-Ranges: bytes
Server: nginx
Connection: keep-alive
X-Timestamp: 1601205290.08723
X-Trans-Id: txaf4ce74e394048e2be429-005f7c15af
X-Openstack-Request-Id: txaf4ce74e394048e2be429-005f7c15af
- This gives statistics of the object in this container
- Content Length is the size of the file.
- Use
swift stat --lh
to see also the file size in a more human-readable format Exampleuser@vm001:~$ swift stat --lh telescope-ucsdnt-pcap-live datasource=ucsd-nt/year=2020/month=09/day=27/hour=09/ucsd-nt.1601197200.pcap.gz Account: 40914e7833c441859d1f0d08866bb74c ... Content Type: application/vnd.tcpdump.pcap ...
To process the file you will need to get the data to your VM because the file is stored on the Swift cluster, not locally.
With the size of the file, you will not be able to download the file due to the size availability on your VM. You can check the storage size and availability of your VM with the following:
user@vm001:~$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 97G 3.0G 94G 4% /
- For complete documentation of Swift command-line tools please check OpenStack documentation.
Processing files stored on Swift storage
Because there is not enough available space on the VM to download a file, you can process the data while downloading it on the cluster rather than downloading the entire file and processing it.
These are the tools to do so: