It became clear towards the end of the data collection that the storage array (sa1) connected to the in1 collection server would fill up before all data was submitted. sshd on in1 was thus shut down, and the new storage volume was mounted via NFS. Synchronization of data between old and new storage array volumes was then attempted, this needed incoming data submission via sshd to remain shut down while this proceeded.
This impacted DSC submission as well as it was difficult to separate
out DITL/PCAP and DSC upload services.
Decision was taken to migrate uploads to in2
server which had more up-to-date/stable OS software.
Some spare space found, uploads were briefly re-enabled.
Data synchronization failed, needed to be restarted.
Downtime for oarc.isc.org website while migrating.
in1 server crashed with data sync incomplete, caused NFS mounts to an1 server to hang.
Disabled uploads and re-started data sync again.
Re-enabled PCAP uploads to in2 server, sync continuing.
DSC uploads still disabled at this point.
Loss of public.oarci.net website from in1.
Fixed NFS problems to an1 analysis server, re-enabled DSC uploads.
public.oarci.net drupal website restored on in1.
Various crashes of in2 server. DSC cron jobs suspected as possible cause - temporarily disabled these and started migrating DSC to use postgres for data storage.
JFS problem on new fd1 file server, blocked NFS mounts to an1.
Crash of fd1 file server, caused in2 to hang. Power cycled fd1.
Further crash of fd1 file server, blocked NFS mounts to an1 and in2.
Installed latest standard non-SMP 2.6.16.27-0.6-default kernel on fd1, re-mounted filesystem on an1.
Postgres problems on in2 - needed reboot.
Further in2 crash. Restored, then shut down in2 and fd1 to upgrade BIOS firmware. Loss of an1 NFS mount again, fixed on 26th.
in2 server OS software upgraded to FreeBSD 6.2.
Brief downtime of sa1 RAID controller for performance tuning.
DDoS attack against various root servers - requirement for more space to store PCAP attack
data as well as DITL data.
DSC service re-started after migration from in1 to in2 server.
Further loss of an1 NFS mount.
Crash of in2 due to lock violation.
fd1 server and NFS access to it down for 70 minutes for successful scheduled upgrade of storage array.
Various crashes of in2 server.
A longer term resolution was achieved by replacing the disks in the old array volume with further higher-capacity new disks, and then on 2nd March re-partitioning and merging this into a single new volume with a total of 4.5TB capacity
The OARC filesystem is now at 36% of available capacity including both DITL data to date and additional data collected during the DDoS attack on 6/7th March,