The Nyxem Email Virus: Analysis and Inferences
Introduction
While email viruses and worms are a ubiquitous part of the
online environment, Nyxem was relatively rare in that newly
infected hosts connect once to a single website, providing a single
source of information about the infected population.
Of more critical interest to those infected, the virus also
contained a malicious payload designed to overwrite files with certain
extensions on the 3rd of every month (beginning February 3, 2006).
Affected file types include: .doc, .xls, .mdb, .mde, .ppt, .pps,
.zip, .rar, .pdf, .psd, and .dmp.
We estimate that between 469,507 and 946,835 computers in more
than 200 countries were infected by the Nyxem virus between January
15 23:40:54 UTC 2006 and Wednesday February 1 05:00:12 UTC. At
least 45,401 of the infected computers were also compromised by
other forms of spyware or bot software.
Background
Virus Name
This virus has at least 17 names in active
use.
Virus Details
The Nyxem virus is a 95 kb Visual Basic executable that infects a
computer when an unwary user runs an executable email attachment.
The virus also spreads to network shares mounted on an infected
computer. After infecting a computer, it attempts to disable a
variety of antivirus products and then looks for email addresses to
automatically spread itself using a variety of Subject fields and
attachment names.
On the 3rd day of every month, the virus searches for files with
12 common file extensions (.doc, .xls, .mdb, .mde, .ppt, .pps,
.zip, .rar, .pdf, .psd, and .dmp) on all available drives and
replaces them with the text string "DATA Error [47 0F 94 93 F4
K5]".
Nyxem Virus Spread
The spread of most email viruses is extremely difficult to track due
to the their spread mechanism -- using legitimate email addresses
gathered from infected computers to spread to the people with whom
virus victims normally interact. Unlike many email viruses, computers
infected with Nyxem automatically generated a single http request for
the url of an online statistics page. Each request for the statistics
page was displayed on the page itself. Presumably this behavior was
included in the virus so that the virus author could track its
progress.
At first, this seems like the perfect vantage point from which to
observe the spread of the virus. Examine the web logs, count up how
many hits the page generated, and voila! Instant worm infection
count. However many non-virus factors artificially inflate the number
of requests for the web page in question. First, there is baseline use
-- the page is online for a reason, and people do view it. Next, as
word of mouth and news coverage publicized the location of the web
counter, many more people viewed the page to see the progress of the
virus. Finally, numerous denial-of-service attacks were launched
against the site. Because a full TCP connection must be set up to
access the page, none of the denial-of-service attacks influencing the
logs used spoofed IP addresses. It remains possible that additional
denial-of-service attacks attempted to consume bandwidth or server
resources and interfere with infected hosts contacting the counter.
There is no way to assess this possibility with the data available to us.
Because each virus-infected host accessed the web counter only once,
one approach to filtering the data would be to count only those IP
addresses which appeared in the logs a single time. While this does
not eliminate false positives of uninfected folks simply viewing the
counter, it seems to eliminate repeat visitors and sources of
denial-of-service attack traffic. However many of the virus victims
are behind web traffic aggregators such as Network Address Translation
(NAT) and web proxy servers, which means that additional probes from
the single IP address of the NAT or proxy server could represent
additional infected computers. Dynamic addressing (typically DHCP)
will also render the single-host-per-IP-address assumption invalid,
e.g., a victim may have an IP address when they are first infected and
access the web counter, and then later another victim may have that
same IP address when they become infected. These factors significantly
complicate efforts to differentiate between virus-infected hosts and
denial-of-service attacks or other network phenomena.
Many denial-of-service attacks use one tool deployed across many
compromised computers (those in a botnet, for example). Connections
generated by those tools tend to have many factors in common, including
the browser type and referer strings. Using these characteristics
combined with high traffic volume of sudden onset and cessation, we
were able to eliminate many IP addresses generating denial-of-service
attacks from the initial data (91.1% of all hits). Next, we removed
any requests for pages other than the one accessed by the virus (0.2%
of all hits)). Then we removed any connection with a referer string
(8.9% of non dos hits). The worm did not generate a referer string in
initiating a connection to the web counter, and our investigation of
the referer strings yielded many news articles and blogs that mentioned
the web page, but no traffic that appeared to be representative of
infected computers. Finally, we eliminated connections with browser
types indicating operating systems that could not possibly represent
infected hosts -- for example, MacOS, Unix, cell phone, and PDA devices
(0.03% of all hits).
Next we turned our attention to the remaining sources of multiple
connections. We compared the behavior of IP addresses with a single
browser type sending multiple probes, IP addresses with multiple
browser types that sent more probes than browser types, and those IP
addresses for which the number of browser types exactly matched the
number of connections to the web counter. As Figures 1 and 2 show,
these groups all exhibit similar behavior, so the remaining IP
addresses for which web counter connections exceed probes are likely to
represent actual infections.
At many times in the spread of the virus, particularly early on,
probes from IP addresses representing many different browsers and more
probes than browsers generate a significant portion of the traffic.
Further investigation of these IP addresses did not show any pattern of
behavior indicating an artificial distortion. We expect that many
sites that use NATs and web proxies would exhibit this behavior as the
virus spreads, particularly because individuals within an organization
have email addresses of other individuals in their organizations on
their computers, and people are likely to open attachments sent by
their coworkers during work hours, thus providing conditions conducive
to rapid spread of the virus within an organization. We hypothesize
that a single browser type with multiple requests is much less common
than other combinations because browser identification strings change
to announce browser extensions and other installed software, and
therefor vary even in centrally managed, relatively homogeneous
populations of computers.
 |
 |
| Figure 1:
Behavior of three categories of IP addresses of computers
infected with the Nyxem virus. IP addresses with the same
number of unique browser types as probes appear to represent
infected hosts, as the different browser identification on each
probe uniquely identifies different computers appearing from
the same IP address. Both IP addresses sourcing multiple
browser types and a greater number of probes than browser
strings and IP addresses with a single browser type but more
than one probe may represent NAT or web proxy devices with many
infected computers behind them, or they may represent dynamic
IP addresses that happened to be assigned to a series of
computers at the time those computers were infected by the
virus. All three categories show similar behavior throughout
the spread of the worm.
|
Figure 2:
Probes sent by the three categories of IP addresses shown in
Figure 1. Additional probes generated by IP addresses that are
likely NAT or web proxies cause a significant difference
between IP addresses sourcing multiple browsers with more
probes than browsers and IP addresses where the number of
browsers and number of probes match. The overall behavior of
all three categories appears similar, although spikes in probes
from IP addresses that sent multiple browser types and more
probes than browser types may indicate the compromise of many
hosts within a single organization.
|
To generate our estimate of the total number of infections, we
examine two values for each IP address: the number of unique,
vulnerable browser types and the total number of probes received from
that IP address. The former represents a lower bound on the number of
infections, while the latter represents an upper bound. Note that we
accept all instances in which a single IP address accessed the web
counter one time with a possibly vulnerable operating system. There
are likely some false positives consisting of people who viewed the web
counter a single time but were not infected. It is not possible to
distinguish this activity from that of a compromised computer with the
available data, so this remains a source of bias potentially inflating
our counts. There are plausible and legitimate reasons for a single IP
address to generate many probes with the same browser string. For
example, a set of identically configured, centrally managed computers
behind a web proxy at a single organization would generate many probes
with identical browser strings over time as the virus spread within the
organization. Because many of these single IP addresses with fewer
browser strings than probes access the web counter slowly over time,
with diurnal patterns that closely match the infection spread in their
geographic area, we believe that many of these repeats represent
additional virus infections. We think that this influence at least
counteracts the inflating effect of non-infected persons viewing the
web counter a single time, and likely pushes the true infection count
towards the mid-to-upper end of our estimated range. We estimate
the total victim count to be between 469,507 and 946,835. This
range represents between 3.2% and 6.4% of all log entries we examined.
The figures below show what we believe to be new infections by IP
address accessing the web counter and by total probes to the counter.
Note that there is little difference in scale or shape of individual IP
addresses versus overall probes received, indicating that few IP
addresses involved in systemic attempts to inflate the count of
infected hosts remain in the graphed data. Diurnal variations are
readily apparent in the infection spread overall, and become more
prominent and more closely tied to daytime (when most people check
their email) when the IP addresses and continents are sorted by
continent.
 |
 |
| Figure 3: New Nyxem
infections every hour and cumulatively between Sunday January
15 23:40:54 UTC 2006 and Wednesday February 1 05:00:12 UTC
2006. This figure approximates the lower bound of our estimate
of the number of infected computers.
|
Figure 4: Probes from
Nyxem-infected computers between Sunday January 15 23:40:54 UTC
2006 and Wednesday February 1 05:00:12 UTC 2006. This figure
approximates the upper bound of our estimate of the number of
infected computers.
|
 |
 |
| Figure 5: New
Nyxem infections by continent. Although the range of timezones
spanned by most continents is large, this view is sufficient to
show the diurnal patterns of increased infections during the
day and evening hours when many people check their email and
decreased infections at night. Note that South America (and
Spanish-speaking countries through the Americas) do not
demonstrate significant infection until four days after the
infection rates in the rest of the continents have peaked.
|
Figure 6:
Probes from computers infected with Nyxem across the
continents. The diurnal cycles and variations within each day
closely resemble those of infected IP addresses across every
content. We mapped virus victims to a total of 201 countries.
|
Virus Victims
The Nyxem virus depended on a user opening an email
attachment to infect a computer. As this is the latest in a long
string of similar viruses, its success indicates that user education
measures intended to dissuade people from opening unexpected email
attachments have not been sufficiently effective. 45,401 Nyxem
victims (approximately ten percent of our conservative estimate) had
concurrent spyware and/or botnet infections that were advertised in
their browser string. Many more likely had concurrent infections that
were not identifiable with the available data.
Interestingly, the geographic distribution of computers infected by
the Nyxem virus differs significantly from general estimates of
Internet usage in countries around the world. The geographic
distribution of Nyxem-infected hosts also differs from that of
random-spread Internet worms that do not require human intervention
such as opening and running an executable attachment from an email.
The virus disproportionately affected the Middle East and some
countries of South America, particularly Peru.
We were highly suspicious that this unusual distribution represented
denial-of-service activity or other influences unrelated to the true
spread of the virus. However, intensive investigation of accesses to
the web counter from these countries turned up only behavior completely
consistent with the rest of the population that we believe to be
infected. For example, similar distributions of browser types, and few
differences between the number of infected IP addresses and the number
of probes received.
We also explored the theory that computers were sourcing
denial-of-service attack traffic from a dynamically addressed subnet.
With worms and denial-of-service attacks in the past, we have observed
hosts that generated a large volume of traffic to be repeatedly
disconnected and reconnected with varying IP addresses. This behavior
would yield relatively small numbers of probes per IP address across
many IP addresses in a dynamic address pool. We did not observe any
examples of this activity in the data.
One consistent regional variation we noted is that the virus does
not achieve significant penetration into most of South and Central
America until late in the day on January 20th and especially January
21st. This pattern differs from that of other countries around the
world, which typically show the spread of the virus peaking between the
16th and 18th and tapering steadily thereafter. We are unsure of the
reason for the delay in virus spread. The fact that the pattern
appears unique to Spanish-speaking countries, and that it significantly
affects Mexico (reversing a tapering pattern with an unusual surge in
weekend activity) leads us to wonder whether a Spanish-language version
of the virus began to spread on January 20th. It's also possible that
the topological spread of the worm simply reached South and Central
America later than the rest of the world. Graphs of all countries in
the Americas with significant spread of the virus are available here. Graphs of all countries affected by the virus are
available here. Graphs are generally named by ISO
3166 three-letter country code, and show activity by probe count, IP
address, and /24 subnet over time.
The charts below show the top countries, Domain Name Service (DNS)
top-level domains, DNS domains, and connection types of infected
computers. The top-level domains and domains closely match the country
distributions -- the top-level domain spaces of highly-infected
countries appear on both lists.
| Country |
Minimum Estimate |
Maximum Estimate |
| Count |
Percent |
Count |
Percent |
| India |
151341 |
32.23 |
273013 |
28.83 |
| Peru |
87599 |
18.65 |
150785 |
15.92 |
| Italy |
38216 |
8.13 |
58002 |
6.12 |
| Turkey |
28264 |
6.01 |
43437 |
4.58 |
| United States |
26315 |
5.6 |
58791 |
6.2 |
| Egypt |
12201 |
2.59 |
25104 |
2.65 |
| Malaysia |
11160 |
2.37 |
19942 |
2.1 |
| Indonesia |
9323 |
1.98 |
21332 |
2.25 |
| Greece |
8348 |
1.77 |
13684 |
1.44 |
| Mexico |
5578 |
1.18 |
10341 |
1.09 |
| Saudi Arabia |
2519 |
0.53 |
51780 |
5.46 |
| United Arab Emirates |
1858 |
0.39 |
19371 |
2.04 |
|
| Table 1: Nyxem victim geographic distribution by country. The chart is ordered by minimum count, but includes the top ten entries for both minimum and maximum counts. The full listing of all countries is available here, and continents are available here. |
|
| TLD |
Minimum Estimate |
Maximum Estimate |
| Count |
Percent |
Count |
Percent |
| Unknown |
173510 |
36.95 |
367750 |
38.84 |
| net |
77706 |
16.55 |
141308 |
14.92 |
| pe |
71881 |
15.3 |
123960 |
13.09 |
| it |
31367 |
6.68 |
45923 |
4.85 |
| in |
25127 |
5.35 |
52818 |
5.57 |
| com |
18516 |
3.94 |
39283 |
4.14 |
| tr |
16162 |
3.44 |
24204 |
2.55 |
| gr |
6766 |
1.44 |
10149 |
1.07 |
| mx |
4818 |
1.02 |
9097 |
0.96 |
| my |
3988 |
0.84 |
5950 |
0.62 |
| id |
3952 |
0.84 |
8776 |
0.92 |
| sa |
1950 |
0.41 |
45565 |
4.81 |
|
| Table 2: Nyxem victim distribution by Top-Level Domain (TLD) based on DNS hostname lookups performed on February 2, 2006. The table is ordered by minimum count, but includes the top ten entries for both minimum and maximum count. The full listing of all top-level domains is available here. |
|
| Internet Connection |
Minimum Estimate |
Maximum Estimate |
| Count |
Percent |
Count |
Percent |
| broadband |
379058 |
80.73 |
796992 |
84.17 |
| xdsl |
56071 |
11.94 |
91115 |
9.62 |
| dialup |
17770 |
3.78 |
24443 |
2.58 |
| cable |
10636 |
2.26 |
20782 |
2.19 |
| t1 |
5309 |
1.13 |
11290 |
1.19 |
| satellite |
663 |
0.14 |
2158 |
0.22 |
|
| Table 3: Nyxem victim distribution by Connection Speed (as estimated by Digital Envoy's Netacuity product). |
|
| Domain |
Minimum Estimate |
Maximum Estimate |
| Count |
Percent |
Count |
Percent |
| Unknown |
173510 |
36.95 |
367750 |
38.84 |
| net.pe |
70086 |
14.92 |
119416 |
12.61 |
| touchtelindia.net |
27121 |
5.77 |
46831 |
4.94 |
| net.in |
23161 |
4.93 |
48358 |
5.1 |
| interbusiness.it |
20778 |
4.42 |
30528 |
3.22 |
| net.tr |
14994 |
3.19 |
21581 |
2.27 |
| sify.net |
13714 |
2.92 |
19208 |
2.02 |
| net.id |
3727 |
0.79 |
8235 |
0.86 |
| com.mx |
3528 |
0.75 |
6479 |
0.68 |
| eth.net |
3505 |
0.74 |
7703 |
0.81 |
| net.my |
3483 |
0.74 |
4606 |
0.48 |
| net.sa |
1950 |
0.41 |
45565 |
4.81 |
|
| Table 4: Nyxem victim DNS domain based on hostname lookups performed on February 2, 2006. The table is ordered by minimum count, but includes the top ten entries for both minimum and maximum count. The full listing of all domains is available here. |
|
Further Analysis
Conclusions
The Nyxem email virus is somewhat unique in that each infected
computer generated a single request for a web page. The global spread
of email viruses is typically impossible to track given the directed,
topological manner in which they spread. Thus Nyxem represented a rare
opportunity to investigate the spread of an email virus. However Nyxem
also presented quite an analysis challenge, as legitimate (that is,
non-virus-driven) access to the web page continued during virus spread
-- particularly after the existence of the counter displayed on that
web page became widely publicized. In addition, deliberate attempts to
skew the counter results via denial-of-service attacks and other
repeated probing further polluted the web logs. Despite these sources
of error inflating the infection count, we believe that we have arrived
at a reasonable, if somewhat less than optimally constrained, estimate
of the total number of infected computers at between 469,507 and
946,835. At least 45,401 of the infected computers were also
compromised by other forms of spyware or bot software that advertised
themselves in the browser identification string.
In many ways, the Nyxem virus is nothing special. While it does
carry a destructive payload, it follows a long history of destructive
viruses and an almost equally long history of email viruses spread via
people opening unexpected attachments. Social engineering is a
tried-and-true technique for the malicious -- as the saying goes, "you
can fool some of the people all of the time." Our estimates of the
total number of victims of Nyxem are an order of magnitude less than
estimates of the spread of other email viruses.
Much as it is near impossible to characterize the spread of most
other email worms, it is impossible to catalog the damage caused by
Nyxem. Extensive news coverage and coordinated efforts to notify
potential victims resulted in the repair of many computers before files
they contained were overwritten by the virus. Many other computers
likely were not repaired, and thus had files deleted. The extent of
the latter damage will likely never be known. File deletion is
generally not an externally visible operation, and given the choice,
large organizations generally avoid the potentially devastating damage
to reputation (not to mention significant monetary losses) that comes
with disclosing such a loss. On the other end of the spectrum, losing
files can be devastating for home and small-business owners, but the
scale of the losses is not considered newsworthy.
However headlines such as File-destroying
worm causes little damage belie a major portion of the cost of
viruses like Nyxem. How many hours of time were spent trying to
identify and notify owners of infected computers? How many hours of
system administrator time, professional or otherwise, were spent
disinfecting compromised machines? While lost data may affect only a
subset of infected computers, every infected machine must be repaired
at significant temporal and monetary cost. Further, it seems unwise to
downplay the effects of the virus while it continues to spread. Most
antivirus products now protect against Nyxem, but without the media
coverage and active mitigation attempts, computers infected in the
future seem more likely to lose data as the worm deletes files on the
third day of every month.
Overall, Nyxem provided a relatively rare opportunity to get an
in-depth look at the global spread of an otherwise mundane email
virus. Other forms of malicious software (viruses, worms, bot
software) spread more quickly, more stealthily, and more widely. Many
allow the theft of financial information from unsuspecting victims.
Others allow long-term control of compromised computers for malicious
purposes. Gaining significant ground towards secure networked
computing will require progress in three major areas: software
engineering to prevent security holes in the first place, mitigation
techniques to minimize the damage caused by known and unknown security
flaws in deployed software, and user education regarding the dangers of
trusting unknown content.
Animations
The following animations, created by Bradley Huffaker with CAIDA's Cuttlefish tool, show the
spread of the Nyxem virus around the world with emphasis on the
diurnal patterns of the spread of the virus. For the top animation,
the diameter of the circle is logarithmically scaled according to the
number of infected computers at that location. In the bottom
animation, the number of infected computers are shown with vertical
bars. The latitude and longitude of each infected computer were
identified using Digital
Envoy's Netacuity
tool.
| Quicktime Movie (.mov) -- Try this image first; it will load in Quicktime, which is better adapted to playing large animations than most web browsers. |
Small (860 KB) |
Large (3.3 MB) |
| Animated Gif (.gif) -- Try this image if you cannot view the Quicktime movie file. The large version is likely to make your web browser use a lot of memory and run very slowly. |
Small (860 KB) |
Large (3.3 MB) |
Acknowledgments
We would like to thank Gadi Evron, Paul Vixie, Joe Stewart, Mikko
Hypponen, Swa Frantzen, Randy Vaughn, Chris Jackman, Jason Nealis, Rob
Thomas, and Lorna Hutcheson for providing us with data and insight into
the spread of the virus. Many thanks to kc claffy for feedback on this
document.
More Information
- Nyxem/Blackworm/Kama Sutra/MyWife Virus
- How Many Names Does This Virus Need, Anyway?
- Virus Code and Function
- Other Analysis
- Previous Internet Worm Studies
About the Authors:
David Moore is the Technical Director of CAIDA and Ph.D. Candidate in the UCSD Computer Science Department.
Colleen Shannon is a Senior Security Researcher at the Cooperative Association for Internet Data
Analysis (CAIDA) at the San Diego
Supercomputer Center (SDSC) at the University of California, San Diego
(UCSD). David and Colleen also run the UCSD Network Telescope. The
Network Telescope and associated security efforts are a joint project
of the UCSD Computer Science and Engineering Department and the
Cooperative Association for Internet Data Analysis. David and Colleen
are both members of the Collaborative
Center for Internet Epidemiology and Defenses (CCIED).
This work was sponsored by:
Grants from Cisco Systems, the National Science Foundation (NSF),
the Department of Homeland
Security (DHS), and CAIDA
members.
|
|