The contents of this legacy page are no longer maintained nor supported, and are made available only for historical purposes.

The Nyxem Email Virus: Analysis and Inferences

An analysis by David Moore and Colleen Shannon of the spread of the Nyxem (or Blackworm or Kama Sutra or MyWife or CME 24) Virus in January and early February 2006. For more information contact info@caida.org.

Support for this work was provided by Cisco Systems, NSF, DHS, and CAIDA members.


Introduction


While email viruses and worms are a ubiquitous part of the online environment, Nyxem was relatively rare in that newly infected hosts connect once to a single website, providing a single source of information about the infected population.

Of more critical interest to those infected, the virus also contained a malicious payload designed to overwrite files with certain extensions on the 3rd of every month (beginning February 3, 2006). Affected file types include: .doc, .xls, .mdb, .mde, .ppt, .pps, .zip, .rar, .pdf, .psd, and .dmp.

We estimate that between 469,507 and 946,835 computers in more than 200 countries were infected by the Nyxem virus between January 15 23:40:54 UTC 2006 and Wednesday February 1 05:00:12 UTC. At least 45,401 of the infected computers were also compromised by other forms of spyware or bot software.


Background


Virus Name

This virus has at least 17 names in active use.

Virus Details

The Nyxem virus is a 95 kb Visual Basic executable that infects a computer when an unwary user runs an executable email attachment. The virus also spreads to network shares mounted on an infected computer. After infecting a computer, it attempts to disable a variety of antivirus products and then looks for email addresses to automatically spread itself using a variety of Subject fields and attachment names.

On the 3rd day of every month, the virus searches for files with 12 common file extensions (.doc, .xls, .mdb, .mde, .ppt, .pps, .zip, .rar, .pdf, .psd, and .dmp) on all available drives and replaces them with the text string "DATA Error [47 0F 94 93 F4 K5]".

Nyxem Virus Spread


The spread of most email viruses is extremely difficult to track due to the their spread mechanism -- using legitimate email addresses gathered from infected computers to spread to the people with whom virus victims normally interact. Unlike many email viruses, computers infected with Nyxem automatically generated a single http request for the url of an online statistics page. Each request for the statistics page was displayed on the page itself. Presumably this behavior was included in the virus so that the virus author could track its progress.

At first, this seems like the perfect vantage point from which to observe the spread of the virus. Examine the web logs, count up how many hits the page generated, and voila! Instant worm infection count. However many non-virus factors artificially inflate the number of requests for the web page in question. First, there is baseline use -- the page is online for a reason, and people do view it. Next, as word of mouth and news coverage publicized the location of the web counter, many more people viewed the page to see the progress of the virus. Finally, numerous denial-of-service attacks were launched against the site. Because a full TCP connection must be set up to access the page, none of the denial-of-service attacks influencing the logs used spoofed IP addresses. It remains possible that additional denial-of-service attacks attempted to consume bandwidth or server resources and interfere with infected hosts contacting the counter. There is no way to assess this possibility with the data available to us.

Because each virus-infected host accessed the web counter only once, one approach to filtering the data would be to count only those IP addresses which appeared in the logs a single time. While this does not eliminate false positives of uninfected folks simply viewing the counter, it seems to eliminate repeat visitors and sources of denial-of-service attack traffic. However many of the virus victims are behind web traffic aggregators such as Network Address Translation (NAT) and web proxy servers, which means that additional probes from the single IP address of the NAT or proxy server could represent additional infected computers. Dynamic addressing (typically DHCP) will also render the single-host-per-IP-address assumption invalid, e.g., a victim may have an IP address when they are first infected and access the web counter, and then later another victim may have that same IP address when they become infected. These factors significantly complicate efforts to differentiate between virus-infected hosts and denial-of-service attacks or other network phenomena.

Many denial-of-service attacks use one tool deployed across many compromised computers (those in a botnet, for example). Connections generated by those tools tend to have many factors in common, including the browser type and referer strings. Using these characteristics combined with high traffic volume of sudden onset and cessation, we were able to eliminate many IP addresses generating denial-of-service attacks from the initial data (91.1% of all hits). Next, we removed any requests for pages other than the one accessed by the virus (0.2% of all hits)). Then we removed any connection with a referer string (8.9% of non dos hits). The worm did not generate a referer string in initiating a connection to the web counter, and our investigation of the referer strings yielded many news articles and blogs that mentioned the web page, but no traffic that appeared to be representative of infected computers. Finally, we eliminated connections with browser types indicating operating systems that could not possibly represent infected hosts -- for example, MacOS, Unix, cell phone, and PDA devices (0.03% of all hits).

Next we turned our attention to the remaining sources of multiple connections. We compared the behavior of IP addresses with a single browser type sending multiple probes, IP addresses with multiple browser types that sent more probes than browser types, and those IP addresses for which the number of browser types exactly matched the number of connections to the web counter. As Figures 1 and 2 show, these groups all exhibit similar behavior, so the remaining IP addresses for which web counter connections exceed probes are likely to represent actual infections.

At many times in the spread of the virus, particularly early on, probes from IP addresses representing many different browsers and more probes than browsers generate a significant portion of the traffic. Further investigation of these IP addresses did not show any pattern of behavior indicating an artificial distortion. We expect that many sites that use NATs and web proxies would exhibit this behavior as the virus spreads, particularly because individuals within an organization have email addresses of other individuals in their organizations on their computers, and people are likely to open attachments sent by their coworkers during work hours, thus providing conditions conducive to rapid spread of the virus within an organization. We hypothesize that a single browser type with multiple requests is much less common than other combinations because browser identification strings change to announce browser extensions and other installed software, and therefor vary even in centrally managed, relatively homogeneous populations of computers.

Nyxem IP addresses over time Nyxem probes over time
Figure 1: Behavior of three categories of IP addresses of computers infected with the Nyxem virus. IP addresses with the same number of unique browser types as probes appear to represent infected hosts, as the different browser identification on each probe uniquely identifies different computers appearing from the same IP address. Both IP addresses sourcing multiple browser types and a greater number of probes than browser strings and IP addresses with a single browser type but more than one probe may represent NAT or web proxy devices with many infected computers behind them, or they may represent dynamic IP addresses that happened to be assigned to a series of computers at the time those computers were infected by the virus. All three categories show similar behavior throughout the spread of the worm. Figure 2: Probes sent by the three categories of IP addresses shown in Figure 1. Additional probes generated by IP addresses that are likely NAT or web proxies cause a significant difference between IP addresses sourcing multiple browsers with more probes than browsers and IP addresses where the number of browsers and number of probes match. The overall behavior of all three categories appears similar, although spikes in probes from IP addresses that sent multiple browser types and more probes than browser types may indicate the compromise of many hosts within a single organization.

To generate our estimate of the total number of infections, we examine two values for each IP address: the number of unique, vulnerable browser types and the total number of probes received from that IP address. The former represents a lower bound on the number of infections, while the latter represents an upper bound. Note that we accept all instances in which a single IP address accessed the web counter one time with a possibly vulnerable operating system. There are likely some false positives consisting of people who viewed the web counter a single time but were not infected. It is not possible to distinguish this activity from that of a compromised computer with the available data, so this remains a source of bias potentially inflating our counts. There are plausible and legitimate reasons for a single IP address to generate many probes with the same browser string. For example, a set of identically configured, centrally managed computers behind a web proxy at a single organization would generate many probes with identical browser strings over time as the virus spread within the organization. Because many of these single IP addresses with fewer browser strings than probes access the web counter slowly over time, with diurnal patterns that closely match the infection spread in their geographic area, we believe that many of these repeats represent additional virus infections. We think that this influence at least counteracts the inflating effect of non-infected persons viewing the web counter a single time, and likely pushes the true infection count towards the mid-to-upper end of our estimated range. We estimate the total victim count to be between 469,507 and 946,835. This range represents between 3.2% and 6.4% of all log entries we examined.

The figures below show what we believe to be new infections by IP address accessing the web counter and by total probes to the counter. Note that there is little difference in scale or shape of individual IP addresses versus overall probes received, indicating that few IP addresses involved in systemic attempts to inflate the count of infected hosts remain in the graphed data. Diurnal variations are readily apparent in the infection spread overall, and become more prominent and more closely tied to daytime (when most people check their email) when the IP addresses and continents are sorted by continent.

hourly and cumulative totals of Nyxem IP addresses over time hourly and cumulative totals of Nyxem probes over time
Figure 3: New Nyxem infections every hour and cumulatively between Sunday January 15 23:40:54 UTC 2006 and Wednesday February 1 05:00:12 UTC 2006. This figure approximates the lower bound of our estimate of the number of infected computers. Figure 4: Probes from Nyxem-infected computers between Sunday January 15 23:40:54 UTC 2006 and Wednesday February 1 05:00:12 UTC 2006. This figure approximates the upper bound of our estimate of the number of infected computers.
Nyxem IP addresses by continent over time Nyxem probes by continent over time
Figure 5: New Nyxem infections by continent. Although the range of timezones spanned by most continents is large, this view is sufficient to show the diurnal patterns of increased infections during the day and evening hours when many people check their email and decreased infections at night. Note that South America (and Spanish-speaking countries through the Americas) do not demonstrate significant infection until four days after the infection rates in the rest of the continents have peaked. Figure 6: Probes from computers infected with Nyxem across the continents. The diurnal cycles and variations within each day closely resemble those of infected IP addresses across every content. We mapped virus victims to a total of 201 countries.

Virus Victims

The Nyxem virus depended on a user opening an email attachment to infect a computer. As this is the latest in a long string of similar viruses, its success indicates that user education measures intended to dissuade people from opening unexpected email attachments have not been sufficiently effective. 45,401 Nyxem victims (approximately ten percent of our conservative estimate) had concurrent spyware and/or botnet infections that were advertised in their browser string. Many more likely had concurrent infections that were not identifiable with the available data.

Interestingly, the geographic distribution of computers infected by the Nyxem virus differs significantly from general estimates of Internet usage in countries around the world. The geographic distribution of Nyxem-infected hosts also differs from that of random-spread Internet worms that do not require human intervention such as opening and running an executable attachment from an email. The virus disproportionately affected the Middle East and some countries of South America, particularly Peru.

We were highly suspicious that this unusual distribution represented denial-of-service activity or other influences unrelated to the true spread of the virus. However, intensive investigation of accesses to the web counter from these countries turned up only behavior completely consistent with the rest of the population that we believe to be infected. For example, similar distributions of browser types, and few differences between the number of infected IP addresses and the number of probes received.

We also explored the theory that computers were sourcing denial-of-service attack traffic from a dynamically addressed subnet. With worms and denial-of-service attacks in the past, we have observed hosts that generated a large volume of traffic to be repeatedly disconnected and reconnected with varying IP addresses. This behavior would yield relatively small numbers of probes per IP address across many IP addresses in a dynamic address pool. We did not observe any examples of this activity in the data.

One consistent regional variation we noted is that the virus does not achieve significant penetration into most of South and Central America until late in the day on January 20th and especially January 21st. This pattern differs from that of other countries around the world, which typically show the spread of the virus peaking between the 16th and 18th and tapering steadily thereafter. We are unsure of the reason for the delay in virus spread. The fact that the pattern appears unique to Spanish-speaking countries, and that it significantly affects Mexico (reversing a tapering pattern with an unusual surge in weekend activity) leads us to wonder whether a Spanish-language version of the virus began to spread on January 20th. It's also possible that the topological spread of the worm simply reached South and Central America later than the rest of the world. Graphs of all countries in the Americas with significant spread of the virus are available here. Graphs of all countries affected by the virus are available here. Graphs are generally named by ISO 3166 three-letter country code, and show activity by probe count, IP address, and /24 subnet over time.

The charts below show the top countries, Domain Name Service (DNS) top-level domains, DNS domains, and connection types of infected computers. The top-level domains and domains closely match the country distributions -- the top-level domain spaces of highly-infected countries appear on both lists.

Country Minimum Estimate Maximum Estimate
Count Percent Count Percent
India 151341 32.23 273013 28.83
Peru 87599 18.65 150785 15.92
Italy 38216 8.13 58002 6.12
Turkey 28264 6.01 43437 4.58
United States 26315 5.6 58791 6.2
Egypt 12201 2.59 25104 2.65
Malaysia 11160 2.37 19942 2.1
Indonesia 9323 1.98 21332 2.25
Greece 8348 1.77 13684 1.44
Mexico 5578 1.18 10341 1.09
Saudi Arabia 2519 0.53 51780 5.46
United Arab Emirates 1858 0.39 19371 2.04
Table 1: Nyxem victim geographic distribution by country. The chart is ordered by minimum count, but includes the top ten entries for both minimum and maximum counts. The full listing of all countries is available here, and continents are available here.
TLD Minimum Estimate Maximum Estimate
Count Percent Count Percent
Unknown 173510 36.95 367750 38.84
net 77706 16.55 141308 14.92
pe 71881 15.3 123960 13.09
it 31367 6.68 45923 4.85
in 25127 5.35 52818 5.57
com 18516 3.94 39283 4.14
tr 16162 3.44 24204 2.55
gr 6766 1.44 10149 1.07
mx 4818 1.02 9097 0.96
my 3988 0.84 5950 0.62
id 3952 0.84 8776 0.92
sa 1950 0.41 45565 4.81
Table 2: Nyxem victim distribution by Top-Level Domain (TLD) based on DNS hostname lookups performed on February 2, 2006. The table is ordered by minimum count, but includes the top ten entries for both minimum and maximum count. The full listing of all top-level domains is available here.
Internet Connection Minimum Estimate Maximum Estimate
Count Percent Count Percent
broadband 379058 80.73 796992 84.17
xdsl 56071 11.94 91115 9.62
dialup 17770 3.78 24443 2.58
cable 10636 2.26 20782 2.19
t1 5309 1.13 11290 1.19
satellite 663 0.14 2158 0.22
Table 3: Nyxem victim distribution by Connection Speed (as estimated by Digital Envoy's Netacuity product).
Domain Minimum Estimate Maximum Estimate
Count Percent Count Percent
Unknown 173510 36.95 367750 38.84
net.pe 70086 14.92 119416 12.61
touchtelindia.net 27121 5.77 46831 4.94
net.in 23161 4.93 48358 5.1
interbusiness.it 20778 4.42 30528 3.22
net.tr 14994 3.19 21581 2.27
sify.net 13714 2.92 19208 2.02
net.id 3727 0.79 8235 0.86
com.mx 3528 0.75 6479 0.68
eth.net 3505 0.74 7703 0.81
net.my 3483 0.74 4606 0.48
net.sa 1950 0.41 45565 4.81
Table 4: Nyxem victim DNS domain based on hostname lookups performed on February 2, 2006. The table is ordered by minimum count, but includes the top ten entries for both minimum and maximum count. The full listing of all domains is available here.

Further Analysis


Conclusions

The Nyxem email virus is somewhat unique in that each infected computer generated a single request for a web page. The global spread of email viruses is typically impossible to track given the directed, topological manner in which they spread. Thus Nyxem represented a rare opportunity to investigate the spread of an email virus. However Nyxem also presented quite an analysis challenge, as legitimate (that is, non-virus-driven) access to the web page continued during virus spread -- particularly after the existence of the counter displayed on that web page became widely publicized. In addition, deliberate attempts to skew the counter results via denial-of-service attacks and other repeated probing further polluted the web logs. Despite these sources of error inflating the infection count, we believe that we have arrived at a reasonable, if somewhat less than optimally constrained, estimate of the total number of infected computers at between 469,507 and 946,835. At least 45,401 of the infected computers were also compromised by other forms of spyware or bot software that advertised themselves in the browser identification string.

In many ways, the Nyxem virus is nothing special. While it does carry a destructive payload, it follows a long history of destructive viruses and an almost equally long history of email viruses spread via people opening unexpected attachments. Social engineering is a tried-and-true technique for the malicious -- as the saying goes, "you can fool some of the people all of the time." Our estimates of the total number of victims of Nyxem are an order of magnitude less than estimates of the spread of other email viruses.

Much as it is near impossible to characterize the spread of most other email worms, it is impossible to catalog the damage caused by Nyxem. Extensive news coverage and coordinated efforts to notify potential victims resulted in the repair of many computers before files they contained were overwritten by the virus. Many other computers likely were not repaired, and thus had files deleted. The extent of the latter damage will likely never be known. File deletion is generally not an externally visible operation, and given the choice, large organizations generally avoid the potentially devastating damage to reputation (not to mention significant monetary losses) that comes with disclosing such a loss. On the other end of the spectrum, losing files can be devastating for home and small-business owners, but the scale of the losses is not considered newsworthy.

However headlines such as File-destroying worm causes little damage belie a major portion of the cost of viruses like Nyxem. How many hours of time were spent trying to identify and notify owners of infected computers? How many hours of system administrator time, professional or otherwise, were spent disinfecting compromised machines? While lost data may affect only a subset of infected computers, every infected machine must be repaired at significant temporal and monetary cost. Further, it seems unwise to downplay the effects of the virus while it continues to spread. Most antivirus products now protect against Nyxem, but without the media coverage and active mitigation attempts, computers infected in the future seem more likely to lose data as the worm deletes files on the third day of every month.

Overall, Nyxem provided a relatively rare opportunity to get an in-depth look at the global spread of an otherwise mundane email virus. Other forms of malicious software (viruses, worms, bot software) spread more quickly, more stealthily, and more widely. Many allow the theft of financial information from unsuspecting victims. Others allow long-term control of compromised computers for malicious purposes. Gaining significant ground towards secure networked computing will require progress in three major areas: software engineering to prevent security holes in the first place, mitigation techniques to minimize the damage caused by known and unknown security flaws in deployed software, and user education regarding the dangers of trusting unknown content.

Animations

The following animations, created by Bradley Huffaker with CAIDA's Cuttlefish tool, show the spread of the Nyxem virus around the world with emphasis on the diurnal patterns of the spread of the virus. For the top animation, the diameter of the circle is logarithmically scaled according to the number of infected computers at that location. In the bottom animation, the number of infected computers are shown with vertical bars. The latitude and longitude of each infected computer were identified using Digital Envoy's Netacuity tool.

Quicktime Movie (.mov) -- Try this image first; it will load in Quicktime, which is better adapted to playing large animations than most web browsers. Small (860 KB) Large (3.3 MB)
Animated Gif (.gif) -- Try this image if you cannot view the Quicktime movie file. Small (860 KB) Large (3.3 MB)

Worm resources

About the Authors

David Moore is the Technical Director of CAIDA and Ph.D. Candidate in the UCSD Computer Science Department. Colleen Shannon is a Senior Security Researcher at the Cooperative Association for Internet Data Analysis (CAIDA) at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego (UCSD). David and Colleen also run the UCSD Network Telescope. The Network Telescope and associated security efforts are a joint project of the UCSD Computer Science and Engineering Department and the Cooperative Association for Internet Data Analysis. David and Colleen are both members of the Collaborative Center for Internet Epidemiology and Defenses (CCIED).

Acknowledgments

We would like to thank Gadi Evron, Paul Vixie, Joe Stewart, Mikko Hypponen, Swa Frantzen, Randy Vaughn, Chris Jackman, Jason Nealis, Rob Thomas, and Lorna Hutcheson for providing us with data and insight into the spread of the virus. Many thanks to kc claffy for feedback on this document.

This work was sponsored by:

Cooperative Association for Internet Data Analysis University of California at San Diego San Diego Supercomputer Center Cisco Systems National Science Foundation U.S. Department of Homeland Security
Grants from Cisco Systems, the National Science Foundation (NSF), the Department of Homeland Security (DHS), and CAIDA members.


Additional Content

Data

Raw data collected from the spread of the Nyxem Virus in January and early February 2006.

Published