All of the test files are actual genetic data (save the spell comparisons)
taken from GenBank. Files fli*.txt come from bacterial flagellar protiens,
ftsa.txt is a cell division protien. stx*.txt contain the gene for the
toxin in the "Jack-in-the-box" E. Coli that caused illness and death in the
Western US in 1993 after it contaminated a batch of hamburger meat. The
genetic data comes from the E. Coli as well as the origin of the toxin,
the Shagella bacteria. The ecoli*.txt files are the first set number of
base pairs from two types of E. Coli: the labratory strain, E. Coli K12, and
the "Jack-in-the-Box" E. Coli, E. Coli O157:H7.
.
List of Files
-----------------------------------------------------------------------
Name: fli8.txt
Length: 10 and 8 bp
Protein: FliC (flagellar Cap protein)
Species: E. Coli K12 & Salmonella typhi
Notes: This file should test for insertion of gaps along with mismatches
Name: fli9.txt
Length: 9 and 7 bp
Protein: FliC (flagellar cap protein)
Species: E. Coli K12 & Salmonella typhi
Notes: This file test for consecutive gaps; very easy to recognize best
match
Name: fli10.txt
Length: 10 and 10 bp
Protein: FliC (flagellar cap protein)
Species: E. Coli K12 & Salmonella typhi
Notes: This file tests for no gap only mismatches; easy to recognize best
match
Name: stx19.txt
Length: 19 and 19 bp
Protein: StxA and StxB (Shiga Toxin Subunits A&B)
Species: Shigella Dysenteriae & E.Coli 0157:H7 (Jack-in-the-Box E. Coli)
Notes: This file test for gaps on each side
Name: stx26.txt
Length: 21 and 26 bp
Protein: StxA and StxB (Shiga Toxin Subunits A&B)
Species: Shigella Dysenteriae & E.Coli 0157:H7
Notes: Just a longer string that has gaps, consecutive gaps, and mismatches
Name: stx27.txt
Length: 21 and 27 bp
Protein: StxA and StxB (Shiga Toxin Subunits A&B)
Species: Shigella Dysenteriae & E.Coli 0157:H7
Notes: Longer string w/ consecutive gaps
Name: gene57.txt
Length: 57 and 56 bp
Protein: ?
Species: ?
Notes: Sequence Contained in assignment writeup
Name: ftsa.txt
Length: 1263 and 1272 bp
Protein: FtsA (A cell division protein)
Species: E. Coli K12 and Cresentus Caulobacter
Notes:
Name: stx1230.txt
Length: 1213 and 1230
Protein: StxA and StxB (Shiga Toxin Subunits A&B)
Species: Shigella Dysenteriae & E.Coli 0157:H7
Notes: This is the gene in the "Jack-in-the-Box" E. Coli epidemic in the
Western US in 1993, as found in both the E.Coli itself and the original
host for the virulence genes
Name: ecoli5000.txt
Length: 5000 and 5000 bp
Proteins: First 5000 bp of genome
Species: E. Coli K12 & E.Coli O157:H7
Notes: The entire genomes of both bacteria are on the university of
Wisconson website in text formats. Each is 4.4 mbp long.
ecoli2500.txt, ecoli3000.txt, ecoli7000.txt, ecoli8000.txt,
ecoli9000.txt, ecoli10000.txt, ecoli20000.txt, ecoli50000.txt,
ecoli100000.txt, ecoli500000.txt, ecoli1000000.txt are similar.