The EST Annotation Machine FAQ and Help

Which is the aim of the EST Annotation Machine?
This software is intended as a tool for batch retrieval of the annotation of hundreds (or even thousands!) of EST or cDNA sequences which belong to UniGene clusters. This is often the case for cDNAs immobilized on DNA (cDNA or oligo) arrays. The aim is giving as far as possible a "functional" view at least of the EST sequences belonging to UniGene clusters, which do have an associated protein similarity.

Where does the information come from?
The information is extracted from the annotation of the UniGene database or from the annotation present in sequence databases. Both UniGene and sequence databases are mirrored and updated at IFOM on a daily basis.

Which are the output options?
Requested information is availlable in three alternative output formats: Tab-delimited text file, MS-Excel-compatible or HTML table. You can receive the output files in the chosen format as a Hyperlink (which points to the result file on the server), as an Email Attachment for PC or for MAC, or directly in the body of the Email message.  In the last case you should save your email as text without headers, in the case of the hyperlink save the Web page as text with the .txt extension. MS-Excel compatible text file has a star (*) as a divider between the columns and can be imported as a delimited text file. HTML file can be visualized with any Web browser.

What about the input? Are there any limitations?
You can paste your EST/cDNA/Genomic accession numbers in one single column the window or upload the AC numbers from an external text file. Current limits are 1000 accession numbers or an Accession Number input file of 16 Kbytes. These limits are intended to insure manageability of data on your side. The software will retrieve any sequence from a given AC, provided that it is included in an UniGene cluster.

Can I leave blanks between my AC so that the output will line up correctly in my Excel file?
If you leave a blank line between one AC and the next, this will be substituted by the null description "Blank" in the output. This happens when you upload the file of Accession Numbers but NOT when you use the cut-and-paste window. Consequently, if you are interested in this feature (that is handy for maintaining the format of your original Excel file), you must upload the Accession Numbers from a file.

How come it is not interactive?
Well, it is almost interactive. If you request annotation of 1000 EST sequences, there are good chances that this operation will take long enough to disconnect your browser from our server and your job will be lost. At IFOM we rely on a distributed load system to balance the workload. In addition, every day indexes are updated - non-interactive jobs will be kept suspended in a queue until the indexing is finished, thus insuring correct completion of your jobs.

How does the output look like?
Below you will find a sample output HTML table, when all the annotation fields are selected from the form. Information about cluster ID (2), gene name (3), cluster title (4), chromosome band mapping (5), representative EST (6), LocusLink identifier (7), tissue expression pattern (8), chromosome mapping (9) and protein similarities (10) is retrieved directly from the UniGene / UniEST database.
The annotation on protein similarities (11) is retrieved dynamically from the sequence databases. The keywords (12) are retrieved from the Swissprot or Swissprot + PIR proteins similar to the given EST sequence. When requesting an HTML table, LocusLink field is a Hyperlink to the IFOM SRS LocusLink server, while in the MS-Excel format it is simply the raw LocusLink identifier.

Who should I groan to if this tool does not work properly - I want more features?
This software has been created by Alessandro Guffanti at FIRC Institute of Molecular Oncology (, Milano Italy. Local sequence databases are maintained by Davide Cittaro (
Please do keep track of your job number - if you send complaints without the correct job number that is given back to you there is not much we can do to help!!

A number of people, mainly biologist which do work with DNA arrays, have contributed to the development of this software with their feedback. This is our hall of fame:
Miriam Alcalay, FIRC Institute of Molecular Oncology, Milano, Italy
Simone Minardi, FIRC Institute of Molecular Oncology, Milano, Italy
Maria Persico, University of Milano, Italy, Biotechnology departement
Daniela Riganelli, University of Perugia, Italy