NAME

AffyDB - An interface for insert affymetrix, licr annotation tables and UCSC exons maps into a Entity Relationship mySQL database and make queries on this data.


SYNOPSIS

        # INSERT TABLES
        # AffyDB scripts that create a new set of Tables from affymetrix file annot_csv,
        # create licr information tables starting for Licr file, make index  for speed up query,
        # and generate a Report and disconnect from mySQL ...
        use AffyDB;
        my $affytable = AffyDB -> new (
                mysql     => 'affy:localhost',
                user      => 'foo',
                password  => 'bar',
                affyfile  => 'Mouse430_2_annot.csv',
                licrfile  => 'Mouse430_2.RefSeq',
                chip      => 'Mouse430_2',
                index     => "1",
                summary   => "1"
        );
        $affytable -> affy_disconnect();
        
        
        # ADD EXON MAP
        use AffyDB;
        my $affytable = AffyDB -> new (
                mysql        => 'affy:localhost',
                user         => 'foo',
                password     => 'bar',
                refseqfile   => 'RefSeqExons.txt',
                refseqcode   => 'HumanRefSeq1',
                chip         => 'test'
        );
        $affytable -> affy_disconnect();
        
        
        # RETRIVE INFORMATION
        # GET'S methods used to retrive information from mySQL tables
        # Retrive probeset informatoion:
        use AffyDB;
        my $affytable = AffyDB -> new (
                mysql     => 'affy:localhost',
                user      => 'foo',
                password  => 'bar',
        )
        
        # get single probeset INFO
        my $probe = '1415670_at';
        my @probe = $affydb -> get_probeset($probe);
        my $chip = 'Mouse Genome 430 2.0 Array';
        print "Probe is @probe\n";
        
        # get all probesets of a chip
        my @probes = $affydb -> get_all_probeset($chip);
        print join ("\n" , @probes);
        
        $affytable -> affy_disconnect();


ABSTRACT

This perl library uses perl5 objects to make it easy to create and query a mySQL improved version of Affymetrix annotation tables. This package defines AffyDB objects, attributes and arguments. Using a AffyDB object's methods, you can insert new tables and retrive data. Using this module with Probeset.pm analysis module is possible to generate original data about single probes position on the genome.

AffyDB.pm provides a simple object-oriented interface to mySQL tables.

The current version of AffyDB.pm is available at

http://bio.ifom-ieo-campus.it/splicy/src/AffyDB.pm


DESCRIPTION

CONSTRUCTOR AND INITIALIZATION

new affytable
my $affytable = AffyDB -> new (

        mysql     => 'affy:localhost',
        user      => 'foo',
        password  => 'bar',
        affyfile  => 'Mouse430_2_annot.csv',
        chip      => 'Mouse430_2',
        summary   => "1",
        index     => "1"

);

If you pass to the module an affyfile (format is csv: comma separeted values), this file will be inserted into the database. With summary you can ask for an extensive summary of the insertion and of the duplication avoided. With index we say to AffyDB.pm to generate index on tables (speed up queries).

new licr table
my $licrtable = AffyDB -> new (

        mysql       => 'affy:localhost',
        user        => 'foo',
        password    => 'bar',
        licrfile    => 'HG-U133A.RefSeq',
        chip        => 'HG-U133A',
        summary     => "1",
        index       => "1"

);

As for affy annotation tables, if you give a licr file as argument (format is ssv: semicolon separeted values), the file will be inserted into the database.

new query connection
my $affydb = AffyDB -> new (

        mysql     => 'affy:localhost',
        user      => 'foo',
        password  => 'bar',
        chip      => 'HG_U133A'

)

Creation of a new connection without insertion of a new file (for retrive queries relatives to chipcode: HG_U133A). Note: mySQl don't like minus (-) in the table names. For this getChipName method switch (-) to (_).

new index creation
my $affydb = AffyDB -> new (

        mysql => 'affy:localhost',
        chip  => 'test',
        index => '1'

);

In this mode, mySQl index for speed up are created:

``CREATE INDEX design_index ON design_table (public_id)''
``CREATE INDEX probes ON probes_table (set_id)''
``CREATE INDEX probes2licr ON probes_table (licr_id)''
``CREATE INDEX affyinfo ON affyinfo_table (set_id)''
``CREATE INDEX affynote ON affynote_table (public_id)''
``CREATE INDEX affyrefseq ON affynote_table (refseq_tran)''
``CREATE INDEX affyfunc ON affyfunc_table (public_id)''
``CREATE INDEX licr ON licrinfo_table (licr_id)''
``CREATE INDEX licr2unigene ON licrinfo_table (unigene)''
``CREATE INDEX probeset ON main_table (set_id)''
``CREATE INDEX set2public_id ON main_table (public_id)''

ATTRIBUTES-ARGUMENTS AND OTHER INFOS

mysql
Mysql is the name and host of the mySQL database in format: 'database:host'

user and password
mySQL username and password to connect to database

dbh (database handler)
Current databade handler. Is set with the getDBhandle method.

affyfile
affyfile is the affy annotation table as csv file (comma separeted values). If give this attribute to the module parse the file and try to insert information into the mySQL database. The subroutine check headers check table headers integrity and exit if find some differences.

licrfile
licrfile is the licr annotation file as ssv file (semicolon separeted values). If give this attribute to the module parse the file and try to insert information into the mySQL database.

refseqfile
refseqfile is the UCSC exon map as TAB-separeted file. If give this attribute to the module parse the file and try to insert information into the mySQL database.

chip
NOTE: into the mySQL tables genechip name are LONG (Mouse Genome 430 2.0 Array) while THIS chipcode affect the name of the probe and design tables for that specific genechip. CONVENTION: as chipcode you should use the name of the file without suffix: ``_annot.csv''.

EXAMPLE: table => 'MOE430A_annot.csv', chip => 'MOE430A'

Cause mySQL doesn't like '-' into the table name, The module will substitute '-' with '_' (underscore). So chip names like HG-U133A are changed to HG_U133A.

refseqcode
Use refseqcode to call Exon map table (as chipcode).

summary
(1 or 0) If you want an extensive summary of database creation and information about duplicates into the csv table put 1 here.

index
Set this argument to 1 if you want to make some index to accellerate SQL queries (BETTER)

CLASS AND OBJECT METHODS

getDBhandle
Return the current dbh (DBI database handler)

setDBhandle
Set the current DBI handler

get_affyfile
Return file name of the affymetric annotation table

get_licrfile
Return file name of the Licr annotation table

get_refseqfile
Return file name of the UCSC exon map

get_refseqcode
Return the RefSeq code for table creation and queries.

getChipName
Get the chipcode, if chipcode contain '-' (minus) substitute with '_' (underscore)

getSummary
Check if an extensive summary are asked from the user

affy_disconnect
Disconnect from the database ($dbh -> disconnect of the DBi module) use always this method to disconnect from AffyDB mySQL DB

get_index
Check if creations of index for this chip is required, used to call method create_index

GET METHODS

freequery
Prepare and execute a query from the script caller (SQL syntax)

Es:

        my $query = <STDIN>;
        chomp $query;
        my @result = $affydb -> freequery($query);
        print join ("\n", @result);

get_probeset
Prepare and execute a query to retrive probeset_id ($array[0]), genechip ($array[1]) and relative affymetrix public_id ($array[2]) and probeset description ($array[3]), cluster ($array[4]), assignments ($array[5]), notes ($array[6]) starting from a probeset id (Es: 1007_s_at)

Es:

        my $probe = '1415670_at';
        my @probe = $affydb -> get_probeset($probe);
        print "Probe is @probe\n";

get_matching_probeset
Prepare and execute a query to retrive ONLY the probesets IDs that match for a specific ACC (public_id of the affymetrix annotation tables)

Es:

        my $public_id = 'NM_013477';
        my @probes = $affydb -> get_matching_probeset ($public_id);
        print join ("\n", @probes);

get_all_probeset
Prepare and execute a query to retrive all probesets IDs (ES: 1007_s_at) starting from a chip name.

Es:

        my $chip = 'Mouse Genome 430 2.0 Array';
        my @probes = $affydb -> get_all_probeset($chip);
        print join ("\n" , @probes);

get_probeset_design
Prepare and execute a query to retrive design affymetrix information of a specific probeset.
        $info[0] = public_id
        $info[1] = seq_type
        $info[2] = seq_source
        $info[3] = target_des
        $info[4] = arch_unigene
        $info[5] = trans_id
        $info[6] = description
        $info[7] = cluster
        $info[8] = assignments
        $info[9] = notes

Es:


        my @info = $affydb -> get_probeset_design ($probe);
        print join ("\n",@info);

get_chip
Prepare and execute a query to retrive chip information starting from the chipcode.
        $info[0] = chipcode
        $info[1] = genechip_name
        $info[2] = organism
        $info[3] = annotation_date

Es:

        my @info = $affydb -> get_chip ($chip);
        print "GENECHIP info are:\n";
        print join ("\n",@info);

get_oligos
Prepare and execute a query to retrive ALL DISTINCT probes of a specific probeset. (Relation One to Many)

Format String:


        PROBE_ID: SET_ID | X:N | Y:M | ACGTGCGTGTGTGTACGCGCGAA

You will retrive an array that contains a list of rows with this format.

get_oligos_html
Prepare and execute a query to retrive ALL DISTINCT probes of a specific probeset. (Relation One to Many)

Format HTML:

        <tr><td>(X,Y) </td><td>ACGCGCGTGCAGCAGCGCAGCATGACGA</td></tr>"

get_probes
Prepare and execute a query to retrive ALL probes of a specific probeset. (WITH REDUNDANCY)

Format String:

        PROBE_ID: SET_ID | X:N | Y:M | ACGTGCGTGTGTGTACGCGCGAA

You will retrive an array that contains a list of rows with this format.

get_locations
Prepare and execute a query to retrieve ALL matching locations for a specific Probeset (One to many). Location are pushed ito an array of array:
        '_locations' => [
                                           [
                                                '1007_s_at',
                                                 '1',
                                                'NM_001954',
                                                '[3678..3702]3840',
                                                '(+)'
                                                ],
                                        ]
        

get_probe_info
Prepare and execute a query to retrive licr information about a list of single probes.
        $info[0] = probe_id
        $info[1] = licr_id
        $info[2] = x
        $info[3] = y
        $info[4] = oligo
        $info[5] = set_id
        $info[6] = position
        $info[7] = strand

get_distinct_licr
Prepare and execute a query to retrive ALL distinct RefSeq that match for this probeset (licr tables). Information are pushed in a array of array as locations.

get_distinct_licr_html
Prepare and execute a query to retrive ALL distinct RefSeq that match for this probeset (licr tables). Information are stored in a HTML table.

CSS
table class: licr
td class: head and value

get_affy_alignments
Prepare and execute a query to retrive Alignments information (affymetrix tables).

FORMAT STRING:

	GENOME VERSION: @$row[1] 
	ALIGNMENTS:     @$row[2]

get_affy_alignments_html
Prepare and execute a query to retrive Alignments information (affymetrix tables).

FORMAT STRING:

	GENOME VERSION: @$row[1]
	ALIGNMENT:      @$row[2]

get_affy_note
Prepare and execute a query to retrive Affymetrix annotation.

FORMAT STRING:

	PUBLIC ID:    @$row[0] 
	GENE_SYMBOL:  @$row[1] 
	GENE_TITLE:   @$row[2] 
	CHR_LOCATION: @$row[3] 
	UNIGENE:      @$row[4] 
	ENSEMBL:      @$row[5] 
	LOCUSLINK:    @$row[6] 
	SWISSPROT:    @$row[7] 
	EC:           @$row[8] 
	OMIM:         @$row[9] 
	REFSEQ_PROT:  @$row[10] 
	REFSEQ_TRAN:  @$row[11] 
	FLYBASE:      @$row[12]  
	AGI:          @$row[13] 
	WORMBASE:     @$row[14] 
	MGI:          @$row[15] 
	RGD:          @$row[16] 
	SGD:          @$row[17]

get_affy_note_html
Prepare and execute a query to retrive Affymetrix annotation.

FORMAT STRING:

	PUBLIC ID:    @$row[0]
	GENE_SYMBOL:  @$row[1]
	GENE_TITLE    @$row[2]
	CHR_LOCATION: @$row[3]
	UNIGENE:      @$row[4]
	UNIGENE_TYPE: @$row[5]
	ENSEMBL:      @$row[6]
	LOCUSLINK:    @$row[7]
	SWISSPROT:    @$row[8]
	EC:           @$row[9]
	OMIM:         @$row[10]
	REFSEQ_PROT:  @$row[11]
	REFSEQ_TRAN:  @$row[12]
	FLYBASE:      @$row[13]
	AGI:          @$row[14]
	WORMBASE:     @$row[15]
	MGI:          @$row[16]
	RGD:          @$row[17]
	SGD:          @$row[18]

get_affy_func
Prepare and execute a query to retrive Affymetrix functional annotation.

FORMAT STRING:

        PUBLIC ID: @$row[0] 
	GO BIO:    @$row[1] 
	GO CELL:   @$row[2] 
	GO MOL:    @$row[3] 
	PATHWAY:   @$row[4]  
        PROT FAM:  @$row[5] 
	PROT DOM:  @$row[6] 
	INTERPRO:  @$row[7] 
	MEMBRANE:  @$row[8] 
	QTL:       @$row[9]

get_affy_func_html
Prepare and execute a query to retrive Affymetrix functional annotation.

FORMAT STRING:

        PUBLIC ID: @$row[0]
        GO BIO:    @$row[1]
        GO CELL:   @$row[2]
        GO MOL:    @$row[3]
        PATHWAY:   @$row[4]
        PROT FAM:  @$row[5]
        PROT DOM:  @$row[6]
        INTERPRO:  @$row[7]
        MEMBRANE:  @$row[8]
        QTL:       @$row[9]

get_refseq_map
Prepare and execute a query to retrive specific map for a RefSeq code (ACC). As locations, refseq_map is an array of array.

get_headers
Take the headers (first line) of the affymetrix annotation table. Es:
my @headers = $affydb -> get_headers();
print join (``\n'', @headers);


BUGS

Please report them!


FILES

AffyDB.pm


SEE ALSO

Probeset.pm


AUTHOR

Davide Rambaldi, IFOM-FIRC www.ifom-firc.it e-mail: filter-drambald@ifom-ieo-campus.it


COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.