Please note: this site relies heavily on the use of javascript. Without a javascript-enabled browser, this site will not function correctly. Please enable javascript and reload the page, or switch to a different browser.
0  structures 0  species 0  interactions 0  sequences 0  architectures

Pfam VM documentation

VM Summary

Installing the Pfam website from scratch can be a tricky and time-consuming task. To make it easier for users to install and run the site locally, we have built a virtual machine (VM) with everything necessary pre-installed and, as far as possible, pre-configured.

The following pages explain how to install the virtual machine in your local virtualisation software and then configure it to run the website. With the exception of the "Install VM" section, you can find exactly the same information in a set of text files in the home directory of the main "pfamadmin" user account within the VM itself.

Note that this VM encapsulates only the code needed to run the Pfam website. It does not include the Pfam database, since that runs to several hundred gigabytes when installed and running. You will need to retrieve and install the database separately. The VM also does not include the code needed to run sequence and other searches through the website. That functionality requires another set of scripts and software and, in our case, an entirely separate set of machines. We'll look at building another VM for that task later, if there's interest.

Setting up the VM

These are the basic steps involved:

  1. Install the virtual machine image
  2. Set up the virtual machine itself
  3. Install the Pfam release database and create the "web_user" database
  4. Configure the PfamWeb application
  5. Set up the cron scripts
  6. Start PfamWeb and lighttpd
  7. Enable monit to make the site start on boot

Machine specification

Operating system Ubuntu 11.10 (oneiric) (64 bit)
Last package update 2012-01-24
CPU cores 1
Memory 1032 Mb
Hard drive 20 Gb (virtual size, dynamically allocated)
OVA file PfamWeb_20120124.ova (1.0Gb) (md5 checksum 9777f4723a4a3f65ffa2ac8911a96115)

Install the VM image

This page gives some pointers on how to install the virtual machine image in your local virtualisation software. The description covers the process in two of the most popular packages, VMware fusion (for mac) and VirtualBox (for mac, linux and windows).

Note: the VM was built using VMware and tested only briefly under VirtualBox and these instructions are based on running the software on a desktop machine, rather than a full-blown virtualisation infrastructure.

Download the VM image

The VM is packaged as an Open Virtual Appliance (OVA), essentially a bundle of files that make up an Open Virtualisation Format (OVF) virtual machine. The package should be usable in many virtualisation frameworks. You can find the OVA file on our FTP area.

Install the image

The OVA packge must be converted to a VMX file for use in VMware. This is done using ovftool. (You may need to install ovftool separately.)

VMware example
shell% ovftool PfamWeb.ova PfamWeb.vmx
Opening OVA source: PfamWeb.ova
The manifest validates
Opening VMX target: PfamWeb.vmx
Writing VMX file: PfamWeb.vmx
Disk Transfer Completed                    

Completed successfully
shell% 

The OVA file can be imported directly into VirtualBox. Choose "Import" from the "File" menu.

Start the VM under the GUI

It's useful to start the VM at least once under the GUI of your virtualisation software, if only to check that it boots and allows you to log in. If you are using VirtualBox, you will also need to install the "guest additions" using the GUI.

Start the VM in "headless" mode

Once you've checked that you can successfully log into the VM and that it connects to your network etc., you may want to start it in "headless" mode, i.e. without a GUI interface. If you're planning to run the VM as a server for a long period, you will almost certainly want to run in this mode.

VMware example
shell% /Library/Application\ Support/VMware\ Fusion/vmrun start \
  $HOME/Documents/Virtual\ Machines.localized/PfamWeb.vmwarevm/PfamWeb.vmx nogui
VirtualBox example
shell% VBoxManage startvm PfamWeb --type headless
Waiting for VM "PfamWeb" to power on...
VM "PfamWeb" has been successfully started.

Finding the IP address of the running VM

If you run your VM in a GUI, you will be able to log in directly, but if you run the VM in "headless" mode you may need to query the guest OS to find the assigned IP address.

The following script gives an example of how to do retrieve the IP address of the VM semi-automatically using the vmrun" (PDF) command with VMware. The VMware tools must be installed in the guest before this script will work. The VM comes with VMware tools pre-installed but if you update packages in the guest OS, you may need to re-install them.

VMware example script: get_guest_ip.sh
#!/bin/sh

VMRUN=/Library/Application\ Support/VMware\ Fusion/vmrun
GU=pfamadmin
GP="admin password"
VM=$HOME/Documents/Virtual\ Machines.localized/PfamWeb.vmwarevm/PfamWeb.vmx
TMP_FILE=/tmp/IP_address

"$VMRUN" -T fusion -gu "$GU" -gp "$GP" runScriptInGuest        "$VM" "/bin/bash" "/sbin/ifconfig eth0 | perl -ne 'print \$1 if m/inet addr:(\\d+\\.\\d+\\.\\d+\\.\\d+) /' > $TMP_FILE"
"$VMRUN" -T fusion -gu "$GU" -gp "$GP" CopyFileFromGuestToHost "$VM" "$TMP_FILE" "$TMP_FILE"
"$VMRUN" -T fusion -gu "$GU" -gp "$GP" deleteFileInGuest       "$VM" "$TMP_FILE"

cat $TMP_FILE
\rm $TMP_FILE

You can then log in using something like:

VMware example
ssh pfamadmin@`get_guest_ip.sh`

As with VMware, VirtualBox allows querying of the guest OS from the host. Also like VMware, VirtualBox requires software on the guest to allow the host to interact with it. These "guest additions" need to be installed before you can query the VM for information like its IP address.

Once the guest additions are installed, the VM can be interrogated from the host using VBoxManage, something like this:

VirtualBox example
ssh pfamadmin@`VBoxManage guestproperty get "PfamWeb" "/VirtualBox/GuestInfo/Net/0/V4/IP" | awk '{ print $2 }'`

Re-configuring the VM specification

The VM, as shipped, has 1Gb of memory, 20Gb of disk space and a single CPU core. These are probably the minimum requirements for running the PfamWeb server. If your server will be used heavily, you may want to consider increasing the memory allocated to the VM to 2Gb or more. Adding more CPUs or CPU cores will improve the performance of the website, though the specification of your database server will also have a significant effect. Consult the documentation for your virtualisation software to find out how to re-configure the VM.

Logging into the VM

The VM is configured with a single user account, "pfamadmin". The password for the account is "admin password". It is highly recommended that you change the password immediately after you log in for the first time.


Installing VirtualBox "guest additions"

Installing the guest additions into VirtualBox can be somewhat involved. Here are some pointers to getting it to work.

Mount the guest additions CD in the VM

VirtualBox can present the guest additions installer to the VM as a CD image. Choose "Install Guest Additions" from the "Devices" menu of the VM, which effectively inserts a CD into the virtual CD drive of the virtual machine. You then need to mount the CD:

Mount "guest additions" CD
pfamadmin@ubuntu:~$ ls /dev/cdrom*
/dev/cdrom2@
shell sudo mount /dev/cdrom2 /media
[sudo] password for pfamadmin: 
mount: block device /dev/sr0 is write-protected, mounting read-only
pfamadmin@ubuntu:~$ 

Install pre-requisites

Next you need to install the necessary pre-requisites for building linux kernel modules: the guest additions include a module that allows the host to interact with the guest.

First, install the module-assistant package, which helps install the required kernel headers and other essentials for building kernel modules. Note: since this involves using apt-get to download and install packages, you may need to configure your proxy settings before you can do this.

Install module-assistant
pfamadmin@ubuntu:~$ sudo apt-get install module-assistant
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  module-assistant
0 upgraded, 1 newly installed, 0 to remove and 2 not upgraded.
Need to get 0 B/101 kB of archives.
After this operation, 582 kB of additional disk space will be used.
Selecting previously deselected package module-assistant.
(Reading database ... 34381 files and directories currently installed.)
Unpacking module-assistant (from .../module-assistant_0.11.4_all.deb) ...
Processing triggers for man-db ...
Setting up module-assistant (0.11.4) ...
pfamadmin@ubuntu:~$ 
Install pre-requisites
pfamadmin@ubuntu:~$ sudo module-assistant prepare
Getting source for kernel version: 3.0.0-12-generic
apt-get install linux-headers-3.0.0-12-generic 
Reading package lists... Done
...

Install the guest additions

Finally, run the installer:

Install "guest additions"
pfamadmin@ubuntu:~$ sudo /media/VBoxLinuxAdditions.run 
Verifying archive integrity... All good.
Uncompressing VirtualBox 4.1.8 Guest Additions for Linux.........
VirtualBox Guest Additions installer
Removing installed version 4.1.8 of VirtualBox Guest Additions...
Removing existing VirtualBox DKMS kernel modules ...done.
Removing existing VirtualBox non-DKMS kernel modules ...done.
Building the VirtualBox Guest Additions kernel modules
The headers for the current running kernel were not found. If the following
module compilation fails then this could be the reason.

Building the main Guest Additions module ...done.
Building the shared folder support module ...done.
Building the OpenGL support module ...done.
Doing non-kernel setup of the Guest Additions ...done.
Starting the VirtualBox Guest Additions ...done.
Installing the Window System drivers ...fail!
(Could not find the X.Org or XFree86 Window System.)
pfamadmin@ubuntu:~$ 

If the installation has been successful, you should now be able to probe the VM from the host machine:

Check guest properties
shell% VBoxManage guestproperty enumerate vm
Name: /VirtualBox/GuestInfo/OS/Product, value: Linux, timestamp: 1327501754794589000, flags: 
Name: /VirtualBox/GuestInfo/Net/0/V4/IP, value: 193.62.207.242, timestamp: 1327501754802162000, flags: 
Name: /VirtualBox/HostInfo/GUI/LanguageID, value: en_US, timestamp: 1327499066451493000, flags: 
...

In particular you should now see an entry /VirtualBox/GuestInfo/Net/0/V4/IP, which was not previously visible. The value for that property gives the IP address for the running VM.

Configure the virtual machine

This file documents the steps needed to set up the virtual machine
itself.

jt6 20111012 WTSI


1. Change the "pfamadmin" password
==================================

On traditional unix systems, the user "root" had total control over all files
and processes. With modern linux distributions such as Ubuntu, there is no
root user, but individual user accounts are granted "super-user" permissions
for certain operations using the "sudo" command. 

This VM comes configured with a single user account, "pfamadmin", with a
default password of "admin password". Since this account can be used to make
any change to the system, it's important to reset the password immediately. 

Use the "passwd" command. You will be asked to give the old password ("admin
password") and then to enter the new password twice. 

  pfamadmin@ubuntu:~$ passwd
  Changing password for pfamadmin.
  (current) UNIX password: 
  Enter new UNIX password: 
  Retype new UNIX password: 
  passwd: password updated successfully
  pfamadmin@ubuntu:~$

If you'll be logging into the VM frequently, you'll probably want to set up
passwordless logins, using public/private keys for authentication instead:

  https://help.ubuntu.com/11.10/serverguide/C/openssh-server.html


2. Configure a web-proxy
========================

Some institutions require their users to access the wider internet through a
network gateway, or "proxy". Any software that needs to access the network
will need to be configured to direct requests through the proxy.

"apt-get"
---------

Ubuntu uses a packaging system for installing and maintaining software.
"apt-get" and related commands can be used to manage packages, including
updating them when bug-fixes or security patches are released. 

If you need to configure your browser to use a web proxy in order to
access the web from your site, you will also need to configure "apt-get"
to use the same proxy, so that it can retrieve information about updates
and the update packages themselves. 

Edit the following file:

  └── etc/
      └── apt/
          └── apt.conf.d
              └── 70debconf

Append the following line:

  Acquire::http::proxy "http://<proxy URL>:<proxy port>";

including the URL and port number for your site's proxy. If you need to
use a proxy for http connections, you'll probably also need to use one
for secure connections. If so, append a second line:

  Acquire::https::proxy "https://<secure proxy URL>:<secure proxy port>";

Check the Ubuntu documentation and "howtos" on the web for information
on keeping the VM up-to-date.

cpan
----

Perl modules are generally installed using a perl-specific packaging
tool called "cpan". Again, in order to contact repositories and to check
for updates, cpan needs to know if you use a web proxy:

  pfamadmin@ubuntu:~$ sudo cpan
  [sudo] password for pfamadmin: 

  cpan shell -- CPAN exploration and modules installation (v1.9402)
  Enter 'h' for help.

  cpan[1]> o conf http_proxy http://<proxy URL>:<proxy port>
      http_proxy         [http://<proxy URL>:<proxy port>]
  Please use 'o conf commit' to make the config permanent!

  cpan[2]> o conf commit
  commit: wrote '/etc/perl/CPAN/Config.pm'

  cpan[3]> q
  Lockfile removed.

You may need to use cpan to update perl modules occasionally, but mostly you
should leave the perl installation untouched.

In the shell
------------

Finally, you may want to configure the proxy in the ".bashrc" file, so that
it's set for all new shells. 

  └── home/
      └── pfamadmin/
          └── .bashrc

Append the following lines:

  export http_proxy=<proxy URL>:<proxy port>
  export https_proxy=<secure proxy URL>:<secure proxy port>


3. Set up an email address to receive monitoring alerts
=======================================================

We use Monit (http://mmonit.com/monit/) to maintain and control the
server processes. When it detects a change in a service or when it
restarts a failed process, monit will send an email to an administrator
to let them know about the event.

The VM is configured so that monit will send event emails to the
"pfamadmin" user on the VM itself. You may want to change the email
address to the main address of the person who will be administering the
machine.

The monit configuration file is:

  └── etc/
      └── monit/
          └── conf.d/
              └── general

Edit "general" and change the email address on the "set alert" line:

  set alert pfamadmin@localhost


4. Configure the "physical" parameters of the VM
================================================

One of the advantages of a virtual machine is that its "physical"
characteristics can be changed easily. For example, the Pfam website VM
is initially configured as a single-core, 64-bit machine with 2Gb of
RAM. One parameter that you might choose to change is the available
memory.

If the website is heavily used, you may want to increase the memory of
the VM to perhaps 4Gb or more, if available. Alternatively, if the host
machine is relatively small, you may want to reduce the memory allocated
to the guest (this VM), so that the host machine doesn't run out of
memory itself.

Before you can alter the characteristics of the VM, you will probably
need to shut down the VM. Use:

  pfamadmin@ubuntu:~$ sudo shutdown -h now

to shutdown the guest operating system cleanly, then reconfigure and
restart the VM using your virtualisation software.


5. Set your timezone
====================

The VM is configured for the "Europe/London" timezone. You will need to
reconfigure it if you're in a different zone. The safest way is to use
"dpkg-reconfigure tzdata", which presents you with a list of
geographical areas and timezones and sets the value in /etc/timezone for
you:

  pfamadmin@ubuntu:~$ cat /etc/timezone
  America/Los_Angeles
  pfamadmin@ubuntu:~$ sudo dpkg-reconfigure tzdata

  Current default time zone: 'Europe/London'
  Local time is now:      Mon Jan 16 11:47:18 GMT 2012.
  Universal Time is now:  Mon Jan 16 11:47:18 UTC 2012.

  pfamadmin@ubuntu:~$ cat /etc/timezone
  Europe/London
  pfamadmin@ubuntu:~$ 


6. Configure mail
=================

The mail system on the VM is configured for local mail only. Also
mail sent to the "root" user will be forward to "pfamadmin"; you can
read it using "mail" or "alpine". 

"root" is commonly sent notices about problems with the system, such as
warnings from monit, so you may want to configure the mail system to
send mails to a user outside of the VM.  You will need to reconfigure
the postfix system to do that:

  pfamadmin@ubuntu:~$ sudo dpkg-reconfigure postfix
  ...

This will take you through a series of forms to set up the mail system
as needed.


7. Configure access to the machine from your network
====================================================

The VM is configured as a basic Ubuntu server, with no restrictions on
access or limits on who can view the website. You will need to configure
security features such as firewall rules or access restrictions
according to your local network policy. Please check the standard Ubuntu
documentation for help on security improvements to the machine.


Useful links
============

http://en.wikipedia.org/wiki/Sudo
https://help.ubuntu.com/community/InstallingSoftware
http://mmonit.com/monit/
http://en.wikipedia.org/wiki/Virtual_machine
https://help.ubuntu.com/11.10/serverguide/C/postfix.html

Install databases

This file documents the steps involved with setting up the Pfam databases.

The Pfam release database uses around 300Gb of disk space when installed
and running, making it too large to distribute easily within a VM.
Instead, you will need to set up a local installation of MySQL and
install the Pfam database within it, using dump files available on the
Pfam FTP area. 

The Pfam website uses two separate databases. One, the main Pfam release
database, contains all of the data about Pfam families. The other, "web_user",
stores ancillary data, such as the content scraped from Wikipedia. You will
need to set up both databases in order to run the website.

jt6 20111012 WTSI


1. Install MySQL
================

You can download MySQL from:

  http://dev.mysql.com/downloads/mysql/

The main Pfam websites use MySQL 5.0 and we recommend that you install
at least version 5.0.

Installing and configuring MySQL itself is beyond the scope of this document,
but there are many tutorials and guides on the web.


2. Install the Pfam database
============================

You can find the database dump files on the Pfam FTP area, at:

  ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/database_files/

The "database_files" directory contains two files for every table, a
".sql" file with the SQL "CREATE TABLE" statement, and a ".txt" file
with the raw data. You will need to download and install all of the
tables. One way to load, for example, the VERSION table is:

  mysql> source VERSION.sql
  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.08 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.02 sec)

  Query OK, 0 rows affected (0.04 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  Query OK, 0 rows affected (0.00 sec)

  mysql> load data infile "/nfs/users/nfs_j/jt6/Downloads/VERSION.txt" into table VERSION;
  Query OK, 1 row affected (0.01 sec)
  Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

You can confirm that the table now has data by doing an extra query:

  mysql> select * from VERSION\G
  *************************** 1. row ***************************
                       pfam_release: 25.0
                  pfam_release_date: 2011-03-31
                 swiss_prot_version: 2010_05
                     trembl_version: 2010_05
                      hmmer_version: 3.0b2
                     pfamA_coverage: 76.7
          pfamB_additional_coverage: 7.9
             pfamA_residue_coverage: 53.9
  pfamB_additional_residue_coverage: 6.9
                    number_families: 12273
  1 row in set (0.00 sec)

  mysql> 


3. Create the "web_user" database
=================================

Creating the "web_user" database should be as simple as running an SQL
script to create the database and a set of empty tables.

You can find the script for generating the "web_user" database at:

  └── home/
      └── pfamadmin/
          └── create_web_user.sql

You can execute the script either from the shell using something like:

  pfamadmin@ubuntu:~$ mysql < create_web_user.sql

or from within the mysql client, with:

  mysql> source create_web_user.sql


Useful links
============

http://dev.mysql.com/doc/refman/5.1/en/installing.html
http://dev.mysql.com/doc/refman/5.0/en/batch-commands.html

Configure PfamWeb application

This file explains how to configure the PfamWeb application.

The Pfam website is built as a Perl application, PfamWeb. The dynamic content
in the pages is retrieved from a pair of MySQL databases and the web
application needs to be configured to connect to them.

Sitting in front of the PfamWeb application is a light-weight web server,
lighttpd, which serves static content such as images and CSS files, and proxies
dynamic content from the perl application to the browser. The web server is
configured to use port 8000 by default, but this can be changed if necessary.

jt6 20111012 WTSI


1. Set the database connection parameters
=========================================

PfamWeb is the Perl application that runs the Pfam website. It's
configured via a pair of Apache-style configuration files:

  └── opt/
      └── www/
          └── conf/
              ├── pfamweb.conf
              └── pfamweb_local.conf

The only changes you should need to make will be in the
"pfamweb_local.conf" file. You should not need to edit the main
"pfamweb.conf" file.

The main configuration parameters are the database connection settings:

  
    schema_class "PfamDB"

    connect_info "dbi:mysql:database=pfam_25_0;host=database_server;port=3306"
    connect_info pfamwebro
    connect_info 

    dasDsn   "http://das.sanger.ac.uk/das/pfam"
    dasTo    4
    dasProxy "http://:"
  

The first "connect_info" line specifies the data source name (DSN),
essentially the connection details of the database itself. You will need
to have the host name of the machine where MySQL is running, as well as
the port where it's running, and the name of the database itself
(usually something like "pfam_25_0").

The next two lines give the username and password for the database
account that will be used to connect to the Pfam database. We recommend
making this a read-only account. For further security you should
restrict the account to accessing only the Pfam database schema.

The next lines give the parameters for the DAS-components of the site.
If your VM will need to use a web-proxy to connect to the wider
internet, you need to give it here on the "dasProxy" line, otherwise
leave the value blank ("").

Finally, you need to give connection details for the "web_user" schema.
The database account that you use here should have read/write access to
the web_user schema:

  
    schema_class "WebUser"

    connect_info "dbi:mysql:database=web_user;host=database_server;port=3306"
    connect_info webuser
    connect_info 

    
      AutoCommit 1
      mysql_enable_utf8 1
    
  

You should leave all other configuration parameters untouched.


2. Set the port number for the website
======================================

You may need to change the default port number (8000), depending on your
network environment. The port is set in one of the lighttpd configuration
files, "20-pfamweb.conf":

  └── etc/
      └── lighttpd/
          └── conf-available/
              └── 20-pfamweb.conf

Change the value on the following line:

  server.port = 8000


3. Configure PfamWeb to expect a front-end proxy
================================================

When designing network architectures, it's common to make all web
traffic come through a dedicated machine, a "front-end" proxy, which
then directs traffic to the appropriate internal machine, something like
this:
                     Internal network | Internet
+------------+      +------------+    |    +------------+
|   PfamWeb  |<-----| Front-end  |<-~ | ~--|   Client   |
|   server   |      |   proxy    |    |    |   browser  |
+------------+      +------------+    |    +------------+
"request from        192.168.0.10     |    87.248.112.181
87.248.112.181"                       |

If you plan to make your Pfam website accessible outside of your
immediate network, and if external traffic will be arriving via an
internal proxy, you may need to adjust the configuration of the
PfamWeb application to tell it that it is running behind a front-end
proxy. 

Telling PfamWeb about the proxy allows it to record the correct IP for
incoming requests, using the IP of the client rather than that of the
proxy, which can be important for auditing or for dealing with malicious
traffic from the wider network.

More importantly, PfamWeb builds many of the URLs in the Pfam website
dynamically and these links can be incorrectly generated if requests
are not properly treated as coming from outside of your network.

To tell PfamWeb that it's running behind a front-end proxy, edit the
local PfamWeb configuration file:

  └── opt/
      └── www/
          └── conf/
              └── pfamweb_local.conf

Change the value of "using_frontend_proxy" to 1:

  using_frontend_proxy 1


Useful links
============

http://search.cpan.org/dist/Config-General/
http://redmine.lighttpd.net/wiki/lighttpd/Docs:ConfigurationOptions

Configure cron jobs


This file explains how to configure the cron jobs that maintain some
of the ancillary data needed by the website.

jt6 20111012 WTSI

There are three cron scripts on the VM:

  mapping_cron.pl
  scrape_cron.pl
  update_das_sources.pl

The first script is responsible for updating the live mapping between
Pfam families and Wikipedia articles. The second downloads the content
of any Wikipedia articles that are referenced by a Pfam family. The
final script updates the list of available DAS sources from the DAS
registry.


1. Edit the cron script configuration file
==========================================

All three scripts deposit data in the "web_user" database. The database
connection parameters are given in:

  └── opt/
      └── www/
          └── conf/
              └── crons.conf

The connection parameters are given in a slightly different format in
this file, compared to the other configuration files, but the host name,
port and account details will be the same:

  
    db_name  web_user
    db_host  database_server
    db_port  3306
    username webuser
    password 
  

You can use the same database account for this as for the web_user
database in the PfamWeb configuration.

If you need to use a web proxy to access the web, you will need to
configure it in two places for the cron scripts.

First, in the "crons.conf" file, set it on the "das_proxy" line:

  das_proxy "http://:"

Leave the value blank if you don't have to use a proxy.


2. Configure the crontab
========================

If you need to use a proxy, that also needs to be set in the crontab
file. Edit the crontab:

  pfamadmin@ubuntu:~$ crontab -e
  
which will drop you into your configured editor. If you would prefer to
use a different editor, you can quit without saving and then choose a
different one using the "select-editor" command at the shell prompt.

In the crontab file, look for the line:

  # http_proxy=http://:

Add your server name and port then uncomment the line by removing the
leading "#". If you don't need to use a proxy, leave that line
untouched.

By default, the output of the cron scripts will be sent to the
"pfamadmin" user on the local machine. If you would prefer to choose a
different address to receive these logging emails, set the values of
"MAILTO" as appropriate.


3. Enable the cron scripts
==========================

Again in the crontab ("crontab -e"), look for the following three lines and
remove the leading "#" to enable the jobs:

  #   00  00,12       *  *  *  $PERLBIN /opt/www/PfamScripts/wiki/mapping_cron.pl -c $CRON_CONFIG
  #   01  01,13       *  *  *  $PERLBIN /opt/www/PfamScripts/wiki/scrape_cron.pl -c $CRON_CONFIG
  #   30  01          *  *  *  $PERLBIN /opt/www/PfamBackend/scripts/update_das_sources.pl -c $CRON_CONFIG

Save the file and exit the editor to install the new crontab.


4. Update the Wikipedia mapping and content
===========================================

You can now wait until the cron scripts run automatically over night, or
you can run them manually to populate the two relevant tables
immediately.

To run the scripts:

  pfamadmin@ubuntu:~$ http_proxy=http://:
  pfamadmin@ubuntu:~$ PERL5LIB=/opt/www/PfamLib:/opt/www/PfamSchemata
  pfamadmin@ubuntu:~$ PERLBIN=/usr/bin/perl
  pfamadmin@ubuntu:~$ CRON_CONFIG=/opt/www/conf/crons.conf
  pfamadmin@ubuntu:~$ $PERLBIN /opt/www/PfamScripts/wiki/mapping_cron.pl -c $CRON_CONFIG
  main:::90 INFO: retrieved 2091 article-to-entry rows for db rfam
  main:::90 INFO: retrieved 4916 article-to-entry rows for db pfam
  main:::97 INFO: got 7007 accessions in final mapping

  pfamadmin@ubuntu:~$ $PERLBIN /opt/www/PfamScripts/wiki/scrape_cron.pl -c $CRON_CONFIG
  ...


Useful links
============

http://en.wikipedia.org/wiki/Cron

Start website


This document explains how to start the Pfam website on the VM. 

Both the PfamWeb application and the lighttpd web server are controlled using
"monit". Monit takes care of starting the processes initially and then watches
to make sure they are running constantly, restarting them if they crash or
become unresponsive.

jt6 20111012 WTSI


1. Start PfamWeb
================

To start the PfamWeb application:

  pfamadmin@ubuntu:~$ sudo monit start pfamweb

There will be no output from this command, but you can check the
progress of the start-up using the shell alias "server":

  pfamadmin@ubuntu:~$ server
  root      3476     1  0 00:05 ?        00:00:00 /bin/sh /etc/init.d/pfamweb start
  www-data  3488     1 42 00:05 ?        00:00:01 perl /opt/www/PfamWeb/script/pfamweb_fastcgi.pl -M FCGI::ProcManager::MaxRequests -n 10 -l /tmp/pfamweb_fastcgi.socket -p /var/run/pfamweb.pid
  1000      3494  2000  0 00:05 pts/5    00:00:00 egrep --color=auto cgi|famweb|lighttpd

In this listing, process 3488 is the "master" server process starting up.
Checking again a few seconds later shows the server processes up and running:

  pfamadmin@ubuntu:~$ server
  www-data  3488     1  1 00:05 ?        00:00:04 perl-fcgi-pm [PfamWeb]
  www-data  3499  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3500  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3503  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3504  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3505  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3506  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3507  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3508  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3509  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3510  3488  0 00:05 ?        00:00:00 perl-fcgi
  1000      3536  2000  0 00:09 pts/5    00:00:00 egrep --color=auto cgi|famweb|lighttpd


2. Start lighttpd
=================

Similarly, starting lighttpd is done through monit:

  pfamadmin@ubuntu:~$ sudo monit start lighttpd

and again, the progress can be seen using "server":

  pfamadmin@ubuntu:~$ server
  www-data  3488     1  1 00:05 ?        00:00:04 perl-fcgi-pm [PfamWeb]
  www-data  3499  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3500  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3503  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3504  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3505  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3506  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3507  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3508  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3509  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3510  3488  0 00:05 ?        00:00:00 perl-fcgi
  www-data  3551     1  0 00:10 ?        00:00:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
  1000      3553  2000  0 00:10 pts/5    00:00:00 egrep --color=auto cgi|famweb|lighttpd

Now a lighttpd process (3551) can be seen in the list.

You can check that the server is working correctly using "curl":

  pfamadmin@ubuntu:~$ curl http://localhost:8000/
  !<DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "<http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

  <html>

    <head>
            <title>Pfam: Home page</title>
      <meta name="verify-v1" content="GjV+z5lf7mSCShhAOJZh1UW8J+iiCgWmbxIFg2GkG0Q=" />
  <meta name="verify-v1" content="FA9AR+bh3BmS05vcSp0mbiAB80DgELEAkFvu4q9ViC8=" />

  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  ...


3. Stopping the website
=======================

If you need to shutdown the website, you need to stop both the lighttpd
and PfamWeb processes. It's slightly cleaner to shutdown lighttpd first,
since shutting down PfamWeb but leaving lighttpd running will cause the
web server to serve error messages to users. Shutting down lighttpd
means simply that users will not be able to connect to the site.

To stop lighttpd:

  pfamadmin@ubuntu:~$ sudo monit stop lighttpd

and to stop PfamWeb

  pfamadmin@ubuntu:~$ sudo monit stop pfamweb

Although lighttpd responds very quickly to a "stop" command, PfamWeb can
be much slower. Check with "server" and wait for all "perl-fcgi"
processes to disappear.


4. Check process statuses using monit
=====================================

You can see what monit thinks is the current state of the server
processes using:

  pfamadmin@ubuntu:~$ sudo monit status
  The Monit daemon 5.2.1 uptime: 55m

  Process 'pfamweb'
    status                            running
    monitoring status                 monitored
    pid                               3488
    parent pid                        1
    uptime                            11m
    children                          10
    memory kilobytes                  108232
    memory kilobytes total            1147132
    memory percent                    10.5%
    memory percent total              112.1%
    cpu percent                       0.0%
    cpu percent total                 0.0%
    unix socket response time         5.005s to /tmp/pfamweb_fastcgi.socket [generic]
    data collected                    Thu Oct 13 00:16:56 2011

  Process 'lighttpd'
    status                            running
    monitoring status                 monitored
    pid                               3590
    parent pid                        1
    uptime                            0m
    children                          0
    memory kilobytes                  1388
    memory kilobytes total            1388
    memory percent                    0.1%
    memory percent total              0.1%
    cpu percent                       0.0%
    cpu percent total                 0.0%
    port response time                0.000s to localhost:8000 [DEFAULT via TCP]
    data collected                    Thu Oct 13 00:16:56 2011

  System 'system_ubuntu'
    status                            running
    monitoring status                 monitored
    load average                      [0.00] [0.02] [0.05]
    cpu                               0.1%us 0.7%sy 0.0%wa
    memory usage                      314188 kB [30.7%]
    swap usage                        1052 kB [0.1%]
    data collected                    Thu Oct 13 00:16:56 2011

The important lines are the "status" rows, which should be showing "running"
if all processes are up and working.


5. Check the server logs
========================

The web server keeps copious logs of usage and errors. You can see basic
access information in the lighttpd "access.log" while "error.log" gives a
detailed log of what the PfamWeb application is doing:

  └── var/
      └── log/
          └── lighttpd/
              ├── access.log
              └── error.log

The access log shows the IP adress of incoming requests, the time of the
request and the resource that was requested, along with several other pieces
of information that can be useful for debugging:

  pfamadmin@ubuntu:~$ sudo tail -f /var/log/lighttpd/access.log
  172.16.100.1 172.16.100.130:8000 - [13/Oct/2011:17:16:39 +0100] "GET /static/images/box_darker.gif HTTP/1.1" 200 2894 "http://172.16.100.130:8000/static/css/cb.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
  172.16.100.1 172.16.100.130:8000 - [13/Oct/2011:17:16:39 +0100] "GET /static/images/borders_darker.gif HTTP/1.1" 200 178 "http://172.16.100.130:8000/static/css/cb.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
  172.16.100.1 172.16.100.130:8000 - [13/Oct/2011:17:16:43 +0100] "GET /shared/images/__utm.gif?utmwv=1&utmn=1594200320&utmcs=UTF-8&utmsr=1920x1200&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.3%20r183&utmdt=Pfam%3A%20Search%20Pfam&utmhn=172.16.100.130&utmr=-&utmp=/tab/search 
  ...

The error log shows, by default, a detailed debug log for every request
that the PfamWeb application handles:

  pfamadmin@ubuntu:~$ sudo tail -f /var/log/lighttpd/error.log
  2011-10-14 11:35:55: (mod_fastcgi.c.2701) FastCGI-stderr: [info] *** Request 1 (0.000/s) [3112] [Fri Oct 14 11:35:53 2011] ***
  [debug] "GET" request for "/" from "172.16.100.1"
  [debug] PfamWeb::index: generating site index
  [debug] Rendering template "pages/index.tt"
  [debug] Response Code: 200; Content-Type: text/html; charset=utf-8; Content-Length: 26289
  [info] Request took 1.709306s (0.585/s)
  .------------------------------------------------------------+-----------.
  | Action                                                     | Time |
  +-------------- 
  2011-10-14 11:35:55: (mod_fastcgi.c.2701) FastCGI-stderr:
  ----------------------------------------------+-----------+
  | /auto                                                      | 1.596192s |
  | /index                                                     | 0.000794s |
  | /end                                                       | 0.074495s |
  |  -> PfamWeb::View::TT->process                             | 0.070278s |
  '------------------------------------------------------------+-----------'

The occasion "FastCGI-stderr" lines interspersed with the debug
information are an annoying side-effect of the interaction between the
perl application and the web-server and can be safely ignored.

Once the server is up and running properly, you may want to turn off the
verbose debugging information in the error log, which can be done by
editing the start up script for the PfamWeb application:

  └── etc/
      └── init.d/
          └── pfamweb

Edit the "pfamweb" script, using the "sudo" command, because this file
is considered to be a system file:

  pfamadmin@ubuntu:~$ sudo vi /etc/init.d/pfamweb

Change the value of "PFAMWEB_DEBUG" to 0 to disable debug messages:

  export PFAMWEB_DEBUG=0

The error log will now show only true errors and warnings from the perl
back-end.


6. Cleaning up processes
========================

It's possible for the PfamWeb server processes to get into various
strange states. In many cases, it may be simplest to reboot the virtual
machine; the server processes will be started as soon as the machine is
rebooted, so the downtime may be less than a minute:

  pfamadmin@ubuntu:~$ sudo shutdown -r now

If you want to try to clean up the server without rebooting, first stop
the server processes using monit:

  pfamadmin@ubuntu:~$ sudo monit stop lighttpd
  pfamadmin@ubuntu:~$ sudo monit stop pfamweb

If PfamWeb processes ("perl-fcgi") are left running after monit thinks
that the service is stopped, you can kill them manually:

  pfamadmin@ubuntu:~$ server
  www-data   736     1  0 11:04 ?        00:00:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
  www-data   855     1  0 11:04 ?        00:00:06 perl-fcgi-pm [PfamWeb]
  www-data  1355   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1356   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1357   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1358   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1359   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1360   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1361   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1362   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1363   855  0 11:04 ?        00:00:00 perl-fcgi
  www-data  1364   855  0 11:04 ?        00:00:00 perl-fcgi
  1000      1586  1372  0 12:56 pts/2    00:00:00 egrep --color=auto cgi|famweb|lighttpd
  pfamadmin@ubuntu:~$ sudo kill -9 855 1355 1356 1357 1358 ....


7. Enable and start monit
=============================

Monit is intially disabled on the VM, so that the website is not started
on first boot. Before it can be started, monit needs to be enabled.
Edit the configuration file for the monit init script:

  └── etc/
      └── default/
          └── monit

Change the value of "startup" to 1:

  startup=1

Start the monit daemon using:

  pfamadmin@ubuntu:~$ sudo /etc/init.d/monit start

Monit should now start the PfamWeb and lighttpd services automatically
when the machine boots.
 

Useful links
============

http://mmonit.com/monit/

Configure PfamWeb application


These are the files and directories that are used by the Pfam website and
related processes on this VM.

jt6 20120109 WTSI

/
├── etc/
│   ├── aliases
│   ├── apt/
│   │   └── apt.conf.d
│   │       └── 70debconf
│   ├── default/
│   │   └── monit
│   ├── init.d/
│   │   ├── lighttpd
│   │   ├── monit
│   │   └── pfamweb
│   ├── lighttpd/
│   │   ├── conf-available/
│   │   │   └── 20-pfamweb.conf
│   │   ├── conf-enabled/
│   │   │   └── 20-pfamweb.conf -> ../conf-available/20-pfamweb.conf
│   │   └── lighttpd.conf
│   └── monit/
│   │   ├── conf.d/
│   │   │   ├── general
│   │   │   ├── lighttpd
│   │   │   └── pfamweb
│   │   └── monitrc
│   └── timezone
├── home/
│   └── pfamadmin/
│       ├── mail/
│       ├── .bashrc
│       ├── .bash_aliases
│       ├── 00README.txt
│       ├── 01configure_vm.txt
│       ├── 02install_databases.txt
│       ├── 03configure_pfamweb.txt
|       ├── 04configure_crons.txt
|       ├── 05start_website.txt
|       ├── 06files.txt
│       └── create_web_user.sql
├── opt/
│   └── www/
│       ├── PfamBackend/
│       |   └── scripts/
│       |       └── update_das_sources.pl
│       ├── PfamBase/
│       ├── PfamLib/
│       ├── PfamSchemata/
│       ├── PfamScripts
│       │   └── wiki
│       │       ├── approvals.cgi
│       │       ├── mapping.cgi
│       │       ├── mapping_cron.pl
│       │       ├── rescrape.pl
│       │       ├── revisions.cgi
│       │       ├── scrape_cron.pl
│       │       ├── sync_articles_cron.pl
│       │       └── update_cron.pl
│       ├── PfamWeb/
│       |   ├── bin/
│       |   ├── inc/
│       |   ├── instructions.txt
│       |   ├── lib/
│       |   ├── LICENSE
│       |   ├── Makefile.PL
│       |   ├── root/
│       |   ├── script/
│       |   │   ├── pfamweb_cgi.pl
│       |   │   ├── pfamweb_create.pl
│       |   │   ├── pfamweb_fastcgi.pl
│       |   │   ├── pfamweb_server.pl
│       |   │   └── pfamweb_test.pl
│       |   └── t/
│       └── conf/
│           ├── changelog.conf
│           ├── crons.conf
│           ├── pfamweb.conf
│           ├── pfamweb_local.conf
│           └── robots/
│               ├── inra.conf
│               ├── janelia.conf
│               ├── sbc.conf
│               └── wtsi.conf
├── tmp/
│   └── pfamweb_fastcgi.socket
└── var/
    ├── log/
    │   └── lighttpd/
    |       ├── access.log
    |       └── error.log
    ├── run/
    │   ├── lighttpd.pid
    │   └── monit.pid
    └── tmp/
        └── opt/