Developer Documentation Page

Most of this document will assume that you have been granted access to the project respository, hosted at https://bitbucket.org/EddieCarlson/know, and that you have set up your SSH keys for use with BitBucket. If this is not the case, please contact the Project Manager, Kimberly Nguyen, at kimberlyyh.nguyen@gmail.com.

The following instructions assume that you are starting from a new copy of the UW CSE Linux VM (available at http://www.cs.washington.edu/lab/labVMs/linuxVM.shtml; requires CSE NetID) and have completed steps 1 through 4 of the "Install & Setup" instructions at http://www.cs.washington.edu/lab/labVMs/linuxVM.shtml.

Obtaining the Source Code

To obtain a copy of the repository:

  1. Open up a terminal window.
  2. cd to a directory of your choice.
  3. Run the following command: git clone git@bitbucket.org:EddieCarlson/know.git
The repository will appear under know/.

Contents of the Repository

The repository is mainly structured around individual features; each feature is contained within its subdirectory. The repository contains the following files and folders at top level:

Each feature's directory contains the files and scripts required to create the feature, as well as any associated tests. Documentation can be found under documentation/; the directory is further subdivided into user/ and developer/, which respectively contain user and developer documentation. The repository may have multiple branches at any given time. The branch labelled "master" is the main branch, and contains the code that constitutes the publicly-visible web page. Other branches are likely works-in-progress by other developers; contact the individual developers working on those branches for more details.

Project Architecture

Our project is based around a single main page that users see when they first access the KNOW Search Visualizations web site. The main page is split up into three major sections:

The search bar is where users specify what terms they want to search the article database for. The tab bar controls which visualization is displayed in the view pane. When the page is initially loaded, the active visualization defaults to a heatmap displaying all articles within the KNOW database. A typical user search would go something like this:
  1. The user enters their terms in the search bar, and the search bar notifies the main page of the changes.
  2. The main page reloads the visualization in the view pane, providing the visualization scripts with the user's search terms.
  3. The user interacts with the visualization for a little while, and then clicks on another tab in the tab bar.
  4. The main page loads that visualization into the view pane, providing the visualization script with the user's current search terms.
  5. The above steps repeat as necessary.
Most code necessary for generating individual visualizations is contained within a single subdirectory; individual scripts within the directories are commented with appropriate documentation.

Setting Up the Development Environment

A new CSE Linux Home VM will require some configuration before it can function as a development and testing server. To configure the VM:

  1. Boot the VM and log in.
  2. In the upper toolbar, click on Applications Menu > Administration > Add/Remove Software
  3. (Optionally, when the Add/Remove Software window opens, you may want to click on Filters > Hide Subpackages from the menu bar. This will make it easier to find the required packages.)
  4. Click on the search box icon, and select "Search by name". Search for a package named "Apache HTTP Server", and double-click on the package listed. (There may be one or more copies; any one is fine.) This should select it and display a open box with a blue plus sign.
  5. Search for the "PHP scripting language for creating dynamic web sites" package, and select that as well.
  6. Click on the "Apply" button to install these packages. You may be prompted to install other additional packages and to provide your password.
  7. Wait for the packages to be installed.
  8. Apache will require some configuration to recognize and properly process PHP scripts. By default, Apache files are under /etc/httpd, and the configuration file is at /etc/httpd/conf/httpd.conf. Open this file up in a text editor. (You may want to open the text editor as root, so that you can later save changes.)
  9. Search for the block of lines beginning with "LoadModule". Add the following line: LoadModule php5_module modules/libphp5.so.
  10. This points Apache towards the PHP libraries it needs.
  11. Search for DocumentRoot, and record the directory listed. By default this is /var/www/html.
  12. This is where Apache will look to find files to display.
  13. Open up a terminal window, and enter the command php --ini. Record the file listed on the Loaded Configuration File: line (/etc/php.ini by default). This is the PHP configuration file.
  14. Add the following lines to the end of the file:
    <IfModule php5_module>
    AddType application/x-httpd-php .php
    AddType application/x-httpd-php-source .phps
    PHPIniDir "/etc/php.ini"
    (Or whereever the PHP configuration file is on your system.)
    <IfModule dir_module>
    DirectoryIndex index.html index.php
    </IfModule>
    </IfModule>
    These lines instruct Apache on how to process files with the .php and .phps extensions, and where to find the configuration file to pass to PHP.
  15. Save and close the Apache configuration file.
  16. Open up a terminal, and run the following command: sudo apachectl -k start. This starts the Apache web server service.
The virtual machine should now be configured as an Apache server, with PHP functionality attached. To test whether it works:
  1. Open up a web browser and navigate to localhost. You should see a "Fedora Test Page", which indicates that the Apache server has been properly configured and started.
  2. In the Apache content directory (DocumentRoot above), create a file called phpinfo.php with the line <?php phpinfo(); ?>.
  3. Navigate to localhost/phpinfo.php in your web browser; you should see a PHP information page, providing you with configuration information and indicating that PHP has been configured properly.
Note that you may need to manually start the Apache service each time the VM is booted, if the service is not automatically started.

You can now place the files for the webpage under the DocumentRoot directory; navigating to those localhost/insert_file_name_here will display the webpage. Contact the project manager in order to obtain a copy of the MySQL KNOW database; additional instructions for configuring the MySQL database will be provided at that point.

Displaying the Project Web Page

Due to the web-based nature of our project and the fact that we use PHP, an interpreted language, there is no requirement to compile our sources into another form. To display the project web page on your machine:

  1. Open up a terminal.
  2. Enter the following command: ln -s /path/to/repo/know /var/www/html/know, where /path/to/repo/know/ is the location of the project repository, and /var/www/html is replaced with the actual DocumentRoot location on your machine if necessary. (sudo may be necessary to create this link.)
  3. Open up a web browser and navigate to localhost/know/.
You should now see the project's main page. (If you receive an "HTTP 403 Forbidden" error, ensure that all folders along the path to the repository have file read permissions enabled for all users.)

Working on the Project

Ideally, the master branch should never contain broken code and should always be ready to be pushed to the publicly-viewable server. This means that if you want to work on some feature, it is best if you create a new branch and work off of that branch to insulate the master branch from intermediate changes. To do this:

  1. cd to the respository.
  2. git branch BranchName to create a new branch, where BranchName is the name you want to give the branch.
  3. Finally, enter git checkout BranchName to switch to the new branch.
Changes to the repository now affect this new branch. If you have configured Apache and PHP on your development machine as detailed above, then you can view your changes at http://localhost/know/. When work is complete on the new feature or bugfix, merge the branch back into the master branch:
  1. git pull origin master to update the master branch.
  2. git checkout master to switch to the master branch.
  3. git merge BranchName to merge BranchName into the master branch.
  4. git push origin master to update the remote repository.
Once the master branch has been updated, you may want to update the publicly-acessible webserver to reflect this change. The public website (http://depts.washington.edu/knowcse1/) is hosted on ovid.u.washington.edu. To update the webserver:
  1. Log on to ovid. (To request credentials, please contact Kimberly Ngyuen at kimberlyyh.nguyen@gmail.com
  2. )
  3. cd public_html
  4. git pull origin master
This will update the contents of the public webserver to those of the master branch, and changes will be visible immediately at http://depts.washington.edu/knowcse1/.

Running Project Tests

Given the user-based nature of our project, we have decided to dispense with automated testing of the generated graphics and word clouds in favor of semi-rigorous human testing of the website. Having people test the website will also make it easier to detect slight visual inconsistencies due to problems with different browsers or platforms being used to view the website.

The following tests also serve to check whether the KNOW database has been successfully set up for use by the website.

  1. Navigate to the KNOW Visualizations webpage (by default, depts.washington.edu/knowcse1/). The heatmap tab should be the first to open, and it should display an empty retangular map of the Earth, courtesy of Google Maps.
  2. On the left-hand side of the page, in the "Countries" box, select "Afghanistan". The heatmap should update to show a red circle over Afghanistan. Note that when you hover your mouse cursor over the circle, a pop-up window displays the country name as well as how many articles can be found in that country. (You may need to manually resize the browser screen if the heatmap displays incorrectly.)
  3. Now click on the "Timeline" tab. It should now display a timeline with articles regarding Afghanistan listed horizontally along the bottom, where the timeline itself is displayed. Clicking on one of the grey article boxes will shift the focus of the timeline to that article and display a preview of the article in the top pane. (You may need to manually resize the browser screen if the timeline displays incorrectly.)
  4. Now click on the "Article Cloud" tab. You should see a grey background with "The article cloud" printed in the upper-left corner of the tab area. Given the random nature of article selection for display in the article cloud, the exact article at the center of the "cloud" and which articles surrond it may differ each time the article cloud is remade. However, what should be consistent is the format of each box. Each box should be a blue rounded rectangle with black outline. The center box will have an article title and a "Go Take A Look" link which you can click to view the article. Surrounding boxes, if any, should contain a title of an article, a common subject or theme, and the "Go Take A Look" link. You can click the "refresh" button in the lower-left of the tab area to regenerate the article cloud.
  5. Now click on the "Word Cloud" tab. You will see a list of words in different colors. The first row should begin "Afghanistan", "War", "NATO", "ISAF", "Justice", and "Terrorism". "Justice" will be in pink, while the other words will be in black. Some words in the word cloud may be colored green or blue as well. Click on the word "Afghanistan". It should display a list of articles with "Afghanistan" or "Afghan" in the title; the word "Afghanistan" will be at the top of the list and a "back" button will appear below the list. Clicking on the "back" button will return you to the word cloud.
  6. Now click on the "Article List". It will display a list of all articles returned by the search, which should begin with the items "Afghanistan: Bomber kills Kunar elder Malik Zarin" and "Maria Bashir: Afghanistan's fearless female prosecutor". Clicking on the underlined article titles will take you to the article's webpage. In addition, the country for each article should be displayed under the title; in this case, the only country listed should be Afghanistan.
  7. Now change the country searched for to "Albania". Note that the article list updates immediately and that only three articles appear. Select the theme "Media and Communication". Note that now only one article appears, titled "Internet penetration in Albania up to 43.5% in 2010". This demonstrates filtering by theme.
  8. Remove the "Media and Communication" theme, and in Courses, select "SIS 201 Sp11". Note that only two of the three articles appear, titled "Libyan oil minister Shokri Ghanem 'defects'" and "Albania welcomes Chinese big business". This demonstrates filtering by courses.
  9. Remove the course filter, and in the "Published After:" field, put June 2nd, 2011. (You may have to type in "2011-06-02".) Note that only the "Albania welcomes Chinese big business" article appears; this demonstrates filtering for articles published after a certain date.
  10. Clear the "Published After:" field, and in the "Published Before:" field, put June 1st, 2011. (You may have to type in "2011-06-01".) Note that the other two articles now show up; this demonstrates filtering for artiles published before a certain date.
  11. Now put May 1st, 2011 in the "Published After:" field. Only "Libyan oil minister Shokri Ghanem 'defects'" should now appear. This test shows that multiple combinations of fields can be used to search for articles.
  12. Keeping the same search terms, click back through the other tabs. You should see that they have updated, and now generate content based on this single article. (Word cloud, for example, will only contain the words "Politics" and "Nato".) This shows that changing the search while one tab is open will cause the other tabs to update once they are opened.
This is a basic run-through of the capabilities of our visualiations. As is with most GUI applications, actually using the interface for longer may expose more bugs, but this series of tests should expose any critical flaw in the visualizations and ensure that the backend database has been set up properly and that the visualizations can display content.

Bug Tracking

Our team uses a bug tracker integrated into BitBucket; the bug tracker may be found by accessing the project repository (https://bitbucket.org/EddieCarlson/know), and clicking on the "Issues" tab.

User Documentation

Are you a poor, smelly unfortunate? Click here.