Lead Image © mopic, 123RF.com

Lead Image © mopic, 123RF.com

Data Analysis with R and Python

On Track

Article from ADMIN 25/2015
The statistical programming language R dissects its database in a masterful way, and you can embed your R in Python using the Rpy2 interface.

Large volumes of data are most useful if you can study them with intensive data analysis. The open source language, R [1], is a powerful tool for evaluating an existing database. The R language offers a variety of statistical functions, but R can do more.

This article shows how to use R for a sample application that evaluates comet data. An Apache web server visualizes the results of statistical reports – with the help of web technologies such as HTML, JavaScript, jQuery [3], and CSS3, which Python creates in combination with R and the MongoDB [4] database.

Comet Rising

Figure 1 shows how the report generator of the sample application displays the comet data in Firefox. The selection list at the top lets the user select a report variant. If you click on Send, JavaScript sends an HTTP request to the Apache web server, which then generates the report.

Figure 1: The comet data application in Firefox.

The Python scripts first save the comet data in a MongoDB database. Scripts then parse the data and draw on R to create the report in the form of a graphic, which ends up as a PNG file in a public directory on the web server. The server sends the URL back to the browser as a response to the HTTP request.

Web Server

Using the instructions in Listing 1, the developer first prepares the Apache web server on Ubuntu 12.04 for running the sample application. The web server copies the script from the listing (with root privileges) to the /etc/apache2/site-available/rconf path. The command sudo a2ensite binds it into the web server's configuration; sudo service apache restart enables the extended configuration by restarting the web server.

Listing 1


01   Listen 8080
02   <VirtualHost *:8080>
03    DocumentRoot /home/pa/www
04    <Directory /home/pa/www>
05     Options +ExecCGI
06     AddHandler cgi-script .py
07    </Directory>
08   </VirtualHost>

Users can then access the sample application via the URL http://localhost:8080 . Line 1 tells Apache to listen on port 8080; lines 2 to 8 handle the incoming HTTP requests. Lines 5 and 6 allow Python scripts to execute via the web server's CGI interface, assuming they reside in the /home/pa/www root directory.

Data Store

The sample application uses the free NoSQL, MongoDB [4] database system as its data repository. The commands from Listing 2 install the current version 2.6.2 on Ubuntu 12.04. Line 1 retrieves the keys for the external repository, and line 2 integrates the key. Line 3 updates the package list. The last two lines install mongodb-org and the current version of the Python Pymongo interface.

Listing 2

Installing Mongo DB and Py-Mongo

01 sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
02 echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist \
   10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list
03 sudo apt-get update
04 sudo apt-get install -y mongodb-org
05 sudo easy_install pymongo

To import the sample data into MongoDB, you need the Python import_comets.py script from Listing 3.

Listing 3


01 from pymongo import MongoClient
02 import csv
03 import sys
05 def pfl(val):
06   try:
07     return float(val)
08   except:
09     return None
11 with open(sys.argv[1]) as csvfile:
12   collec = MongoClient()["galaxy"]["comets"]
13   for row in csv.reader(csvfile, delimiter="\t"):
14     try:
15       collec.insert({"name":row[0],"observer":row[1],"type":row[2],"period":\
         pfl(row[3]), "ecc":pfl(row[4]),"semaj_axs":pfl(row[5]), \
         "perih_dist":pfl(row[6]), "incl":pfl(row[7]), "abs_mag":pfl(row[8])})
16     except:
17       print "Error: could not import: ", row

The python import_comets.py data/comets.csv command starts the import at the command line. The script then parses the sample data from the CSV file, data/comets.csv.

Line 1 integrates MongoClient from the Python pymongo package; the next two lines import the csv and sys modules. Line 11 reads the CSV file path from the field in the sys.argv command-line parameter, opens the file, and stores the resulting descriptor in the csvfile variable.

If they do not already exist, the Python script then creates the MongoDB database galaxy and the comets data collection in line 12. The reader() method then parses the CSV file and splits it into columns based on the tab character.

The for loop fetches the next line from the reader object and stores it in the row field. Line 15 then finally stores the record from the row in the form of a Python dictionary with key/value pairs in the MongoDB database. The pfl() function converts numeric values to floating points. If the conversion fails, the script returns a value of None.

The keys match the attributes in Table 1. The sample data provides characteristic parameters for known comets. Comets primarily differ in terms of their trajectory shapes. Just like planets, comets move in repetitive elliptical orbits (Figure 2).

Table 1

Overview of Comet Data

Attribute Meaning
name Comet name
observer First observed by
type Comet type: RP recurring periodically; NP not period
period Orbit time in years
ecc Numerical eccentrictity ? of the orbit
semaj_axs Semi-major axis in astronomic units; 1AU = 1.4960 x 1011m
perih_dist Next perihedron distance in AU
incl Incidence angle of the orbit in degrees
abs_mag Relative brightness
Figure 2: Typical characteristic data for an ellipse: a is the semi-major axis; b is the semi-minor axis; ? represents the eccentricity, which is zero for a circle.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus