:tocdepth: 2
NumPy record arrays
-------------------
In this example, we want to
* get access to an online catalog with photometric data,
* make a plot of all the stars in that catalog
* select the bright (visual magnitude smaller than 3, and blue ones (brighter in B than in V).
* look up the official names and spectral types of the 10 brightest stars in the selection
The catalog we use is the Geneva Photometry Catalog from Rufener (1988).
It is accessible through the ViZieR website via its designation ``II/169/main``.
First we will download the catalog to the local disk:
.. sourcecode:: python
import urllib, pyfits, os
vizier_url = 'http://vizier.u-strasbg.fr/viz-bin/asu-fits/VizieR?' # website of ViZieR
cat_url = '&-source=II/169/main' # name of the catalog
cat_options = '&-oc=deg,eq=J2000&-out.all' # retrieve all columns, coordinates in J2000 and degrees
if not os.path.isfile('gen.fits'): # don't download if the file already exists
urllib.URLopener().retrieve(vizier_url+cat_url+cat_options,filename='gen.fits')
What's in this FITS file? Let's have a look:
.. sourcecode:: python
hdus = pyfits.open('gen.fits')
len(hdus) # 2 HDUs, an empty primary and a Table extension
print(hdus[1].header) # information on the columns names, contents and type
That's a lot to digest! We have *many* columns, and some of them contain strings,
other contain floats etc... A normal numpy array doesn't seem fit to hold this
kind of information: a normal array cannot contain columns of different types,
and functions like ``sum`` or ``transpose`` loose their meaning. Also, putting
all the data in columns in a matrix is not optimal, because we want to access
columns by their name (easy to remember), not by a number (difficult to remember).
That's easy enough with the FITS table extension:
.. sourcecode:: python
print(hdus[1].columns.names)
print(hdus[1].data.field('Vmag'))
Still, we want to use the powerful indexing mechanisms of numpy arrays to select
for bright and blue stars, and plot the coordinates.
For this particular problem, numpy subclasses the basic array type into ``record
arrays``, which are very similar to FITS table extensions, but add the power of
numpy. In fact, when you use pyfits, the pyFITS record behaves almost like a numpy
recarray.
.. raw:: html
Click to Show/Hide Introduction to recarray
Let's start with a small example. We make a record array with 3 columns: an index
array (integer type), a column with magnitude values, and a column with star
names:
.. sourcecode:: python
col1 = np.arange(4)
col2 = [0.03,-1.47,1.816,0.42]
col3 = ['Vega', 'Sirius', 'Mirfak', 'Betelgeuse']
recarr = np.rec.fromarrays([col1,col2,col3],names=['index','vmag','name'])
recarr
The default string representation is quite difficult to read. Luckily, ``matplotlib``
provides us with a nice alternative:
.. sourcecode:: python
import pylab as plt
print(plt.mlab.rec2txt(recarr)) # pretty print
recarr['vmag'] # extract one column
recarr[0] # extract first row (or record)
recarr.dtype.names # access column names
recarr2 = recarr[ (recarr['vmag']<0) ] # slicing is possible as usual
print(plt.mlab.rec2txt(recarr2)) # pretty print
.. raw:: html
Though a pyFITS record has almost the same behaviour as
we require, a numpy record array is more general in its use. Therefore, we convert
the FITS record to a numpy record array:
.. sourcecode:: python
names = hdus[1].columns.names # we need the column names
cols = [hdus[1].data.field(col) for col in names] # and their content
cat = np.rec.fromarrays(cols,names=names)
print(cat.dtype.names) # similar to hdus[1].columns.names
cat.shape # and we have access to numpy commands
Next, we make a new catalog, with only the bright and blue stars.
.. sourcecode:: python
bright_blue = (cat['Vmag']<3) & (cat['V-B']>0)
len(bright_blue),sum(bright_blue) # how many stars do we have, and how many or bright and blue?
cat_select = cat[bright_blue] # make the selection
cat_select = cat_select[np.argsort(cat_select['Vmag'])] # sort according to magnitude
.. raw:: html
Click to Show/Hide Alternative with dictionaries
How would the above look when we would use dictionaries instead of record arrays?
First, we make a dictionary of the catalog:
.. sourcecode:: python
cat2 = {name:cat[name] for name in cat.dtype.names}
cat2_select = {key:cat2[key][bright_blue] for key in cat2}
sortarray = np.argsort(cat2_select['Vmag'])
cat2_select = {key:cat2_select[key][sa] for key in cat2_select}
Compare [2] with [31] and [4] with [32]. Though we can the same with dictionary
with an equal amount of lines, record arrays provide a much more logical interface,
and does not require us to cycle over the columns in the catalog manually.
.. raw:: html
Now we can make the plots that we want. For fun, we first make a histogram of
the magnitudes of all stars in the catalog. Then, we make a plot of the location
of all stars (grey dots) and the selected one (blue dots). Also, we scale the
size of the dots with the brightness of star.
.. sourcecode:: python
plt.hist(cat['Vmag'],bins=50)
.. sourcecode:: python
size = ((cat['Vmag'].max()-cat['Vmag'])/5.)**3 # size the dots with mag
size_select = ((cat['Vmag'].max()-cat_select['Vmag'])/5.)**3
plt.scatter(cat['_RAJ2000'],cat['_DEJ2000'],color='0.5',s=size,edgecolors='none')
plt.scatter(cat_select['_RAJ2000'],cat_select['_DEJ2000'],color='b',s=size_select,edgecolors='none')
+------------------------------------+-----------------------------------+
| .. image:: genhist.png | .. image:: gencat.png |
| :scale: 50 | :scale: 50 |
+------------------------------------+-----------------------------------+
Finally, we are interested in the names 10 brightest blue stars in this catalog.
We need to convert the HD number to the official star name. We use
the ``sesame`` online database (i.e. SIMBAD), download the HTML files with the
info on the stars, and extract their name and spectral types. We
can use the non-standard package BeautifulSoup:
.. sourcecode:: python
import BeautifulSoup as bs
import urllib2
for hd in cat_select['HD'][:10]:
html = "".join(urllib2.urlopen('http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame/-oxpsIF/S?HD{0}'.format(hd)).readlines())
page = bs.BeautifulSoup(html)
name = page('oname')[0].text
sptype = page('sptype')[0].text
print(name,sptype)
There seems to be a solar-like star in there!
.. raw:: html
Click to Show/Hide Solution without BeatifulSoup
What would the previous script look like when BeautifulSoup is not installed? In
this case, we'll have to read in the and extract the information manually. We
could write the following Python code:
.. sourcecode:: python
for hd in cat_select['HD'][:10]:
lines = urllib2.urlopen('http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame/-oxpsIF/S?HD{0}'.format(hd)).readlines()
for line in lines:
if 'spType' in line: sptype = line.split('>')[1].split('<')[0]
if 'oname' in line: name = line.split('>')[1].split('<')[0]
print(name,sptype)
.. raw:: html