(hide) Jump to Section
Parallel Computing for Everyone
Abstract
We request 12 two-processor dual core nodes for multiple uses. The processors will double the throughput of the astronomy department Condor distributed computing system. The computers will also be linked with high performance networking to enable the nodes to be use together as a supercomputer. The dual functionality of a high-performance computer cluster will enhance the distributed computing power and provide students with a parallel computer.
We created a <a href="http://www.astro.washington.edu/stinson/techfee/">webpage</a> that demonstrates how students will use these computers. The page includes a list of the accomplishments of students who have used STF approved computing resources in the past.
Background
Gone are the days when all astronomers spent countless nights in front of an eye piece staring at their favorite celestial object. Today, many astronomers are finding themselves more familiar with high-performance computing than high-precision optics. Computing methods have evolved drastically, and typical applications have moved beyond single-processor desktop machines, requiring specialized hardware and software. Two intrinsically different approaches exist in high-performance computing. Distributed computing refers to multiple executions of the same code on different data on multiple machines at the same time with no interprocessor communication. Distributed computing provides a simple way to greatly reduce the overall time required to complete work. By contrast, in parallel computing one instance of a program is broken into different pieces and run on several processors at once while all the processors communicate with one another. Students in the Astronomy Department at the University of Washington employ both of these approaches in their daily research tasks. However, these modern methods require complex hardware and our students have recently found their research endeavors stifled by limited computing resources. The cluster we are requesting in this proposal will enable our students to greatly expand their research avenues by enhancing both types of computing.
Astronomy is entering the era of "survey science", where large astronomical surveys such as the Sloan Digital Sky Survey produce terabytes of data on millions of celestial objects from galaxies to comets. One of the next planned major surveys, the Large Synoptic Survey Telescope (LSST), will collect roughly 30 terabytes of data per day. Realizing that this is the future of astronomy research, one of the major components of our Department's vision is to train our students through research to effectively use these large data sets. They cannot possibly be analyzed on a single machine, and therefore the use of distributed computing is crucial. Another very important aspect of cutting-edge research involves the use of the world's most advanced telescopes. Students in the department work with high resolution imagery from the Hubble and Spitzer Space Telescopes, Chandra X-Ray Telescope, and GALEX, among others. The detailed images comprise enormous data sets that require a powerful computational infrastructure for analysis. Using a single computer processor, typical tasks leading to scientific publications could take years. Thus, such astronomical research requires a distributed computing system to divide the workload across a large number of machines and reduce real computation time to days or even seconds.
Some astronomical problems involve datasets that cannot be broken down in this way so that parallel computing must be employed. For example, theoretical astronomy involves ever-larger simulations that would never be completed within an astronomer's lifetime running on a single processor. Many students work in the Astronomy Theory group running such simulations, but they currently do not have a readily accessible parallel machine to pursue their research efficiently.
Students have joined with the Astronomy Department to develop a strong distributed computing infrastructure (see STF proposals <a href="http://techfee.washington.edu/proposals/2002-405-1">2002-405</a>, <a href="http://techfee.washington.edu/proposals/2003-062-1">2003-062</a>, <a href="http://techfee.washington.edu/proposals/2004-017-1">2004-017</a>). The department provided a fast network backbone and many professors PCs to combine with 26 Linux PCs and high-capacity disk storage funded through the STF. Condor scheduling software was installed on all of these computers to distribute jobs across the network. Students submit jobs to Condor, and the system searches for idle machines on which to run the program. If there are more users than available machines, the Condor system ensures that each user receives an equal amount of processor time. With this system, Condor users simultaneously execute many single-processor jobs by harvesting otherwise idle CPUs. Over the past year, more than 200,000 node hours have been harvested from such CPUs. That is the equivalent of 24 computers running around the clock for one full year. However, this is not enough; the Condor queues are constantly overloaded. When not being used for parallel processing, the additional 48 state-of-the-art processors of the proposed cluster would double the computational throughput of our Condor pool and help satisfy the saturated demand that we currently experience. Examples of research that Condor enables and a publication list of completed work are displayed at this <a href="http://www.astro.washington.edu/stinson/techfee">webpage</a>.
However, until now, the pressing need for a parallel computing resource has yet to be addressed. A modern, parallel machine is unavailable for students' use. While desktop systems can be configured to run parallel jobs, the communication between these processors is too slow to gain significant speed from parallelization. See this <a href="http://www.astro.washington.edu/roskar/benchmark.html">webpage</a> for an example. Parallel applications require a high-speed network to spend less time communicating and more time calculating. Therefore, the network equipment requested is an essential component of what makes the cluster a special computer. The cluster will be a unique resource, specialized for parallel tasks, but equally available for simple distributed computing.
Successful research in observational and theoretical astronomy necessitates extensive computing resources, both parallel and serial. Providing our graduate and undergraduate students with the necessary means to hone their skills in this important aspect of research is therefore of paramount importance to their future success. In our current situation, however, our Condor pool has been pushed to its limits, and easy access to a modern parallel computer is non-existent. The dual functionality of a high-performance computer cluster would alleviate both of these problems at the same time, better equipping our students for successful careers in astronomy.
Benefits
The chief benefit of more computer power is continuing the <a href="http://www.astro.washington.edu/stinson/techfee/">exciting research</a> of the Astronomy department. The <a href="http://www.astro.washington.edu/stinson/techfee/">webpage</a> shows how Condor running on STF-funded equipment has enabled students to publish many papers in refereed journals and driven the research of six Ph.D. theses. Condor is so useful and has become so popular that the job queues are filled beyond capacity. This upgrade will help meet this increased demand, and significantly improve students' productivity.
In addition to enhancing the Condor pool, the proposal will provide students with a much needed parallel computing environment. They will be able to run intricate simulations using existing parallel programs. The clusters currently available to students are out-of-date or overloaded. Faculty and post docs constantly use the 12 processors on the modern machine leaving a cluster that was purchased in 2000 for students. Unfortunately, this cluster does not run much faster in sum total than one typical desktop machine.
The new machine will also enable more efficient use of the small, valuable amounts of time allocated to the department on supercomputers at national facilities. These supercomputers allow researchers to run simulations concurrently on hundreds or thousands of processors. Running on so many machines means that one small mistake in a complicated configuration wastes thousands of node hours of valuable computing time (not to mention power) on an incorrect calculation. A medium-sized local cluster would allow students to ensure that their complicated configuration is correct before moving the simulation to the large national facility.
Additionally, huge datasets from both observations and simulations can be analyzed in parallel. When a dataset is larger than the memory of one processor, it needs to be analyzed on a cluster that can use the memory of many machines. Real-time access to large machines at national super-computing centers is not possible. Because of this, students are not presently equipped to deal with the output from state-of-the-art simulations or the enormous amounts of data projected to be collected by upcoming surveys.
Knowledge and familiarity with efficient parallel processing is becoming an invaluable skill for all branches of astronomy. The cluster will give graduate students the necessary experience to be competitive in the job market after graduation and will help undergraduate students gain acceptance into more competitive graduate programs. Top graduate programs expect undergraduates applying for admission to possess research experience. Serious research projects need to be squeezed into the final short years of an undergraduate education between classes and other activities. Fast, cutting-edge computational tools like the new cluster will make their research possible.
Finally, each and every time research opportunities of our students improve, the University benefits. Student research is a prime component of the University; its prestige in the wider academic world is directly derived from the success of its students. Successful research projects look good not only on the student's CV, but also on the grant applications of that student's advisor. Since roughly 52% of of each grant successfully secured by a faculty member goes straight to the University, the wider community reaps significant direct benefit from our research success. Modern computing resources also make the department more attractive in the eyes of incoming graduate students. The breadth of potential research is an important consideration for every student choosing a graduate program. The computer resources requested in this proposal would therefore help attract the best students to our program, and by extension, keep the University of Washington's high profile.
Student Access
This system will be accessible to anyone with an Astronomy Department user account. These accounts are given to anyone who chooses to do research in the Astronomy Department, undergraduate or graduate. We note that many Physics graduate students choose to work in the Astronomy Department. There is no distinction in the accounts of undergraduates, graduates, or physicists, so all of these students will receive equal access to the cluster resources. Access will be possible through two different avenues:
<ul>
<li> Command line access will be available through remote login programs such as ssh. In this way, individual processors can be used in the same manner as a normal desktop machine.
<li> Processor access via Condor. Students will be able to submit jobs from their desktop, lab, or personal machines using the simple Condor submission procedure that is outlined on Astronomy Department <a http://www.astro.washington.edu/reschke/SystemDocs/Condor/">web pages</a>.
</ul>
The department's <a href="http://www.astro.washington.edu/reschke/SystemDocs/Condor/">Condor webpage</a> contains detailed instructions and examples compiled by students, so that anyone with minimal computer experience can quickly start using the Condor system. The nature of the Condor scheduling system promotes equal usage to all students running jobs on our network. As one user builds up usage time on the network, his or her priority rating decreases, and jobs submitted by another user with a higher priority rating will cause the first user's jobs to be suspended until both priority ratings are comparable. This system ensures that these resources will not be monopolized by a few select students. Additionally, the Condor system will be configured to manage both parallel and serial jobs, so that neither type of work will take precedent over the other. Programs compiled for current Condor machines will work immediately since the operating system and processor architecture are identical.
We are also interested in linking the new cluster into a new campus wide grid of computing clusters. In this way, various campus units will be able to share their valuable computing power. Such a grid is not yet in place, but we have begun to explore the possibility of using the advanced features of Condor to enable such a technological leap forward. The linking of the physics and astronomy department Condor pools would be one large step toward this goal. We have already discussed such a plan with the Physics Department (see Physics STF proposal <a href="http://techfee.washington.edu/proposals/2005-038-2">2005-038</a>). While physics students without astronomy user IDs will not be able to access the cluster directly, their jobs will be able to execute on it via Condor. Therefore, the immediate accessibility of the requested cluster will not be limited to our department only.
Available Resources
<h2>Personnel</h2>
A computing cluster is a complicated system to install. Fortunately, our Astrophysics Theory group employs one of the leading experts on cluster design and installation, Chance Reschke, who was involved in creating the first Beowulf cluster in the early 1990s and has been working with clusters ever since. Since this cluster will be of paramount importance to the work of our students in the Astrophysics Theory group, Chance will be available to assist us with the installation and configuration. Our department's computing webpages include an <a href="http://www.astro.washington.edu/reschke/SystemDocs/Condor/">introduction to Condor</a>, as well as a <a href="http://www.astro.washington.edu/condor/Condor_expls/CondorIDL.html">page with simple examples</a> to assist first-time users. Several graduate students (Greg Stinson, Rok Roskar, Nathan Kaib, Peter Yoachim) will also provide assistance to Condor users as well as those wishing to utilize the parallel capabilities of the new cluster.
Chance does not run the Astronomy and Physics department network, however. That task is left to the Physics and Astronomy Computing Support (PACS). PACS has agreed to support the head node of the cluster that will talk to the astronomy network. Such collaboration means that all astronomers will have easy access to this machine using their current user IDs and passwords.
<h2>Computing Resources</h2>
<h3>81 processors</h3>
<ul>
<li><b>Undergraduate Teaching Lab:</b> 18 processors purchased with STF funds in 2005 worth $35k
<li><b>Graduate PCs:</b> 26 processors purchased with STF funds in 2003 worth $59,800
<li><b>Faculty, Staff Desktops:</b> 37 processors obtained using a variety of faculty grants worth $75k
</ul>
<h3>Current Computer Clusters</h3>
<ul>
<li><b>Beowulf Cluster:</b> 64 out-of-date processors obtained from Intel grant in 2000 worth $280k
<li><b>Shared Memory Clusters:</b> two 12 processor SGI machines obtained using the Intel
grant and worth $250k, but rarely available to students
<li><b>Time at National Supercomputing Facilities:</b> 250,000 node hours on NSF computers; 200,000 hours at Arctic Regional Supercomputing Center available to students in the Astrophysics Theory group
</ul>
<h3>Disk Storage:</h3>
<ul>
<li><b>Student RAID arrays:</b> 15.2 TB Array purchased with STF funds from 2004 worth $60k will be connected via GBit ethernet to the cluster so that data will flow quickly from the disks to the processors and back.
<li><b>Faculty Disks:</b> Faculty have spent $100k on 8 TB of various disks around the department that support their work and the work of some of their grad students.
</ul>
<h3>Network:</h3>
<ul>
<li>Gigabit ethernet between servers
<li>100 Mbit ethernet to all desktops
</ul>
Installation Timeline
As this proposal requests an expensive item, our experience is that the most time-consuming part of the project will be procurement when the University will solicit bids from various vendors.
Once the equipment is ordered, most vendors can ship the items within a couple of weeks. Installation will take another couple of weeks. Once all the machines are plugged in and working, the astronomy/physics network is standardized to the point that the computers can be integrated into the Condor pool within a week. We estimate that the computer can be fully integrated within 4 months after the funds are approved.
Departmental Endorsement
The department has committed itself to pursuing survey science. This will be impossible without sufficient computing resources.
Prof. Tom Quinn cites the lack of cluster computing power as the major current impediment to his students moving their research forward.
Department Chair Bruce Balick stresses that computing power is the engine that drives astronomical research. Without it, the department loses prestige as research suffers.
Please see the comments from faculty and students below.
Student Endorsement
Students look forward to continuing <a href="http://www.astro.washington.edu/stinson/techfee">their work</a>. The <a href="http://www.astro.washington.edu/stinson/techfee">webpage</a> provides individual examples of how students see the new computer helping their research.
Items
Below are the items making up the current proposal. The asterisk (*) beside items signify that they were approved by the committee. This however was not implemented correctly for our database before 2005, so earlier years may not show this.
Click an item's title to view details on that item, or show all item details.
| Title | Type | Price | Qty | Subtotal | |
|---|---|---|---|---|---|
| Cluster Node | server | $6,757.00 | 12 | $81,084.00 | |
| Location: Physics / Astronomy Bldg -
Description:
2 dual core 2.2 GHz AMD Opteron processors Justification: computing power able to talk with other nodes quickly | |||||
| 26-Port 4x Fabric Copper Switch | network-equipment | $5,007.00 | 1 | $5,007.00 | |
| Location: Physics / Astronomy Bldg - Description: Low latency network switch Justification: Enables nodes to work together as high performance cluster | |||||
| 3m 4x Fabric Copper cables | network-equipment | $114.00 | 12 | $1,368.00 | |
| Location: Physics / Astronomy Bldg - Description: Myrinet cables Justification: so nodes can commuicate quickly | |||||
| Switch | network-equipment | $2,815.00 | 1 | $2,815.00 | |
| Location: Physics / Astronomy Bldg - Description: standard network switch Justification: administrative network | |||||
| Head node | server | $2,799.00 | 1 | $2,799.00 | |
| Location: Physics / Astronomy Bldg -
Description:
HP ProLiant DL385 (2u) Justification: administrative node for storage and connection to internet | |||||
| Hard Disk | Hardware | $899.00 | 5 | $4,495.00 | |
| Location: Physics / Astronomy Bldg - Description: 300 GB SCSI hard disk Justification: fast access disks for simulations run on cluster | |||||
| KVM switch | monitor | $2,000.00 | 1 | $2,000.00 | |
| Location: Physics / Astronomy Bldg - Description: monitor used to configure cluster Justification: | |||||
| Redundant Power Supply | Hardware | $299.00 | 1 | $299.00 | |
| Location: Physics / Astronomy Bldg - Description: power supply Justification: makes hardware more failsafe | |||||
| Redundant Fan Kit | Hardware | $194.00 | 1 | $194.00 | |
| Location: Physics / Astronomy Bldg - Description: Extra fans Justification: | |||||
| Tax | tax/shipping | $9,000.00 | 1 | $9,000.00 | |
| Location: Physics / Astronomy Bldg - Description: tax @9% Justification: tax | |||||
| shipping | tax/shipping | $300.00 | 1 | $300.00 | |
| Location: Physics / Astronomy Bldg - Description: shipping Justification: get stuff here | |||||
| Requested Total: | $109,361.00 | ||||
| Approved Total: | $104,546.00 | ||||
| Funding Status: | Partially Funded | ||||
Comments (currently disabled)
Astronomy will only continue to become a more computing intensive field. The experience in parallel computing that will be gained by our students from easy access to this cluster will be a unique and highly valued skill that few other astronomy programs can offer, and this will serve them well in their future careers. Additionally, out of all astronomy programs, ours posesses one of the premier distributed computing networks, and our students' research has flourished because of this. The huge gains resulting from the purchase of this cluster will ensure that our Condor network will remain a state of the art research tool unique to our program.
The Condor cluster is one of the most valuable research tools available to astronomy students. It ensures that no time is wasted in producing exciting results that would otherwise take years to produce. Simply put, I could not complete my thesis without having this capability. I use three surveys of the entire sky comprising over 5 terabytes of data to determine how millions of stars are distributed. Without the capabilities of parallel processing, this work would take years to perform. Please help us in maintaining this critical resource for our students.
Additional computing power is sorely needed by the Grad Students as the Dept is involved in many computationally intensive projects. The state of the art research done by the grad students often requires thousands of CPU hours for each individual project. National supercomputers would not be the answer to the students needs as the flexibility given by local resources and up date CPUs is required when developing new and innovative software.
The proposal is extremely well thought, all the major aspects have been carefully planned. Costs are realistics. Most importantly the persons involved in it, starting with Chance Reschke and the grad student at our department have a long and outstanding record. This will make sure that the most suitable equipment will be acquired and then used to the best of its capabilities, giving UW the best (cosmic) bang for the bucks.
Twenty years ago astronomy emerged from the era of photgraphic plates into the era of electronic detector arrays. The sensitivity, linearity, and field of view of these detectors plus simultaneous advances in telescope technolgies created a stunning revolution in how astronomical data are obtained and interpreted. This plus the luanch of one novel satellite after another (Hubble Space Telescope, x-ray and gamma-ray telescopes, etc) has profoundly and precipitously changed the face of research astronomy: what we do and how we do it.
Not only has the data volume increased (catalogues of data are 30 terabytes now, and 20 petabytes in a decade), but the sohpistication and versatility of numerical modelling has led to a doubling of our computing capacity needs every two years (much like Moore's law).
Five years ago we had about one per CPU desk, or about forty workstations in the department. Today we have two hundred CPUs, all which are connected to serve as a parallel processor for huge compute jobs. But our needs are growing even faster than the number and speed of our CPUs. Only the newest and most modern data storage systems, servers, networking software, and flexible operating systems come close to serving our research needs. Our key research problems, such as dark energy and planet detections, continue to drive the need for computer hardware even above new telescopes in the list of our infrastructure priorities. (I know that this sounds bizarre!)
The research toolkit of our graduate students simply must be constantly upgraded if they are to seek and find winning postdoctoral positions in a highly competitive field. Our grants and state funds can't entirely bear the entire financial load. The sorts of tools proposed in STF 2006-010-1 enable the sorts of world-recognized research of our students that can successfully launch their careers in research.
We are a successful research department. The last time that we did a study, 85% of our PhD graduates were still actively engaged in astronomical research and teaching. Statistics like these help to explain why our graduate program was ranked first in the U.S. in a survey of graduate students (with Caltech in the next spot). We need your investment in our infrastructure order to continue to build the international reputation of the Department as one of the premier centers of graduate studies in astrophysics.
--Bruce Balick, Chair, Astronomy Department
This proposal represents an exciting and unique opportunity to move astronomical research at the University of Washington to a new level. Many research projects are currently limited solely by our inability to obtain sufficient computing power and/or storage to complete them--a problem that is only getting worse as we move to producing the more and more detailed computer simulations needed to match the information provided by larger and larger astronomical surveys.
I write in strong support of this proposal. It will help prepare undergraduate and graduate students for the reality of the job market. Individuals who know how to set up, maintain, and use parallel and distributed computing are and will continue to be more desirable job candidates in the rapidly changing world of astronomy. Astronomical simulations are increasingly complex and students who learn how to create code which run in modern environments will be at the forefront of astronomical research in the decades to come.
Having had to analyse reams of data from the Sloan Digital Sky Survey (SDSS), I can attest to the fact that our current computing resources are simply not sufficient in this age of survey-based astronomy. It currently takes weeks to sift through the millions upon millions of spectra and images in the survey. A faster Condor pool would greatly alleviate this problem.
The Condor pool has made it possible for several of our past and present graduate students to base their theses on the kind of cutting edge research that is only possible with strong computing resources. Access to distributed computing is what makes it possible for students here to contruct "Universes in a bottle", detailed models that describe astrophysical phenomena in ways impossible to observe, even with the best telescopes. Our computing resources are the heart of the widely-respected theoretical work being done in this department, and are key to maintaining our desirability as a graduate program. Without improving the Condor pool, it will be impossible for us to keep pace with the rest of the astronomical community.
In my time as a grad student in the Astronomy Department, I have seen the computing network evolve from barely adequate group of individual computer to a sophisticated pool of machines that's necessary for the majority of our research. Additionally, each time that STF has funded the expansion of the Astronomy Computing Network the new resources were quickly fully realized and utilized. The Condor network has become such an important part of the research for this department, both theoretical and observational, that most grads (and motivated undergrads too!) use if for their research at some point.
In addition to continuing all the great research we are doing, it's natural for the network to evolve into two more specalized groups of computers. The parallel cluster will allow grads easy access to a sophisticated and powerful computing environment, and much like when we introduced Condor 5 years ago, it will create an explosion of research that takes advantage of that equipment
If you build it, science will get done.
As a graduate student in the Astronomy department, I have had the great pleasure of using the Condor distributed computing system. I have published research that was done using Condor, and plan to have Condor computations feature prominently in my thesis. I don't think there can be any doubt that Astronomy students have made great use of Condor in the past, and will continue to think up great projects in the future which can only be done with massive amounts of computing power.
As an astronomy graduate student, I will use Condor as a major tool in my thesis research. I must use Condor to analyze our multi-million particle galaxy simulations. Because there is so much data, a single analysis program run on my own desktop would take days in most instances, and up to a month in some cases (not exaggerating!). Clearly, running these programs a single desktop would cripple my desktop for a month. With condor, I can separate the data and run programs on any available computer, generally reducing the time required to a matter of hours rather than a matter of days (or weeks). Expanding the Condor resources is an excellent idea.
Even though the next large survey telescope is a few years away, all preparations are being done now. The pipeline needs to be in place and operating efficiently. These astronomical survey projects have resulted in a wave of new discoveries that cross all fields, from solar system objects to distant galaxies. The UW Astronomy Department is one of the leaders, and the computing capacity must keep pace. The research opportunities that the increase in computing power would offer are nearly without limit. Definitely fund this proposal!
I am one member of the Astronomy department who is actively involved in the software design and development for the LSST Project. One of the Project's most difficult challenges will be data management - how to reduce, analyze, and interpret Petabytes of astronomical data. To be a successful end-user of LSST, a scientist will have to be versed in the language of distributed computing and be comfortable (and responsible!) in formatting and executing complex and computationally intensive queries. Those future scientists are now our current grad students - this is an optimal time to gain this necessary experience, and this proposal addresses directly the resources needed to do it. As well, it will help us to gain the institutional knowledge and infrastructure needed to take advantage of our position as founding partners in LSST - being prepared to immediately extract science from the LSST data stream (which will be more like a firehose!). Its also very clear that the current computational resources are invaluable to the successful graduation of a large number of our graduate students, and yet are oversubscribed. This suggests any investment in resources will certainly be utilized.
Additionally, out of all astronomy programs, ours posesses one of the premier distributed computing networks, and our students' research has flourished because of this. The huge gains resulting from the purchase of this cluster will polo outlet|wholesale shoes|wholesale soccer cleats| ensure that our Condor network will remain a state of the art research tool unique to our program.
Mortgage Mis Selling
I must appreciate you for the information you have shared.I find this information very useful and it has considerably saved my time.thanks:)
Buy Articles | Life Experience Degree | bachelor degree | Sample Essays
Note: This cannot be undone.