The Scriptome project vision is to help biologists
manipulate and explore their data. This page describes design principles for a
solution that is effective, easy to learn and remember, and inexpensive to
develop and maintain. It also discusses more generally the theory of equipping
non-programmers with programming tools. (The principles are even more
general if you replace "biologists" below with "scientists".) If you just want
to know how to use the Scriptome, you'll probably be more interested in the
Help or Overview pages.
- Help biologists manipulate data
Biologists may know exactly which small pieces of data they want, but
may not be able to find them in numerous, large data files.
Or they may want to switch data between two known formats, but editing
the files by hand is too time-consuming. Help biologists perform
these clearly defined data manipulations.
- Help biologists explore data
With the vast amount of raw, uninterpreted data available, sometimes the
best way to learn about data is to play with it. Help biologists explore
their data, quickly assessing their results, trying different avenues
of analysis, and finding connections between data sets.
- Present an interface appropriate for occasional, non-programmer users
Provide an interface that stresses simplicity over power, enabling
biologists to find and use the right tools. The interface should be easy to
learn - and easy to remember after a month working in the lab. (For example,
leverage biologists' familiarity with web or other computer tools, rather
than building a fancy, "intuitive" interface. If possible, do not require
them to remember command names or parameters.) Simplifying installation
and upgrade will also make the system more attractive to non-programmers.
- Solve "easy" problems
Many tools exist to perform complex bioinformatics algorithms, but there are
fewer tools available for reformatting or filtering data. Although data
manipulation may be trivial for experienced programmers, it can present a
major obstacle for non-programming scientists.
- Do a little at a time
A large program is unlikely to solve every biologists' need, even if it has
many, complex parameters. Exploring data is also easier when each step is
small: intermediate results can be checked, and it is easier to retrace steps
if the biologist goes down the wrong path. Borrow the Unix philosophy of
creating many, small tools, each of which performs a small task (but provide
only one or two parameters, to avoid confusion).
- Keep the biologist in the loop
Don't try to provide complete solutions for everything a biologist will want to
do. Biologists know their problems better than anyone else, so give them
tools to work on these problems by themselves. If tools are small and
flexible, biologists can string them together to solve their diverse,
- Take advantage of existing tools
Complement existing tools rather than replacing them.
For example, Bioperl has a huge amount of functionality with an accessible API.
This will be much simpler than creating a huge set of extra tools from scratch.
(On the other hand, use only "standard" tools, so scientists don't
need to install packages.)
- Encourage development of programming skills
Biologists who do not program should still be able to reformat or filter
their data. But many problems can realistically be solved only by using
full-fledged programming techniques, such as looping, problem decomposition,
and (process) debugging. Create a high-level and domain-specific language,
to help biologists learn these skills while (mostly) avoiding ugly details of
- Catalyze the process of learning programming
One of the most effective ways of learning a language is to read and then
tweak working code - especially when it solves problems that interest the
student. Provide scientists with short, working, documented source code.
This will help them learn programming without taking time off from their
research - all while obtaining useful results on their own data
- Create a system that can evolve
New tools will constantly be needed to handle the diverse and changing
array of questions biologists are asking (not to mention new data formats).
Developing new tools and making them available to users must be simple and
fast. Soliciting code from the bioinformatics community will speed development;
soliciting input from the greater biology community will guide development of
The Frequently Asked Questions list describes how the particular
implementation chosen for the Scriptome meets these design principles.