close

Вход

Забыли?

вход по аккаунту

?

GridPP-DIRAC-v1.1 - Indico

код для вставкиСкачать
GridPP DIRAC
QMUL – GridPP PMB
9th December 2014
Jeremy Coles
j.coles@rl.ac.uk
Why this talk/discussion?
•
•
•
•
Need to formally notify our uses communities about our plans to support them, our
strategy, the timescales …. What level of service we can deliver
It is extra work to develop our DIRAC capability. We could (for now) keep using the
LFC/WMS as they are still supported.
Expectations surrounding the Imperial prototype have gone beyond early testing.
There is a background policy question about funding services purely for non-LHC
VOs. Is there going to be associated funding in GridPP5 if needed?
Thanks to several people for input including: Daniela Bauer, Duncan Rand, Simon Fayer,
Janusz Martyniak, Tom Whyntie, Andrew McNab and Ewan MacMahon
GridPP DIRAC - futures
2
Jeremy Coles – 09/12/2014
Background: Where does it fit?
By using DIRAC, our users can transparently access whatever Grid, Cloud, and Vac
resources we put into the system. If/when sites start migrating capacity away from
CREAME/ARC then the users do not need to do anything differently.
GridPP DIRAC - futures
3
Jeremy Coles – 09/12/2014
What does it offer users above WMC/LFC?
Job&Pilot Monitor
Job&Pilot Browsing
Proxy Upload
Launchpad
Accounting
Configuration
Browsing
… or Admin
Certificate
Authentication
4
DIRAC as a Service
ISGC 2014, Taipei
• DIRAC seems to offer alternatives to all the services our Vos currently use in WMS/LFC
• BUT the Imperial instance DIRAC services have not been tested at large scale.
• There are some “advanced” LFC services such as the web interface but they are not us
GridPP DIRAC - futures
4
Jeremy Coles – 09/12/2014
Where are we with usage (=risk)?
1) NA62 - users J.Martyniak (IC) , Dan Protopopescu (Glasgow) - to be used for MC
production - a job submission portal has been created and can be used in parallel with
an old WMS based portal.
2) COMET - a user from IC. Interested only with data movement and the catalog.
3) landslides - active in the past, the VO is now in hibernation.
4) t2k.org - testing. They use the LFC as the catalog.
5) snoplus - testing -M. Mottram.
6) londongrid, mainly for testing. 2 people from proteomics group at QMUL wish to try
Dirac and are using Londongrid.
7) pheno - theory group from Durham, this is enabled, but they have yet to submit a job.
8) cernatschool - production - great work from Tom!
9) northgrid - enabled.
10) gridpp - enabled (running Andrew McNab's test jobs).
GridPP DIRAC - futures
5
Jeremy Coles – 09/12/2014
What do we have at Imperial?
•
•
•
•
•
Running
DIRAC v6r11p24 (the latest r11 release) s/w as stable as DIRAC gets.
There is a large amount of development work in progress and no clear upgrade path from this current
version to any future versions. This implies that any future upgrades will require downtime while the
services are upgraded and tested.
Current setup: VM with 2GB of RAM and 50G of disk space on production VM cluster, this is currently
about 80% utilised. We have
Shortly: a pair of brand new physical servers for hosting an upgraded version of the service. These are
Dell PowerEdge R420 nodes with 12-core CPUs and 96GB of RAM each; there are two pairs of 10K
SAS disks for the storage.
Will host in a standard clustered configuration with redundant pairs of DIRAC services where possible.
Some of the services (such as the database) will probably only run on one node, but have the option to
transparently move between the physical nodes in the event of a failure. We believe this is the most
reliable configuration possible given the constraints of the DIRAC software.
GridPP DIRAC - futures
6
Jeremy Coles – 09/12/2014
Development – current, planned and issues
•
•
•
•
•
•
We are primarily adapting things for "multi-VO" usage of DIRAC. The configuration was designed for a
single closed VO so the IC team have been adding code for automatic discovery of multiple-VOs (such
as automatically finding sites & users).
Working with the upstream developers has not always been smooth, they seem keen to implement
things themselves but do not necessarily meet our timescales. We suffer from there being no MoU.
Due to the lack of test code, it is common for us to find new bugs when moving to new versions with
the features we have requested. IC can develop these changes, but they thought that going via the
developers would be politically advantageous, however so far this has only delayed the work.
There is no stable upgrade path. v7 may solve many functionality needs but it is a long way from being
production ready.
Once new hardware is in place communities moving will need to start moving to the DIRAC file
catalogue (depending on LFC future).
Once the service is ready, we will require users to re-install their UIs to point at the new service, this
will be a significant changeover point. This should be done by January 2015, it may be ready in
December for initial testing (working on this now).
GridPP DIRAC - futures
7
Jeremy Coles – 09/12/2014
•
•
•
Other considerations
A low risk alternative: The WMS is just condor, gridsite and some C++ daemons. The LFC is built with
the same libraries as DPM. So we could opt to maintain these!
We could direct our other/smaller VOs to the EGI instance of DIRAC hosted by France.
Is GridPP putting any funding behind a DIRAC service? Running national services ought to be
recognised somehow by the collaboration as additional work.
But:
•
•
•
•
•
•
•
•
A view held by many is that the future of WMS and LFC is limited – it is supported for now. It is better
to build/find an alternative now.
DIRAC is currently the best alternative for the WMS, and the catalogue is improving. We would
probably have to build something similar if we do not go with it!
We host WMS and LFC in the UK for pretty much the same set of other VOs and we have seen other
services (e.g. WMS) withdrawn by other providers.
DIRAC is not completely unknown as it already has had scale usage by LHCb, Belle, ILC, BES,
Biomed.
LHCb's DIRAC work includes input from Raja, Andrew, and Rob at Liverpool; and the Imperial group
has a longstanding involvement via Ganga and directly as part of LHCb. So the UK has expertise
already.
The stability issues tend to come with new features (as they did with grid). There are workarounds.
DIRAC may improve take-up of the “other 10%” of resources such that new policies on access are
needed!
There is potential DIRAC Horizon 2020 effort: Imperial (leading) and Glasgow are in the NA62 part,
and Manchester is leading the LHCb part. STFC is mainly participating as space science, but Linda is
leading part of the core security work in the project.
GridPP DIRAC - futures
8
Jeremy Coles – 09/12/2014
User perspectives
•
•
•
•
•
•
•
Andrew has reported finding good overall stability
Tom has been quite vocal in favour of what DIRAC provides CERN@School. Landslides were keen
too.
Using software deployed via the RAL CVMFS instance, C@S can and have run full GEANT4
simulations of the satellite-based LUCID experiment and Timepix detectors. They use the DIRAC File
Catolog (DFC) to manage Logical File Names (LFNs) for our data
Tom has tested and implemented experimental metadata (e.g. start times, pixel hits, etc.) functionality
for CERN@school datasets using DIRAC’s Python API. Such functionality was either unavailable or
not documented for small VOs; up until now most users have used local SQL servers (e.g. NA62) and
custom scripting to manage their data.
There is some good documentation. But there was nothing with a full-chain example of data upload,
processing, management and analysis with DIRAC. Tom has produced some.
Previously Grid allowed too much flexibility for people with little time. DIRAC solves this problem by
providing all of the necessary functionality – job submission, data and replica management, and
metadata capabilities – in one framework.
The code is open source and available on GitHub bodes well for software maintainability.
GridPP DIRAC - futures
9
Jeremy Coles – 09/12/2014
To start discussion: A proposal
January 2015:
•
•
•
•
•
Engage DIRAC developers in discussion about code updates (MoU?)
Complete move of current prototype service to physical hardware.
Ensure regular test job stability
Review and update documentation
Migrate CERN@School to new setup (they use it already as production)
February 2015:
•
•
•
•
Move the proteomics work, T2K, SNO+ and NA62 to scale test DIRAC to levels beyond current WMS/LFC levels.
Gather user feedback and undertake gap analysis
Review DIRAC hosting from a resilience perspective (e.g. out of hours cover)
Explore options for catalogue migration (if needed)
March 2015:
•
•
•
•
•
Review outstanding development issues including upgrade paths and suitability of external hosting.
Install latest available release (assume at Imperial given expertise)
4 weeks intensive testing with VOs.
Carry out readiness review with 4 largest VOs
If appropriate start decommissioning discussion for WMS
April 2015
•
Notify VOs and community of formal switchover timelines.
May 2015
•
If readiness reviews are passed then DIRAC is declared as production service
GridPP DIRAC - futures
10
Jeremy Coles – 09/12/2014
Документ
Категория
Без категории
Просмотров
11
Размер файла
8 005 Кб
Теги
1/--страниц
Пожаловаться на содержимое документа