Jisc case studies wiki Case studies / Kindura
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Kindura

Funded by the: JISC Flexible Service Delivery programme.

Lead Institution: King's College London.

Partner Organisations: The Science & Technology Facilities Council (STFC) and DuraSpace.

Key Words: Cloud and shared services.

 

This case study was produced along with the 'Improving Organisational Efficiency' infoKit.

 

Background

 

Aims and Objectives

 

Kindura sought to pilot the use of a hybrid cloud, shared service and in-house model for providing repository-focused services to researchers across its partner institutions. These were to include services for:

 

  • Managing research outputs and information (built on top of storage services), including intelligent data placement and management, following pre-defined policies on replication, metadata extraction, conversion, etc.
  • Processing research outputs (built on top of compute services, including the ability to interface to National Grid Service (NGS) cloud services).
  • Data storage services accessed like clouds, but less elastic, so to achieve more cost-effective archiving than a public cloud could provide.

 

We carried out our pilot using DuraCloud, which has been developed by DuraSpace. We used DuraCloud to broker between storage or compute resources supplied by external cloud services, shared services, or in-house services. The Fedora repository, which interoperates easily with DuraCloud, provided the researcher front-end for managing research outputs and information. The data management infrastructure was based on the iRODS grid-based storage system to leverage its facilities for automatic replication and server-side data processing workflows.

 

All services were built to appear “cloud-like”, even internal ones—it is thus a hybrid approach that combines the advantages of the commercial/external cloud with an institutional/consortium cloud.

 

The pilot aimed to provide cloud or cloud-like services at several levels. It provides Infrastructure as a Service (IaaS) components, via storage and compute services, but more importantly it aimed to combine these, using DuraCloud and the Fedora repository as enabling technologies, to provide an integrated Software as a Service (SaaS) package of repository-centric services.

 

Project partners

 

The Kindura project was a collaboration between the Centre for e-Research (CeRch) at King's College London, the Science and Technologies Facilities Council (STFC) and DuraSpace.

 

CeRch has a focus on developing Information and Communication Technology (ICT) solutions for supporting research activities across the College, including digital repository infrastructures and other research information systems.

 

STFC provides access to world-class experimental facilities, e.g. ISIS and the Diamond Light Source. They also have extensive expertise in managing scientific research data and information, in particular using grid storage systems such as iRODS, and data replication and migration (e.g. for the Large Hadron Collider).

 

CeRch and STFC previously collaborated on the ASPiS project, which worked on using iRODS for storage and management of research data.

 

DuraSpace is a not-for-profit organisation that specialises in open source technologies in the fields of digital repositories and clouds, with a particular focus on the needs of HE/FE, research centres, libraries and cultural heritage. The organisation was created in 2009 from a merger of Fedora Commons and the DSpace Foundation. DuraSpace provided technical consultancy on this project.

 

Context

 

King’s College London is a large research and teaching institution in the centre of London. Research is carried out across a wide range of disciplines including science, humanities and medicine. The Information Systems and Services (ISS) organisation provides centralised IT services to the University's departments. Since King’s College London was formed from an amalgamation of previously autonomous institutions, there has been an ongoing process of integrating IT functions from the member organisations.

 

For the purposes of this project we identified a number of target groups of researchers who make use of large datasets and would be particularly suited to the use of cloud computing. They came from a mix of subject areas including Environmental Science, Financial Mathematics, Biomedical sciences and Humanities. Once the technical and legal issues are fully understood and overcome, cloud storage is likely to be applied to a much wider group of researchers requiring the storage of large datasets.

 
Researchers are notoriously reluctant to spend time curating and archiving their research data. We drew upon our experiences in related JISC-funded projects such as BRIL and FISH.Net to help us consider how the process of archiving research data could be automated and integrated into research workflows. 

 

The scope of the project was to work with sample datasets from individual research teams in the target disciplines to evaluate the functionality and usability of the system as well as the technical capability in dealing with large collections of data. We planned to test the system's capability to archive both individual large files as well as large collections of small files. Such collections would be typically in the size range of 100Gb to 5Tb. It was not within the scope of this pilot to move the system into service within a given department. On the other hand, we intended to make the system as robust as possible, using production quality software components, in order to enable a rollout in a subsequent project.

 

The business case

 

Cloud technology is increasingly being used for flexible service provision; the large number of projects focusing on cloud technology, as well as the commercial service providers, show that this is a viable business strategy. There is a clear opportunity in the academic environment to increase flexibility in data management by storing data “in the cloud.” This combines elastic commercial or academic public clouds with internal archival storage in Kindura services. In turn, a Kindura service will enable an institution to make use of its own unused storage; replication of data within Kindura itself between federated iRODS services will enable some level of dynamic service provision.


As for the cost, all commercial cloud providers publish prices, and some (e.g. Microsoft Azure) provide “calculators” to compare the cost effectiveness of service provision with those of in-house services.


Without picking any particular provider, a typical storage rate is about £0.06 per Gigabyte (GB) per month (down from about £0.10 in 2010), which for 1 Terabyte (TB) for a year is £720. On top of that, add the same amount in transfer costs (at least).


Meanwhile, the cost of provisioning in-house services (at the same availability as the cloud prices quoted above) is often underestimated—it is far more than “just buying a 1 TB disk for £75.” Depending on the infrastructure already available at the institution, professionally-run services with backups can be run by IT services, or by central data centres. The staffing, power, hardware, software, maintenance, backup and facilities costs all need to be taken into account in making a realistic comparison between in-house and cloud storage.

 

The elasticity provided by public clouds is difficult to replicate using in-house services. Provisioning of in-house or data centre-hosted storage often takes place over a period of months, requiring financial, administrative and technical processes to be completed. Cloud storage can be provisioned in a matter of hours to cope with sudden increases in demand, and decommissioned if it is no longer required.  


There is thus a good business case for being able to mix and match services, and an opportunity for cloud services to complement internal institutional resources and traditional data centres, to the benefit of all researchers in the UK.

 

Key drivers

 

Your Mission/Vision—why this issue is important to your institution

 

King’s College London has grown from the amalgamation of previously autonomous institutions. As a result, the IT infrastructure is fragmented. Cloud provision and shared services provide a unified structure. Firstly, they enable existing resources spread across a number of geographic locations to be combined into a single resource with a “cloud-like” interface. This will enable more efficient use of storage and computing resources to be made, where previously systems might have been either overloaded or underutilised. Secondly they provide a mechanism to flexibly extend storage and computing resources to meet varying demands by migrating data to external cloud infrastructure.

 

STFC’s mission statement calls for providing resources to support UK-funded research. As researchers increasingly explore new avenues for making use of cloud-based computation, and STFC already provides data archives for many areas of research, the outcome of Kindura fits with STFC’s strategic aims.

 

Departmental and/or institutional strategies (i.e. policy issues that you considered to be relevant)


King’s College London has identified the strategic goal of enabling the preservation of research outputs, including both documents and data, through the provision of repository services. Research publications and research data represent some of the key outputs of a research institution. Existing systems and processes that were primarily designed to manage the preservation of paper-based outputs are no longer fit for purpose. Many journals are requiring the retention of research data to support publications in order that the results can subsequently be verified and as a resource for other researchers.

 

Funding bodies such as Engineering and Physical Sciences Research Council (EPSRC) are increasingly mandating that research data is retained for a period of 10 years or more as a condition of funding. The costs of retaining data outputs beyond the lifetime of research projects need to be met by the host institution, resulting in a critical requirement for reliable and cost effective repository storage.

 

STFC’s e-Science centre runs the datastore as well as large scale computation resources.

 

Financial considerations


King's College London is in the second phase of a £1bn redevelopment programme which is transforming its estate. The strategy runs over a period of ten years. The pilot provided an opportunity to demonstrate a model for future infrastructure that makes best use of internal and external infrastructure in order to provide repository services to researchers.

 

Within the HE sector, there is growing pressure to increase fees charged to students in order to meet the funding shortfall from central government. As a consequence, students are becoming demanding customers of higher education services. Provision of a modern IT infrastructure to support both research and teaching is a key expectation, and is likely to influence the recruitment of high quality applicants.


Public cloud storage providers, such as Amazon, provide resources which, for small volumes of data (a few gigabytes) provide a cost effective alternative to in-house storage. For larger volumes of data, or “hot” data (data which is accessed or transferred frequently), traditional data centres become more cost effective, at a cost of losing the elasticity provided by the clouds. Enabling researchers to mix and match in-house, commercial, and data centre providers according to their needs and financial constraints will provide new flexibility in research data management. Many technical, legal and governance issues are raised by the use of third party cloud services that need to be addressed to make this a viable solution.

 

Technical considerations


Continued expansion of IT infrastructure at King’s College London is difficult due to the nature of the buildings (many are listed), space, cost and power supply constraints of the central London location. A more cost effective solution is therefore likely to be the reliance on offsite data centres and pooled resources. King’s College London already has transferred some servers to University of London Computer Centre (ULCC), a resource that is shared with other institutions in London.


Confidentiality and security of data is a major concern, particularly when outsourcing both data storage and computing to third parties. This is particularly the case for data such as personal data, medical records, and information covered as part of Non-Disclosure Agreements (NDA).


Continuity and reliability of third party services is an issue, since there would be large costs associated with a sustained IT outage.  Existing Service Level Agreements (SLAs) provided by individual cloud resource suppliers may not guarantee the quality of service required by a Higher Education Institution (HEI).


When flexible services are provided, interoperation and standards become extremely important; standards promote interoperation, and interoperation enables flexibility and prevents “lock-in” to a single provider. Although standards exist (or are emerging), e.g. OCCI for computing and resources, and CDMI for storage, these are not yet universally supported. Kindura, therefore, chose the pragmatic approach of having a single layer which knows how to talk to different providers, and to interface it to the data storage layer via a widely implemented interface.

 

Other factors

 

There is already existing demand from researchers for the provision of cloud services to support their work. Indeed, many researchers are independently making use of cloud resources. Accounting for this usage is difficult since it is typically paid for by credit card. Making use of cloud resources on a case-by-case basis by researchers does not represent the most efficient use of funding, since costs of cloud resources reduce as the volume of data and compute resources increases. Independent use of cloud resources also make it difficult to provide governance for usage of such resources and may result in unacceptable data loss or the security of personal data such as medical records being compromised.


Nonetheless, the take-up of cloud-based services by individual researchers needs to be considered. Despite the hype, clouds are not easily accessible by some researchers who are perhaps/likely to be more interested in research than in configuring and deploying IaaS. In Kindura, two development activities improve usability: first, the brokered approach to storage and repository services provide a friendly front-end, which insulates the user from having to deal directly with storage infrastructure providers; secondly, to the extent we can automate the data management behind the scenes (e.g. collect files which have not been accessed for a while for archiving), usability is improved because researchers do not need to micro-manage their individual datasets.


Flexibility and scalability of storage and computing resources are an issue for research disciplines requiring intermittent use of high powered computing resources or the storage of very large datasets. These cannot be easily planned for or supported by existing IT infrastructure. Kindura planned to deliver the archival storage based on iRODS services. Since iRODS servers can be federated and can manage their own data policies, we planned to replicate data between service providers. This provides some level of dynamic service provision in that the data is available as long as at least one provider, who holds it, is present in the infrastructure. Within Kindura, we provide automatic replication between King's College London and STFC (and the STFC provider will have backups on tape). So for a pilot service, we are already providing a good assurance of data availability.


This replication is similar to those provided by commercial cloud storage providers, with the notable exception that we know where the data is located. A commercial provider may have to make use of services elsewhere, e.g. in other countries, and cannot usually guarantee that sensitive data does not leave the country or region. Having strict controls on the data placement provides another advantage of Kindura based services over many commercial providers.

 

Kindura didn't expect integration with the NGS, to enable analysis of data held in Kindura and migrated to NGS cloud resources for analysis, to be difficult but knew it would require some thought nonetheless. We planned to demonstrate this interaction through this project.

 

Establishing and maintaining senior management buy-in

 

At King's College London, Kindura worked closely with an initiative in ISS that was working on the specification and development of private cloud infrastructure.  The Kindura project provided a testing ground for investigating how private clouds, external clouds, grid and internal resources could be used to provide integrated storage services. In particular we considered user requirements, architecture, technology solutions, cost-benefit analysis and user evaluation. The outputs of the project were regularly fed back to the ISS leadership team.

 

STFC runs a multi-petabyte (PB) datastore, with currently 20-40 PB tape capacity and 10 PB disk. This datastore manages data for space science, the Diamond synchrotron, STFC’s instruments and facilities, other research councils, as well as the CERN Large Hadron Collider. Cost effective management and analysis of data and metadata is an essential part of STFC’s services. Kindura worked closely with target groups of researchers who have a requirement for flexible storage and computing facilities. At King’s College London, the researchers included groups in biophysics and financial mathematics. At STFC, we gathered requirements from environmental researchers working with the US-based Earth System Grid. Cloud technology and virtualisation technology are increasingly being used to provide resources for researchers, including by the NGS, although the NGS currently has no cloud storage activity. By gaining a clear understanding of the needs and concerns raised by researchers at all stages through the project and gaining buy-in of the relevant departments, we can build a strong case to senior management.

 

Technologies used

 

The Kindura project made use of DuraCloud software, an open source Java application being developed by DuraSpace. DuraCloud is being used to provide a common “cloud-like” interface to storage and computing facilities.


We made use of the iRODS storage system available at STFC and King’s College London that was developed during the JISC-funded ASPiS project to provide a pilot infrastructure. We also investigated the use of Eucalyptus for the creation of a private cloud infrastructure.


We built a policy management layer on DuraCloud that enabled us to perform a brokerage across the available storage providers.


The Fedora Commons repository was integrated with DuraCloud to provide services for the archival of research data.

 

Outcomes

 

Achievements

 

Pilot system

 

  • Provided a pilot hybrid cloud-based repository solution for research data in HE institutions addressing the requirements of researchers and funders for data preservation, access and reuse.
  • Designed and implemented a hybrid cloud storage platform including commercial cloud providers, iRODS and internal storage infrastructure. 
  • Evaluated the pilot system obtaining the feedback of researchers and archivists and documented the lessons learned.

 

Specific outputs

 

  • Developed business rules for storing, replicating and migrating content between different cloud storage providers and storage tiers to optimise the use of storage, maintain data security and integrity and minimise cost.
  • Defined a metadata schema for classifying the relevant storage attributes of data to be stored in the repository including ownership, protective marking, access requirements, provenance and content type (e.g. text, video, images).
  • Documented requirements for repository storage of research data, including issues specifically relating to use of cloud storage (security, cost, privacy, Service Level Agreements), based on interviews with researchers, archivists and cloud service providers.
  • Evaluated the use of the DuraCloud open source software, which is generating interest in US institutions and libraries, as a potential solution for use in repository systems in UK HE institutions.

 

Cost and efficiency

 

  • Documented the cost issues of combining pay-per-use commercial infrastructure with internal storage and the extent to which researchers are prepared to manage their own storage budgets.
  • Produced a methodology for costing internal storage infrastructure to enable comparison with commercial cloud storage costs.

 

Benefits

 

Tangible

 

Cost and efficiency benefits

 

  • Expected tangible benefits:
    • Cost savings through using internal and external storage resources more efficiently.
    • Reduction of redundancy in internal storage by provisioning in-house storage as a private cloud.
  • Actual tangible benefits:
    • Kindura has demonstrated how cost savings and efficiencies can be achieved in the provision of repository storage for researchers. We have produced models for costing internal storage that enable direct comparison with commercial storage providers. We have defined business rules that optimise the storage and migration of data across storage providers and storage tiers. Savings can also be achieved through centralised purchasing and provisioning of storage with the resulting economies of scale.  These savings and efficiencies benefit the IT service departments as well as academic departments that are currently funding their own storage infrastructure.
    • The pilot has demonstrated how repository and storage services can be centralised and provisioned through a central brokerage service. Linking internal and external storage in this way enables more effective use of internal storage and the avoidance of underused storage islands within institutions.  

 

Management and governance

 

  • Expected tangible benefits:
    • Improved management of storage resources and generation of business intelligence information through the provision of a centralised service for allocation of storage resources, both in-house and external.
    • Improved governance for use of internal and external storage through centralised storage management services.
  • Actual tangible benefits:
    • The pilot enables improved governance of repository storage. The centralisation of storage and the mandatory entry of metadata enables improved monitoring of the use of storage resources, the reduced risk of data being compromised or personal data being released into the public domain. The centralised storage approach also enables more effective management of storage costs that were previously spread across departmental and project budgets.

 

New and improved capabilities

 

  • Expected tangible benefits:
    • Prototype of a cross-institutional hybrid cloud platform to enable research collaboration. 
    • The capability to provide highly elastic storage to researchers by seamlessly augmenting internal storage resources with external cloud storage.
    • User requirements for the provision of storage and computing resources to support research involving large data sets.  
    • Increased data security by providing data replication across multiple cloud providers where appropriate, and providing researchers access to backed up storage services.
    • Improved access to data preservation and curation tools that enable the rapid ingest of large volumes of data making use of elastic cloud computing resources.
    • Evaluation of DuraCloud as a platform for the provision of cloud storage, computing and preservation services.
    • Investigation and evaluation of emerging standards for cloud computing and storage.
  • Actual tangible benefits:
    • We have demonstrated improvements in preservation processes of research data and their uptake through the provision of flexible and elastic repository services for researchers. Use of commercial cloud means that deployment times can be much reduced compared to the use of internal storage that can take months to provision. Researchers benefit from repository services through the ability to easily share and reuse research data and the resulting efficiencies. Replication of data across multiple providers has increased the security of the intellectual assets of the partner institutions that were previously at risk from loss due to poor archival practices. The Kindura brokerage system allows researchers to enter metadata associated with their content that enables them to retain control of the types of storage used to store the data and ensure that this is compliant with legal, licensing and security requirements.
    • Kindura adds additional functionality to traditional repositories by providing shared computational services that can be used to rapidly transform and ingest large quantities of data. Use of elastic cloud compute services enables preservation actions to be performed on large volumes of data that would have previously been impractical. As the sizes of datasets are increasing continuously, this overcomes a major challenge for repository provision.
    • Kindura gave us the opportunity to upgrade and test iRODS as a production service. In particular, the work on the Castor tapestore interface should enable us to use it realistically as a data service. 
    • The metadata schemas used for classification of content and storage providers are outputs that can be used by other institutions and service providers implementing cloud-based repository systems.
    • The Kindura system provides a valuable resource for preservation of data from PhD student projects. Student projects may generate large amounts of data, and it may be difficult for departments to provision suitable storage to preserve their outputs. Students are likely to leave the institution after their project has ended, resulting in a potential loss of knowledge, which can be overcome using Kindura.

 

New skills

 

  • The project team has learned a great deal about the technical, security and legal issues surrounding the use of cloud storage and compute resources. We have worked with commercial cloud storage services as well as with the iRODS grid based storage system.
  • We have investigated the use of the DuraCloud software, and have gained a thorough understanding of the code, its potential use as a cloud storage interface and the interaction of DuraCloud with the Fedora repository.
  • We have investigated emerging standards relating to cloud such as OCCI and CDMI and their potential application to providing interoperable cloud services.
  • We have collected a great deal of requirements and feedback from researchers regarding their research data, storage and preservation requirements and interest and concerns regarding cloud storage.
  • We have found general information about cloud resources through attending cloud-related conferences and exhibitions.
  • We have collaborated closely with the DuraSpace organisation on configuration of the DuraCloud software.
  • Through internal project collaborations at King’s College London, we have learned a great deal about the legal and security issues surrounding data storage, particularly as the College already provides some existing services through the cloud.
  • The cross-institutional collaboration during the project has enabled us to share information and skills. In particular, the extensive experience and involvement of STFC in grid computing and standards has assisted greatly in understanding the potential evolution of cloud services, particularly related to research computing.
  • Our ongoing engagement with research communities within the College and at STFC has provided many new insights.

 

Intangible

 

The main anticipated benefit was that this project would contribute to improved archival practices of researchers at the partner institutions by the provision of central repository infrastructure. The project also aimed to demystify the use of commodity cloud infrastructure by providing a practical and usable solution.

 

  • The project provides experience and intelligence for other institutions embarking on the use of cloud storage infrastructure, particularly to support storage and preservation activities.
  • The project contributes to a change in culture at the partner institutions in archival practices of researchers from use of desktop PCs or portable storage devices to centrally managed, secure and resilient repository storage.
  • The project has demonstrated how commercial commodity cloud storage infrastructure can be used effectively to provide an elastic extension to existing internal storage infrastructure.

 

Drawbacks

 

The concept of providing a centralised repository and storage services has great potential benefits for improving efficiency, standardising processes and realising cost savings. Replicating and migrating data across different storage locations has the disadvantage of generating additional network traffic. Many migration operations can be performed overnight to reduce the impact on users. When moving larger datasets, it is necessary to consider how this can be scheduled to avoid causing network issues and to ensure that data is available at the required location at a specific time. As the size of datasets increase, improvements and upgrades to storage capacity should be considered in conjunction with network infrastructure upgrades. Further cost benefit analysis may be required to determine the tradeoffs between network infrastructure upgrades and storage flexibility.

 

The choice of a hybrid cloud solution is inherently more complex than a cloud-based model, where all storage is outsourced. The hybrid model results in additional costs in managing both internal and external resources and in determining which content may be moved to external cloud providers. Given the current state of the cloud infrastructure market, we believe that these costs are justified in the short to medium term. However, in the longer term, it may prove to be feasible to adopt a purely outsourced storage model. 

 

Key Lessons

 

  • Cloud storage provides many potential benefits for HE institutions including rapid provisioning, replication and transparent costing models.
  • Hybrid cloud enables flexibility and reduces risk by enabling institutions and individual users to determine whether content should be stored internally or in external storage infrastructure.
  • Use of cloud infrastructure for storing valuable assets for long periods requires careful consideration and planning. In particular it is important to provide appropriate metadata with content including both descriptive metadata as well as legal and technical criteria, and make provision for data migration between storage providers. Costing models are required to make the most effective use of internal and commercial storage resources in a rapidly evolving market.
  • Pay-per-use charging models for cloud storage may ultimately impact the way in which departments pay for storage within HE institutions, with a move from top slicing to a model based more heavily on usage.
  • We developed the concept of a "limited-elasticity cloud" where storage budgets are limited, but provide a mechanism to rapidly increase storage allocation where necessary. Building cost effective storage services on top of existing allocations, in a way which is cloudy for small customers, turns out to have advantages not only for the customer (who gets cheaper services) but also for the service provider (who gets easier customer management and better resource utilisation due to multi-tenancy).

 

Looking Ahead

 

Kindura was a pilot for hybrid cloud repository storage. We expect the project to be taken forward in a number of ways.

 

  • King’s College London is currently reviewing future storage requirements through the internal ODM project to which Kindura is also contributing. One of the outcomes of this joint work is a set of recommendations to the IT executive team. The College is already using commercial cloud storage, and some College IT services are provided via SaaS. We therefore expect increased use of cloud storage going forward. The recent research council policies for long term storage of research data are likely to provide a major impetus for College investments.
  • STFC is continuing work to move iRODS and the Castor tape store into a production environment for data services. The work on costings will increase understanding of the costs of moving data and enable improved storage strategies.
  • The Centre for e-Research at King’s College London is working on the development and deployment of repositories to support specific research communities in disciplines such as freshwater biology. This work has been supported through the JISC funded FISHNet and FISH.link projects. Repositories that support researchers in the same field across multiple institutions are ideal candidates to use cloud hosting and we anticipate that this will be an increasing trend going forward.
  • HEFCE and JISC recently launched the University Modernisation Fund (UMF) programme which is supporting institutions in negotiations and brokerage between commercial cloud providers as well as providing cloud-based services through JANET. The Kindura project team have already been in discussions with the UMF programme staff regarding the transfer of knowledge and technology.

 

In its current form, Kindura provides a pilot repository platform for investigating and evaluating technologies. Further steps are required to move from a pilot to a production environment. These include:

 

  • Integration of Kindura with a production repository front-end. This could be based on existing open source offerings such as Islandora or Hydra, or an in-house solution. The modular approach taken in Kindura means that institutions can plug their existing repository infrastructure into the cloud storage with a small amount of integration work.
  • In order to move to a production service offering, SLAs would need to be obtained with cloud providers.
  • Kindura uses a pre-production release of DuraCloud. The first production release will be available in Q4 of 2011. We expect that this will only require minor upgrades to the existing beta releases.
  • In order to fully test the system loading and move from pilots with small research teams to an institutional solution, we would anticipate carrying out large pilot testing with research departments as part of a staged roll-out programme.

 

Kindura was conceived as a pilot for increasing the capaciity and flexibility of in-house storage using a hybrid cloud approach. Deploying Kindura as a shared service presents a number of challenges. There are outstanding technical issues with running DuraCloud and Fedora repository in the cloud, which DuraSpace are currently addressing. Further, there would need to be a common approach to classification of content and storage brokerage across institutions, as well as a pooling of SLA agreements to make this feasible.

 

The Centre for e-Research at King’s College London plans to carry out further projects on cloud computing infrastructure for use in research as well as digital repositories. This includes both work on future pilots as well as moving Kindura technologies into production environments. The knowledge gained in Kindura therefore makes a contribution to our ongoing programmes. The use cases and the community of researchers who are actively involved with computationally and data intensive research that have been built up during the Kindura project are likely to be of ongoing value.

 

The Kindura project has generated a great deal of interest amongst STFC researchers and collaborators, mainly in the technology. There are opportunities for collaboration with King's College London on further development as well as with other data centres who are running iRODS.

 

Sustainability

 

The existing infrastructure will be available, as open source, which other institutions can download and evaluate. Kindura is built from a set of open source components that need to be downloaded and installed. Each institution needs to integrate DuraCloud to its own storage providers.  Plug-ins for the large public cloud providers (Amazon, Azure, Rackspace) are available out of the box, as is a plug-in for iRODS. iRODS is available as an open source download. Integration of other storage providers may require additional development. Each institution needs to configure its own business rules for storage brokerage, depending on the specific internal policies. Sample rules developed in the project are provided as a guide.

 

Summary and reflection

 

Kindura has demonstrated that a hybrid cloud solution is a viable solution for HE institutions wishing to increase the flexibility of their repository storage provision through integration of existing storage with commercial cloud and grid-based technologies such as iRODS.

 

Appendix

 

Project Website

 

http://kindura.cerch.kcl.ac.uk/