Legally compliant blockchain archiving

All Together Now

Compliance in the Cloud

Revision security (compliance) is a complicated matter, and a technology and provider segment has developed in the past around the accordingly certified IT infrastructures that can only be mastered with considerable investments in hardware, intelligence, and operation. In this segment, conventional technologies dominate, and a few monopolists with highly certified solutions determine the conditions. Particularly painful, in the case of necessary scaling, costs for systems, licenses, and project services do not rise gently because of conventional system architecture. However, in this age of permanently exploding data volume and throughput requirements, the need for scaling is not a special case, but the challenging normal state.

Under this pressure, a changing of the guard has long been underway in the enterprise content management (ECM) and enterprise information management (EIM) platforms: The classic ECM systems have already lost the race; innovative cloud and big data systems are beginning to assert themselves and are raising the organization and the possibilities of archive data to a new level, thanks to NoSQL architecture. Scaling in the system core is also much cheaper and more flexible. With the change on the platform side, it becomes even clearer who is the real cost driver in the case of scaling: the storage infrastructure. Switching to cloud storage systems would significantly reduce storage costs because of the great variety of suppliers in this segment. Costs are extremely low thanks to a genuine buyer's market. Scaling is free and costs linear – with high availability and always-on infrastructure included.

Why do the write once, read many (WORM) memories nevertheless continue to exert themselves in the market? Because, on the one hand, the cloud had no viable compliance concepts up to now and, on the other hand, a storage provider in the cloud cannot simply be changed quickly. The dependence on a cloud provider is even more blatant with large amounts of data than with the use of older WORM media. Unloading data from the cloud once it has been stored and then entrusting it to another provider is simply not part of a cloud provider's business model, although this is certainly not true when data in the tax environment is mentioned. Technically, the customer is on their own and depends on the standard API of the provider. In this context, large-volume migrations can quickly become a nightmare in terms of run times and quality assurance measures.

Blockchain Archiving

Deepshore [1], a German company that develops solutions in the unchartered territory of compliance in distributed and virtual infrastructures, solves the challenges described with blockchain technology and has thus equipped cloud storage with full compliance. The way there is through the establishment of a "Private Permissioned Blockchain," in which hashing and timestamping functions take over the safeguarding of the data. Meanwhile, the data and documents are protected from deletion by storage on a distributed filesystem. Compliance functions and storage locations remain completely separate. No raw data is managed in the blockchain itself – only technical information about the respective document to prove its integrity.

During development, care was taken not to use components whose use is marketed under commercial licenses (i.e., only open source products). The core of the architecture is the implementation of a microservice infrastructure that meets all the requirements outlined above without becoming too complex (Figure 1). Functions such as managing different storage classes and geolocations on the basis of the cloud infrastructures used are absolutely new. Accordingly, a policy-based implementation of individual sets of rules becomes possible (e.g., domestic storage, limitation on the number of copies in certain georegions, copies on different storage systems).

Figure 1: The blockchain-based archive depicts the data service infrastructure as a microservice.

Thanks to complete abstraction of the physical storage layer, migrations in distributed cloud storage are easier than ever before; it is sufficient to log off new nodes and automatically replicate the database. By using the native distribution mechanisms, the system can replicate its data across multiple cloud providers without getting into a legal proof problem of a classic archive migration, which is a novelty. The otherwise usual documentation procedure of a migration is omitted because, technically, the data does not leave the system at any time, even if the underlying infrastructure changes. This setup eliminates the dependence on individual cloud providers, which gives the system a previously nonexistent independence. The infrastructure and each of its components does not care whether they work under Microsoft Azure, Amazon Web Services (AWS), Google Cloud, or on-premises. This independence is achieved through the use of container technology and is a completely new dimension of technical flexibility.

Last but not least, data does not have to go through multiple complicated and complex enterprise application integration (EAI) processes, because applications and clients can speak directly to the new infrastructure. Of course, this applies equally to writing and reading processes and means, for example, that an existing DWH system can potentially make use of any required information from the database.

Technical Details

The system focuses on five main services: Access Service, Analytics Service, Indexing Service, Verification Service, and Storage Service.

A client never talks technically with the distributed components themselves, but always with the Access Service. Therefore, an external system or client does not have to worry about the asynchronous processing of the infrastructure behind the Access Service. This service is arbitrarily scalable and executable on different cloud systems, coordinating requests in the distributed system complex and merging the information from the various services. A major advantage of this approach is that applications can submit their data directly to the data service. The use of complex EAI/enterprise service bus (ESB) scenarios becomes superfluous. The Access Service works the same way in the other direction.

Enterprise applications of any kind, then, can use the data pool of the new service flexibly and at will. These potential infrastructure cost savings alone are likely to be significant in large enterprises and large-volume processing scenarios. This service was developed as a technical hybrid of REST and GraphQL in JavaScript. The content parsing of data can easily be realized as a service within the processing routine.

The Analytics Service is not an essential installation on the system in general. Technically, it is Apache Spark for the more complex computing of Indexing Service data. In this case, however, any other (already existing) analytical system could also be used that extracts information from the system by way of the Access Service. In this respect, the Analytics Service is to be regarded as an optional component, but with the potential to take over tasks from the classic DWH, as well.

The Indexing Service is presented by a NoSQL-like database that uses CockroachDB, although MongoDB and Cassandra were also part of the evaluation. Each database has its own advantages and disadvantages. CockroachDB is a distributed key-value store with the ability to send SQL queries. By using a Raft consensus algorithm [2], the system offers a high degree of consistency in the processing of its transactions. A special and newly developed procedure can also be used to ensure the status or consistency of the data within the Verification Service (more on this later).

The data Verification Service is technically a blockchain based on MultiChain. The tamper-proof Verification Service can document the integrity and the original condition of raw data. Interventions in the Storage Service are also stored as audit trails within the blockchain.

Finally, the Storage Service allows raw data to be stored with any data persistence. As part of cooperative research with the Zuse Institute Berlin, the distributed file system (XtreemFS) was further developed and equipped with different storage adapters and a blockchain link.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Data security and data governance
    Protecting data becomes increasingly important as the quantity and value of information grows. We describe the basics of data security and governance and how they intertwine.
  • SQL Server 2022 and Azure
    SQL Server 2022 focuses on even closer collaboration between on-premises SQL servers and SQL functions in Azure, including availability and data analysis. We highlight the innovations of the database server and the interaction with versatile and powerful Azure services.
  • Energy efficiency in the data center
    Storage systems are one of the biggest factors in power consumption, so data storage can make a massive difference in operating costs. We look at how you can achieve savings through technologies such as flash, tiered storage, or even cloud-native container environments.
  • MarkLogic and SGI Announce DataRaptor
  • What's new in SQL Server 2016
    The focus in SQL Server 2016 is on mobility, cloud usage, and speed, with improvements to in-memory processing and security.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=