Cloudera has been granted a patent for scalable architectures, systems, and services that allow for the creation of manifest-based snapshots in distributed computing environments. The technology enables the creation of snapshots of data objects stored in a cloud-computing platform without disrupting I/O operations. The patent also discloses a log roll approach to creating snapshots, which reduces the probability of causal consistency in the snapshot. The method involves accessing a snapshot manifest representing a data object and cloning a table based on the snapshot by creating a copy of table metadata and copying relevant partitions of the data object into a new directory on each associated data node. GlobalData’s report on Cloudera gives a 360-degree view of the company including its patenting strategy. Buy the report here.
According to GlobalData’s company profile on Cloudera, Virtual data center was a key innovation area identified from patents. Cloudera's grant share as of September 2023 was 72%. Grant share is based on the ratio of number of grants to total number of patents.
Creating manifest-based snapshots in distributed computing environments
A recently granted patent (Publication Number: US11768739B2) describes a computer-implemented method for operating a distributed computing platform. The platform consists of a master node and multiple slave nodes, each with a region server associated with a data node. The method involves accessing a snapshot manifest that represents a snapshot of a data object stored in the platform. Each data node stores a partition of the data object. The method then clones a table based on the snapshot by creating a copy of the table metadata and copying relevant partitions of the data object on each associated data node into a new directory on the node. The copy of a relevant partition includes a link to the partition but not the actual data, and the link remains operational even when the relevant partitions are moved.
The patent also includes additional claims and methods. These include combining responses from region servers to create the snapshot manifest, allowing the platform to accept input/output operations from clients during the creation of the snapshot manifest, creating an archived copy of a response received from a region server before combining it with other responses, updating reference information in the snapshot to point to the location of the archived copy, rolling back or restoring the table based on the snapshot, detecting and resolving causal inconsistencies in the platform, backing up data in the platform using a MAPREDUCE job based on the snapshot manifest, and configuring the platform not to modify a partition of the data object stored on a data node except for merging or splitting the partition.
The patent also describes a computer system that includes multiple slave nodes with region servers and a master node. The master node is responsible for accessing the snapshot manifest and cloning the table based on the snapshot.
Furthermore, the patent includes a non-transitory computer-readable storage medium that stores instructions for a computer system to access the snapshot manifest and clone the table based on the snapshot.
Overall, this patent presents a method and system for operating a distributed computing platform, specifically focusing on cloning tables based on a snapshot of a data object. The patent covers various aspects of the method, including combining responses, resolving inconsistencies, and backing up data. The system described includes a master node and multiple slave nodes with region servers.
To know more about GlobalData’s detailed insights on Cloudera, buy the report here.
Data Insights
From
The gold standard of business intelligence.
Blending expert knowledge with cutting-edge technology, GlobalData’s unrivalled proprietary data will enable you to decode what’s happening in your market. You can make better informed decisions and gain a future-proof advantage over your competitors.