Download Probabilistic Data Structures
If you are interested in probabilistic data structures you might want to read my recently published book probabilistic data structures and algorithms for big data applications isbn.
Download probabilistic data structures. Despite the complex sounding name probabilistic data structures are relatively simple data structures that can be very useful for solving streaming analytics problems at scale. But before we dive into cm sketch it is important to understand why youd use any probabilistic data structure. Data structure probabilistic data structure hashing big data created date. Probabilistic data structures is a common name of data structures based on different hashing techniques.
Cm sketch has been a redis module for several years and was recently rewritten as part of the redisbloom module v20. 9783748190486 available at amazon where i have explained many of such space efficient data structures and fast algorithms that are extremely useful in modern big data applications. Probabilistic data structures make heavy use of concepts from information theory such as hashing to give you an approximate view of some characteristics of your data. The book probabilistic data structures and algorithms in big data applications is now available at amazon and from local bookstores.
Probabilistic data structures for java with redis store. These data structures allow checking for membership count or distinct count information efficiently speed and space at the cost of pinpoint accuracy. A software architects guide keywords. They are often based on hashing and have many other useful features.
Probabilistic data structures have many applications in modern web and data applications where the data arrives in a streaming fashion and needs to be processed on the fly using limited memory. Probabilistic data structures and algorithms for big data applications author. Redis backing helps in implementing the data structures over paralleldistributed environments. The probabilistic data structures and algorithms pdsa are a family of advanced approaches that are optimized to use fixed or sublinear memory and constant execution time.
Probabilistic data structures and algorithms for big data applications by andrii gakhov. This is where probabilistic data structures come to the rescue. In the triangle of speed space and accuracy probabilistic data structures sacrifice some accuracy to gain spacepotentially a lot of space.