Deduplicating archiver with compression and authenticated encryption.
-
Updated
Jul 23, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Deduplicating archiver with compression and authenticated encryption.
Simple, configuration-driven backup software for servers and workstations
A powerful and modular toolkit for record linkage and duplicate detection in Python
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Fast block-level out-of-band BTRFS deduplication tool.
CLI utility to find near duplicate images and remove all but the best copy.
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
Record Linkage ToolKit (Find and link entities)
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
图片查重、图片去重、Find/Delete duplicated images
Dedupe/batch geocode addresses and venues around the world with libpostal
Python package for deduplication/entity resolution using active learning
The Dropbox for IPFS (without the icky stuff)
Created by Halbert L. Dunn
Released 1946