Rsync incremental without hard-link of unchanged files

About writing shell scripts and making the most of your shell
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
NChewie

Rsync incremental without hard-link of unchanged files

Post by NChewie »

Hi,

I currently backup using Makedirdiff through a script that I hacked together to compare the content of my data directories with the previous full backup directory and produce a new directory structure of the incrementals only, with a top level directory name that dateandtimestamps the backup and updates the full backup directory. As it runs, the full-backup directory is updated to reflect the newest changed files (nothing is ever deleted) and each incremental directory structure only shows the changed files, therefore is small enough to encrypt and backup onto cloud storage or burn to cds as needed, with a monthly copy of the latest full-backup file directory.

I am interested in replacing this with an rsync script. There is a very good tutorial showing an example at:
http://webgnuru.com/linux/rsync_incremental.php which compares a source directory (my working directory) with a target directory (initially empty and built up each day as an incremental) and a link-destination directory (the previous days incremental backup) such that anything newer in the source than in the link-dir is written to the new target.
However, my issue is that this design does not isolate the changed files into a separate directory tree. The description in that tutorial states that:
Additionally, unchanged files are hard linked from the link destination directory to the target directory.
This implies that each incremental directory structure contains the changed files and a hard-link to every unchanged file. This appears to leave each of the incremental directories as the equivalent of my previous full backup - i.e. with the full file lists from the master directory, even if the hard links are really only pointers to the same physical file space as the original files. When it comes to copying this structure to permanent storage (e.g. cd), is there a way to only select the genuinely changed files, i.e. those without hard links?
Or as a better alternative, is there a flag/option to run rsync in incremental mode such that it still compares to the link-dir content but only writes the changed files to the target-dir. This could be an extra step, such that one pass writes an incremental only directory, and a second pass (with slightly different options) produces the changes plus hard-links directory to be used as the link-dir on the following day.

Are there any rsync gurus who know how to do this?

[Mods - sorry... I think this should be in Scripts & Bash :oops: ]
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 2 times in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
mintybits

Re: Rsync incremental without hard-link of unchanged files

Post by mintybits »

Could you use diff to identify the changed files and filepaths and feed that list into rsync?
NChewie

Re: Rsync incremental without hard-link of unchanged files

Post by NChewie »

@mintybits

I think I have misinterpreted the proper use of rsync...
it really is a remote synchronisation tool, rather than a tool for creating incrementals. (The clue is in the name :oops: )
rsync appears to be more for enterprise level with large storage capacity, where the desire is to have a fully synchronised file structures between the source machine and a remote backup server, with highly efficient mechanisms for updating the files on the backup without having to transmit the full file across the link.

The MakeDirDiff script that I currently use (manually) is designed to do exactly what I need - create a (local or remote) file structure made up of only the differences between the last full backup and the current working directory. This allows a small backup structure if there are a small number of files changed between each run. My time stamped incremental directory structures contain only the files that have changed since the script last ran. As long as I have a secure copy of the original master directory, I can track each change to each file as I have a chronological copy of each changed file.

So instead of trying to re-invent the wheel, I think I'll just add a cron job to run my current script at a regular time.
BlackVeils

Rsync incremental without hard-link of unchanged files

Post by BlackVeils »

fair enough if you want to not switch to rsync, but you might like to know that rsync can be used for lots of things. it does do incremental backups (by default I think), I use it for my backups to a USB flash drive.

--- Sent from my Moto G using Tapatalk ---
NChewie

Re: Rsync incremental without hard-link of unchanged files

Post by NChewie »

@BlackVeils

I agree that there probably is a way to do what I want with rsync, but I just don't have the knowledge/experience with it to produce the desired result.
My testing kept creating incremental backup directories which contained
(a) copies of any file changed since the last run [as desired]
(b) hard-linked copies of all the unchanged files [not wanted]

(b) was unwanted, because I wished to backup that incremental directory to external storage by encrypting it and writing it to CD and to cloud storage, but my copies kept 'following the hardlinks' and backing up both changed files and the unchanged files.
My testing took about 15 minutes for my rsync to run, compared to about 30 seconds for the equivalent makedirdiff :?
This would be because I was scanning/relinking about 8 Gig of files.

I suppose I could run rsync and then try a copy command which skips hardlink files and only takes 'regular files', something like:

Code: Select all

find dir1 -maxdepth 1 -type f | xargs -I {} cp {} dir2
(lifted without testing from http://unix.stackexchange.com/questions ... other?rq=1)

to end up with only the changed files in dir 2...

Cheers,
C.
BlackVeils

Re: Rsync incremental without hard-link of unchanged files

Post by BlackVeils »

there are various options for rsync, I'm sure there is one for not including hardlinks. and you don't need to actually test the file copying, you can run it in test mode, a dry-run.

manual:

Code: Select all

man rsync
primolarry
Level 1
Level 1
Posts: 7
Joined: Wed Mar 25, 2015 7:58 am

Re: Rsync incremental without hard-link of unchanged files

Post by primolarry »

Hi there,

In case you are still interested, I coded a script some months ago to do something similar:

https://github.com/alvaroreig/varios/bl ... _remote.sh

It performs a complete backup every N backups, the rest are incremental. I use it with anacron in all my computers/servers.

Regards,
NChewie

Re: Rsync incremental without hard-link of unchanged files

Post by NChewie »

Thanks primolarry

Sorry it took so long to look at it... work got in the way.

Your example shows rsync used properly, to remotely sync. My earlier fumbled attempts were trying to use it on a single machine to make a local list of incrementals. I think makedirdiff is more suitable for what I want.

Your code is interesting, I may learn a few tricks from it :)
One thing... you stop Apache(2) whether or not it is a full backup, but only log it as stopped for a full backup.

Cheers,
NC
Locked

Return to “Scripts & Bash”