Announcement

Collapse
No announcement yet.

OT: Computer File Synchronization/Backup

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • OT: Computer File Synchronization/Backup

    I am looking for a way to set up a computer file synchronization/backup system that would have all of my files on two or three different computers in my home network and they would all be updated whenever any changes were made to any file by any machine. Or at least the updates would be automatically made when the other machines were turned on. The different computers will be used for different purposes in different places, but I only envision one set of files for all of them. Oh, they are operating with Win 7 and Win XP.

    Has anybody here done anything like that and, if so, what software or system did you use?
    Last edited by Paul Alciatore; 12-31-2013, 09:07 PM.
    Paul A.
    SE Texas

    Make it fit.
    You can't win and there IS a penalty for trying!

  • #2
    http://rsync.samba.org/

    Has a Windows binary if you need it;
    http://www.itefix.no/i2/cwrsync

    Does everything you could ever want --and it's free.

    On edit; the *Nix version is free, I'm not used to doing anything 'Doze so I just copied the link. But it's likely still a bargain about 35 bucks I think.

    Zero.
    Last edited by + or - Zero; 12-31-2013, 09:14 PM.

    Comment


    • #3
      AJC Sync might be worth checking out. (ajcsoft.com/file-sync.htm) I still have older versions of AJC Active Backup and AJC Grep that work just fine on Win7.

      Comment


      • #4
        The rsync tool has a number of modes that allow either backups or archiving, and synchronizing. It has absolutely no intelligence and will create complete crap and destroy your backups of you don't take time to understand it. The greatest problem is it has no way to know if a file is open although it does sometimes know a file has changed during the time it has run. Take your email inbox, for example. Many of us have incoming filters that will pick up a message and based on some trigger, subject line, sender, what ever, and put it into another folder. Now suppose you have two systems both doing this and the email is running during an rsync session. The target system will suffer file corruption in this scenario.

        Rsync also has a very valuable feature - if you are synching the same files again and again it will only copy over those parts of the source that are changed from the target. At the end of the process the source and target are identical but perhaps only a few hundreds of bytes need be transferred to accomplish that. This is dangerous if you cannot guarantee neither the source nor the target change during this process.

        This is mirroring and is really not backup or archiving of data. The distinction is a backup is a full or synthetic copy of the original data. An archive is a full or synthetic copy of the original data from a point of time in the past.

        If you need mirroring and backup/archiving then you can use rsync and a server-based tool such as Amanda to create full copies and archives to online, near-line, and offline repositories.

        Snapshotting is another option but this is getting long and in truth the entire topic is full of choices and cost concerns.

        If you don't care who looks at your junk you can use cloud storage services to do all this, too, and they love to do it for you.

        In the middle of all this mirroring, backups, and archiving there is de-duplication which is a very synthetic process and is intended to create backups, archives to a point in time, but to not have more than a single copy of any data. Data in this case can be as small as 8KB of binary data. De-dupe systems don't store data as files, but as unique blocks. The files are synthetic in that a database keeps track of which blocks are needed and what the order of the blocks is to restore a file. You can save 20X space with de-dupe when combined with LZW data compression. The data blocks are stored on extreme high reliability, fault-tolerant hardware which is also pretty damn expensive.

        rsync runs fine in Windows - the hard part is shutting down sensitive apps so files don't change during the mirroring window. This open files problem is not limited to rsync, btw. Expensive backup/archive systems know how to put applications to sleep or to write to shadow files during the backup to prevent corruption.
        Last edited by dp; 01-01-2014, 02:16 AM.

        Comment


        • #5
          Yes it may well be that rsync is a bit to difficult for the average 'doze user --and some *Nix users too. I've been using it for a long time without any major kerfuffles, but I use it mostly totally in house (I'm currently at around 3/4 of a PB live storage with doubled backups so lots of spindles). But then it's easy to lock files and use crond for stuff like I'm doing --once set it's forget unless error messages start showing up. As rsync uses file time stamps extensively it does pay to not be messing with active files, which in my application is an easy thing to control --I back up email, but never when the agent is running, for example. This does create a small window for loss but as the fail over agent is operational (but not being synced), the chance of actual data loss is very small for me, in that operation.

          Having read your excellent post a couple of times I would agree that rsync just isn't for everyone... you might say I grew up with it so it seems simple to me, but that's also all *Nix systems so it all plays nice, and I don't deal with any other users --makes it even easier. I have a couple of off site servers, but they are simple hosts that are not seeing any heavy use and I can (and do) just keep a 100% copy of them so I can do bare metal restore rather easy if it was ever required.

          I had thought maybe the paid graphical interface version might work out well for a Windows application, but I know nothing about it or how well it works --I last used anything Windows back in late 1995 and really just no longer have much of a clue what goes on on that side of things.

          BTW, did you know they no longer require the hacked zlib that comes with rsync, you can now use an unmodified zlib at compile time --this really helps various things and will soon be the default package build.

          So what would you recommend for Paul?

          Zero.
          Last edited by + or - Zero; 01-01-2014, 03:39 AM. Reason: The typo devil made me do it

          Comment


          • #6
            Just store all the files on a NAS with RAID.

            Comment


            • #7
              Originally posted by macona View Post
              Just store all the files on a NAS with RAID.
              Easy enough OK, but it really isn't back up as such. The problem with Network Attached Storage is that it is attached to the network, so it really isn't a back up system --it's part of the system. Raid is OK, but again it's not back up, it's several possible things but in this case I'll assume you're thinking of mirrored drives in case one dies or gets corrupted, rather then making one big fast disk out of 2 or more. This use of raid is simply redundancy --but by it's nature it's still part of the system and therefore not actually a back up. A virus will infect both RAID drives for example, as will data corruption --one is just a mirror, so corrupt data on one and you've corrupted both.

              Now NAS and RAID may match some requirements, but back ups must be separate entities, one of which is a true copy of the other, but which no possible damage to the active system can in any way damage or corrupt the back up unit.

              Now actual serious back up may not be the required goal here --many people are quite pleased with NAS (with or without RAID), but there are a goodly number of things that can pretty much wipe out a NAS system. And if that NAS isn't backed up... well you know, it should have been.

              Zero.

              Comment


              • #8
                NAS with RAID is great to the point of being able to hot-swap disks in case of a disk failure. However, a RAID controller failure pretty much wipes out all those carefully tended backups.

                Mark

                Comment


                • #9
                  The real problem with "sync" is a sort of "how and when" problem.....

                  "Synch" will take OFF files that exist on one if they do not exist on the "source" drive...it will *erase* them. So if you are *adding files on two machines*, and you use "synch", you may automatically delete new, wanted, useful files in some cases.

                  It works best when you work on one machine mostly, and only edit, but do not create new on the other. Then it will add the files, and it will update both to the latest version (if settings are correct).

                  if you create new on both, you WILL end up with the problem of "who is the master?"..... which one sets what is deleted from the other one? You have to then remember to change modes and run in a "contribute" mode to get all the new files spread around correctly so they are not deleted.

                  My system is to use a pair of stick memories.... one is at home and one off-site. I use "synctoy", set to "contribute" mode, so it adds *but never deletes* files. I put files on the at home stick from machine "A", and from it to machine "B". I also put onto it from "B" and from it onto "A". This ensures a common set of data on both, NEVER deletes anything unexpectedly.
                  From time to time I reverse teh stick memories, so the at home one is updated, then taken off-site, and the off-site is brought home for use.

                  This will work with almost any number of machines, used in nearly any manner.

                  I also image all the machines every so often on a removable multi-terabyte drive.
                  Last edited by J Tiers; 01-01-2014, 11:35 AM.
                  1601

                  Keep eye on ball.
                  Hashim Khan

                  Comment


                  • #10
                    Macona's is right. The op is clearly not looking for a backup system even thought the word is used. Obviously any 2-or-more-way synchronization would mean a single bad alteration on one machine would corrupt everything everywhere unless the synchronization software also included version control.

                    The proper solution, the one used in business successfully all over the world every day, is to store all your data on a centralized server, be it a NAS or SAN or whatnot, then give access to those who need access to it. That data, if it's at all important, is regularly backed up to an off-site location. If ithe data is complicated and/or regularly modified a version control system should be used. If multiple parties may be working on it together you may need collaboration software.

                    RAID is not a backup but it can significantly help prevent data loss if properly implemented and monitored as it effectively increases MTBF vs using single drives, thereby decreasing the likelihood of catastrophic failure between backups. Off-site backup is still critical though.

                    If loss of your RAID controller resulted in major data loss you may want to look into better hardware. That should never happen.

                    And while rsync works well for implementing the backups of the data from the storage system to the off-site backup, it's silly to have recommended it for this situation if for no other reason than the fact that you know it's 99% likely [ON EDIT: on re-read the OP says Windows in the post!] the OP has a Windows PC (sorry, that's not condescending enough, I meant 'Doze) and there are about a thousand more user-friendly and well supported options available than rsync. And you haven't even used rsync on Windows! It would be like me coming in here and saying "what's wrong with Time Machine? Did I mention I only use Macs, not Pee-Cheese?" Besides that, if you're using rsync to keep your documents synchronized between your computers like the OP asked about You're Doing It Wrong.
                    Last edited by JoeBean; 01-01-2014, 11:55 AM.

                    Comment


                    • #11
                      Originally posted by J Tiers View Post
                      The real problem with "sync" is a sort of "how and when" problem.....

                      "Synch" will take OFF files that exist on one if they do not exist on the "source" drive...it will *erase* them. So if you are *adding files on two machines*, and you use "synch", you may automatically delete new, wanted, useful files in some cases.

                      There is directionality in the process. Sync to master means all the clients become clones of a central system to what is called "managed state". There is a tool for Windows, for example, that is used to build client systems, or image them, to a managed state. Computer training sites use this method to start each day with the lab systems in a known state. It is a destructive process in that it is a scorched earth method that comes with a policy that the clients never contain useful changeable data.

                      Another managed state has the policy that certain data that exists on a master trumps similar data on clients, so if the client has a file different than the master, the client file is replaced. If the client has a file the master doesn't recognize that file is ignored, or if guided by policy, backed up to the master system with a time tag and other meta data. This is somewhat similar to what cloud storage does. The difference is the state and policy system is central and makes the decisions for the clients.

                      The least useful method for the home user is true bi-directional mirroring which technically is impossible. The goal here is to create multiple identical systems by merging all the ad hoc file systems. The only way this can work is if all the peers are turned off and the disk files are merged by an external process. This is how kiosk systems work - all the storage is on a central server and any local storage is considered temporary. One example of this is a bootable CD or flash drive, but network booted systems also fall into this category. The boot process can include downloading to local storage/memory the kernel, OS devices, and create temporary files locally while storing new data remotely in a NAS, SAN, or entirely by 4-gen process management (data base server where all new data is the result of transactional processing).

                      Web server farms use uni-directional mirroring and frequently go to the next step of using virtual machines where everything is stored remotely except swap/temp files that have no global context. It is a critical and essential need for an ecommerce site to have consistency across the entire customer facing server farm, because transactions frequently happen over extended periods of time and can involve random members of the farm to complete that transaction. Out of sync errors might include tax tables or discounts on featured items and are not allowed.

                      Your synctoy method is a near mirror. Acceptable discrepancies exist and it is assumed there is no guarantee of data consistency between the storage devices and the concept of master is undefined. It is not considered either a backup or an archive, but only non-authoritative views of the current state of the peers. Rsync can do this so you don't have to go back and forth to the systems to keep things aligned.

                      Another system is called convergence and this is one of the more fascinating methods as it is entirely policy driven. Convergence assumes you have a policy in place for all the client systems, and defined actions to deal with out of policy issues. It is iterative because policies can and do create conflicts that must be resolved. It begins with a definition of a managed state and rules to maintain that state, and actions to follow to converge an out of compliance system to an acceptable state. A simple example is a policy that says business systems will have no local data storage. All office files will be placed on network drives where they can be backed up/archived by the network backup system. John Doe, new employee, is given his laptop and he immediately parks the contents of his memory stick on C:\Personal. The convergence system sees this and removes the offending data and C:\Personal. JD figgers he's done something wrong so does it again with the same result so he calls the help desk who tells him to re-read the employee manual. As a Unix nazi with 30+ years of system management behind me, I love these tools. See http://www.slideshare.net/mindbat/cf...vs-puppet-chef for examples of CFengine, Puppet, and Chef.
                      Last edited by dp; 01-01-2014, 12:25 PM.

                      Comment


                      • #12
                        Originally posted by JoeBean View Post

                        And while rsync works well for implementing the backups of the data from the storage system to the off-site backup, it's silly to have recommended it for this situation if for no other reason than the fact that you know it's 99% likely [ON EDIT: on re-read the OP says Windows in the post!] the OP has a Windows PC (sorry, that's not condescending enough, I meant 'Doze) and there are about a thousand more user-friendly and well supported options available than rsync. And you haven't even used rsync on Windows! It would be like me coming in here and saying "what's wrong with Time Machine? Did I mention I only use Macs, not Pee-Cheese?" Besides that, if you're using rsync to keep your documents synchronized between your computers like the OP asked about You're Doing It Wrong.
                        rsync works the same in Windows as Unix, and there is nothing wrong with using rsync in this way. Using rsync between Windows and Unix systems presents some interesting problems because Windows does not treat upper/lower case the same as Unix.

                        Comment


                        • #13
                          Paul - you might want to look at a document management system. An example though not a recommendation is http://www.tortoisecvs.org/

                          These are versioning and revision control tools that allow you to maintain a central copy of your files, and requires you check them in and out to prevent multiple versions from getting into the wild.

                          Comment


                          • #14
                            Originally posted by JoeBean View Post
                            Macona's is right. The op is clearly not looking for a backup system even thought the word is used. Obviously any 2-or-more-way synchronization would mean a single bad alteration on one machine would corrupt everything everywhere unless the synchronization software also included version control.

                            The proper solution, the one used in business successfully all over the world every day, is to store all your data on a centralized server, be it a NAS or SAN or whatnot, then give access to those who need access to it. That data, if it's at all important, is regularly backed up to an off-site location. If ithe data is complicated and/or regularly modified a version control system should be used. If multiple parties may be working on it together you may need collaboration software.

                            RAID is not a backup but it can significantly help prevent data loss if properly implemented and monitored as it effectively increases MTBF vs using single drives, thereby decreasing the likelihood of catastrophic failure between backups. Off-site backup is still critical though.

                            If loss of your RAID controller resulted in major data loss you may want to look into better hardware. That should never happen.

                            And while rsync works well for implementing the backups of the data from the storage system to the off-site backup, it's silly to have recommended it for this situation if for no other reason than the fact that you know it's 99% likely [ON EDIT: on re-read the OP says Windows in the post!] the OP has a Windows PC (sorry, that's not condescending enough, I meant 'Doze) and there are about a thousand more user-friendly and well supported options available than rsync. And you haven't even used rsync on Windows! It would be like me coming in here and saying "what's wrong with Time Machine? Did I mention I only use Macs, not Pee-Cheese?" Besides that, if you're using rsync to keep your documents synchronized between your computers like the OP asked about You're Doing It Wrong.
                            Thanks for your well thought out and masterful opine. I'll take it for exactly what it's worth to me.

                            Zero

                            Comment


                            • #15
                              Originally posted by JoeBean View Post

                              If loss of your RAID controller resulted in major data loss you may want to look into better hardware. That should never happen.
                              Sure, it "should never happen", but like anything else man-made, it does. I've seen it happen on high-end as well as low-end RAID controllers. That's why we (where I work) don't completely depend on RAID storage to ensure complete reliability for our backup systems.

                              The RAID controller is still a single point of failure.

                              Mark

                              Comment

                              Working...
                              X