PDA

View Full Version : OT: Computer File Synchronization/Backup



Paul Alciatore
12-31-2013, 09:02 PM
I am looking for a way to set up a computer file synchronization/backup system that would have all of my files on two or three different computers in my home network and they would all be updated whenever any changes were made to any file by any machine. Or at least the updates would be automatically made when the other machines were turned on. The different computers will be used for different purposes in different places, but I only envision one set of files for all of them. Oh, they are operating with Win 7 and Win XP.

Has anybody here done anything like that and, if so, what software or system did you use?

+ or - Zero
12-31-2013, 09:10 PM
http://rsync.samba.org/

Has a Windows binary if you need it;
http://www.itefix.no/i2/cwrsync

Does everything you could ever want --and it's free.

On edit; the *Nix version is free, I'm not used to doing anything 'Doze so I just copied the link. But it's likely still a bargain about 35 bucks I think.

Zero.

caveBob
12-31-2013, 11:38 PM
AJC Sync might be worth checking out. (ajcsoft.com/file-sync.htm) I still have older versions of AJC Active Backup and AJC Grep that work just fine on Win7.

dp
01-01-2014, 02:12 AM
The rsync tool has a number of modes that allow either backups or archiving, and synchronizing. It has absolutely no intelligence and will create complete crap and destroy your backups of you don't take time to understand it. The greatest problem is it has no way to know if a file is open although it does sometimes know a file has changed during the time it has run. Take your email inbox, for example. Many of us have incoming filters that will pick up a message and based on some trigger, subject line, sender, what ever, and put it into another folder. Now suppose you have two systems both doing this and the email is running during an rsync session. The target system will suffer file corruption in this scenario.

Rsync also has a very valuable feature - if you are synching the same files again and again it will only copy over those parts of the source that are changed from the target. At the end of the process the source and target are identical but perhaps only a few hundreds of bytes need be transferred to accomplish that. This is dangerous if you cannot guarantee neither the source nor the target change during this process.

This is mirroring and is really not backup or archiving of data. The distinction is a backup is a full or synthetic copy of the original data. An archive is a full or synthetic copy of the original data from a point of time in the past.

If you need mirroring and backup/archiving then you can use rsync and a server-based tool such as Amanda to create full copies and archives to online, near-line, and offline repositories.

Snapshotting is another option but this is getting long and in truth the entire topic is full of choices and cost concerns.

If you don't care who looks at your junk you can use cloud storage services to do all this, too, and they love to do it for you.

In the middle of all this mirroring, backups, and archiving there is de-duplication which is a very synthetic process and is intended to create backups, archives to a point in time, but to not have more than a single copy of any data. Data in this case can be as small as 8KB of binary data. De-dupe systems don't store data as files, but as unique blocks. The files are synthetic in that a database keeps track of which blocks are needed and what the order of the blocks is to restore a file. You can save 20X space with de-dupe when combined with LZW data compression. The data blocks are stored on extreme high reliability, fault-tolerant hardware which is also pretty damn expensive.

rsync runs fine in Windows - the hard part is shutting down sensitive apps so files don't change during the mirroring window. This open files problem is not limited to rsync, btw. Expensive backup/archive systems know how to put applications to sleep or to write to shadow files during the backup to prevent corruption.

+ or - Zero
01-01-2014, 03:38 AM
Yes it may well be that rsync is a bit to difficult for the average 'doze user --and some *Nix users too. I've been using it for a long time without any major kerfuffles, but I use it mostly totally in house (I'm currently at around 3/4 of a PB live storage with doubled backups so lots of spindles). But then it's easy to lock files and use crond for stuff like I'm doing --once set it's forget unless error messages start showing up. As rsync uses file time stamps extensively it does pay to not be messing with active files, which in my application is an easy thing to control --I back up email, but never when the agent is running, for example. This does create a small window for loss but as the fail over agent is operational (but not being synced), the chance of actual data loss is very small for me, in that operation.

Having read your excellent post a couple of times I would agree that rsync just isn't for everyone... you might say I grew up with it so it seems simple to me, but that's also all *Nix systems so it all plays nice, and I don't deal with any other users --makes it even easier. I have a couple of off site servers, but they are simple hosts that are not seeing any heavy use and I can (and do) just keep a 100% copy of them so I can do bare metal restore rather easy if it was ever required.

I had thought maybe the paid graphical interface version might work out well for a Windows application, but I know nothing about it or how well it works --I last used anything Windows back in late 1995 and really just no longer have much of a clue what goes on on that side of things.

BTW, did you know they no longer require the hacked zlib that comes with rsync, you can now use an unmodified zlib at compile time --this really helps various things and will soon be the default package build.

So what would you recommend for Paul?

Zero.

macona
01-01-2014, 03:43 AM
Just store all the files on a NAS with RAID.

+ or - Zero
01-01-2014, 05:11 AM
Just store all the files on a NAS with RAID.

Easy enough OK, but it really isn't back up as such. The problem with Network Attached Storage is that it is attached to the network, so it really isn't a back up system --it's part of the system. Raid is OK, but again it's not back up, it's several possible things but in this case I'll assume you're thinking of mirrored drives in case one dies or gets corrupted, rather then making one big fast disk out of 2 or more. This use of raid is simply redundancy --but by it's nature it's still part of the system and therefore not actually a back up. A virus will infect both RAID drives for example, as will data corruption --one is just a mirror, so corrupt data on one and you've corrupted both.

Now NAS and RAID may match some requirements, but back ups must be separate entities, one of which is a true copy of the other, but which no possible damage to the active system can in any way damage or corrupt the back up unit.

Now actual serious back up may not be the required goal here --many people are quite pleased with NAS (with or without RAID), but there are a goodly number of things that can pretty much wipe out a NAS system. And if that NAS isn't backed up... well you know, it should have been.

Zero.

wendtmk
01-01-2014, 11:28 AM
NAS with RAID is great to the point of being able to hot-swap disks in case of a disk failure. However, a RAID controller failure pretty much wipes out all those carefully tended backups.

Mark

J Tiers
01-01-2014, 11:32 AM
The real problem with "sync" is a sort of "how and when" problem.....

"Synch" will take OFF files that exist on one if they do not exist on the "source" drive...it will *erase* them. So if you are *adding files on two machines*, and you use "synch", you may automatically delete new, wanted, useful files in some cases.

It works best when you work on one machine mostly, and only edit, but do not create new on the other. Then it will add the files, and it will update both to the latest version (if settings are correct).

if you create new on both, you WILL end up with the problem of "who is the master?"..... which one sets what is deleted from the other one? You have to then remember to change modes and run in a "contribute" mode to get all the new files spread around correctly so they are not deleted.

My system is to use a pair of stick memories.... one is at home and one off-site. I use "synctoy", set to "contribute" mode, so it adds *but never deletes* files. I put files on the at home stick from machine "A", and from it to machine "B". I also put onto it from "B" and from it onto "A". This ensures a common set of data on both, NEVER deletes anything unexpectedly.
From time to time I reverse teh stick memories, so the at home one is updated, then taken off-site, and the off-site is brought home for use.

This will work with almost any number of machines, used in nearly any manner.

I also image all the machines every so often on a removable multi-terabyte drive.

JoeBean
01-01-2014, 11:52 AM
Macona's is right. The op is clearly not looking for a backup system even thought the word is used. Obviously any 2-or-more-way synchronization would mean a single bad alteration on one machine would corrupt everything everywhere unless the synchronization software also included version control.

The proper solution, the one used in business successfully all over the world every day, is to store all your data on a centralized server, be it a NAS or SAN or whatnot, then give access to those who need access to it. That data, if it's at all important, is regularly backed up to an off-site location. If ithe data is complicated and/or regularly modified a version control system should be used. If multiple parties may be working on it together you may need collaboration software.

RAID is not a backup but it can significantly help prevent data loss if properly implemented and monitored as it effectively increases MTBF vs using single drives, thereby decreasing the likelihood of catastrophic failure between backups. Off-site backup is still critical though.

If loss of your RAID controller resulted in major data loss you may want to look into better hardware. That should never happen.

And while rsync works well for implementing the backups of the data from the storage system to the off-site backup, it's silly to have recommended it for this situation if for no other reason than the fact that you know it's 99% likely [ON EDIT: on re-read the OP says Windows in the post!] the OP has a Windows PC (sorry, that's not condescending enough, I meant 'Doze) and there are about a thousand more user-friendly and well supported options available than rsync. And you haven't even used rsync on Windows! It would be like me coming in here and saying "what's wrong with Time Machine? Did I mention I only use Macs, not Pee-Cheese?" Besides that, if you're using rsync to keep your documents synchronized between your computers like the OP asked about You're Doing It Wrong.

dp
01-01-2014, 12:21 PM
The real problem with "sync" is a sort of "how and when" problem.....

"Synch" will take OFF files that exist on one if they do not exist on the "source" drive...it will *erase* them. So if you are *adding files on two machines*, and you use "synch", you may automatically delete new, wanted, useful files in some cases.


There is directionality in the process. Sync to master means all the clients become clones of a central system to what is called "managed state". There is a tool for Windows, for example, that is used to build client systems, or image them, to a managed state. Computer training sites use this method to start each day with the lab systems in a known state. It is a destructive process in that it is a scorched earth method that comes with a policy that the clients never contain useful changeable data.

Another managed state has the policy that certain data that exists on a master trumps similar data on clients, so if the client has a file different than the master, the client file is replaced. If the client has a file the master doesn't recognize that file is ignored, or if guided by policy, backed up to the master system with a time tag and other meta data. This is somewhat similar to what cloud storage does. The difference is the state and policy system is central and makes the decisions for the clients.

The least useful method for the home user is true bi-directional mirroring which technically is impossible. The goal here is to create multiple identical systems by merging all the ad hoc file systems. The only way this can work is if all the peers are turned off and the disk files are merged by an external process. This is how kiosk systems work - all the storage is on a central server and any local storage is considered temporary. One example of this is a bootable CD or flash drive, but network booted systems also fall into this category. The boot process can include downloading to local storage/memory the kernel, OS devices, and create temporary files locally while storing new data remotely in a NAS, SAN, or entirely by 4-gen process management (data base server where all new data is the result of transactional processing).

Web server farms use uni-directional mirroring and frequently go to the next step of using virtual machines where everything is stored remotely except swap/temp files that have no global context. It is a critical and essential need for an ecommerce site to have consistency across the entire customer facing server farm, because transactions frequently happen over extended periods of time and can involve random members of the farm to complete that transaction. Out of sync errors might include tax tables or discounts on featured items and are not allowed.

Your synctoy method is a near mirror. Acceptable discrepancies exist and it is assumed there is no guarantee of data consistency between the storage devices and the concept of master is undefined. It is not considered either a backup or an archive, but only non-authoritative views of the current state of the peers. Rsync can do this so you don't have to go back and forth to the systems to keep things aligned.

Another system is called convergence and this is one of the more fascinating methods as it is entirely policy driven. Convergence assumes you have a policy in place for all the client systems, and defined actions to deal with out of policy issues. It is iterative because policies can and do create conflicts that must be resolved. It begins with a definition of a managed state and rules to maintain that state, and actions to follow to converge an out of compliance system to an acceptable state. A simple example is a policy that says business systems will have no local data storage. All office files will be placed on network drives where they can be backed up/archived by the network backup system. John Doe, new employee, is given his laptop and he immediately parks the contents of his memory stick on C:\Personal. The convergence system sees this and removes the offending data and C:\Personal. JD figgers he's done something wrong so does it again with the same result so he calls the help desk who tells him to re-read the employee manual. As a Unix nazi with 30+ years of system management behind me, I love these tools. See http://www.slideshare.net/mindbat/cfengine-vs-puppet-chef for examples of CFengine, Puppet, and Chef.

dp
01-01-2014, 12:49 PM
And while rsync works well for implementing the backups of the data from the storage system to the off-site backup, it's silly to have recommended it for this situation if for no other reason than the fact that you know it's 99% likely [ON EDIT: on re-read the OP says Windows in the post!] the OP has a Windows PC (sorry, that's not condescending enough, I meant 'Doze) and there are about a thousand more user-friendly and well supported options available than rsync. And you haven't even used rsync on Windows! It would be like me coming in here and saying "what's wrong with Time Machine? Did I mention I only use Macs, not Pee-Cheese?" Besides that, if you're using rsync to keep your documents synchronized between your computers like the OP asked about You're Doing It Wrong.

rsync works the same in Windows as Unix, and there is nothing wrong with using rsync in this way. Using rsync between Windows and Unix systems presents some interesting problems because Windows does not treat upper/lower case the same as Unix.

dp
01-01-2014, 12:55 PM
Paul - you might want to look at a document management system. An example though not a recommendation is http://www.tortoisecvs.org/

These are versioning and revision control tools that allow you to maintain a central copy of your files, and requires you check them in and out to prevent multiple versions from getting into the wild.

+ or - Zero
01-01-2014, 12:59 PM
Macona's is right. The op is clearly not looking for a backup system even thought the word is used. Obviously any 2-or-more-way synchronization would mean a single bad alteration on one machine would corrupt everything everywhere unless the synchronization software also included version control.

The proper solution, the one used in business successfully all over the world every day, is to store all your data on a centralized server, be it a NAS or SAN or whatnot, then give access to those who need access to it. That data, if it's at all important, is regularly backed up to an off-site location. If ithe data is complicated and/or regularly modified a version control system should be used. If multiple parties may be working on it together you may need collaboration software.

RAID is not a backup but it can significantly help prevent data loss if properly implemented and monitored as it effectively increases MTBF vs using single drives, thereby decreasing the likelihood of catastrophic failure between backups. Off-site backup is still critical though.

If loss of your RAID controller resulted in major data loss you may want to look into better hardware. That should never happen.

And while rsync works well for implementing the backups of the data from the storage system to the off-site backup, it's silly to have recommended it for this situation if for no other reason than the fact that you know it's 99% likely [ON EDIT: on re-read the OP says Windows in the post!] the OP has a Windows PC (sorry, that's not condescending enough, I meant 'Doze) and there are about a thousand more user-friendly and well supported options available than rsync. And you haven't even used rsync on Windows! It would be like me coming in here and saying "what's wrong with Time Machine? Did I mention I only use Macs, not Pee-Cheese?" Besides that, if you're using rsync to keep your documents synchronized between your computers like the OP asked about You're Doing It Wrong.

Thanks for your well thought out and masterful opine. I'll take it for exactly what it's worth to me.

Zero

wendtmk
01-01-2014, 01:15 PM
If loss of your RAID controller resulted in major data loss you may want to look into better hardware. That should never happen.



Sure, it "should never happen", but like anything else man-made, it does. I've seen it happen on high-end as well as low-end RAID controllers. That's why we (where I work) don't completely depend on RAID storage to ensure complete reliability for our backup systems.

The RAID controller is still a single point of failure.

Mark

millwrong
01-01-2014, 01:27 PM
Ok...So if the average computer Joe,(me), wants to backup the hard-drive contents,that is using a Windows OS,what is an appropriate method? I seem to remember some software entitled "Ghost" which copied the precious family photos,etc, to another drive. Is this still a viable option? I'm reading this thread with great interest,and I'm finding it difficult because of my computer illiteracy,and the need for those who are literate to excessively use acronyms. I'm diving back in....

dp
01-01-2014, 01:42 PM
Ghost is no more. http://us.norton.com/ghost

Edit: Here's a link to cloud backup services comparisons. This is probably one of the better contemporary methods for archiving and backing up data. But you can be sure they are using the power of Google to crawl your data looking for ways to improve your online experience.

http://www.top-10-online-backups.com/best-cloud-storage.php

J Tiers
01-01-2014, 01:58 PM
"RAID" is not a system

it is a NUMBER OF DIFFERENT systems. it depends entirely on how many drives you have.

We have a "nominal RAID" at work.... I call it "RAID 0.5", but it really is a "RAID 1" system.... There are several numbered levels of RAID system. Three drives, but one is the server OS only. The other two hold the data.

With two active drives, the only option is to mirror. With three, any two combine to hold all the data, so one can fail without loss. *BUT*... once one fails, you must shut down to preserve data.... a second failure, which is not unlikely since the drives are probably the same age, will lose about half your data irretrievably.

If you have 4, then you can keep operating with any single failure, IF you have 2 drives worth of data, since 2 failures still keeps data OK. Any more and you can lose data with 2 failures. 5 can tolerate 3 failures, again if there is only 2 drives of data.

other levels of RAID differ.

So a RAID system may be only partly protective, and depends on having relatively massive overkill on storage capacity relative to actual data volume (what does not?). And of course, it is indeed part of the system, and so it gets all the viruses embedded in the data as it comes in.

We have also had problems with files disappearing off our system.... they used to be there, but they are not any more. That has not been solved yet. It may be an error by some user, since often it is a whole directory that is gone.

dp
01-01-2014, 02:09 PM
There is also ZFS which allows multiple drive failures through the use of pools. It requires all the drives be locally attached vs NAS file systems. If money is no object then you can have mirrors of RAID 5 arrays. Pools of disks are also available as rackable hardware NAS storage and relieves the OS of managing the storage. with pools of disks every disk has a piece of the data and also a piece of the CRC value. This allows some pretty cool options and failure recovery modes.

ZFS was invented by Sun Microsystems which is now Oracle. Apple stopped supporting ZFS when that happened, but it was fun while it lasted. I have several Sun servers and ZFS storage on aging SCSI storage. http://www.youtube.com/watch?v=QGIwg6ye1gE

+ or - Zero
01-01-2014, 04:28 PM
There is also ZFS which allows multiple drive failures through the use of pools. It requires all the drives be locally attached vs NAS file systems. If money is no object then you can have mirrors of RAID 5 arrays. Pools of disks are also available as rackable hardware NAS storage and relieves the OS of managing the storage. with pools of disks every disk has a piece of the data and also a piece of the CRC value. This allows some pretty cool options and failure recovery modes.

ZFS was invented by Sun Microsystems which is now Oracle. Apple stopped supporting ZFS when that happened, but it was fun while it lasted. I have several Sun servers and ZFS storage on aging SCSI storage. http://www.youtube.com/watch?v=QGIwg6ye1gE

There is a native Linux kernel port of the ZFS filesystem, works well on Debian and Arch, from personal experience. But it is ZFS so 64 bit and 2.6.26 - 3.11 kernels --and I wouldn't trust my only data repository with it during the build and learning curve --but it is bloody impressive once up and running.

Zero.

macona
01-01-2014, 04:33 PM
I did not specify a RAID level since there are many options that would work.

RAID can be used as a backup, use mirroring and pull one of the drives occasionally and store it remotely, stick in a new drive and let the system fill it back up.

You could do tape, but the OP really isn't looking for true backup and hard drives are incredibly cheap.

J Tiers
01-01-2014, 05:11 PM
I did not specify a RAID level since there are many options that would work.

RAID can be used as a backup, use mirroring and pull one of the drives occasionally and store it remotely, stick in a new drive and let the system fill it back up.



That might not work with other than a 2 drive raid1. You cannot assume that two drives from different times will hold the whole set of files. In fact you can be pretty certain that they do NOT.

macona
01-01-2014, 05:17 PM
That might not work with other than a 2 drive raid1. You cannot assume that two drives from different times will hold the whole set of files. In fact you can be pretty certain that they do NOT.

As long as the new drive is equal or bigger it will work fine. I have done it.

dp
01-01-2014, 08:53 PM
As long as the new drive is equal or bigger it will work fine. I have done it.

It just has to be larger than the file system. In test/dev environments we frequently make the file system smaller than the allocated storage size because it is easier to grow the FS than the physical storage (this is less of an issue with disk pools than discrete disk arrays), and lets us commit whole disks to projects. It is a bit of a head game to play as it silently forces economy of usage by the developers who tend to use all available space immediately ;)

J Tiers
01-01-2014, 09:56 PM
As actually STATED, yes, if you do "mirroring" (as I said: "That might not work with other than a 2 drive raid1"), then the drive is always equal to the other drive.

But you can't generically do it with multi-drive systems because the data is spread out over more than one drive.

OTOH, there is nothing wrong with a regular "backup" on ANY RAID system.... just have it back up the data in it's entirety onto an external terabyte drive.

Paul Alciatore
01-01-2014, 11:31 PM
I am picking my way through the responses and I am on #10 now. I can see that a little more input is in order.

This is my home system which I use for a number of purposes including household business, design work, network browsing, etc. I have finally got a new laptop for my wife's use so she no longer uses my machines. I am going to need a back-up scheme for her also as she writes poetry. I currently have three Windows machines that I use, a new one I have built, a laptop that is several years old and a Gateway tower that is about 10 years old. I have had to go through a panic style reconstruction several times in the past when my primary computer suffered a system failure, but never a disk failure. My purpose is to be able to continue working when such a failure occurs so a second, working computer seems to be a must as I have had more computer failures than disk failures. But, of course, I would want to survive either.

I am the only one who would be working on the files in my end of the system, at least for now. So the only way a file would be open on two machines would be if I was stupid enough to do so. And, of course, I am that stupid so it could happen.

I am not making highly technical distinctions with words like "back-up", "mirror", or "synchronization". Basically I want the files stored in two or more computers and I want them to be available in all of those machines which will be located in different locations (office, shop, electronic shop, and perhaps more in the future).

Another detail, since I originally started this thread last night, I turned the Gateway computer off and back on again. It now refuses to start Windows (XP). My current files are presently on the Gateway so I am, once again, in a panic situation to get them out and somewhere where I can use them. AAAaaggggghhh!!!!! I can boot to a DOD command line and may be able to use that to copy them to an external drive. I have no idea how long that would take.

macona
01-01-2014, 11:42 PM
I still say a NAS is what you want, lacie makes some nice ones.

J Tiers
01-01-2014, 11:52 PM
And I suggest the simple 'contribute" system to keep the same data on two sticks and two computers..... it works quite well.

You do get a problem if you separately work on two versions of the same file on two different machines..... but that is not the fault of the backup, it is a fault of any system without a single copy on a server which every machine works from.

Paul Alciatore
01-02-2014, 01:00 AM
For the benefit of those who are not familiar with the term RAID (Redundant Array of Independent Disks), Wikipedia has a description of the various kinds of RAID here:

http://en.wikipedia.org/wiki/RAID

Some of the types of RAIDs do not have any redundancy so the term is somewhat improperly applied in those cases. But such is life.

dp
01-02-2014, 01:13 AM
1) If you get a NAS you'll also need some way to back it up and something to back it up to.
2) A NAS is an underpowered PC with a lot of storage that does only one thing.
3) A powerful PC/Server with a lot of storage can function as a powerful NAS and still be a hell of a game system or provide other useful functionality.
4) Any NAS is a place to put things but does nothing manage or organize what is there.
5) A NAS is less intrusive if it has its own physical network.
6) If you get a NAS then disable Windows file system indexing or all the connected systems will constantly churn the NAS device.
7) All Windows systems already have built-in file sharing that will likely solve the problem Paul has.

Paul Alciatore
01-02-2014, 01:16 AM
I have used RAID systems before. If I understand you , I think you are saying that a RAID 1 system with two or more drives would provide identical copies on each of the drives and they would be automatically synchronized. This seems like it would do what I want.

Would it be in one of the computers or in a separate piece of equipment? Perhaps an external drive stack that is connected to one of the computers or to the network itself.




I did not specify a RAID level since there are many options that would work.

RAID can be used as a backup, use mirroring and pull one of the drives occasionally and store it remotely, stick in a new drive and let the system fill it back up.

You could do tape, but the OP really isn't looking for true backup and hard drives are incredibly cheap.

dp
01-02-2014, 01:41 AM
A stack of drives connected to the network is a NAS. Disks in a RAID configuration are attached to one system, regardless of the RAID type. Depending on the RAID type each disk may have a complete copy or only a partial copy.

RAID 0 is a strip, or barber pole array, and each disk has only part of the data. If any disk fails you lose all the data. The storage capacity is the sum of the devices. Write performance is improved because parallel writes are possible.

RAID 1 is a simple mirror and each disk has a full copy. Only one disk in a two-way mirror can fail without loss of data. The capacity is that of the smallest device in the mirror. Write performance is less than that of RAID 0 because both disks need to be written to before a commit is acknowledged. Read performance is better than RAID 0 because it can be read from either disk, hence latency is reduced.

RAID 1+0, or RAID 10 is a combination of stripes and mirrors such that you can lose two disks, one from each mirror. Not that Murphy would allow that. The storage capacity is the sum of the mirrors. This is the best combination of safety and performance. It is also costly as storage capacity is 50% that of a RAID 0 array for the same number of disks.

RAID 5 is the best compromise of cost, capacity, and performance. It is a stripe of all the devices but has a 1/N penalty for storing parity data. Yield is 1/N-1. It takes 5 terabytes of storage devices to get 4 T of storage. You can lose one drive without losing data. Lose another drive and your data is gone. Performance takes a hit when a device is off line.

Meta disks are arrays of arrays. It is a way of increasing capacity when there are not free disks available to configure in other way. A stripe of three 5-disk RAID 5 arrays is incredibly fault tolerant and reasonably fast You need to lose an entire RAID 5 disk group to lose data from a meta disk. The hardware that supports this kind of RAID generally has at least one hot spare for every 14 disks so total failure is unlikely.

There are other RAID types that are seldom used. EMC uses RAID 6 in the de-duplication storage systems it sells. These have a capacity of about 20T of raw data per tray, and many times that after de-duping.

macona
01-02-2014, 02:47 AM
I have used RAID systems before. If I understand you , I think you are saying that a RAID 1 system with two or more drives would provide identical copies on each of the drives and they would be automatically synchronized. This seems like it would do what I want.

Would it be in one of the computers or in a separate piece of equipment? Perhaps an external drive stack that is connected to one of the computers or to the network itself.

You can do it a couple different ways. On my mac mini I have an external firewire/usb/esata case that holds 2 drives and they are set RAID 1. If one drive fails it will keep going, insert a new drive and it will copy everything over. You can then set up file sharing on that PC and share that volume over the network.

or get something like this:

http://www.lacie.com/us/products/product.htm?id=10584

Holds two drives and you can set them for mirroring, get another tray for a 3rd drive to swap out. It has hot swap so you can pull the drives while it is still running. For $179 without drives I say it is pretty cheap.

Next step up is one of these:

http://www.lacie.com/us/products/product.htm?id=10604

Pretty cool unit, kind of expensive. I have one of the boards out of the insides of it, custom dual core atom mini-itx motherboard with built in raid controller.

J Tiers
01-02-2014, 08:36 AM
Another fact is that a drive can be either a bootable master, or not..... or may be able to be told not to be one.

So if the systems use compatible drive types, or can be made to connect disks by an external device (for instance, I have a gizmo to attach several types of bare disk drives via USB, making them look like a stick memory), you can use one system to read data off the drive out of a different one, IF the problem is basically with the machine having the drive you need to read (bad PSU, etc), or the OS is corrupted but data is OK. If the file structure is messed up, it becomes a more difficult recovery process.

I'd advise keeping it simple.... sneaker-netting stick drives is a good simple way to keep two or 3 machines with the same data. You can get 64 and 128 g sticks, which should do fine for the data most have.

This does assume you have "data discipline". Windows does NOT have this by default.... some programs default to storing their data in their own default directory, which may be fairly deeply buried in the "program files" area of XP, for instance.

I find it MUCH better to set up my own "all my data" directory, and force all programs to put their data, and to the largest degree possible their setup files (program customization) as well, in that directory (call it whatever you like). Then you can backup just one overarching directory, and avoid the risk of forgetting one somewhere. If you restore that, you are back in business.

A periodic disk image to a terabyte drive is good also...of course.

John Stevenson
01-02-2014, 08:44 AM
Thanks for this thread, although we run Raid 4 drives on the server, not all files live on there.
Should do but you get lax.

This thread gave me chance yesterday to organise the design and drawing files better and also sign up to a cloud account.

After looking around a bit I chose Just Cloud. Later on i will organise some more files and back them up on line.