MooseFS

Software Screenshot:
MooseFS
Software Details:
Version: 1.6.27
Upload Date: 20 Feb 15
Developer: Gemius SA
Distribution Type: Freeware
Downloads: 11

Rating: 1.0/5 (Total Votes: 1)

MooseFS is a fault tolerant, network distributed file system that spreads data over several physical servers which are visible to the user as one resource. For standard file operations MooseFS acts as other Unix-alike file systems:

 * A hierarchical structure (directory tree)
 * Stores POSIX file attributes (permissions, last access and modification times)
 * Supports special files (block and character devices, pipes and sockets)
 * Symbolic links (file names pointing to target files, not necessarily on MooseFS) and hard links (different names of files which refer to the same data on MooseFS)
 * Access to the file system can be limited based on IP address and/or password

Distinctive features of MooseFS are:

 * High reliability (several copies of the data can be stored across separate computers)
 * Capacity is dynamically expandable by attaching new computers/disks
 * Deleted files are retained for a configurable period of time (a file system level "trash bin")
 * Coherent snapshots of files, even while the file is being written/accessed

MooseFS consists of four components:

 * Managing server (master server) – a single machine managing the whole filesystem, storing metadata for every file (information on size, attributes and file location(s), including all information about non-regular files, i.e. directories, sockets, pipes and devices).
 * Data servers (chunk servers) - any number of commodity servers storing files data and synchronizing it among themselves (if a certain file is supposed to exist in more than one copy).
 * Metadata backup server(s) (metalogger server) - any number of servers, all of which store metadata changelogs and periodically downloading main metadata file; so as to promote these servers to the the role of the Managing server when primary master stops working.
 * Client computers that access (mount) the files in MooseFS - any number of machines using mfsmount process to communicate with the managing server (to receive and modify file metadata) and with chunkservers (to exchange actual file data).

mfsmount is based on the FUSE mechanism (Filesystem in USErspace), so MooseFS is available on every Operating System with a working FUSE implementation (Linux, FreeBSD, MacOS X, etc.)

Metadata is stored in the memory of the managing server and simultaneously saved to disk (as a periodically updated binary file and immediately updated incremental logs). The main binary file as well as the logs are synchronized to the metaloggers (if present).

File data is divided into fragments (chunks) with a maximum of 64MiB each. Each chunk is itself a file on selected disks on data servers (chunkservers).
High reliability is achieved by configuring as many different data servers as appropriate to realize the "goal" value (number of copies to keep) set for the given file.

HOW THE SYSTEM WORKS

All file operations on a client computer that has mounted MooseFS are exactly the same as they would be with other file systems. The operating system kernel transfers all file operations to the FUSE module, which communicates with the mfsmount process. The mfsmount process communicates through the network subsequently with the managing server and data servers (chunk servers). This entire process is fully transparent to the user.


mfsmount communicates with the managing server every time an operation on file metadata is required:

 * creating files
 * deleting files
 * reading directories
 * reading and changing attributes
 * changing file sizes
 * at the start of reading or writing data
 * on any access to special files on MFSMETA

mfsmount uses a direct connection to the data server (chunk server) that stores the relevant chunk of a file. When writing a file, after finishing the write process the managing server receives information from mfsmount to update a file's length and the last modification time.

Furthermore, data servers (chunk servers) communicate with each other to replicate data in order to achieve the appropriate number of copies of a file on different machines.
 
FAULT TOLERANCE
 
Administrative commands allow the system administrator to specify the "goal", or number of copies that should be maintained, on a per-directory or per-file level. Setting the goal to more than one and having more than one data server will provide fault tolerance. When the file data is stored in many copies (on more than one data server), the system is resistant to failures or temporary network outages of a single data server.

This of course does not refer to files with the "goal" set to 1, in which case the file will only exist on a single data server irrespective of how many data servers are deployed in the system.

Exceptionally important files may have their goal set to a number higher than two, which will allow these files to be resistant to a breakdown of more than one server at once.

In general the setting for the number of copies available should be one more than the anticipated number of inaccessible or out-of-order servers.

In the case where a single data server experiences a failure or disconnection from the network, the files stored within it that had at least two copies, will remain accessible from another data server. The data that is now 'under its goal' will be replicated on another accessible data server to again provide the required number of copies.

It should be noted that if the number of available servers is lower than the "goal" set for a given file, the required number of copies cannot be preserved. Similarly if there are the same number of servers as the currently set goal and if a data server has reached 100% of its capacity, it will be unable to begin to hold a copy of a file that is now below its goal threshold due to another data server going offline. In these cases a new data server should be connected to the system as soon as possible in order to maintain the desired number of copies of the file.

A new data server can be connected to the system at any time. The new capacity will immediately become available for use to store new files or to hold replicated copies of files from other data servers.

Administrative utilities exist to query the status of the files within the file system to determine if any of the files are currently below their goal (set number of copies). This utility can also be used to alter the goal setting as required.

The data fragments stored in the chunks are versioned, so re-connecting a data server with older copy of data (such as if it had been offline for a period of time), will not cause the files to become incoherent. The data server will synchronize itself to hold the current versions of the chunks, where the obsolete chunks will be removed and the free space will be reallocated to hold the new chunks.

Failures of a client machine (that runs the mfsmount process) will have no influence on the coherence of the file system or on the other client's operations. In the worst case scenario the data that has not yet been sent from the failed client computer may be lost.
 
PLATFORMS

 MooseFS is available on every Operating System with a working FUSE implementation:

 * Linux (Linux 2.6.14 and up have FUSE support included in the official kernel)
 * FreeBSD
 * OpenSolaris
 * MacOS X

The master server, metalogger server and chunkservers can also be run on Solaris or Windows with Cygwin. Unfortunately without FUSE it won't be possible to mount the filesystem within these operating systems.

What is new in this release:

  • The most important changes include fixed signal handling in multithreaded modules, goal and trashtime limits in mfsexport.cfg, and a simple check for downloaded metadata files.

What is new in version 1.6.19:

  • Substantial changes were introduced to the metalogger machine and metarestore tool for better integrity of the metadata.
  • A scanning progress bar in CS was added.
  • The master name is now resolved when a connection fails.
  • A new session is created when the previous one is lost.
  • Lots of other bug fixes and improvements were made.

What is new in version 1.6.17:

  • In this release we introduced an automatic data cache management.
  • It is enough to upgrade just the master server (no changes in chunk servers' or clients' code were made).
  • The kernel cache mechanism always has existed but up to now the cache was always cleared upon opening a file. Now MooseFS controls whether to clear it or not by checking if the file was or not modified by another client. Let's look at some scenarios.
  • First scenario:
  • 1. Computer A reads the file X
  • 2. Computer B reads the file X
  • 3. Computer A wants to read the file X - we leave the cache (the file was not changed)
  • Second scenario:
  • 1. Computer A reads the file X
  • 2. Computer A writes to the file X
  • 3. Computer A wants tor read the file X - we leave the cache (the file was changed but computer A knows about these changes)
  • Third scenario:
  • 1. Computer A reads the file X
  • 2. Computer B writes to the file X
  • 3. Computer A wants to read the file X - here we have to force to empty the cache (because changes were made by Computer B and Computer A doesn't know about them)
  • In real environments first and second scenarios happen by far more often than third scenario and that's why it is reasonable to leave the contents of the cache and gain overall system performance.
  • Of course there are some mean scenarios (but they also existed before) like this one:
  • 1. Computer A opens file X and reads it completely (the file stays in cache)
  • 2. Computer B modifies the file X
  • 3. Computer A again reads the file X (but without closing or reopening it - just moving at position 0 and rereading it)
  • In this situation Computer A would get the same data as in step 1 but the same also happened in MooseFS before.
  • An attribute nodatacache was also introduced which forbids to cache a file. From version 1.6.17 up files with nodatacache attribute behave like files did in older versions of MooseFS. This flag can be used with tools mfsseteattr, mfsdeleattr and mfsgeteattr.
  • This flag was added preventively and probably you won't need to use it. If after some time it appears that this is really useless it will be deleted. But if you find any case / scenario which demands disabling the automatic cache mechanism please share it with us.

Similar Software

Linux NTFS
Linux NTFS

3 Jun 15

SVFS
SVFS

20 Feb 15

Magma
Magma

3 Jun 15

genromfs
genromfs

3 Jun 15

Comments to MooseFS

Comments not found
Add Comment
Turn on images!