Moving to the cloud is in essence like moving to a different house. Depending on how much inventory you have and how fast you need to move it all in, you decide on which company works best for you.
Let’s dive into what kind of “companies” you can leverage when you move to cloud.
Move to the cloud
Of course there are several ways of moving that data to the cloud, like Robocopy or Rsync depending whether SMB or NFS are used, to name a few. When we talk about a Lift & Shift particularly around file storage migration, I would like to focus on two free solutions NetApp offers combined with Azure NetApp Files to it’s customers.
Namely XCP and Cloud Sync, and what the main differences are.
XCP and Cloud Sync are in my opinion the two most popular ones. In this blog I want to compare the two and look at what use cases work best for what tool.
Let’s start with..
NetApp XCP is a client software that enables fast and reliable any-to-NetApp and NetApp-to-NetApp data migrations. In addition, XCP file analytics provides visibility into the file system.
With XCP you can fully utilize available CPU, network, and storage resources to scan, scope, copy and verify large file trees at maximum speed. With logging, reporting, subdirectory granularity plus three levels of verification (stats, structure and full data), XCP offers unique capabilities to accelerate and improve file tree processing and data migration.
In the NFS world, traditional tools like rsync/find/dd/du use a simple back-and-forth style of serial IO operations. When there are millions of files to process, waiting for all the round-trips to finish can take weeks.
On the other hand, all XCP features benefit from a built-in NFS client engine that generates parallel streams of asynchronous IO requests.
XCP NFS engine significantly mitigates the effects of latency by keeping servers and networks busy all the time, making millions or even billions of files a lot easier to manage. In the CIFS version of XCP, we have implemented the same parallelization algorithms to provide file discovery and transfer at maximum speed.
XCP sync finds all the changes that happened on the source and then performs the necessary operations to update the target and make it exactly match the source.
By default, XCP verify does a full comparison of target files and directories including NTFS ALCs, attributes and every byte of data, and has options for fast verification, selective verification, and incremental data verification after a sync to minimize cutover times.
Ran out of space or inodes? Encountered an unexpected error? Just fix the problem and resume your operations. On NFS there’s a dedicated resume command to run, while the CIFS operations just pick up natively from where it left off.
XCP is beneficial for:
- NFS and SMB workloads
- Copy and sync file systems
- Designed to handle millions of files
XCP Migration Workflow
- XCP Show: Query the exports and shares in the file server
- XCP Scan: Provide details about the number of files, data set size and file distribution up to the directory level granularity for better migration planning
- XCP Copy: Create a baseline copy
- XCP Sync dry run: Estimate the changes for the cutover window
- XCP Sync: Sync the final changes in the cutover window
- XCP Verify: Compare the data between the source and target to validate that everything migrated
How do I get XCP?
NetApp XCP is a free to use software and license is free. Obtaining a license, installing and configuring XCP is a matter of minutes, and here are the steps:
- Register a NetApp NSS account, if you don’t have one yet*
- Download XCP to a host
- Login and download a license
- Deploy the license file to the XCP host
*If you are not a NetApp customer please contact your local CSA
XCP is CLI based and doesn’t have a fancy GUI. When you download XCP 1.6 you will get a Netapp_XCP_1.6.1.tgz file which will allow you to choose to install XCP either on Windows or on a Linux client.
- NetApp XCP product documentation
- TR-4863: Best Practice Guidelines For NetApp XCP Data Mover, File Migration and Analytics
Cloud Sync (CS) is NetApp’s service for rapid and secure data synchronization. Whether you need to transfer files between on-premises NFS or SMB file shares, Amazon S3 object format, Azure Blob, IBM Cloud Object Storage or NetApp StorageGRID® Webscale appliance, Cloud Sync moves the files where you need them quickly and securely.
Now one of the things I like about CS is that it is very user-friendly and has a GUI
How does NetApp Cloud Sync work ?
Cloud Sync in a nutshell, looks at the source and the target, and then tries to make the target be as identical as possible as the source.
It then scans the source folder, looks at the target, then does a comparison. If the file exists on the target and if it is newer or has been modified on the source, it will be replaced or copied to the target. Is a file deleted on the target, do I need to delete it on the source as well? That is a move data action which means it copies the files and after copy is done deletes them from the source.
At the end of the day, while Cloud Sync is moving / copying files from source to target, it can potentially delete files on the target. Why? For example if you have a data lake of 4 sources and add a 5th source, but make a setup change like: “if file is gone from source delete from target”. If the 5th source is unique on top of all the other sources, you will clean up all your data lake. Be advised of that. It cannot be undone. And it might be hard to recover.
CS starts doing a complete baseline, while comparing data, then additional sync events will handle the incremental changes. This makes you think about “SnapMirror – just with files”. That file distinction is hugely important.
File level replication doesn’t enjoy the benefits of block replications. Block replication from source to target can be done in a deduped way.
- The Data Broker serves as a mid-point between the source and the target.
- It works with API against AWS S3, Azure Blob, GCS and ICOS
- In essence it is an agent for data transformation.
- It works natively with NFS (v3.1, v4.1, v4.2), SMB, EFS and CVS/ANF
The Cloud Sync Broker, how does it run ?
The Broker has two kind of processes, each with 4 concurrent copies:
- Scanner: Roam the file structure and decide what will be copied
- Transferrer: The file copier
- Only the meta data goes to the queuing service: The files do not leave the customer (on or off) premise
The Scanner traverses the tree, processing a folder at a time, they compare the source and the target and use the queuing service to mark files for copy by sending meta data markers
The Transferrers pull the meta data from the queue and copy the files from source to target in the most efficient way.
Recommendations from the experts…
- Cloud Sync is recommended for handling small file data sets by adding more data brokers.
- Cloud Sync is recommended If you need an ongoing synchronization or object support, even if the file count is high or the data set is large.
- We don’t recommend XCP with large numbers of symbolic links: re-syncs may be slower than an initial copy would be. for example : ln -s my_file.txt my_link.txt.
- We don’t recommend Cloud Sync for workloads with open files (in flight) such as databases or mounted VMs.
- We don’t recommend either Cloud Sync or XCP when latency between source and target is >50 ms.
- We recommend XCP for tens of millions of small files because the engine is better equipped to handle those amounts of files.
To summarize, no matter what kind of data you want to move to the cloud, we have great tools you can leverage to start your journey.