CentOS 7.x Owncloud upload files parallelly via weddav

From Notes_Wiki
Revision as of 01:49, 4 February 2021 by Saurabh (talk | contribs)

<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>

CentOS 7.x Owncloud upload files parallelly via weddav

Uploading files to owncloud sequentially using rsync or cp can be slow. This can especially be an issue if you need to upload thousands of small files. To upload multiple small files in parallel use:

  1. yum -y install parallel
  2. Create list of files to be copied by comparing only size. This is required as owncloud creates its own timestamps and davfs2 timestamps shown on command-line are not copied to backend.
    rsync -nvr --size-only /mnt/source/ /mnt/owncloud-dest/ > /root/copy-list.txt 2>/root/error-list.txt &
    and wait for file-list to be created
  3. Remove first line similar to:
    sending incremental file list
    and last 3 lines similar to:
    sent 65 bytes received 19 bytes 168.00 bytes/sec
    total size is 0 speedup is 0.00 (DRY RUN)
    from the created files
  4. Use parallel to copy files in parallel using above list to owncloud
    cd /mnt/source ##Very important
    cat /root/copy-list.txt | parallel --will-cite -j 5 cp -v --parents {} /mnt/owncloud-dest/ > /root/cp-output.txt 2>&1 &
    where -j 5 indicates 5 parallel copies at any time.
  5. At any time see 5 copy process running using:
    ps aux | grep "cp -v"
    Also
    ps aux | grep "cp -v" | wc -l
    will show 7 (2 more than -j value) due to grep, parallel commands also getting grepped
  6. To continuously monitor uploads use:
    watch "ifconfig br0; echo -n "No of copy processes:"; ps aux | grep 'cp -v' | wc -l; echo -n "No of files copied: "; grep -v 'cannot\|omitting' /root/cp-output.txt | wc -l; echo; df -h /; echo; du -sh /var/cache/davfs2; echo; tail /root/cp-output.txt"
    where br0 should be replaced with name of interface. This is monitoring:
    1. Interface statistics to get idea on uploads
    2. No of parallel cp processes running.
    3. No of files copied based on no. of lines in /root/cp-output.txt file
    4. Space in "/". Necessary to monitor this to ensure that cache space is not too large to accommodate in "/" filesystem.
    5. Disk space usage of davfs2 cache folder
    6. Last 10 copied files
  7. If davfs2 size increases automatically, to pause and continue above processes automatically use:
    ps aux | grep parallel
    while true; do sleep 7200; kill -19 <parallel-pid>; sleep 3600; kill -18 <parallel-pid>; done
    where <parallel-pid> is the PID of parallel process as seen in output of ps command.
    This will allow parallel to spawn processes for 2 hours and then pause it for 1 hour and then again continue it for another 2 hours and so on.
    Better option is to use pause-unpause.sh erlang script specified below which will automatically pause parallel with /var/cache/davfs2/ is more than 1000000 KB (approx 1GB) in size and then automatically unpause it when size goes below 100000 KB (approx 100MB)


Refer:


Pause unpause script to ensure /var/cache/davfs2 size is under limits

It is possible to pause parallel process using 'kill -19 <pid>' and then unpause it using 'kill -18 <pid>' based on space occupied by /var/cache/davfs2 folder. This can be done using erlang script:

#!/usr/bin/env escript

-define(High, 1000000).  %1 GB
-define(Low,   100000).  %100 MB

main(_) ->
        Output1=tl(string:tokens(os:cmd("ps -C perl -o pid"),"\n")),
        case Output1 of
           [] ->
                 io:format("Can't get pid of parallel process.  Exiting.~n");
           _ ->
               io:format("Got pid of parallel process as ~p~n",[Output1]),
               pause(Output1)
        end. 
           

pause(Pid1) ->
       Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
       if 
           Space1 > ?High ->
                io:format("Got space as ~p which is higher than ~p.  Pausing~n", [Space1, ?High]),
                Command1=lists:flatten(io_lib:format("kill -19 ~s", [Pid1])),
                io:format("Will pause with command ~p~n", [Command1]),
                os:cmd(Command1),
                unpause(Pid1);
           true ->
                sleep(60),
                pause(Pid1)
        end.

     
unpause(Pid1) ->
       Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
       if 
           Space1 < ?Low ->
                io:format("Got space as ~p which is lower than ~p.  Unpausing~n", [Space1, ?Low]),
                Command1=lists:flatten(io_lib:format("kill -18 ~s", [Pid1])),
                io:format("Will unpause with command ~p~n", [Command1]),
                os:cmd(Command1),
                pause(Pid1);
           true ->
                sleep(60),
                unpause(Pid1)
        end.

    

sleep(N) ->
    receive
    after N*1000 ->
        ok
    end.

To use the above script

  1. Enable epel repository using 'dnf -y install epel-release'
  2. Install erlang and byobu using 'dnf -y install erlang byobu
  3. Copy above script as file 'pause-unpause.sh'
  4. Give execute permission to file using 'chmod +x pause-unpause.sh'
  5. Start byobu shell using 'byobu'
  6. Execute pause-unpause.sh using './pause-unpause.sh'
  7. (Optionally) Exit byobu shell by leaving script running in background using 'F6' key



<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>