Difference between revisions of "CentOS 7.x Owncloud upload files parallelly via weddav"

From Notes_Wiki
(Created page with "<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb> =CentOS 7.x Owncloud upload files parallelly via weddav= Uploading files to owncloud sequentially using rsyn...")
 
m
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>
[[Main Page|Home]] > [[CentOS]] > [[CentOS 7.x]] > [[CentOS 7.x Web Based Tools|Web Based Tools]] > [[CentOS 7.x owncloud|owncloud]] > [[CentOS 7.x Owncloud upload files parallelly via weddav]]
=CentOS 7.x Owncloud upload files parallelly via weddav=


Uploading files to owncloud sequentially using rsync or cp can be slow.  To upload multiple files in parallel use:
==New approach by uploading files to data folder and running occ files:scan==
 
To upload files to owncloud use following steps:
# Copy files to data folder of corresponding user at desired location.  For example to upload files to projects/new/ folder of admin user copy them to '<tt>/opt/owncloud-&lt;version&gt;/apps/owncloud/data/admin/files/projects/new/</tt>' such as using:
#:<pre>
#:: rsync -vtrp ./new/ /opt/owncloud-&lt;version&gt;/apps/owncloud/data/admin/files/projects/new/
#:</pre>
# Fix permission on uploaded files
#:<pre>
#:: chown -R daemon:daemon /opt/owncloud-&lt;version&gt;/apps/owncloud/data/admin/files/projects/new/
#:</pre>
# Scan the particular folder for corresponding changes:
#:<pre>
#::    cd /opt/owncloud-10.0.10-4/apps/owncloud/htdocs
#::    sudo -u daemon /opt/owncloud-10.0.10-4/php/bin/php occ files:scan --path "admin/files/projects/new/"
#:</pre>
#: This will add any new files in projects/new folder to owncloud database.
#: If installation is small you can also consider scanning all folders using:
#::<pre>
#:::    sudo -u daemon /opt/owncloud-10.0.10-4/php/bin/php occ files:scan --all
#::</pre>
#:: Refer [[CentOS 7.x Owncloud file cache and sharing]]
# If any filesystem permissions are changed after running the scan then we need to run the scan again.  Owncloud look at file/folder permissions and caches that information in MySQL DB.  Hence it is important to update permission cached in DB, if there is any change in permission at file-system level.
 
 
 
==Old approach using parallel and webdav==
'''This is older approach and perhaps not required.  This take considerable time and expertise and is still slower then the faster approach explained above.'''
 
Uploading files to owncloud sequentially using rsync or cp can be slow.  This can especially be an issue if you need to upload thousands of small files.  To upload multiple small files in parallel use:
# yum -y install parallel
# yum -y install parallel
# Create list of files to be copied by comparing only size.  This is required as owncloud creates its own timestamps and davfs2 timestamps shown on command-line are not copied to backend.
# Create list of files to be copied by comparing only size.  This is required as owncloud creates its own timestamps and davfs2 timestamps shown on command-line are not copied to backend.
Line 8: Line 36:
#::    rsync -nvr --size-only /mnt/source/ /mnt/owncloud-dest/ > /root/copy-list.txt 2>/root/error-list.txt &
#::    rsync -nvr --size-only /mnt/source/ /mnt/owncloud-dest/ > /root/copy-list.txt 2>/root/error-list.txt &
#:</pre>
#:</pre>
#::and wait for file-list to be created
#::and wait for file-list to be created.  '''This can be very slow if /mnt/owncloud-dest is mounted using davfs2.  To speed this up directly build this list by comparing /mnt/source with contents of /opt/owncloud-&lt;version&gt;/apps/owncloud/data/&lt;user&gt;/files/&lt;path&gt; of remote owncloud machine over ssh or sshfs'''
# Remove first line similar to:
# Remove first line similar to:
#:<pre>
#:<pre>
Line 53: Line 81:
#: where &lt;parallel-pid&gt; is the PID of parallel process as seen in output of ps command.
#: where &lt;parallel-pid&gt; is the PID of parallel process as seen in output of ps command.
#: This will allow parallel to spawn processes for 2 hours and then pause it for 1 hour and then again continue it for another 2 hours and so on.
#: This will allow parallel to spawn processes for 2 hours and then pause it for 1 hour and then again continue it for another 2 hours and so on.
#: '''Better option is to use pause-unpause.sh erlang script specified below which will automatically pause parallel with /var/cache/davfs2/ is more than 1000000 KB (approx 1GB) in size and then automatically unpause it when size goes below 100000 KB (approx 100MB)'''
'''If you are instead trying to upload small no. (<10) of large files (>1GB) then perhaps have a look at https://unix.stackexchange.com/questions/354026/disable-davfs2-caching'''




Line 61: Line 93:




<yambe:breadcrumb>CentOS_7.x_owncloud|Owncloud</yambe:breadcrumb>
===Pause unpause script to ensure /var/cache/davfs2 size is under limits===
It is possible to pause parallel process using '<tt>kill -19 &lt;pid&gt;</tt>' and then unpause it using '<tt>kill -18 &lt;pid&gt;</tt>' based on space occupied by /var/cache/davfs2 folder.  This can be done using erlang script:
<pre>
#!/usr/bin/env escript
 
-define(High, 1000000).  %1 GB
-define(Low,  100000).  %100 MB
 
main(_) ->
        Output1=tl(string:tokens(os:cmd("ps -C perl -o pid"),"\n")),
        case Output1 of
          [] ->
                io:format("Can't get pid of parallel process.  Exiting.~n");
          _ ->
              io:format("Got pid of parallel process as ~p~n",[Output1]),
              pause(Output1)
        end.
         
 
pause(Pid1) ->
      Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
      if
          Space1 > ?High ->
                io:format("Got space as ~p which is higher than ~p.  Pausing~n", [Space1, ?High]),
                Command1=lists:flatten(io_lib:format("kill -19 ~s", [Pid1])),
                io:format("Will pause with command ~p~n", [Command1]),
                os:cmd(Command1),
                unpause(Pid1);
          true ->
                sleep(60),
                pause(Pid1)
        end.
 
   
unpause(Pid1) ->
      Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
      if
          Space1 < ?Low ->
                io:format("Got space as ~p which is lower than ~p.  Unpausing~n", [Space1, ?Low]),
                Command1=lists:flatten(io_lib:format("kill -18 ~s", [Pid1])),
                io:format("Will unpause with command ~p~n", [Command1]),
                os:cmd(Command1),
                pause(Pid1);
          true ->
                sleep(60),
                unpause(Pid1)
        end.
 
   
 
sleep(N) ->
    receive
    after N*1000 ->
        ok
    end.
 
</pre>
 
To use the above script
# Enable epel repository using '<tt>dnf -y install epel-release</tt>'
# Install erlang and byobu using '<tt>dnf -y install erlang byobu</tt>
# Copy above script as file '<tt>pause-unpause.sh</tt>'
# Give execute permission to file using '<tt>chmod +x pause-unpause.sh</tt>'
# Start byobu shell using '<tt>byobu</tt>'
# Execute pause-unpause.sh using '<tt>./pause-unpause.sh</tt>'
# (Optionally) Exit byobu shell by leaving script running in background using 'F6' key
 
 
 
 
 
[[Main Page|Home]] > [[CentOS]] > [[CentOS 7.x]] > [[CentOS 7.x Web Based Tools|Web Based Tools]] > [[CentOS 7.x owncloud|owncloud]] > [[CentOS 7.x Owncloud upload files parallelly via weddav]]

Latest revision as of 09:25, 25 August 2022

Home > CentOS > CentOS 7.x > Web Based Tools > owncloud > CentOS 7.x Owncloud upload files parallelly via weddav

New approach by uploading files to data folder and running occ files:scan

To upload files to owncloud use following steps:

  1. Copy files to data folder of corresponding user at desired location. For example to upload files to projects/new/ folder of admin user copy them to '/opt/owncloud-<version>/apps/owncloud/data/admin/files/projects/new/' such as using:
    rsync -vtrp ./new/ /opt/owncloud-<version>/apps/owncloud/data/admin/files/projects/new/
  2. Fix permission on uploaded files
    chown -R daemon:daemon /opt/owncloud-<version>/apps/owncloud/data/admin/files/projects/new/
  3. Scan the particular folder for corresponding changes:
    cd /opt/owncloud-10.0.10-4/apps/owncloud/htdocs
    sudo -u daemon /opt/owncloud-10.0.10-4/php/bin/php occ files:scan --path "admin/files/projects/new/"
    This will add any new files in projects/new folder to owncloud database.
    If installation is small you can also consider scanning all folders using:
    sudo -u daemon /opt/owncloud-10.0.10-4/php/bin/php occ files:scan --all
    Refer CentOS 7.x Owncloud file cache and sharing
  4. If any filesystem permissions are changed after running the scan then we need to run the scan again. Owncloud look at file/folder permissions and caches that information in MySQL DB. Hence it is important to update permission cached in DB, if there is any change in permission at file-system level.


Old approach using parallel and webdav

This is older approach and perhaps not required. This take considerable time and expertise and is still slower then the faster approach explained above.

Uploading files to owncloud sequentially using rsync or cp can be slow. This can especially be an issue if you need to upload thousands of small files. To upload multiple small files in parallel use:

  1. yum -y install parallel
  2. Create list of files to be copied by comparing only size. This is required as owncloud creates its own timestamps and davfs2 timestamps shown on command-line are not copied to backend.
    rsync -nvr --size-only /mnt/source/ /mnt/owncloud-dest/ > /root/copy-list.txt 2>/root/error-list.txt &
    and wait for file-list to be created. This can be very slow if /mnt/owncloud-dest is mounted using davfs2. To speed this up directly build this list by comparing /mnt/source with contents of /opt/owncloud-<version>/apps/owncloud/data/<user>/files/<path> of remote owncloud machine over ssh or sshfs
  3. Remove first line similar to:
    sending incremental file list
    and last 3 lines similar to:
    sent 65 bytes received 19 bytes 168.00 bytes/sec
    total size is 0 speedup is 0.00 (DRY RUN)
    from the created files
  4. Use parallel to copy files in parallel using above list to owncloud
    cd /mnt/source ##Very important
    cat /root/copy-list.txt | parallel --will-cite -j 5 cp -v --parents {} /mnt/owncloud-dest/ > /root/cp-output.txt 2>&1 &
    where -j 5 indicates 5 parallel copies at any time.
  5. At any time see 5 copy process running using:
    ps aux | grep "cp -v"
    Also
    ps aux | grep "cp -v" | wc -l
    will show 7 (2 more than -j value) due to grep, parallel commands also getting grepped
  6. To continuously monitor uploads use:
    watch "ifconfig br0; echo -n "No of copy processes:"; ps aux | grep 'cp -v' | wc -l; echo -n "No of files copied: "; grep -v 'cannot\|omitting' /root/cp-output.txt | wc -l; echo; df -h /; echo; du -sh /var/cache/davfs2; echo; tail /root/cp-output.txt"
    where br0 should be replaced with name of interface. This is monitoring:
    1. Interface statistics to get idea on uploads
    2. No of parallel cp processes running.
    3. No of files copied based on no. of lines in /root/cp-output.txt file
    4. Space in "/". Necessary to monitor this to ensure that cache space is not too large to accommodate in "/" filesystem.
    5. Disk space usage of davfs2 cache folder
    6. Last 10 copied files
  7. If davfs2 size increases automatically, to pause and continue above processes automatically use:
    ps aux | grep parallel
    while true; do sleep 7200; kill -19 <parallel-pid>; sleep 3600; kill -18 <parallel-pid>; done
    where <parallel-pid> is the PID of parallel process as seen in output of ps command.
    This will allow parallel to spawn processes for 2 hours and then pause it for 1 hour and then again continue it for another 2 hours and so on.
    Better option is to use pause-unpause.sh erlang script specified below which will automatically pause parallel with /var/cache/davfs2/ is more than 1000000 KB (approx 1GB) in size and then automatically unpause it when size goes below 100000 KB (approx 100MB)


If you are instead trying to upload small no. (<10) of large files (>1GB) then perhaps have a look at https://unix.stackexchange.com/questions/354026/disable-davfs2-caching


Refer:


Pause unpause script to ensure /var/cache/davfs2 size is under limits

It is possible to pause parallel process using 'kill -19 <pid>' and then unpause it using 'kill -18 <pid>' based on space occupied by /var/cache/davfs2 folder. This can be done using erlang script:

#!/usr/bin/env escript

-define(High, 1000000).  %1 GB
-define(Low,   100000).  %100 MB

main(_) ->
        Output1=tl(string:tokens(os:cmd("ps -C perl -o pid"),"\n")),
        case Output1 of
           [] ->
                 io:format("Can't get pid of parallel process.  Exiting.~n");
           _ ->
               io:format("Got pid of parallel process as ~p~n",[Output1]),
               pause(Output1)
        end. 
           

pause(Pid1) ->
       Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
       if 
           Space1 > ?High ->
                io:format("Got space as ~p which is higher than ~p.  Pausing~n", [Space1, ?High]),
                Command1=lists:flatten(io_lib:format("kill -19 ~s", [Pid1])),
                io:format("Will pause with command ~p~n", [Command1]),
                os:cmd(Command1),
                unpause(Pid1);
           true ->
                sleep(60),
                pause(Pid1)
        end.

     
unpause(Pid1) ->
       Space1=list_to_integer(hd(string:tokens(os:cmd("du -s /var/cache/davfs2"), "\n\t"))),
       if 
           Space1 < ?Low ->
                io:format("Got space as ~p which is lower than ~p.  Unpausing~n", [Space1, ?Low]),
                Command1=lists:flatten(io_lib:format("kill -18 ~s", [Pid1])),
                io:format("Will unpause with command ~p~n", [Command1]),
                os:cmd(Command1),
                pause(Pid1);
           true ->
                sleep(60),
                unpause(Pid1)
        end.

    

sleep(N) ->
    receive
    after N*1000 ->
        ok
    end.

To use the above script

  1. Enable epel repository using 'dnf -y install epel-release'
  2. Install erlang and byobu using 'dnf -y install erlang byobu
  3. Copy above script as file 'pause-unpause.sh'
  4. Give execute permission to file using 'chmod +x pause-unpause.sh'
  5. Start byobu shell using 'byobu'
  6. Execute pause-unpause.sh using './pause-unpause.sh'
  7. (Optionally) Exit byobu shell by leaving script running in background using 'F6' key



Home > CentOS > CentOS 7.x > Web Based Tools > owncloud > CentOS 7.x Owncloud upload files parallelly via weddav