Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
3a1aa73
feat(populate-guids): add CLI/SDK functionality to populate a manifes…
Avantol13-machine-user Jun 9, 2022
9ed4c51
Apply automatic documentation changes
Avantol13 Jun 9, 2022
4f6b2e8
Merge branch 'master' into feat/populate-guids
Avantol13-machine-user Jun 10, 2022
d61920c
Apply automatic documentation changes
Avantol13 Jun 10, 2022
3ce8dfa
Merge branch 'master' into feat/populate-guids
Avantol13-machine-user Jun 10, 2022
90ca6d6
Apply automatic documentation changes
Avantol13 Jun 10, 2022
a8ab776
Merge branch 'master' into feat/populate-guids
Avantol13-machine-user Jun 13, 2022
bf608ea
Apply automatic documentation changes
Avantol13 Jun 13, 2022
4d9464a
Merge branch 'master' into feat/populate-guids
Avantol13-machine-user Aug 1, 2022
dbcc944
Apply automatic documentation changes
Avantol13 Aug 1, 2022
26f72c1
Merge branch 'master' into feat/populate-guids
Avantol13-machine-user Aug 10, 2022
3bc1788
Apply automatic documentation changes
Avantol13 Aug 10, 2022
30da48d
Merge branch 'master' into feat/populate-guids
Avantol13-machine-user Aug 15, 2022
ec67e36
Apply automatic documentation changes
Avantol13 Aug 15, 2022
f86df5c
Merge branch 'master' into feat/populate-guids
Avantol13-machine-user Aug 16, 2022
d565981
Apply automatic documentation changes
Avantol13 Aug 16, 2022
b9d2174
Merge branch 'master' into feat/populate-guids
Avantol13 Dec 6, 2023
d9c2b0c
Apply automatic documentation changes
Avantol13 Dec 6, 2023
8139d30
Merge branch 'master' into feat/populate-guids
Avantol13 Dec 7, 2023
232e98e
Apply automatic documentation changes
Avantol13 Dec 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/tools/indexing.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/tools/metadata.doctree
Binary file not shown.
66 changes: 66 additions & 0 deletions docs/_build/html/_modules/gen3/tools/indexing/index_manifest.html
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,72 @@ <h1>Source code for gen3.tools.indexing.index_manifest</h1><div class="highlight



<div class="viewcode-block" id="populate_object_manifest_with_valid_guids">
<a class="viewcode-back" href="../../../../tools/indexing.html#gen3.tools.indexing.index_manifest.populate_object_manifest_with_valid_guids">[docs]</a>
<span class="k">def</span> <span class="nf">populate_object_manifest_with_valid_guids</span><span class="p">(</span>
<span class="n">commons_url</span><span class="p">,</span> <span class="n">manifest_file</span><span class="p">,</span> <span class="n">output_filename</span><span class="o">=</span><span class="kc">None</span>
<span class="p">):</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Given a minimal file object manifest, populate any missing GUIDs with valid GUIDs</span>
<span class="sd"> for the given commons.</span>

<span class="sd"> NOTE: This DOES NOT index anything, it only works client side to populate the manifest</span>
<span class="sd"> with valid GUIDs (which are obtained from the server). No records are created</span>
<span class="sd"> as part of this function call.</span>

<span class="sd"> Args:</span>
<span class="sd"> commons_url (str): root domain for commons where indexd lives</span>
<span class="sd"> manifest_file (str): file path for input manifest file to populate empty GUIDs</span>
<span class="sd"> output_filename(str): output file name for manifest</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">output_filename</span><span class="p">:</span>
<span class="n">file</span><span class="p">,</span> <span class="n">extension</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">splitext</span><span class="p">(</span><span class="n">manifest_file</span><span class="p">)</span>
<span class="n">output_filename</span> <span class="o">=</span> <span class="n">file</span> <span class="o">+</span> <span class="s2">&quot;_populated_guids&quot;</span> <span class="o">+</span> <span class="n">extension</span>

<span class="k">try</span><span class="p">:</span>
<span class="n">records</span><span class="p">,</span> <span class="n">headers</span> <span class="o">=</span> <span class="n">get_and_verify_fileinfos_from_manifest</span><span class="p">(</span>
<span class="n">manifest_file</span><span class="p">,</span> <span class="n">manifest_file_delimiter</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">include_additional_columns</span><span class="o">=</span><span class="kc">True</span>
<span class="p">)</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">exc</span><span class="p">:</span>
<span class="n">logging</span><span class="o">.</span><span class="n">error</span><span class="p">(</span>
<span class="sa">f</span><span class="s2">&quot;Can not read records and headers from input manifest: </span><span class="si">{</span><span class="n">manifest_file</span><span class="si">}</span><span class="s2">.&quot;</span>
<span class="p">)</span>
<span class="k">raise</span>

<span class="c1"># ensure GUID column exists</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">headers</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">GUID_STANDARD_KEY</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
<span class="n">headers</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">GUID_STANDARD_KEY</span><span class="p">)</span>

<span class="n">index</span> <span class="o">=</span> <span class="n">Gen3Index</span><span class="p">(</span><span class="n">commons_url</span><span class="p">)</span>
<span class="n">valid_guids</span> <span class="o">=</span> <span class="n">index</span><span class="o">.</span><span class="n">get_valid_guids</span><span class="p">(</span><span class="n">count</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>

<span class="c1"># modify records to include a valid GUID if it doesn&#39;t exist</span>
<span class="n">new_records</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">record</span> <span class="ow">in</span> <span class="n">records</span><span class="p">:</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">record</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">GUID_STANDARD_KEY</span><span class="p">):</span>
<span class="n">record</span><span class="p">[</span><span class="n">GUID_STANDARD_KEY</span><span class="p">]</span> <span class="o">=</span> <span class="n">valid_guids</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>

<span class="c1"># if we run out of valid GUIDs, get some more</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">valid_guids</span><span class="p">:</span>
<span class="n">valid_guids</span> <span class="o">=</span> <span class="n">index</span><span class="o">.</span><span class="n">get_valid_guids</span><span class="p">(</span><span class="n">count</span><span class="o">=</span><span class="mi">10000</span><span class="p">)</span>

<span class="n">new_records</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">record</span><span class="p">)</span>

<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">new_records</span><span class="p">)</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="n">records</span><span class="p">)</span>

<span class="n">output_filename</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="n">output_filename</span><span class="p">)</span>
<span class="n">logging</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Writing output to </span><span class="si">{</span><span class="n">output_filename</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>

<span class="c1"># remove existing output if it exists</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">output_filename</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">unlink</span><span class="p">(</span><span class="n">output_filename</span><span class="p">)</span>

<span class="n">_write_csv</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CURRENT_DIR</span><span class="p">,</span> <span class="n">output_filename</span><span class="p">),</span> <span class="n">new_records</span><span class="p">,</span> <span class="n">headers</span><span class="p">)</span></div>



<span class="nd">@click</span><span class="o">.</span><span class="n">command</span><span class="p">()</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">option</span><span class="p">(</span>
<span class="s2">&quot;--commons-url&quot;</span><span class="p">,</span>
Expand Down
4 changes: 3 additions & 1 deletion docs/_build/html/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -527,10 +527,12 @@ <h2 id="O">O</h2>
<h2 id="P">P</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="tools/drs_pull.html#gen3.tools.download.drs_download.Downloadable.pprint">pprint() (gen3.tools.download.drs_download.Downloadable method)</a>
<li><a href="tools/indexing.html#gen3.tools.indexing.index_manifest.populate_object_manifest_with_valid_guids">populate_object_manifest_with_valid_guids() (in module gen3.tools.indexing.index_manifest)</a>
</li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="tools/drs_pull.html#gen3.tools.download.drs_download.Downloadable.pprint">pprint() (gen3.tools.download.drs_download.Downloadable method)</a>
</li>
<li><a href="tools/indexing.html#gen3.tools.indexing.index_manifest.PREV_GUID">PREV_GUID (in module gen3.tools.indexing.index_manifest)</a>
</li>
</ul></td>
Expand Down
1 change: 1 addition & 0 deletions docs/_build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ <h1>Welcome to Gen3 SDK’s documentation!<a class="headerlink" href="#welcome-t
<li class="toctree-l4"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.ThreadControl"><code class="docutils literal notranslate"><span class="pre">ThreadControl</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.delete_all_guids"><code class="docutils literal notranslate"><span class="pre">delete_all_guids()</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.index_object_manifest"><code class="docutils literal notranslate"><span class="pre">index_object_manifest()</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.populate_object_manifest_with_valid_guids"><code class="docutils literal notranslate"><span class="pre">populate_object_manifest_with_valid_guids()</span></code></a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="tools/indexing.html#module-gen3.tools.indexing.verify_manifest">Verify</a><ul>
Expand Down
Binary file modified docs/_build/html/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/_build/html/tools.html
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ <h1>Gen3 Tools<a class="headerlink" href="#gen3-tools" title="Link to this headi
<li class="toctree-l3"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.ThreadControl"><code class="docutils literal notranslate"><span class="pre">ThreadControl</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.delete_all_guids"><code class="docutils literal notranslate"><span class="pre">delete_all_guids()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.index_object_manifest"><code class="docutils literal notranslate"><span class="pre">index_object_manifest()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="tools/indexing.html#gen3.tools.indexing.index_manifest.populate_object_manifest_with_valid_guids"><code class="docutils literal notranslate"><span class="pre">populate_object_manifest_with_valid_guids()</span></code></a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="tools/indexing.html#module-gen3.tools.indexing.verify_manifest">Verify</a><ul>
Expand Down
24 changes: 23 additions & 1 deletion docs/_build/html/tools/indexing.html
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,27 @@ <h1>Indexing Tools<a class="headerlink" href="#indexing-tools" title="Link to th
</dl>
</dd></dl>

<dl class="py function">
<dt class="sig sig-object py" id="gen3.tools.indexing.index_manifest.populate_object_manifest_with_valid_guids">
<span class="sig-prename descclassname"><span class="pre">gen3.tools.indexing.index_manifest.</span></span><span class="sig-name descname"><span class="pre">populate_object_manifest_with_valid_guids</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">commons_url</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">manifest_file</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_filename</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/indexing/index_manifest.html#populate_object_manifest_with_valid_guids"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.indexing.index_manifest.populate_object_manifest_with_valid_guids" title="Link to this definition">¶</a></dt>
<dd><p>Given a minimal file object manifest, populate any missing GUIDs with valid GUIDs
for the given commons.</p>
<dl class="simple">
<dt>NOTE: This DOES NOT index anything, it only works client side to populate the manifest</dt><dd><p>with valid GUIDs (which are obtained from the server). No records are created
as part of this function call.</p>
</dd>
</dl>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>commons_url</strong> (<em>str</em>) – root domain for commons where indexd lives</p></li>
<li><p><strong>manifest_file</strong> (<em>str</em>) – file path for input manifest file to populate empty GUIDs</p></li>
<li><p><strong>output_filename</strong> (<em>str</em>) – output file name for manifest</p></li>
</ul>
</dd>
</dl>
</dd></dl>

</section>
<section id="module-gen3.tools.indexing.verify_manifest">
<span id="verify"></span><h2>Verify<a class="headerlink" href="#module-gen3.tools.indexing.verify_manifest" title="Link to this heading">¶</a></h2>
Expand Down Expand Up @@ -380,7 +401,7 @@ <h1>Indexing Tools<a class="headerlink" href="#indexing-tools" title="Link to th

<dl class="py function">
<dt class="sig sig-object py" id="gen3.tools.indexing.verify_manifest.async_verify_object_manifest">
<em class="property"><span class="k"><span class="pre">async</span></span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">gen3.tools.indexing.verify_manifest.</span></span><span class="sig-name descname"><span class="pre">async_verify_object_manifest</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">commons_url</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">manifest_file</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_concurrent_requests=24</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">manifest_row_parsers={'acl':</span> <span class="pre">&lt;function</span> <span class="pre">_get_acl_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'authz':</span> <span class="pre">&lt;function</span> <span class="pre">_get_authz_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'file_name':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_name_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'file_size':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_size_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'guid':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'md5':</span> <span class="pre">&lt;function</span> <span class="pre">_get_md5_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'urls':</span> <span class="pre">&lt;function</span> <span class="pre">_get_urls_from_row&gt;}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">manifest_file_delimiter=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_filename='verify-manifest-errors-1701900364.6706986.log'</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/indexing/verify_manifest.html#async_verify_object_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.indexing.verify_manifest.async_verify_object_manifest" title="Link to this definition">¶</a></dt>
<em class="property"><span class="k"><span class="pre">async</span></span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">gen3.tools.indexing.verify_manifest.</span></span><span class="sig-name descname"><span class="pre">async_verify_object_manifest</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">commons_url</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">manifest_file</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_concurrent_requests=24</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">manifest_row_parsers={'acl':</span> <span class="pre">&lt;function</span> <span class="pre">_get_acl_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'authz':</span> <span class="pre">&lt;function</span> <span class="pre">_get_authz_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'file_name':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_name_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'file_size':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_size_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'guid':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'md5':</span> <span class="pre">&lt;function</span> <span class="pre">_get_md5_from_row&gt;</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">'urls':</span> <span class="pre">&lt;function</span> <span class="pre">_get_urls_from_row&gt;}</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">manifest_file_delimiter=None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_filename='verify-manifest-errors-1701965244.9268556.log'</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/indexing/verify_manifest.html#async_verify_object_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.indexing.verify_manifest.async_verify_object_manifest" title="Link to this definition">¶</a></dt>
<dd><p>Verify all file object records into a manifest csv</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
Expand Down Expand Up @@ -455,6 +476,7 @@ <h3>Navigation</h3>
<li class="toctree-l4"><a class="reference internal" href="#gen3.tools.indexing.index_manifest.ThreadControl"><code class="docutils literal notranslate"><span class="pre">ThreadControl</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="#gen3.tools.indexing.index_manifest.delete_all_guids"><code class="docutils literal notranslate"><span class="pre">delete_all_guids()</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="#gen3.tools.indexing.index_manifest.index_object_manifest"><code class="docutils literal notranslate"><span class="pre">index_object_manifest()</span></code></a></li>
<li class="toctree-l4"><a class="reference internal" href="#gen3.tools.indexing.index_manifest.populate_object_manifest_with_valid_guids"><code class="docutils literal notranslate"><span class="pre">populate_object_manifest_with_valid_guids()</span></code></a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="#module-gen3.tools.indexing.verify_manifest">Verify</a><ul>
Expand Down
Loading