{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import mdf_toolbox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Globus Search Utilities\n", "The MDF Toolbox provides a few utilities to make integrating with Globus Search easier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# format_gmeta\n", "`format_gmeta()` takes a dictionary of data you want to change into the Globus Search GMeta format and returns the `GMetaEntry` of that dictionary. It is required to provide the `acl` (Access Control List, or `[\"public\"]` for public data) and `identifier` (unique ID for this entry, or an existing ID to overwrite).\n", "\n", "To make a `GIngest` (the final form of Globus Search ingests), provide a list of `GMetaEntry` objects." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "my_data = {\n", " \"foo\": \"bar\",\n", " \"baz\": [1, 2, 3, 4]\n", "}\n", "gmeta_entry = mdf_toolbox.format_gmeta(my_data,\n", " acl=[\"public\"],\n", " identifier=\"abc123\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'@datatype': 'GMetaEntry',\n", " '@version': '2016-11-09',\n", " 'content': {'baz': [1, 2, 3, 4], 'foo': 'bar'},\n", " 'subject': 'abc123',\n", " 'visible_to': ['public']}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gmeta_entry" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "list_of_gmeta_entry = [gmeta_entry]\n", "g_ingest = mdf_toolbox.format_gmeta(list_of_gmeta_entry)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'@datatype': 'GIngest',\n", " '@version': '2016-11-09',\n", " 'ingest_data': {'@datatype': 'GMetaList',\n", " '@version': '2016-11-09',\n", " 'gmeta': [{'@datatype': 'GMetaEntry',\n", " '@version': '2016-11-09',\n", " 'content': {'baz': [1, 2, 3, 4], 'foo': 'bar'},\n", " 'subject': 'abc123',\n", " 'visible_to': ['public']}]},\n", " 'ingest_type': 'GMetaList'}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g_ingest" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# globus_sdk.SearchClient.ingest(index, g_ingest)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## gmeta_pop\n", "`gmeta_pop()` takes the results from a Globus Search query and unwraps them from the GMeta format. You can pass in a `GlobusHTTPResponse` from the `SearchClient`, a JSON-dumped string, or a dictionary." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "sample_search_result = { \n", " '@datatype': 'GSearchResult',\n", " '@version': '2016-11-09',\n", " 'count': 11, \n", " 'gmeta': [{\n", " '@datatype': 'GMetaResult',\n", " '@version': '2016-11-09',\n", " 'content': [{\n", " \"foo\": \"bar\",\n", " \"baz\": [1, 2, 3, 4, 5]\n", " }, {\n", " \"food\": \"bard\",\n", " \"bazd\": [\"d\"]\n", " }],\n", " 'subject': \"http://example.com/abc123\",\n", " }],\n", " 'offset': 0,\n", " 'total': 22\n", "}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'baz': [1, 2, 3, 4, 5], 'foo': 'bar'}, {'bazd': ['d'], 'food': 'bard'}]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf_toolbox.gmeta_pop(sample_search_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want the metadata associated with your query (total number of query matches), you can use `info=True` to get a tuple of (results, metadata)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "([{'baz': [1, 2, 3, 4, 5], 'foo': 'bar'}, {'bazd': ['d'], 'food': 'bard'}],\n", " {'total_query_matches': 22})" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf_toolbox.gmeta_pop(sample_search_result, info=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## translate_index\n", "Globus Search requires or strongly encourages users to query using an index's UUID instead of the index's name. `translate_index()` takes the index name and returns the UUID (if found, otherwise it returns the input back)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1a57bbe5-5272-477f-9d31-343b8258b7a5'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdf_toolbox.translate_index(\"mdf\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }