SearchHelper

class mdf_toolbox.SearchHelper(index, **kwargs)[source]

Utility class for performing queries using a globus_sdk.SearchClient.

Notes

Query strings may end up wrapped in parentheses, which has no direct effect on the search. It is inadvisable to use the “private” methods to modify the query string directly, as the low-level logic for query string generation is not as user-friendly.

__init__(index, **kwargs)[source]

Create a SearchHelper object.

Parameters:

index (str) – The Globus Search index to search on.

Keyword Arguments:
 
  • search_client (globus_sdk.SearchClient) – The Globus Search client to use for searching. If not provided, one will be created and the user may be asked to log in. Default: None.
  • anonymous (bool) –

    If True, will not authenticate with Globus Auth. If False, will require authentication (either a SearchClient or a user-interactive login). Default: False.

    Caution

    Authentication is required to view non-public data in Search. An anonymous SearchHelper will only return public results.

  • app_name (str) – The application name to use. Should be changed for subclassed clients, and left alone otherwise. Only used if performing login flow. Default: "SearchHelper_Client".
  • client_id (str) – The ID of a native client to use when authenticating. Only used if performing login flow. Default: The default SearchHelper client ID.
  • q (str) – A query string to initialize the SearchHelper with. Intended for internal use.
  • advanced (bool) – The initial advanced state for thie SearchHelper. Intended for internal use.
add_sort(field, ascending=True)[source]

Sort the search results by a certain field.

If this method is called multiple times, the later sort fields are given lower priority, and will only be considered when the eariler fields have the same value.

Parameters:
  • field (str) – The field to sort by. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • ascending (bool) – If True, the results will be sorted in ascending order. If False, the results will be sorted in descending order. Default: True.
Returns:

Self

Return type:

SearchHelper

current_query()[source]

Return the current query string.

Returns:The current query.
Return type:str
exclude_field(field, value, new_group=False)[source]

Exclude a field:value term from the query. Matches will NOT have the value in the field.

Parameters:
  • field (str) – The field to check for the value. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • value (str) – The value to exclude.
  • new_group (bool) – If True, will separate term the into a new parenthetical group. If False, will not. Default: False.
Returns:

Self

Return type:

SearchHelper

exclude_range(field, start='*', stop='*', inclusive=True, new_group=False)[source]

Exclude a field:[some range] term from the query. Matches will not have any value in the range in the field.

Parameters:
  • field (str) – The field to check for the value. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • start (str or int) – The starting value, or None for no lower bound. Default: None.
  • stop (str or int) – The ending value, or None for no upper bound. Default: None.
  • inclusive (bool) – If True, the start and stop values will be excluded from the search. If False, the start and stop values will not be excluded from the search. Default: True.
  • new_group (bool) – If True, will separate the term into a new parenthetical group. If False, will not. Default: False.
Returns:

Self

Return type:

SearchHelper

exclusive_match(field, value)[source]

Match exactly the given value(s), with no other data in the field.

Parameters:
  • field (str) – The field to check for the value. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • value (str or list of str) – The value(s) to match exactly.
Returns:

Self

Return type:

SearchHelper

initialized

Whether any valid term has been added to the query.

logout()[source]

Delete Globus Auth tokens.

match_exists(field, required=True, new_group=False)[source]

Require a field to exist in the results. Matches will have some value in field.

Parameters:
  • field (str) – The field to check. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • required (bool) – If True, will add term with AND. If False, will use OR. Default: True.
  • new_group (bool) – If True, will separate the term into a new parenthetical group. If False, will not. Default: False.
Returns:

Self

Return type:

SearchHelper

match_field(field, value, required=True, new_group=False)[source]

Add a field:value term to the query. Matches will have the value in the field.

Parameters:
  • field (str) – The field to check for the value. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • value (str) – The value to match.
  • required (bool) – If True, will add term with AND. If False, will use OR. Default: True.
  • new_group (bool) – If True, will separate the term into a new parenthetical group. If False, will not. Default: False.
Returns:

Self

Return type:

SearchHelper

match_not_exists(field, new_group=False)[source]

Require a field to not exist in the results. Matches will not have field present.

Parameters:
  • field (str) – The field to check. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • new_group (bool) – If True, will separate the term into a new parenthetical group. If False, will not. Default: False.
Returns:

Self

Return type:

SearchHelper

match_range(field, start=None, stop=None, inclusive=True, required=True, new_group=False)[source]

Add a field:[some range] term to the query. Matches will have a value in the range in the field.

Parameters:
  • field (str) – The field to check for the value. The field must be namespaced according to Elasticsearch rules using the dot syntax. For example, "mdf.source_name" is the source_name field of the mdf dictionary.
  • start (str or int) – The starting value, or None for no lower bound. Default: None.
  • stop (str or int) – The ending value, or None for no upper bound. Default: None.
  • inclusive (bool) – If True, the start and stop values will be included in the search. If False, the start and stop values will not be included in the search. Default: True.
  • required (bool) – If True, will add term with AND. If False, will use OR. Default: True.
  • new_group (bool) – If True, will separate the term into a new parenthetical group. If False, will not. Default: False.
Returns:

Self

Return type:

SearchHelper

match_term(value, required=True, new_group=False)[source]

Add a fulltext search term to the query.

Warning

Do not use this method with any other query-building helpers. This method is only for building fulltext queries (in non-advanced mode). Using other helpers, such as match_field(), will cause the query to run in advanced mode. If a fulltext term query is run in advanced mode, it will have unexpected results.

Parameters:
  • value (str) – The term to match.
  • required (bool) – If True, will add term with AND. If False, will use OR. Default: True.
  • new_group (bool) – If True, will separate the term into a new parenthetical group. If False, will not. Default: False.
Returns:

Self

Return type:

SearchHelper

reset_query()[source]

Destroy the current query and create a fresh one. This method should not be chained.

Returns:None
search(q=None, advanced=False, limit=None, info=False, reset_query=True)[source]

Execute a search and return the results, up to the SEARCH_LIMIT.

Parameters:
  • q (str) – The query to execute. Default: The current helper-formed query, if any. There must be some query to execute.
  • advanced (bool) – Whether to treat q as a basic or advanced query. Has no effect if a query is not supplied in q. Default: False
  • limit (int) – The maximum number of results to return. The max for this argument is the SEARCH_LIMIT imposed by Globus Search. Default: SEARCH_LIMIT for advanced queries, 10 for basic queries.
  • info (bool) – If False, search will return a list of the results. If True, search will return a tuple containing the results list and other information about the query. Default: False.
  • reset_query (bool) – If True, will destroy the current query after execution and start a fresh one. If False, will keep the current query set. Has no effect if a query is supplied in q. Default: True.
Returns:

The search results. If info is True, tuple: The search results, and a dictionary of query information.

Return type:

If info is False, list

Note

If a query is specified in q, the current, helper-built query (if any) will not be used in the search or modified.

show_fields(block=None)[source]

Retrieve and return the mapping for the given metadata block.

Parameters:
  • block (str) – The top-level field to fetch the mapping for (for example, "mdf"), or the special values None for everything or "top" for just the top-level fields. Default: None.
  • index (str) – The Search index to map. Default: The current index.
Returns:

field:datatype pairs.

Return type:

dict

Subclass Helpers

class mdf_toolbox.AggregateHelper(*args, **kwargs)[source]

Subclass to add the aggregate() functionality to the SearchHelper.

aggregate() is currently the only way to retrieve more than 10,000 entries from Globus Search, and requires a scroll_field index field.

__init__(*args, **kwargs)[source]

Add the AggregateHelper to a SearchHelper.

Parameters:scroll_field (str) – The field on which to scroll. This should be a field that counts/indexes the entries.
aggregate(q=None, scroll_size=10000, reset_query=True, **kwargs)[source]

Perform an advanced query, and return all matching results. Will automatically perform multiple queries in order to retrieve all results.

Note

All aggregate queries run in advanced mode, and info is not available.

Parameters:
  • q (str) – The query to execute. Default: The current helper-formed query, if any. There must be some query to execute.
  • scroll_size (int) – Maximum number of records returned per query. Must be between one and the SEARCH_LIMIT (inclusive). Default: SEARCH_LIMIT.
  • reset_query (bool) – If True, will destroy the current query after execution and start a fresh one. If False, will keep the current query set. Default: True.
Keyword Arguments:
 

scroll_field (str) – The field on which to scroll. This should be a field that counts/indexes the entries. This should be set in self.scroll_field, but if your application requires separate scroll fields for a single client, it can be set in this way as well. Default: self.scroll_field.

Returns:

All matching records.

Return type:

list of dict