SearchHelper¶
-
class
mdf_toolbox.
SearchHelper
(index, **kwargs)[source]¶ Utility class for performing queries using a
globus_sdk.SearchClient
.Notes
Query strings may end up wrapped in parentheses, which has no direct effect on the search. It is inadvisable to use the “private” methods to modify the query string directly, as the low-level logic for query string generation is not as user-friendly.
-
__init__
(index, **kwargs)[source]¶ Create a SearchHelper object.
Parameters: index (str) – The Globus Search index to search on.
Keyword Arguments: - search_client (globus_sdk.SearchClient) – The Globus Search client to use for
searching. If not provided, one will be created and the user may be asked
to log in. Default:
None
. - anonymous (bool) –
If
True
, will not authenticate with Globus Auth. IfFalse
, will require authentication (either a SearchClient or a user-interactive login). Default:False
.Caution
Authentication is required to view non-public data in Search. An anonymous SearchHelper will only return public results.
- app_name (str) – The application name to use. Should be changed for
subclassed clients, and left alone otherwise.
Only used if performing login flow.
Default:
"SearchHelper_Client"
. - client_id (str) – The ID of a native client to use when authenticating. Only used if performing login flow. Default: The default SearchHelper client ID.
- q (str) – A query string to initialize the SearchHelper with. Intended for internal use.
- advanced (bool) – The initial advanced state for thie SearchHelper. Intended for internal use.
- search_client (globus_sdk.SearchClient) – The Globus Search client to use for
searching. If not provided, one will be created and the user may be asked
to log in. Default:
-
add_sort
(field, ascending=True)[source]¶ Sort the search results by a certain field.
If this method is called multiple times, the later sort fields are given lower priority, and will only be considered when the eariler fields have the same value.
Parameters: - field (str) – The field to sort by.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - ascending (bool) – If
True
, the results will be sorted in ascending order. IfFalse
, the results will be sorted in descending order. Default:True
.
Returns: Self
Return type: - field (str) – The field to sort by.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
current_query
()[source]¶ Return the current query string.
Returns: The current query. Return type: str
-
exclude_field
(field, value, new_group=False)[source]¶ Exclude a
field:value
term from the query. Matches will NOT have thevalue
in thefield
.Parameters: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - value (str) – The value to exclude.
- new_group (bool) – If
True
, will separate term the into a new parenthetical group. IfFalse
, will not. Default:False
.
Returns: Self
Return type: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
exclude_range
(field, start='*', stop='*', inclusive=True, new_group=False)[source]¶ Exclude a
field:[some range]
term from the query. Matches will not have anyvalue
in the range in thefield
.Parameters: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - start (str or int) – The starting value, or
None
for no lower bound. Default:None
. - stop (str or int) – The ending value, or
None
for no upper bound. Default:None
. - inclusive (bool) – If
True
, thestart
andstop
values will be excluded from the search. IfFalse
, thestart
andstop
values will not be excluded from the search. Default:True
. - new_group (bool) – If
True
, will separate the term into a new parenthetical group. IfFalse
, will not. Default:False
.
Returns: Self
Return type: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
exclusive_match
(field, value)[source]¶ Match exactly the given value(s), with no other data in the field.
Parameters: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - value (str or list of str) – The value(s) to match exactly.
Returns: Self
Return type: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
initialized
¶ Whether any valid term has been added to the query.
-
match_exists
(field, required=True, new_group=False)[source]¶ Require a field to exist in the results. Matches will have some value in
field
.Parameters: - field (str) – The field to check.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - required (bool) – If
True
, will add term withAND
. IfFalse
, will useOR
. Default:True
. - new_group (bool) – If
True
, will separate the term into a new parenthetical group. IfFalse
, will not. Default:False
.
Returns: Self
Return type: - field (str) – The field to check.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
match_field
(field, value, required=True, new_group=False)[source]¶ Add a
field:value
term to the query. Matches will have thevalue
in thefield
.Parameters: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - value (str) – The value to match.
- required (bool) – If
True
, will add term withAND
. IfFalse
, will useOR
. Default:True
. - new_group (bool) – If
True
, will separate the term into a new parenthetical group. IfFalse
, will not. Default:False
.
Returns: Self
Return type: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
match_not_exists
(field, new_group=False)[source]¶ Require a field to not exist in the results. Matches will not have
field
present.Parameters: - field (str) – The field to check.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - new_group (bool) – If
True
, will separate the term into a new parenthetical group. IfFalse
, will not. Default:False
.
Returns: Self
Return type: - field (str) – The field to check.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
match_range
(field, start=None, stop=None, inclusive=True, required=True, new_group=False)[source]¶ Add a
field:[some range]
term to the query. Matches will have avalue
in the range in thefield
.Parameters: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
"mdf.source_name"
is thesource_name
field of themdf
dictionary. - start (str or int) – The starting value, or
None
for no lower bound. Default:None
. - stop (str or int) – The ending value, or
None
for no upper bound. Default:None
. - inclusive (bool) – If
True
, thestart
andstop
values will be included in the search. IfFalse
, the start and stop values will not be included in the search. Default:True
. - required (bool) – If
True
, will add term withAND
. IfFalse
, will useOR
. Default:True
. - new_group (bool) – If
True
, will separate the term into a new parenthetical group. IfFalse
, will not. Default:False
.
Returns: Self
Return type: - field (str) – The field to check for the value.
The field must be namespaced according to Elasticsearch rules
using the dot syntax.
For example,
-
match_term
(value, required=True, new_group=False)[source]¶ Add a fulltext search term to the query.
Warning
Do not use this method with any other query-building helpers. This method is only for building fulltext queries (in non-advanced mode). Using other helpers, such as
match_field()
, will cause the query to run in advanced mode. If a fulltext term query is run in advanced mode, it will have unexpected results.Parameters: - value (str) – The term to match.
- required (bool) – If
True
, will add term withAND
. IfFalse
, will useOR
. Default:True
. - new_group (bool) – If
True
, will separate the term into a new parenthetical group. IfFalse
, will not. Default:False
.
Returns: Self
Return type:
-
reset_query
()[source]¶ Destroy the current query and create a fresh one. This method should not be chained.
Returns: None
-
search
(q=None, advanced=False, limit=None, info=False, reset_query=True)[source]¶ Execute a search and return the results, up to the
SEARCH_LIMIT
.Parameters: - q (str) – The query to execute. Default: The current helper-formed query, if any. There must be some query to execute.
- advanced (bool) – Whether to treat
q
as a basic or advanced query. Has no effect if a query is not supplied inq
. Default:False
- limit (int) – The maximum number of results to return.
The max for this argument is the
SEARCH_LIMIT
imposed by Globus Search. Default:SEARCH_LIMIT
for advanced queries, 10 for basic queries. - info (bool) – If
False
, search will return a list of the results. IfTrue
, search will return a tuple containing the results list and other information about the query. Default:False
. - reset_query (bool) – If
True
, will destroy the current query after execution and start a fresh one. IfFalse
, will keep the current query set. Has no effect if a query is supplied inq
. Default:True
.
Returns: The search results. If
info
isTrue
, tuple: The search results, and a dictionary of query information.Return type: If
info
isFalse
, listNote
If a query is specified in
q
, the current, helper-built query (if any) will not be used in the search or modified.
-
show_fields
(block=None)[source]¶ Retrieve and return the mapping for the given metadata block.
Parameters: - block (str) – The top-level field to fetch the mapping for (for example,
"mdf"
), or the special valuesNone
for everything or"top"
for just the top-level fields. Default:None
. - index (str) – The Search index to map. Default: The current index.
Returns: field:datatype
pairs.Return type: dict
- block (str) – The top-level field to fetch the mapping for (for example,
-
Subclass Helpers¶
-
class
mdf_toolbox.
AggregateHelper
(*args, **kwargs)[source]¶ Subclass to add the
aggregate()
functionality to the SearchHelper.aggregate()
is currently the only way to retrieve more than 10,000 entries from Globus Search, and requires ascroll_field
index field.-
__init__
(*args, **kwargs)[source]¶ Add the AggregateHelper to a SearchHelper.
Parameters: scroll_field (str) – The field on which to scroll. This should be a field that counts/indexes the entries.
-
aggregate
(q=None, scroll_size=10000, reset_query=True, **kwargs)[source]¶ Perform an advanced query, and return all matching results. Will automatically perform multiple queries in order to retrieve all results.
Note
All
aggregate
queries run in advanced mode, andinfo
is not available.Parameters: - q (str) – The query to execute. Default: The current helper-formed query, if any. There must be some query to execute.
- scroll_size (int) – Maximum number of records returned per query. Must be
between one and the
SEARCH_LIMIT
(inclusive). Default:SEARCH_LIMIT
. - reset_query (bool) – If
True
, will destroy the current query after execution and start a fresh one. IfFalse
, will keep the current query set. Default:True
.
Keyword Arguments: scroll_field (str) – The field on which to scroll. This should be a field that counts/indexes the entries. This should be set in
self.scroll_field
, but if your application requires separate scroll fields for a single client, it can be set in this way as well. Default:self.scroll_field
.Returns: All matching records.
Return type: list of dict
-