SnowPro Specialty Gen AI GES-C01 Dumps Questions

Download Latest GES-C01 Dumps Questions 2026 for Preparation ■ ■ Enjoy 20% OFF on All Exams – Use Code: 2025 Boost Your Success with Updated & Verified Exam Dumps from CertSpots.com https://www.certspots.com/exam/ges - c01/ © 2026 CertSpots.com – All Rights Reserved 1 / 29 Exam : GES-C01 Title : Version : V8.02 SnowPro ® Specialty: Gen AI Certification Exam 2 / 29 1.A data application developer is tasked with building a multi-turn conversational AI application using Streamlit in Snowflake (SiS) that leverages the COMPLETE (SNOWFLAKE. CORTEX) LLM function. To ensure the conversation flows naturally and the LLM maintains context from previous interactions, which of the following is the most appropriate method for handling and passing the conversation history? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C 2.A Streamlit application developer wants to use AI_COMPLETE (the latest version of COMPLETE (SNOWFLAKE. CORTEX)) to process customer feedback. The goal is to extract structured information, such as the customer's sentiment, product mentioned, and any specific issues, into a predictable JSON format for immediate database ingestion. Which configuration of the AI_COMPLETE function call is essential for achieving this structured output requirement? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: 'AI_COMPLETE Structured Outputs' (and its predecessor ‘ COMPLETE Structured Outputs ’ ) specifically allows supplying a JSON schema as the 'response_format ’ argument to ensure completion responses follow a predefined structure. This significantly reduces the need for post-processing in AI data pipelines and enables seamless integration with systems requiring deterministic responses. The JSON schema object defines the structure, data types, and constraints, including required fields. While prompting the model to 'Respond in JSON' can improve accuracy for complex tasks, the ‘ response_format ’ argument is the direct mechanism for enforcing the schema. Setting ‘ temperature ’ to 0 provides more consistent 3 / 29 results for structured output tasks. Option A is a form of prompt engineering, which can help but does not guarantee strict adherence as response_format ’ does. Option B controls randomness and length, not output structure. Option D is less efficient for extracting multiple related fields compared to a single structured output call. Option E's ‘ guardrails' are for filtering unsafe or harmful content, not for enforcing output format. 3.A Snowflake developer, AI_ENGINEER, is creating a Streamlit in Snowflake (SiS) application that will utilize a range of Snowflake Cortex LLM functions, including SNOwFLAKE.CORTEX.COMPLETE,SNOwFLAKE.CORTEX.CLASSIFY_TEXT, and SNOwFLAKE.CORTEX.EMBED_TEXT_768. The application also needs to access data from tables within a specific database and schema. AI_ENGINEER has created a custom role, app_dev_role, for the application to operate under. Which of the following privileges or roles are absolutely necessary to grant to app_dev_role for the successful execution of these Cortex LLM functions and interaction with the specified database objects? (Select all that apply.) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A,C Explanation: To execute Snowflake cortex AI functions such as 'SNOWFLAKE.CORTEX.COMPLETE, ‘ SNOWFLAKE.CORTEX.CLASSIFY_TEXT, and ‘ EMBED_TEXT_768' (or their SAE prefixed counterparts), the role used by the application in this case) must be granted the 'SNOWFLAKE.CORTEX_USER database role. Additionally, for the Streamlit application to access any database or schema objects (like tables for data input/output, or for the Streamlit app itself if it is stored as a database object), the USAGE privilege must be granted on those specific database and schema objects. Option B, 'CREATE SNOWFLAKE.ML.DOCUMENT_INTELLIGENCE, is a privilege specific to creating Document AI model builds and is not required for general Cortex LLM functions. Option D, ‘ ACCOUNTADMIN ’ , grants excessive privileges and is not a best practice for application roles. Option E, 'CREATE COMPUTE POOL', is a privilege related to Snowpark Container Services for creating compute pools, which is not directly required for running a Streamlit in Snowflake application that consumes Cortex LLM functions. 4.A data application developer is tasked with building a multi-turn conversational AI application using Streamlit in Snowflake (SiS) that leverages the COMPLETE (SNOWFLAKE. CORTEX) LLM function. To 4 / 29 ensure the conversation flows naturally and the LLM maintains context from previous interactions, which of the following is the most appropriate method for handling and passing the conversation history? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: To provide a stateful, conversational experience with the 'COMPLETE (SNOWFLAKE.CORTEX)' function (or its latest version, 'AI_COMPLETE'), all previous user prompts and model responses must be explicitly passed as part of the argument. This argument expects an array of objects, where each object represents a turn and contains a 'role' ('system', 'user', or 'assistant') and a ‘ content key, presented in chronological order. In Streamlit, ‘ st.session_state ’ is the standard and recommended mechanism for storing and managing data across reruns of the application, making it ideal for maintaining chat history, by initializing 'st.session_state.messages = [l' and appending messages to it. Option A is incorrect because 'COMPLETE does not inherently manage history from external tables. Option B is incorrect as ‘ COMPLETE does not retain state between calls; history must be explicitly managed. Option D is a less effective form of prompt engineering compared to passing structured history, as it loses the semantic role distinction and can be less accurate for LLMs. Option E describes a non- existent parameter for the 'COMPLETE function. 5.A Streamlit application developer wants to use AI_COMPLETE (the latest version of COMPLETE (SNOWFLAKE.CORTEX)) to process customer feedback. The goal is to extract structured information, such as the customer's sentiment, product mentioned, and any specific issues, into a predictable JSON format for immediate database ingestion. Which configuration of the AI_COMPLETE function call is essential for achieving this structured output requirement? A. Option A B. Option B 5 / 29 C. Option C D. Option D E. Option E Answer: C Explanation: 'AI_COMPLETE Structured OutputS (and its predecessor ‘ COMPLETE Structured OutputS) specifically allows supplying a JSON schema as the ‘ response_format' argument to ensure completion responses follow a predefined structure. This significantly reduces the need for post-processing in AI data pipelines and enables seamless integration with systems requiring deterministic responses. The JSON schema object defines the structure, data types, and constraints, including required fields. For complex tasks, prompting the model to respond in JSON can improve accuracy, but the ‘ response_format ’ argument is the direct mechanism for enforcing the schema. Setting ‘ temperature to 0 provides more consistent results for structured output tasks. Option A is a form of prompt engineering, which can help but does not guarantee strict adherence as ‘ response_format does. Option B controls randomness and length, not output structure. Option D, while 'AI_EXTRACT (or EXTRACT ANSWER) can extract information, using it multiple times and then manually combining results is less efficient and less robust than a single 'AI_COMPLETE call with a structured output schema for multiple related fields. Option E's ‘ guardrails' are for filtering unsafe or harmful content, not for enforcing output format. 6. A. B. C. The USAGE privilege on the specific database and schema where the Streamlit application and its underlying data tables are located. D. The ACCOUNTADMIN role to ensure unrestricted access to all Snowflake Cortex features. E. The CREATE COMPUTE POOL privilege to provision resources for the Streamlit application. Answer: A,C Explanation: To execute Snowflake Cortex AI functions such as 'SNOWFLAKE.CORTEX.COMPLETE, 'CLASSIFY TEXT (SNOWFLAKE.CORTEX)', and (or their prefixed counterparts like 'AI_COMPLETE', 'AI_CLASSIFY, 'AI_EMBED), the role used by the application in this case) must be granted the database role. This role includes the privileges to call these functions. Additionally, for the Streamlit application to access any database or schema objects (like tables for data input/output, or for the Streamlit app itself if it is stored as a database object), the 'USAGE ’ privilege must be granted on those specific database and schema objects. Option B, 'CREATE SNOWFLAKE.ML.DOCUMENT_INTELLIGENCE, is a privilege specific to creating 6 / 29 DocumentAl model builds and is not required for general Cortex LLM functions. Option D, 'ACCOUNTADMIN", grants excessive privileges and is not a best practice for application roles. Option E, 'CREATE COMPUTE POOL', is a privilege related to Snowpark Container Services for creating compute pools, which is generally not directly required for running a Streamlit in Snowflake application that consumes Cortex LLM functions via SQL, unless the LLMs themselves were deployed as services on compute pools using Model Serving in Snowpark Container Services, which is not explicitly stated as the method of LLM usage here. 7.A data engineer is building a Snowflake data pipeline to ingest customer reviews from a raw staging table into a processed table. For each review, they need to determine the overall sentiment (positive, neutral, negative) and store this as a distinct column. The pipeline is implemented using SQL with streams and tasks to process new data. Which Snowflake Cortex LLM function, when integrated into the SQL task, is best suited for this sentiment classification and ensures a structured, single-label output for each review? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B Explanation: To classify text into predefined categories, the function (or its updated version, is purpose-built and directly returns the classification label. This approach is more direct and efficient than using 'SENTIMENT()' which returns a score, which extracts an answer to a question, or multiple calls which return Boolean values. While could be prompted for classification, is a more specific task-specific function designed for this exact use case within Cortex LLM functions. 8.A financial services company is developing an automated data pipeline in Snowflake to process Federal Reserve Meeting Minutes, which are initially loaded as PDF documents. The pipeline needs to extract specific entities like the FED's stance on interest rates ('hawkish', 'dovish', or 'neutral') and the reasoning behind it, storing these as structured JSON objects within a Snowflake table. The goal is to ensure the output is always a valid JSON object with predefined keys. Which AI_COMPLETE configuration, used within an in-line SQL statement in a task, is most effective for achieving this structured extraction directly in the pipeline? 7 / 29 A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: To ensure that LLM responses adhere to a predefined JSON structure, the 'AI_COMPLETE function's ‘ response_format ’ argument, which accepts a JSON schema, is the most effective and direct method. This mechanism enforces the structure, data types, and required fields, significantly reducing the need for post-processing and ensuring deterministic, high-quality output. The AI-Infused Data Pipelines with Snowflake Cortex blog highlights asking the LLM to create a JSON object for maximizing utility. While setting ‘ temperature' to 0 can improve consistency, it does not enforce a specific schema. Prompt engineering (Option A) can help but does not guarantee strict adherence. Using multiple extraction calls (Option D) is less efficient and robust for extracting multiple related fields than a single 'AI_COMPLETE call with a structured output schema. Snowflake Cortex does not automatically infer and enforce a JSON schema without explicit configuration (Option E). 9.A data engineering team is building a pipeline in Snowflake that uses a SQL task to call various Snowflake Cortex LLM functions (e.g., AI_COMPLETE, AI EMBED) on large datasets of customer interaction logs. The team observes fluctuating costs and occasional query failures, which sometimes halt the pipeline. To address these issues and ensure an efficient, robust, and monitorable pipeline, which of the following actions or considerations are essential? (Select all that apply.) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A,B,E 8 / 29 Explanation: A. "Correct." The 'TRY function is designed to perform the same operation as but returns ‘ NULL ‘ instead of raising an error when the LLM operation cannot be performed. This is critical for building robust data pipelines, as it prevents pipeline halts due to transient or specific LLM failures, allowing for more resilient data processing. B. ‘ The view provides detailed information on token consumption and credit usage for Snowflake Cortex LLM functions. Monitoring this view is essential for understanding cost drivers and optimizing expenditure within AI pipelines. C. "Incorrect." Snowflake recommends executing queries that call Cortex AISQL functions with a smaller warehouse (no larger than MEDIUM), as larger warehouses do not necessarily increase performance but can lead to unnecessary costs. The LLM inference itself runs on Snowflake-managed compute, not solely on the user's virtual warehouse compute size. D. ‘ Setting the 'temperature' parameter to 0 makes the LLM's output more deterministic and focused. While this can be beneficial for consistency in certain tasks, it does not directly minimize token usage. Token usage is primarily determined by the length of the input prompt and the length of the generated output, which can vary regardless of ‘ temperature'. E. "Correct." Encapsulating complex and potentially lengthy prompt logic within a UDF CUSER DEFINED FUNCTION') makes the prompts more manageable, reusable, and easier to integrate programmatically into SQL statements within a data pipeline. This improves code organization and maintainability. 10.A data engineering team is setting up an automated pipeline in Snowflake to process call center transcripts. These transcripts, once loaded into a raw table, need to be enriched by extracting specific entities like the customer's name, the primary issue reported, and the proposed resolution. The extracted data must be stored in a structured JSON format in a processed table. The pipeline leverages a SQL task that processes new records from a stream. Which of the following SQL snippets and approaches, utilizing Snowflake Cortex LLM functions, would most effectively extract this information and guarantee a structured JSON output for each transcript? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: To guarantee a structured JSON output for entity extraction, (the updated version of 'COMPLETE()') with the response_format ’ argument and a specified JSON schema is the most effective approach. This mechanism enforces that the LLM's output strictly conforms to the predefined structure, including data types and required fields, significantly reducing the need for post-processing and improving data quality within the pipeline. 9 / 29 Option A requires multiple calls and manual JSON assembly, which is less efficient. Option B relies on the LLM's 'natural ability' to generate JSON, which might not be consistently structured without explicit ‘ response_format ’ Option D uses, which is for generating summaries, not structured entity extraction. Option E involves external LLM API calls and Python UDFs, which, while possible, is less direct than using native 'AI_COMPLETE structured outputs within a SQL pipeline in Snowflake Cortex for this specific goal. 11.A data team has implemented a Snowflake data pipeline using SQL tasks that process customer call transcripts daily. This pipeline relies heavily on SNOWFLAKE. CORTEX. COMPLETE() (or its updated alias) for various text analysis tasks, such as sentiment analysis and summary generation. Over time, they observe that the pipeline occasionally fails due to LLM-related errors, and the compute costs are higher than anticipated. What actions should the team take to improve the robustness and cost-efficiency of this data pipeline? (Select all that apply.) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A,C,D Explanation: A. ‘ 'Correct." performs the same operation as 'COMPLETE(Y but returns ‘ NULC on failure instead of raising an error when the LLM operation cannot be performed. This is critical for building robust data pipelines, as it prevents pipeline halts due to transient or specific LLM failures. B. ' ‘ Incorrect." Snowflake recommends executing queries that call Cortex AISQL functions with a smaller warehouse (no larger than MEDIUM). Larger warehouses do not necessarily increase performance but can lead to unnecessary costs, as the LLM inference itself runs on Snowflake-managed compute, not solely on the user's virtual warehouse compute size. C. "Correct." The 'SNOWFLAKE.ACCOUNT USAGE.CORTEX_FUNCTIONS USAGE HISTORY view provides detailed information on token consumption and credit usage for Snowflake Cortex LLM functions. Monitoring this view is essential for understanding cost drivers and optimizing expenditure within AI pipelines. D. ‘ ‘ Correct." Snowflake Cortex AI functions incur compute costs based on the number of tokens processed (both input and output). Optimizing prompt engineering to be concise and effective directly contributes to reducing the number of tokens consumed and, therefore, the associated costs. E. "Incorrect." Setting the ‘ temperature' parameter to 1.0 makes the LLM's output more diverse and random. While useful for creativity, it does not guarantee a reduction in token usage or a lower error rate. For the most consistent results, setting ‘ temperature' to 0 is generally recommended. 10 / 29 12.A financial institution wants to develop a Snowflake-based pipeline to process call transcripts from their customer support. The pipeline needs to perform two main tasks: first, ‘‘ summarize very lengthy technical support calls ’’ (up to 20,000 tokens per transcript) into concise actionable insights, and second, ‘‘ classify the sentiment ’’ of these calls as 'positive', 'neutral', or 'negative'. Given these requirements for integration into SQL data pipelines, which combination of Snowflake Cortex functions and prompt engineering considerations would be most appropriate? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B Explanation: For summarizing very lengthy technical support calls (up to 20,000 tokens), a model with a sufficiently large context window is essential. (the updated version of offers flexibility for detailed summarization with prompt engineering. A model like ‘ mistral-large? has a context window of 128,000 tokens, making it suitable for such long inputs. Encapsulating complex prompt logic within a SQL User Defined Function (UDF) is a recommended practice for better management and reusability in data pipelines. For classifying sentiment into predefined categories ('positive', 'neutral', 'negative'), (the updated version of is purpose-built and directly returns the classification label. A. is a generic summarization function, but 'AI_COMPLETE with a large model provides more control for 'actionable insights'. returns a numerical score, requiring additional logic for categorical output. C. 'SNOWFLAKE.CORTEX.EXTRACT ANSWER()' is designed to extract specific answers to questions, not to summarize text. Using it multiple times for summarization would be inefficient and less effective. While can perform classification, is the specialized function for this task. D. ‘ gemma-7b ’ has a context window of 8,000 tokens, which is insufficient for processing calls up to 20,000 tokens, potentially leading to truncation or incomplete results. E. and SUMMARIZE AGG()' are designed to aggregate insights or summaries ‘ across multiple rows ’ or groups of text, not to summarize a single, lengthy document. returns a boolean result, making it less suitable for multi-category classification directly. 13.A data engineering team is designing a Snowflake data pipeline to automatically enrich a 'customer issues ’ table with product names extracted from raw text-based 'issue_description' columns. They want to use a Snowflake Cortex function for this extraction and integrate it into a stream and task-based pipeline. Given the 'customer_issues' table with an 'issue_id' and (VARCHAR), which of the following SQL snippets correctly demonstrates the use of a Snowflake Cortex function for this data enrichment within a task, 11 / 29 assuming is a stream on the 'customer issues' table? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B Explanation: Option B correctly uses to pull specific information (product name) from unstructured text, which is a common data enrichment task. It also integrates with a stream ('issue_stream') by filtering for 'METADATA$ACTION = 'INSERT" and uses a 'MERGE statement, which is suitable for incremental updates in a data pipeline by inserting new extracted data based on new records in the stream. Option A uses for generating a response, not for specific entity extraction, and its prompt is less precise for this task than 'EXTRACT_ANSWER. Option C uses 'SNOWFLAKE.CORTEX.CLASSIFY_TEXT for classification, not direct entity extraction of a product name, and attempts to update the source table directly, which is not ideal for adding new columns based on stream data. Option D proposes a stored procedure and task, which is a valid pipeline structure. However, the EXTRACT ANSWER call within the procedure only returns a result set and does not demonstrate the final insertion or merging step required to persist the extracted data into an ‘ enriched_issues ’ table. Option E uses to generate vector embeddings, which is a form of data enrichment, but the scenario specifically asks for 'product names' (a string value), not embeddings for similarity search. 14.A retail company wants to implement an automated data pipeline in Snowflake to analyze daily customer reviews. The goal is to enrich a 'product_reviews_sentiment' table with sentiment categories (e.g., 'positive', 'neutral', 'negative') for each new review. They require the sentiment to be returned as a JSON object for downstream processing and need the pipeline to handle potential LLM errors gracefully without stopping. Assuming a stream 'new reviews_stream' monitors a 'customer _ reviews ’ table, which approach effectively uses a Snowflake Cortex function for this scenario? 12 / 29 A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: Option C is the most effective approach for this scenario. It correctly uses "SNOWFLAKE.CORTEX.TRY COMPLETE", which performs the same operation as 'COMPLETE but returns NULL instead of raising an error when the operation cannot be performed, making the pipeline more robust to LLM issues. The ‘ response_format option ensures the output adheres to a specified JSON schema for structured sentiment categories, meeting the requirement for structured output. This is integrated within a 'MERGE statement in a task for incremental processing of new data from Option A suggests a Python UDF with 'COMPLETE. While feasible, 'TRY_COMPLETE is explicitly designed for graceful error handling in pipelines, which 'COMPLETE lacks by default. Option B uses ‘ SNOWFLAKE.CORTEX.SENTIMENT, which returns a numeric score (e.g., 0.5424458), not a categorical JSON object, requiring additional post-processing logic for categorization. Option D uses for summarization and ‘ AI CLASSIFY for classification. While 'AI_CLASSIFY can categorize, the request is for sentiment of ‘ each ’ review, and 'AI_AGG' would aggregate before classifying, not fulfilling the individual review sentiment requirement. Option E suggests a dynamic table, but dynamic tables currently do not support incremental refresh with 'COMPLETE (or 'AI_COMPLETE) functions, making them unsuitable for continuous LLM-based processing in this manner. Furthermore, 'COMPLETE does not offer the graceful error handling of 'TRY COMPLETE'. 15.A data architect is integrating Snowflake Cortex LLM functions into various data enrichment pipelines. To ensure optimal performance, cost-efficiency, and accuracy, which of the following are valid best practices or considerations for these pipelines? A. When extracting specific entities from documents using SAI EXTRACT or '!PREDICT, it is often more effective to fine-tune a Document AI model for complex or varied document layouts rather than relying solely on extensive prompt engineering for zero-shot extraction. B. For tasks requiring deterministic JSON outputs, explicitly specifying a JSON schema using the ‘ response_format' argument with 'AI COMPLETE is crucial, and for OpenAI (GPT) models, including the ‘ required' field and setting ‘ additionalPropertieS to 'false' in every node of the schema is a mandatory requirement. C. To manage costs effectively for LLM functions like SAI COMPLETE in a pipeline, always use the 13 / 29 largest available warehouse size (e.g., 6XL Snowpark- optimized) to maximize throughput, as this directly reduces the overall token processing time and cost. D. When performing sentiment analysis on customer feedback using 'AI_SENTIMENT, it's best practice to pass detailed, multi-turn conversation history to the function to enhance accuracy, similar to how 'AI_COMPLETE handles conversational context. E. For data enrichment involving classification with 'AI_CLASSIFY', using descriptive and mutually exclusive categories in plain English, along with an optional clear task description, can significantly improve classification accuracy. Answer: A,B,E Explanation: Option A is correct. For extracting information from documents with complex or varied layouts, fine-tuning a Document AI model can significantly improve results compared to relying solely on zero-shot extraction and extensive prompt engineering. Document AI provides both zero-shot extraction and fine-tuning capabilities, with fine-tuning recommended to improve results on specific document types. Option B is correct. To ensure 'AI_COMPLETE (or 'COMPLETE) returns responses in a structured JSON format, it is essential to specify a JSON schema using the ‘ response_format ’ argument. For OpenAl (GPT) models, specific requirements include setting ‘ additionalPropertieS to 'false' in every node and ensuring the ‘ required' field lists all property names. Option C is incorrect. Snowflake explicitly recommends executing queries that call Cortex AISQL functions (such as 'AI COMPLETES) using a smaller warehouse, no larger than MEDIUM. Using larger warehouses does not increase performance for these functions but will incur unnecessary compute costs. The LLM inference itself is managed by Snowflake, and its performance isn't directly scaled by warehouse size in the same way as traditional SQL queries. Option D is incorrect. 'AI_SENTIMENT (and 'SENTIMENT) is a task-specific function designed to return a sentiment score for a given English-language text. Unlike 'AI_COMPLETE (or 'COMPLETE'), which supports multi-turn conversations by passing conversation history for a stateful experience, SAI SENTIMENT processes individual text inputs and is not designed to leverage multi-turn context in the same way for sentiment analysis. Option E is correct. For classification tasks using 'AI_CLASSIFY (or 'CLASSIFY TEXT), best practices include using plain English for the input text and categories, ensuring categories are descriptive and mutually exclusive, and adding a clear ‘ task_description ’ when the relationship between input and categories is ambiguous. These guidelines significantly improve classification accuracy. 16.A Gen AI Specialist is tasked with implementing a data pipeline to automatically enrich new customer feedback entries with sentiment scores using Snowflake Cortex functions. The new feedback arrives in a staging table, and the enrichment process must be automated and cost-effective. Given the following pipeline components, which combination of steps is most appropriate for setting up this continuous data augmentation process? 14 / 29 A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: Option C is the most direct and efficient approach for continuously augmenting data with sentiment scores in a Snowflake pipeline. is a task-specific AI function designed for this purpose, returning an overall sentiment score for English-language text. SNOWF LAKE .CORTEX.SENTIMENT Integrating it directly into a task that monitors a stream allows for automated, incremental processing of new data as it arrives in the stage. The source explicitly mentions using Cortex functions in data pipelines via the SQL interface. Option A is plausible, but calling SENTIMENT directly in SQL within a task (Option C) is simpler and avoids the overhead of a Python UDF if the function is directly available in SQL, which it is. Option B, using a dynamic table, is not supported for Snowflake Cortex functions. Option D, while powerful for custom LLMs, is an over-engineered solution and introduces more complexity (SPCS setup, custom service) than necessary for a direct sentiment function. Option E describes a manual, non- continuous process, which contradicts the requirement for an automated pipeline. 17.A financial institution wants to automate the extraction of key entities (e.g., invoice number, total amount, list of invoice items) from incoming PDF financial statements into a structured JSON format within their Snowflake data pipeline. The extracted data must conform to a specified JSON schema for seamless downstream integration. Which Snowflake Cortex capabilities, when combined, can best achieve this data augmentation and ensure schema adherence in a continuous processing pipeline? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B,D Explanation: 15 / 29 Option C is incorrect. COMPLETE, performs the same operation as COMPLETE (or AI_COMPLETE) but returns instead of raising an error when the operation cannot be TRY COMPLETE NULL performed. It does not return a structured error object for detailed debugging, but rather handles the error by returning allowing a pipeline to NULL, continue. Option D is correct. For the most consistent results and to optimize JSON adherence accuracy, it is recommended to set the temperature option to 0 when calling COMPLETE (or AI_COMPLETE). Option E is incorrect. The number of tokens processed (and billed) increases with schema complexity. A larger and more complex supplied schema generally consumes more input and output tokens, leading to higher compute costs. 338.A data engineering team is building an automated pipeline within Snowflake to process newly ingested documents. This pipeline needs to classify each document's sentiment (positive, neutral, negative) and summarise its content using Cortex LLM functions, then store the results in a table. The pipeline is orchestrated using Streams and Tasks. Which considerations are paramount for implementing and monitoring this AI-infused data pipeline? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A,B,C Explanation: 16 / 29 339.A data application developer is using the Snowflake Cortex COMPLETE function to power a multi-turn conversational AI application. They want to ensure responses are creative but not excessively long, adhere to a specific JSON structure, and are filtered for safety. Given the following SQL query snippet, which statements accurately describe the impact of the specified options? A. Setting will make the model's output highly deterministic and focused on the most probable tokens, reducing creativity. B. The setting ensures that the generated response will not exceed 50 tokens, potentially leading to truncated but concise answers. C. The option, powered by Cortex Guard, will filter out potentially unsafe or harmful content from the LLM's response, preventing it from being returned to the application. D. Including a with a JSON schema will enforce the LLM to return a response that strictly conforms to the defined structure, and this functionality works for all models supported by AI_COMPLETE. E. For a multi-turn conversation, previous user prompts and model responses should be passed in the array to maintain state, but this will not impact the cost per round. Answer: B,C,D Explanation: Option A is incorrect because a higher temperature, such as 0.8, controls the randomness of the output by influencing which possible token is chosen at each step, resulting in more diverse and random output, not deterministic and focused. Option B is correct because the ‘ max_tokens ’ option sets the maximum number of output tokens in the response, and small values can result in truncated responses. Option C is correct because the ‘ guardrails: TRUE option enables Cortex Guard to filter potentially unsafe and harmful responses from a language model. Option D is correct because AI_COMPLETE Structured Outputs allows you to supply a JSON schema that completion responses must follow, and every model supported by AI_COMPLETE supports structured output. Option E is incorrect because to provide a stateful conversational experience, all previous user prompts and model responses should be passed in the ‘ prompt_or_history ’ array, but the number of tokens 17 / 29 processed increases for each round, and costs increase proportionally. The 'COMPLETE ’ function is the older version of 'AI_COMPLETE ’ 340.A data analyst is working with a table containing customer feedback text and needs to perform various text analysis tasks efficiently within Snowflake. They want to summarize the reviews, determine their sentiment, and extract specific pieces of information. Which of the following Snowflake Cortex LLM functions, when applied to a text column, will achieve the desired outcome and return the specified output type? A. To get a concise overview of each review, the analyst should use which returns a string containing a summary of the original text. B. To determine the overall sentiment of each review, the analyst should use which returns an INTEGER from -1 to 1. C. To extract a specific answer to a question from each review, the analyst can use , which returns a JSON object with the extracted answer. D. To categorize reviews into predefined labels, the analyst should use , which returns an OBJECT value with a 'label' field specifying the category. E. The function can be used to aggregate a text column and return insights across multiple rows, but it is subject to the model's context window limitations, requiring careful chunking of input. Answer: A,D Explanation: Option A is correct because the 'SUMMARIZE ’ function takes an English-language input text and returns a string containing a summary of the original text. Option B is incorrect because the 'SENTIMENT function returns a floating-point number from -1 to 1 (inclusive) indicating the level of negative or positive sentiment, not an INTEGER. Option C is incorrect because the 'EXTRACT_ANSWER function returns a string containing an answer to the given question, not a JSON object. Option D is correct because the 'CLASSIFY _ TEXT function classifies free-form text into categories and returns an OBJECT value (VARIANT) with a 'label' field specifying the category. 'AI_CLASSIFY ’ is the latest version of this function. Option E is incorrect because 'AI_AGG' aggregates a text column and returns insights across multiple rows based on a user-defined prompt, and importantly, it is not subject to context window limitations. 341.A machine learning engineer needs to fine-tune the 'mistral-7b' LLM using Snowflake Cortex for a specialized task. They have prepared training data in a Snowflake table. 18 / 29 Which of the following statements correctly describe the process, requirements, and cost considerations for initiating this fine-tuning job? A. To create the fine-tuning job, the engineer should use the SQL command: B. The training data query result must explicitly include columns named prompt and completion; otherwise, the fine-tuning job will fail. C. The fine-tuned model will automatically appear in the Snowflake Model Registry and is available for sharing with other accounts using Data Sharing, even if it contains user code. D. The cost for fine-tuning is incurred based on the number of tokens used in training, calculated as: E. The ‘ max_epochs ’ option in the ‘ options' JSON object can be set to any positive integer to control the training duration, but it is capped at a maximum of 50 epochs. Answer: A,B,D Explanation: Option A is correct because this SQL syntax accurately demonstrates how to create a fine-tuning job using the SNOWFLAKE.CORTEX.FINETUNE function, specifying 'CREATE', a name for the tuned model, the base model, and a SQL query for the training data with aliased columns for 'prompt' and 'completion'. Option B is correct because the training data query result must include columns named ‘ prompt ’ and ‘ completion' for the fine-tuning job to proceed successfully. Option C is incorrect. While Cortex Fine-Tuned LLMs appear in the Model Registry's Snowsight UI, they are explicitly noted as not being managed by the model registry API. Specifically, models generated with Cortex Fine-tuning ('CORTEX_FINETUNED) do not contain user code and can be shared using Data Sharing, but 'USER MODEL' types (models containing user code) cannot currently be shared. The statement implies that models *with* user code are shareable, which is not currently the case for all model types in the registry. Option D is correct as the compute cost for the Snowflake Cortex Fine-tuning function is based on the number of tokens used in training, which is calculated as ‘ number of input tokens ‘ number of epochs trained'. Option E is incorrect because the ‘ max_epochs ’ option in the ‘ options' object has a value range from 1 to 10 (inclusive), not up to 50. 342.A data scientist is tasked with improving the accuracy of an LLM-powered chatbot that answers user questions based on internal company documents stored in Snowflake. They decide to implement a Retrieval Augmented Generation (RAG) architecture using Snowflake Cortex Search. Which of the following statements correctly describe the features and considerations when leveraging Snowflake Cortex Search for this RAG application? A. Cortex Search automatically handles text chunking and embedding generation for the source data, eliminating the need for manual ETL processes for these steps. B. To create a Cortex Search Service, one must explicitly specify an embedding model and manually manage its underlying infrastructure, similar to deploying a custom model via Snowpark Container Services. 19 / 29 C. For optimal search results with Cortex Search, source text should be pre-split into chunks of no more than 512 tokens, even when using models with larger context windows like D. The SNOWFLAKE. CORTEX.SEARCH_PREVIEW function can be used to test the search service with a query and optional filters before integrating it into a full application, for example: E. Enabling change tracking on the source table for the Cortex Search Service is optional; the service will still refresh automatically even if change tracking is disabled. Answer: A,C,D Explanation: Option A is correct because Cortex Search is a fully managed service that gets users started