Thursday, November 21, 2024

Understanding openAI

 Lets start our journey of understanding openai with some basic concepts around Machine learning.

Machine learning is subset of AI. ML is all about implement algo to make system learn. Deep learning is kind of ML, based on brain algos called Artificial neural network. GPT-4/chatGPT based on deep learning algorithm called as "Transformers".


AI - Any technique that allows a computer to mimic human behavior

ML - Ability to learn without explicit programming

Deep learning - Ability to extract patterns from data using Artificial neural networks

Transformers - is an algorithm, GPT and chatGPT are based on this

Transformer arch is based on "pays attention" model. It has 2 parts - Cross Attention and Self-Attention.


Cross Attention - > find relevance of different parts of input text to find next prediction. like I Love sunny weather -> Sunny and weather together are considered to find next word. An attention mechanism in Transformer architecture that mixes two different embedding sequences.

Self attention -> Give weightage to different words in the sentence and then figure out the meaning.The goal is to learn the dependencies between the words in the sentence and use that information to capture the internal structure of the sentence

Encoder -> process the input text, identify the valuable features and generate meaningful representations of that text (called embedding). encoders are designed to learn embeddings that can be used for various predictive modeling tasks such as classification.

Decoder -> use the embedding to produce output. To train our decoder model, we use a technique called “Teacher Forcing” in which we feed the true output/token (and not the predicted output/token) from the previous time-step as input to the current time-step.

Generative pre-trained transformers (GPT) are designed to use only decoder, and relies only on Self-Attention mechanism within decoder. There is no encoder in GPT, so no cross attention. 

GPT models, however, do not use an encoder. Instead, they are with a decoder-only architecture. This means that the input data is fed directly into the decoder without being transformed into a higher, more abstract representation by an encoder.

There is token mechanism to split the text into multiple chunks. In general 75 words is 100 tokens. These tokens are used to create context and that context is used to figure out or predict the next word. GPT-4 has context window of 8192 token and 32768 tokens. 

Based on context the new word is predicted with probability and then added into the text. The newly formed text becomes the new input to the algo again and keep repeating. 


Models - 

Models are different strategies by which the prompt will be processed. 

Instruct GPT - text-ada-001, text-babbage-001, text-curie-001 text-davinci-003 are different instruct models . Generally used for single turn task.

ChatGpt - gpt-3.5-turbo, gpt-3.5-turbo-16K. Used for chat.

GPT4 - gpt-1, gpt-4-32k.  Used for both chat and single-turn.


both chatGPT and GPT4 uses the same endpoint - openai.chatcompletion


while making the api call - 

OPENAI_API_KEY -> is api key, which is associated with your account. It is used to give you access and then charge you for the same.

Access the endpoint - 

response = openai.ChatCompletion.create(

    model="gpt-3.5-turbo",

    messages=[{"role": "user", "content": "Hello World!"}],

)

# Extract the response

print(response["choices"][0]["message"]["content"])


{

    "choices": [

        {

            "finish_reason": "stop", ---> Status of the response. stop means it got completed successfully.

            "index": 0,

            "message": {

                "content": "Hello there! How may I assist you today?", --> text generated by model. Instead of content node, there could be function_call also possible.

                "role": "assistant", --> role will always be assistant.

            },

        }

    ],

    "created": 1681134595,

    "id": "chatcmpl-73mC3tbOlMNHGci3gyy9nAxIP2vsU",   ---> technical identifier used internally by openAI

    "model": "gpt-3.5-turbo", --> model used

    "object": "chat.completion",  ---> always chat.completion

    "usage": {"completion_tokens": 10, "prompt_tokens": 11, "total_tokens": 21}, ---> token used in the call, gives yoiu idea of the cost

}


In the API 2 parameters are mandatory 

model

message  - It has role (system, user, assistant) and content (actual message)


How to use the function instead of content?

You can create a function call using openai apis, but openAI model doesnt call the function itself. It will give you the argumenst you can use to call the function.

function object has - 

name - name of the function

description - what is function is all about

parameters - parameters passed in the function


# Example function  -----> this is the function you want to call once openai respond.

def find_product(sql_query):

    # Execute query here

    results = [

        {"name": "pen", "color": "blue", "price": 1.99},

        {"name": "pen", "color": "red", "price": 1.78},

    ]

    return results

# Function definition  ----> this is the definition passed in the openai call

functions = [

    {

        "name": "find_product",

        "description": "Get a list of products from a sql query",

        "parameters": {

            "type": "object",

            "properties": {

                "sql_query": {

                    "type": "string",

                    "description": "A SQL query",

                }

            },

            "required": ["sql_query"],

        },

    }

]


# Example question

user_question = "I need the top 2 products where the price is less than 2.00"

messages = [{"role": "user", "content": user_question}]

# Call the openai.ChatCompletion endpoint with the function definition

response = openai.ChatCompletion.create(

        model="gpt-3.5-turbo-0613", messages=messages, functions=functions  -----------> you pass the function defintion here.

)

response_message = response["choices"][0]["message"]

messages.append(response_message)

the response has function_call, instead of content, which will look like this  - 

"function_call": {

        "name": "find_product",

        "arguments": '{\n  "sql_query": "SELECT * FROM products \

    WHERE price < 2.00 ORDER BY price ASC LIMIT 2"\n}',

    }


Now we can use this response and call the actual function as - 

# Call the function

function_args = json.loads(  ------> load the arguments

    response_message["function_call"]["arguments"]

)

products = find_product(function_args.get("sql_query"))   --> call the function. Please note the caller has to call the function them self, API can just return the arguments based on function definition passed.

# Append the function's response to the messages

messages.append(

    {

        "role": "function",

        "name": function_name,

        "content": json.dumps(products),       -------------> this is the output of our function call to DB.

    }

)

# Format the function's response into natural language

response = openai.ChatCompletion.create(

    model="gpt-3.5-turbo-0613",

    messages=messages,  ---> respond back to the user with the message. For this making another call to openai to get better user message

)


chat completion and text completion are 2 different things. Chat completion is more of the chat or conversation oriented response, but the text completion is more of completing the sentence, may not continue on conversation but just complete your sentence. 

In the response, insteado of getting content, we receive "text". In the request we send the prompt. 


Embedding - 

AI model is mathematical functions, and work with numeric inputs. embedding converts these words/tokens into numerical vectors. when you call the embedding api, you receive back vectors, which is array of floats.

result = openai.Embedding.create(

    model="text-embedding-ada-002", input="your text"

)

result['data']['embedding']  --> this is vector


embedding is like interpreter, that translates words and sentence into numbers. similar meaning words mapped closer together in numerical space. 


you can moderate the user input by calling the moderation api of openai. 

# Call the openai Moderation endpoint, with the text-moderation-latest model

response = openai.Moderation.create(

    model="text-moderation-latest",

    input="I want to kill my neighbor.",

)


{

    "id": "modr-7AftIJg7L5jqGIsbc7NutObH4j0Ig",

    "model": "text-moderation-004",

    "results": [

        {

            "categories": {

                "hate": false,

                "hate/threatening": false,

                "self-harm": false,

                "sexual": false,

                "sexual/minors": false,

                "violence": true,

                "violence/graphic": false,

            },

            "category_scores": {

                "hate": 0.0400671623647213,

                "hate/threatening": 3.671687863970874e-06,

                "self-harm": 1.3143378509994363e-06,

                "sexual": 5.508050548996835e-07,

                "sexual/minors": 1.1862029225540027e-07,

                "violence": 0.9461417198181152,

                "violence/graphic": 1.463699845771771e-06,

            },

            "flagged": true,

        }

    ],

}

=============================================

prompt engineering - 

Role 

Context

Task 



zero-shot-CoT strategy - 

CoT - chain of thoughts 

The term zero-shot means the model does not rely on task-specific examples to perform this reasoning; it is ready to handle new tasks based on its general training.

The few-shot learning technique gives examples of inputs with the desired outputs.

one-shot learning. As its name indicates, in this case you provide only one example to help the model execute the task.



==============


========================

Transcript - 

Slide 1- This presentation is to introduce you with the basic concepts of openai and generative ai. This covers few code examples to give you an idea of how the API is invoked. The content are created based on Oreilly book of "Deep dive into GPT-4 and chat GPT API".


Slide 2 -

Before we start on understanding openai and generative AI, lets get our terminology right. AI (Artificial intelligence) is a technique that allows computer to mimic human behaviour. It get powered by Machine learning, which is implementing the algorithm to make system learn. Within Machine learning, there is algorithm/sub-category which is called as "Deep Learning". One of the popular algorithm in Deep Learning is "Transformer". The transformer fundamental algo used to implement the GPT.


Slide 3 - Transformer works on the concept of "Pay Attention", which means focus on the different text element passed in the stream to figure out the context or meaning of the sentence. There are 2 categories in transformer to implement the pay attention-

1) Cross attention - where multiple stream or text elements are used to processed to figure out the meaning/context. 

2) Self attention - where single stream of words is used to figure out the meaning/context. In Self attention the different words in the text is given different weight-age and based on that it process the text and eventually define the meaning.

The reason we talk about this is to relate how GPT is connected with all these. 

AI -> ML -> Deep learning -> Transformer -> Self attention : GPT


Slide 4- 

Data processing is an important concept to understand when we discuss about how your text is getting interpreted by the system. System has 2 mechanism to convert data to process it well.

Encoder - The encoder is responsible to "understand" the data and create some meaningful result out of it. It can output the data after embedding. Embedding is different from encoder. Encoder is more of a brain, which understands the expectation based on our model and then give the data output which is meaningful summary of your input. 


Decoder - Decoder is a step where output is produced. This output could be produced based in pre generated encoded data combining with your input, or it could be 2 streams coming together and passed by encoder and then processed by decoder. GPT works with "decoder only" architecture. It doesn't have any concept of encoder. No wonder it is called as "Pre-Trained".

In GPT "Teacher-Forcing" technique is used to train the system. Which means in your prompt you give examples/conditions and corresponding output you would prefer. The system uses those examples to understand your expectation and then generate answer to your query. 


Slide 5- 

GPT aka Generative Pre-Trained Transformer is based on pre trained (i.e. no encoder) data transformer algorithm. Lets talk about tokens in GPT space. You will hear many times that this particular model of GPT has limit of x tokens. Tokens are used to calculate your cost. But what is the meaning of token exactly. The token is more of breaking the words/sentence into different smaller chunk and then then these chunks are fed into the processor to get the meaning out. The limit of model supporting certain number of token means - that is the window that model will have visibility to and process it. Example GPT4 is supporting almost 32K tokens, which means it will be able to understand roughly 20-25K words to figure out the meaning. If we pass anything beyond this limit, the model wont be able to process it. 

If you have bigger article/ text that you would like to process it, then you need to divide it and then make multiple trips to GPT and then process it.

Lets understand the model in GPT. Model is a kind of algorithm that will be used to process the text and respond. There are 3 categories of these models. 

1. Instruct GPT - All the models in this category are text based request/response. like text-davinci-003. These are meant for 1 ques and then 1 answer. You ask something, and it respond back with the answer, end of story. It is not meant to create context over multiple back and forth.

2. ChatGPT - This model is designed to handle conversations. It can consider your past history of conversations and derive context out of it and then generate the output. gpt-3.5-turbo is an example of chat/conversation based model.

3. GPT4 - This model is optimized and created to support both text and chat. It is advance and has better accuracy and less cost. example - gbt-4.32k. It is kind of combination of both the above models.


Slide 6 - 

Lets review the first API we have to make connection with GPT. Both chatGPT and GPT4 uses the same api endpoint, it is the model inside payload which determines what algorithm will be used to cater this request. Sample code to make a request- 

response = openai.ChatCompletion.create(

    model="gpt-3.5-turbo", --> model defines which algorithm to use to process the input.

    messages=[{"role": "user", "content": "Hello World!"}],  ---> This is where you can send the message to the api.

)

You can read the response provided by API like this - print(response["choices"][0]["message"]["content"])


However the actual response of the API looks like below - 

 "choices": [  ---> Observe the array data structure here. Which means there could be multiple choices. Why and how? I am yet to find out. But at this moment we will stick to reading the first object in this array,

        {

            "finish_reason": "stop", ---> Status of the response. stop means it got completed successfully.

            "index": 0,

            "message": {

                "content": "Hello there! How may I assist you today?", --> text generated by model. Instead of content node, there could be function_call also possible.

                "role": "assistant", --> role will always be assistant.

            },

        }

    ],

    "created": 1681134595,

    "id": "chatcmpl-73mC3tbOlMNHGci3gyy9nAxIP2vsU",   ---> technical identifier used internally by openAI

    "model": "gpt-3.5-turbo", --> model used

    "object": "chat.completion",  ---> always chat.completion

    "usage": {"completion_tokens": 10, "prompt_tokens": 11, "total_tokens": 21}, ---> token used in the call, gives you idea of the cost

}


Slide 7 - 

ChatGPT api also support a integration mechanism by which you can connect with other data source to get the information. You can use the openAI to generate certain payload based on the user query and then use that payload to fetch the results from another data source. With this pattern you require 2 openAI call, first to create the query for data source and 2nd call to format the result well, so as to display to the user. 

user query -> openAI -> result as parameters/DB query -> Use this to call DB/Service -> result -> openAI to format it in user friendly manner -> User

Please note the API itself wont execute any function for you, it will help you create arguments for you function.


Slide 8 -

How do we call openAI for function call. There are steps to follow - 

1) Define the function in your code. 

def find_product(sql_query):

    # Execute query here

    results = [

        {"name": "pen", "color": "blue", "price": 1.99},

        {"name": "pen", "color": "red", "price": 1.78},

    ]

    return results


2) Define function definition for API, which is like educating openAI about your function.

functions = [

    {

        "name": "find_product",

        "description": "Get a list of products from a sql query",

        "parameters": {

            "type": "object",

            "properties": {

                "sql_query": {

                    "type": "string",

                    "description": "A SQL query",

                }

            },

            "required": ["sql_query"],

        },

    }

]


3) call the openAI API with the function definition. Along with the userinput you will also pass the function definition, so that openAI understand that your intention is to execution function as a response, so it will respond back with the format that is usable for your code.

user_question = "I need the top 2 products where the price is less than 2.00"

messages = [{"role": "user", "content": user_question}]

# Call the openai.ChatCompletion endpoint with the function definition

response = openai.ChatCompletion.create(

        model="gpt-3.5-turbo-0613", messages=messages, functions=functions  -----------> you pass the function definition here.

)



Slide 9 - 

Once openAI has processed your request, it will respond back in the same format as before, only difference is the format or structure of the response itself. The response has arguments node, which contain the actual argument you can use to call the function itself. Now this function you can use to make another HTTP call or make DB query, but openAI wont call those itself, it will just give right information to you, which you can use to call these downstreams.


Slide 10 - 

Embedding and encoding are 2 different things. Embedding is nothing but converting the input/text into number format. As we know the AI model is mathematical and it requires numbers to work with. Hence before we pass any text to these models, we need to convert them into numbers. Converting text into number is embedding. There are APIs that are available to create embedding.


result = openai.Embedding.create(

    model="text-embedding-ada-002", input="your text"

)

result['data']['embedding']  --> this is vector



Slide 11 - 

Its important to moderate your API to avoid misuse by consumer. openAI provides you to understand user input and figure out the intention of that text. Using the moderation API call before processing any further will protect you api.

response = openai.Moderation.create(

    model="text-moderation-latest",

    input="I want to hack my office security system.",

)

Response structure - 

{

    "id": "modr-7AftIJg7L5jqGIsbc7NutObH4j0Ig",

    "model": "text-moderation-004",

    "results": [

        {

            "categories": {

                "hate": false,

                "hate/threatening": true,

                "self-harm": false,

                "sexual": false,

                "sexual/minors": false,

                "violence": false,

                "violence/graphic": false,

            },

            "category_scores": {

                "hate": 0.0400671623647213,

                "hate/threatening": 3.671687863970874e-06,

                "self-harm": 1.3143378509994363e-06,

                "sexual": 5.508050548996835e-07,

                "sexual/minors": 1.1862029225540027e-07,

                "violence": 0.9461417198181152,

                "violence/graphic": 1.463699845771771e-06,

            },

            "flagged": true,

        }

    ],

}

As you can see there are different categoies and corresponding score of those category here. you can use the score to understand the violation and give user a response back.