Running open source LLM Models on your workstation

So, you're completely new to the world of LLMs and want to start experimenting with some kind of LLM? You've come to the right post.

We will show you how to start your LLM journey.

Step 1: Join HuggingFace community

First step is to understand what models are available and how to start using them. There are far too many LLM models out there which are being trained for different purposes. Huggingface is a community that has all the info and community at one place. Creation of account is free and information is highly reliable.

Step 2: Chose a model to play with.

Next step is you pick a model that you want to play with. I personally prefer something that is text based such as Microsoft's phi-2.

But you might ask what a model is. A model is basically a big file. It takes some input and apply the model's "parameters" on the input to produce an output. The actual working of the model will depend on many things which are beyond the scope of this article but all you need to know is that model is a giant big file.

Step 3: Download the model file and use it in the program.

My recommendation is to play with Microsoft's phi-2. It is a fairly large model to make its output interesting.

💡

There are two model files and they are very big. Do not worry, when you clone the repository these files are not downloaded. The first time you run the program is when these large files will be downloaded once.

Once you download a repo the example code comes with it. All you need to do is to install python and run the program. For the above mentioned model the program looks something like this:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

💡

You will need to install pytorch for this code to run.

In above code, you are basically asking the model to complete your code. It generates following code:

def print_prime(n):
   """
   Print all primes between 1 and n
   """
   primes = []
   for num in range(2, n+1):
       is_prime = True
       for i in range(2, int(math.sqrt(num))+1):
           if num % i == 0:
               is_prime = False
               break
       if is_prime:
           primes.append(num)
   print(primes)

Understand that phi-2 model is meant for QnA, code generation and chatbots. However you should also understand that this is more of a research project and not necessarily to be relied upon for production code generation.

Step 4: Play with more models, understand licenses and think about real world use cases

Once you run few models in this manner, you can think more deeply about how you can use for real world use cases. You might want to integrate it with your existing apps and website, you might want to apply it to solve novel problems and so on.

For this to happen you need to understand the license under which the model is basically released. Then you need to ask if you can retrain the model for your narrow use cases. In our future articles we will discuss how to do this but for now, just play with the existing models.

Conclusion

Do not worry about all the complexity around LLMs. They are ridiculously simple to use for your small use cases. Getting started with them is an important first step you should take to eventually master them.