This article explains the process of developing effective prompts for large language models (LLMs) and testing their performance using the HuggingFace datasets library. The chosen benchmark dataset is GLUE (General Language Understanding Evaluation), specifically the sentiment classification task with the Stanford Sentiment Treebank (sst2). The article proposes different few-shot prompts and tests them using the BLOOM model composed of 1.7 billion parameters. The performance of the prompt is evaluated using the accuracy metric. The article concludes that the chosen prompt plays a crucial role in the LLM text generation task. The article also compares the performance of BLOOM models based on their size and training objectives, with the instruction fine-tuned models outperforming pre-trained models in most cases.