云原生 API 网关 APISIX 入门教程
试想,如果我们能利用Generative AI技术,用文字来描绘照片所蕴含的意义,让AI来讲述那些精彩纷呈的瞬间,那该有多好?你可以将这些文字发布在网上,与亲朋好友分享,或者将它们记录下来,作为自己的日记珍藏。
由于这是我个人非常想使用的工具,因此我决定以一个充满创意的开发人员身份来实现它,而不是以研究人员、ML工程师或数据科学家的身份。我对利用和整合一系列强大的Google API来完成这项任务充满兴趣。
本文附带了一个Jupyter/Colab笔记本,其中包含了整个解决方案的详细步骤。这个方案涵盖了从EXIF照片元数据提取,到使用Google Maps API获取照片拍摄地点的信息,再到利用生成式AI API(如Vertex Imagen用于图像描述,以及Vertex Palm API用于博客文章生成)的全过程。
该流程的输出结果是一篇生成的博客文章,用于描述整个照片相册。你可以将自己的相册上传到Colab笔记本中,然后轻松地看到Generative AI是如何用文字来描绘那些相机记录下的美好时刻的。
该项目依赖于 Google Cloud Platform(GCP)来访问相关API。若您打算在Colab上运行,可以选择使用现有的GCP账户,或者在此注册新账户并获取300美元的免费积分。
若您想在Colab上利用提供的照片或自己的照片运行笔记本,笔记本的设置指南将指引您完成以下步骤:安装必要的库、通过Google身份验证登录GCP、获取Google Maps Platform API密钥,并启用以下API:
在此笔记本部分中,您将配置包含相册照片的文件夹的路径。它将使用 Pillow 成像库处理照片以执行以下任务:
Google Maps为不同的任务提供了许多专门的 API。这里我们使用以下 API:
在设置了Maps Platform API密钥后,调用Geocoding API和Places API将变得非常简单。
import googlemaps
gmaps = googlemaps.Client(key=MAPS_API_KEY)
locations = gmaps.reverse_geocode(latlng=(lat,lng))
nearby_places = gmaps.places_nearby(location=(lat,lng), radius=radius)
在本笔记本的这一部分中,我们将开始使用生成式 AI。Vertex Imagen 提供了一个用于图像字幕的 API,即能够以文本格式描述图片中的内容。
为此,我们首先需要使用您的 GCP 项目初始化 Vertex AI SDK。
import vertexai
from vertexai.vision_models import ImageTextModel, Image
model = ImageTextModel.from_pretrained("imagetext")
source_image = Image.load_from_file(location=path)
captions = model.get_captions(
LLM 通常会按照提示中的说明进行微调,从而能够执行他们以前没有接受过培训的任务。设计一个好的提示通常需要一个与 LLM 交互的试错过程,并检查输出是否接近(或优于)预期。
本项目需要设计一个提示,指导LLM生成一个帖子,用以描述一组照片中所捕捉的瞬间。这包括为LLM编写特定的指示,明确输入格式(包含照片元数据的列表)、需遵循的规范(例如,在描述照片时引用<Photo id>)以及期望的输出格式,即包含交错文本和照片占位符的内容。
提示工程技术是研究人员或社区发现并提出的一系列提示设计模式,旨在帮助LLM产生更优的输出。其中,few-shot prompting技术便是一种,它要求我们提供一些输入和预期输出的示例,就像下面的prompt模板那样。在我使用 Vertex Palm API 的测试中,这种技术在大多数情况下都有助于获得所需的输出。
def generate_prompt(context, pictures_infos):
prompt = f"""
You are a copywriter and journalist.
Can you help me to write a photo tour that describes the moments
registered in a photo album from a context and some information
I provide about the photos?
The items were already sorted by the date and time the photos were taken.
Pay attention to the dates and time to infer how many days were
covered by these photos and at which time of the day they were taken.
Please include descriptions of all the photos taken.
Only report places or experiences that are described by the
photo informations.
The photos information has the following structure:
- <Photo id> | Date the photo was taken | Time the photo was taken |
Photo Description generated by an LLM |
Approximate Locations where the photo was taken |
Approximate Nearby locations where photo was taken
Here is an example of photo information and how it should be generated
in plain text, interleaving photo descriptions and the <Photo id>.
Example photo information:
- <Photo 0> | Date: 08/04/2023 | Time: 07:53:13 |
Photo Description: a man stands in front of a sign that says
welcome to the united states |
Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles
International Airport, Terminal B, Los Angeles, Los Angeles County |
Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge,
ICE International Currency Exchange, Relay, Bank of America.
Expected output:
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety
to leave the airport and get to visit the city.
<Photo 0>
Photos album context: {context}
Photos description:
return prompt
在此示例中,我们提供了 photos 元数据和一个简短的上下文段落,以根据上面的模板生成提示。
album_context = """I flew to Los Angeles for a short trip,
and the album contains the photos
from the day I arrived there.
The man in those photos is myself.
blog_prompt = generate_prompt(album_context, photos_info_concat)
You are a copywriter and journalist.
Can you help me write a photo tour that describes the moments registered in a
photo album from a context and some information I provide about the photos?
The items are already sorted by the time the photos was taken.
Pay attention to the dates and time to infer how many days were
covered by these photos and in which time of the day they were taken.
Please include descriptions of all the photos taken.
Do not report any place or experience that is not described by the
photo informations.
The photos information has following structure:
- <Photo id> | Date the photo was taken | Time the photo was taken |
Photo Description generated by an LLM |
Approximate Locations where the photo was taken |
Approximate Nearby locations where photo was taken
Here is an example of photo information and how it should be generated in plain
interleaving photo descriptions and the <Photo id>.
Example photo information:
- <Photo 0> | Date: 08/04/2023 | Time: 07:53:13 |
Photo Description: a man stands in front of a sign that says welcome to the
united states |
Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles International
Airport, Terminal B, Los Angeles, Los Angeles County |
Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
Expected output:
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety to leave the airport
and get to visit the city.
<Photo 0>
Photos album context: I flew to Los Angeles for a short trip, and the album
contains the photos from the day I arrived there. The man in those photos is myself.
Photos description:
- <Photo 0> | Date and time: 08/04/2023 (Friday) 07:53 AM | Photo Description: a
man stands in front of a sign that says welcome to the united states | Locations: BURBERRY
LAX TERMINAL B, Los Angeles International Airport, Los Angeles, Los Angeles County,
California | Possible Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
- <Photo 1> | Date and time: 08/04/2023 (Friday) 09:32 AM | Photo Description: a man in a
nasa shirt is sitting in a white car | Locations: Los Angeles International Airport, Los
Angeles, Los Angeles County, California, United States | Possible Nearby locations: Los
- <Photo 2> | Date and time: 08/04/2023 (Friday) 09:59 AM | Photo Description: a man in a
white shirt is driving a mustang | Locations: Westchester, Los Angeles, Los Angeles
County, California, United States | Possible Nearby locations: Plaza Towers OBGYN:
Lawrence Bruksch, MD, LA Fitness, Dr. Jitsen Chang, Obstetrician-gynecologist, Kinecta
Federal Credit Union - Westchester, Clarity Retirement
- <Photo 3> | Date and time: 08/04/2023 (Friday) 10:29 AM | Photo Description: a man
wearing a nasa shirt stands on a beach | Locations: Los Angeles, Los Angeles County,
California, United States | Possible Nearby locations: Los Angeles, Venice
- <Photo 4> | Date and time: 08/04/2023 (Friday) 11:29 AM | Photo Description: a man sits
on a bench in front of a subba gump shrimp restaurant | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Bubba Gump Shrimp
Co., Santa Monica Pier Rock Shop, Pier Burger, Santa Monica Police Pier Substation, 66-To-
- <Photo 5> | Date and time: 08/04/2023 (Friday) 11:43 AM | Photo Description: a man
stands on a pier with a ferris wheel in the background | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica Pier,
The eCenter, Character Drawings, Santa Monica Pier, ビーチ・サインズ&モア
- <Photo 6> | Date and time: 08/04/2023 (Friday) 11:46 AM | Photo Description: a man
stands on a pier with a seagull sitting on the railing | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica,
Pacific Plunge, Inkie’s Scrambler, Fun 'N' Games, Pacific Wheel
- <Photo 7> | Date and time: 08/04/2023 (Friday) 11:52 AM | Photo Description: a man with
a backpack that says o'neill on it | Locations: Santa Monica, Los Angeles County,
California, United States | Possible Nearby locations: Coffee Bean & Tea Leaf, Japadog (at
Santa Monica Pier), Santa Monica Trapeze School, Pacific Park on the Santa Monica Pier,
Funnel Cakes
- <Photo 8> | Date and time: 08/04/2023 (Friday) 12:10 PM | Photo Description: a man poses
in front of the cheesecake factory | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
- <Photo 9> | Date and time: 08/04/2023 (Friday) 12:32 PM | Photo Description: a plate of
food with a napkin that says the cheesecake factory | Locations: Downtown, Santa Monica,
Los Angeles County, California, United States | Possible Nearby locations: Forever 21,
Tesla, Nike Santa Monica, Louis Vuitton Santa Monica Place, Pandora Jewelry
- <Photo 10> | Date and time: 08/04/2023 (Friday) 01:15 PM | Photo Description: a man
stands in front of a blue tesla model x | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
- <Photo 11> | Date and time: 08/04/2023 (Friday) 05:03 PM | Photo Description: a green
trolley is parked in front of a gap store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Haagen-Dazs Ice Cream Shops,
Wetzel's Pretzels, Nike The Grove, Gap, Bar Verde
- <Photo 12> | Date and time: 08/04/2023 (Friday) 05:44 PM | Photo Description: a variety
of caramel apples are displayed in a store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Los Angeles, The Original
Farmers Market, The Dog Bakery - Fresh Baked Treats & Dog Birthday Cakes, Marconda's,
Littlejohn's English Toffee House & Fine Candies
- <Photo 13> | Date and time: 08/04/2023 (Friday) 06:01 PM | Photo Description: a man is
holding a scoop of ice cream in front of a sign that says " drinks " | Locations: Farmers
Market, La Brea, Central LA, Los Angeles, Los Angeles County | Possible Nearby locations:
Los Angeles, The Original Farmers Market, Littlejohn's English Toffee House & Fine
Candies, Hutchco Technologies, Marconda's
- <Photo 14> | Date and time: 08/04/2023 (Friday) 06:06 PM | Photo Description: cars are
parked in front of a ross store | Locations: 3rd / Ogden, La Brea, Central LA, Los
Angeles, Los Angeles County | Possible Nearby locations: A1 Locksmith & Keys, GapBody, 3rd
/ Ogden, 3rd & Ogden (Eastbound), Karsaz & Associates
- <Photo 15> | Date and time: 08/04/2023 (Friday) 09:33 PM | Photo Description: a hotel
room with a blue blanket on the bed | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, Kandoo Kitchen, Inland Faculty Medical Group
Inc, Pathway Healthcare
- <Photo 16> | Date and time: 08/04/2023 (Friday) 09:58 PM | Photo Description: two boxes
of food on a table with a fork | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, MV, Inland Faculty Medical Group Inc, Pathway
现在,我们将使用Vertex Palm API中的TextGenerationModel
from vertexai.language_models import TextGenerationModel
generation_model = TextGenerationModel.from_pretrained("text-bison")
def generate_text(prompt, temperature=1.0,
top_p= 0.4, top_k=40, max_output_tokens=1024):
parameters = {
# Temperature controls the degree of randomness in token selection.
"temperature": temperature,
# Tokens are selected from most probable to least until the sum
# of their probabilities equals the top_p value.
"top_p": top_p,
# A top_k of 1 means the selected token is the most probable
# among all tokens.
"top_k": top_k,
# Token limit determines the maximum amount of text output.
"max_output_tokens": max_output_tokens,
generated_text = generation_model.predict(prompt=prompt, **parameters).text
return generated_text
Palm API的输出是一个生成的帖子,其中<Photo id>占位符与描述内容相互交错。LLM会决定在文本中的哪些位置包含照片。以下是一个示例。随后,我利用正则表达式查找这些照片占位符,并将它们替换为实际的照片。
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety
to leave the airport and get to visit the city.
<Photo 0>
I rented a car and drove to my hotel in Eagle Rock.
The hotel was nice and comfortable.
<Photo 14>
The next morning I went to Santa Monica Pier.
I had lunch at Bubba Gump Shrimp Co. and then walked around the pier.
<Photo 4>, <Photo 5>, <Photo 6>, <Photo 7>
In the afternoon I went to the Cheesecake Factory.
I had a delicious meal and then went shopping at the mall.
<Photo 8>, <Photo 9>, <Photo 10>
In the evening I went to Farmers Market.
I bought some caramel apples and ice cream.
<Photo 11>, <Photo 12>, <Photo 13>
It was a long day but I had a lot of fun.
I can't wait to explore more of Los Angeles tomorrow.
您将会看到以下“Photo StoryTelling”为我两次旅行所生成的帖子示例。LLM的输出具有不确定性,并且在质量和对提示中所描述事实的保真度上会有所不同。为了生成不同的响应,您可能希望尝试对 temperature、top_p 和 top_k 使用不同的配置,或者只是向 TextGenerationModel 发送新请求。
album_context = """I flew to Los Angeles for a short trip,
and the album contains the photos
from the day I arrived there.
The man in those photos is myself.
blog_prompt_sf = generate_prompt(album_context, photos_info_concat_sf)
generated_post = generate_text(prompt=blog_prompt)
album_context_sf = """Me and my wife travelled to San Francisco.
We spent a single day there. We rented a car in SF and visited.
many places during that day.
The man in the pictures is myself and the woman is my wife.
blog_prompt_sf = generate_prompt(album_context_sf, photos_info_concat_sf)
generated_post_sf = generate_text(prompt=blog_prompt)
我们开车去了梅森堡,绕着 Ghirardelli 广场走了一圈。我们吃了一些美味的巧克力和冰淇淋。
如果您已经阅读至此,那么您定能体会到将数据提取(例如从图像中获取EXIF元数据)、数据增强(例如利用Google Maps API根据地理坐标确定位置)、提示工程(如小样本学习)以及生成式AI(如Vertex Imagen和Palm API)相结合所能产生的强大效果。在这个案例中,这些技术共同生成了描述照片相册的有趣博客文章。