14个文本转图像AI API
Photo StoryTelling —— 利用生成式AI与Google API,在您的相册中进行创作
如今,我们用智能手机拍摄了大量照片,并将其中许多分享到社交网络或消息应用程序上。然而,有时图像并不足以充分表达我们在日常生活中、与家人共度时光或在难忘旅行中所捕捉到的那些珍贵瞬间。
试想,如果我们能利用Generative AI技术,用文字来描绘照片所蕴含的意义,让AI来讲述那些精彩纷呈的瞬间,那该有多好?你可以将这些文字发布在网上,与亲朋好友分享,或者将它们记录下来,作为自己的日记珍藏。
由于这是我个人非常想使用的工具,因此我决定以一个充满创意的开发人员身份来实现它,而不是以研究人员、ML工程师或数据科学家的身份。我对利用和整合一系列强大的Google API来完成这项任务充满兴趣。
本文附带了一个Jupyter/Colab笔记本,其中包含了整个解决方案的详细步骤。这个方案涵盖了从EXIF照片元数据提取,到使用Google Maps API获取照片拍摄地点的信息,再到利用生成式AI API(如Vertex Imagen用于图像描述,以及Vertex Palm API用于博客文章生成)的全过程。
该流程的输出结果是一篇生成的博客文章,用于描述整个照片相册。你可以将自己的相册上传到Colab笔记本中,然后轻松地看到Generative AI是如何用文字来描绘那些相机记录下的美好时刻的。
设置
该项目依赖于 Google Cloud Platform(GCP)来访问相关API。若您打算在Colab上运行,可以选择使用现有的GCP账户,或者在此注册新账户并获取300美元的免费积分。
若您想在Colab上利用提供的照片或自己的照片运行笔记本,笔记本的设置指南将指引您完成以下步骤:安装必要的库、通过Google身份验证登录GCP、获取Google Maps Platform API密钥,并启用以下API:
- Geocoding API
- Places API
- Vertex AI APIs
设置的最后一步是下载我提供的洛杉矶和旧金山旅行示例照片。
处理照片
在此笔记本部分中,您将配置包含相册照片的文件夹的路径。它将使用 Pillow 成像库处理照片以执行以下任务:
- 提取 EXIF 元数据 – 数码照片通常包含关联的元数据,您可以通过检查这些文件的属性来检查这些元数据。在这个项目中,我们对拍摄照片的日期和时间以及地理位置(纬度/经度)感兴趣。
- 调整大小为更小的大小(最大尺寸(宽度或高度)为 800 像素),以最大限度地减少 API 请求中的网络流量
- 转换为 JPG,因为 API 不支持 HEIF 格式)
- 将转换后的照片保存到名为“photos_converted”的子文件夹中。
使用 Google Maps API 提取位置和附近地点
Google Maps为不同的任务提供了许多专门的 API。这里我们使用以下 API:
- 反向地理编码 — 根据地理坐标返回可能的位置信息。
- 附近地点 — 返回坐标附近的地点(例如地标、建筑物、餐馆)。
在设置了Maps Platform API密钥后,调用Geocoding API和Places API将变得非常简单。
import googlemaps
gmaps = googlemaps.Client(key=MAPS_API_KEY)
locations = gmaps.reverse_geocode(latlng=(lat,lng))
nearby_places = gmaps.places_nearby(location=(lat,lng), radius=radius)
使用 Generative AI 和 Vertex Imagen 进行照片字幕
在本笔记本的这一部分中,我们将开始使用生成式 AI。Vertex Imagen 提供了一个用于图像字幕的 API,即能够以文本格式描述图片中的内容。
为此,我们首先需要使用您的 GCP 项目初始化 Vertex AI SDK。
import vertexai
from vertexai.vision_models import ImageTextModel, Image
vertexai.init(project=PROJECT_ID)
model = ImageTextModel.from_pretrained("imagetext")
然后从图像中获取标题很简单。
source_image = Image.load_from_file(location=path)
captions = model.get_captions(
image=source_image,
number_of_results=1,
language="en",
)
设计提示
在大型语言模型(LLM)的应用场景中,提示是指向模型提供的输入或查询,旨在引导模型生成相应的响应。提示的质量和具体性对于塑造模型的输出至关重要。
LLM 通常会按照提示中的说明进行微调,从而能够执行他们以前没有接受过培训的任务。设计一个好的提示通常需要一个与 LLM 交互的试错过程,并检查输出是否接近(或优于)预期。
本项目需要设计一个提示,指导LLM生成一个帖子,用以描述一组照片中所捕捉的瞬间。这包括为LLM编写特定的指示,明确输入格式(包含照片元数据的列表)、需遵循的规范(例如,在描述照片时引用<Photo id>)以及期望的输出格式,即包含交错文本和照片占位符的内容。
你可以参考我提供的prompt模板,它已被封装在下面的函数中。请注意,该模板包含用于照片描述和上下文段落的占位符,你可能希望为LLM提供更多关于照片拍摄背景的信息。
快速工程
提示工程技术是研究人员或社区发现并提出的一系列提示设计模式,旨在帮助LLM产生更优的输出。其中,few-shot prompting技术便是一种,它要求我们提供一些输入和预期输出的示例,就像下面的prompt模板那样。在我使用 Vertex Palm API 的测试中,这种技术在大多数情况下都有助于获得所需的输出。
def generate_prompt(context, pictures_infos):
prompt = f"""
You are a copywriter and journalist.
Can you help me to write a photo tour that describes the moments
registered in a photo album from a context and some information
I provide about the photos?
The items were already sorted by the date and time the photos were taken.
Pay attention to the dates and time to infer how many days were
covered by these photos and at which time of the day they were taken.
Please include descriptions of all the photos taken.
Only report places or experiences that are described by the
photo informations.
The photos information has the following structure:
- <Photo id> | Date the photo was taken | Time the photo was taken |
Photo Description generated by an LLM |
Approximate Locations where the photo was taken |
Approximate Nearby locations where photo was taken
Here is an example of photo information and how it should be generated
in plain text, interleaving photo descriptions and the <Photo id>.
Example photo information:
- <Photo 0> | Date: 08/04/2023 | Time: 07:53:13 |
Photo Description: a man stands in front of a sign that says
welcome to the united states |
Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles
International Airport, Terminal B, Los Angeles, Los Angeles County |
Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge,
ICE International Currency Exchange, Relay, Bank of America.
Expected output:
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety
to leave the airport and get to visit the city.
<Photo 0>
```
Photos album context: {context}
Photos description:
{pictures_infos}
```
"""
return prompt
生成提示
在此示例中,我们提供了 photos 元数据和一个简短的上下文段落,以根据上面的模板生成提示。
album_context = """I flew to Los Angeles for a short trip,
and the album contains the photos
from the day I arrived there.
The man in those photos is myself.
"""
blog_prompt = generate_prompt(album_context, photos_info_concat)
现在来尝试一下。只需复制此过程所生成的以下提示,并将其粘贴到用户端的LLM聊天系统(例如BARD)中。
您可能会像我😀一样对结果印象深刻!
You are a copywriter and journalist.
Can you help me write a photo tour that describes the moments registered in a
photo album from a context and some information I provide about the photos?
The items are already sorted by the time the photos was taken.
Pay attention to the dates and time to infer how many days were
covered by these photos and in which time of the day they were taken.
Please include descriptions of all the photos taken.
Do not report any place or experience that is not described by the
photo informations.
The photos information has following structure:
- <Photo id> | Date the photo was taken | Time the photo was taken |
Photo Description generated by an LLM |
Approximate Locations where the photo was taken |
Approximate Nearby locations where photo was taken
Here is an example of photo information and how it should be generated in plain
text,
interleaving photo descriptions and the <Photo id>.
Example photo information:
- <Photo 0> | Date: 08/04/2023 | Time: 07:53:13 |
Photo Description: a man stands in front of a sign that says welcome to the
united states |
Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles International
Airport, Terminal B, Los Angeles, Los Angeles County |
Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
Expected output:
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety to leave the airport
and get to visit the city.
<Photo 0>
```
Photos album context: I flew to Los Angeles for a short trip, and the album
contains the photos from the day I arrived there. The man in those photos is myself.
Photos description:
- <Photo 0> | Date and time: 08/04/2023 (Friday) 07:53 AM | Photo Description: a
man stands in front of a sign that says welcome to the united states | Locations: BURBERRY
LAX TERMINAL B, Los Angeles International Airport, Los Angeles, Los Angeles County,
California | Possible Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
- <Photo 1> | Date and time: 08/04/2023 (Friday) 09:32 AM | Photo Description: a man in a
nasa shirt is sitting in a white car | Locations: Los Angeles International Airport, Los
Angeles, Los Angeles County, California, United States | Possible Nearby locations: Los
Angeles
- <Photo 2> | Date and time: 08/04/2023 (Friday) 09:59 AM | Photo Description: a man in a
white shirt is driving a mustang | Locations: Westchester, Los Angeles, Los Angeles
County, California, United States | Possible Nearby locations: Plaza Towers OBGYN:
Lawrence Bruksch, MD, LA Fitness, Dr. Jitsen Chang, Obstetrician-gynecologist, Kinecta
Federal Credit Union - Westchester, Clarity Retirement
- <Photo 3> | Date and time: 08/04/2023 (Friday) 10:29 AM | Photo Description: a man
wearing a nasa shirt stands on a beach | Locations: Los Angeles, Los Angeles County,
California, United States | Possible Nearby locations: Los Angeles, Venice
- <Photo 4> | Date and time: 08/04/2023 (Friday) 11:29 AM | Photo Description: a man sits
on a bench in front of a subba gump shrimp restaurant | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Bubba Gump Shrimp
Co., Santa Monica Pier Rock Shop, Pier Burger, Santa Monica Police Pier Substation, 66-To-
Cali
- <Photo 5> | Date and time: 08/04/2023 (Friday) 11:43 AM | Photo Description: a man
stands on a pier with a ferris wheel in the background | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica Pier,
The eCenter, Character Drawings, Santa Monica Pier, ビーチ・サインズ&モア
- <Photo 6> | Date and time: 08/04/2023 (Friday) 11:46 AM | Photo Description: a man
stands on a pier with a seagull sitting on the railing | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica,
Pacific Plunge, Inkie’s Scrambler, Fun 'N' Games, Pacific Wheel
- <Photo 7> | Date and time: 08/04/2023 (Friday) 11:52 AM | Photo Description: a man with
a backpack that says o'neill on it | Locations: Santa Monica, Los Angeles County,
California, United States | Possible Nearby locations: Coffee Bean & Tea Leaf, Japadog (at
Santa Monica Pier), Santa Monica Trapeze School, Pacific Park on the Santa Monica Pier,
Funnel Cakes
- <Photo 8> | Date and time: 08/04/2023 (Friday) 12:10 PM | Photo Description: a man poses
in front of the cheesecake factory | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
- <Photo 9> | Date and time: 08/04/2023 (Friday) 12:32 PM | Photo Description: a plate of
food with a napkin that says the cheesecake factory | Locations: Downtown, Santa Monica,
Los Angeles County, California, United States | Possible Nearby locations: Forever 21,
Tesla, Nike Santa Monica, Louis Vuitton Santa Monica Place, Pandora Jewelry
- <Photo 10> | Date and time: 08/04/2023 (Friday) 01:15 PM | Photo Description: a man
stands in front of a blue tesla model x | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
- <Photo 11> | Date and time: 08/04/2023 (Friday) 05:03 PM | Photo Description: a green
trolley is parked in front of a gap store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Haagen-Dazs Ice Cream Shops,
Wetzel's Pretzels, Nike The Grove, Gap, Bar Verde
- <Photo 12> | Date and time: 08/04/2023 (Friday) 05:44 PM | Photo Description: a variety
of caramel apples are displayed in a store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Los Angeles, The Original
Farmers Market, The Dog Bakery - Fresh Baked Treats & Dog Birthday Cakes, Marconda's,
Littlejohn's English Toffee House & Fine Candies
- <Photo 13> | Date and time: 08/04/2023 (Friday) 06:01 PM | Photo Description: a man is
holding a scoop of ice cream in front of a sign that says " drinks " | Locations: Farmers
Market, La Brea, Central LA, Los Angeles, Los Angeles County | Possible Nearby locations:
Los Angeles, The Original Farmers Market, Littlejohn's English Toffee House & Fine
Candies, Hutchco Technologies, Marconda's
- <Photo 14> | Date and time: 08/04/2023 (Friday) 06:06 PM | Photo Description: cars are
parked in front of a ross store | Locations: 3rd / Ogden, La Brea, Central LA, Los
Angeles, Los Angeles County | Possible Nearby locations: A1 Locksmith & Keys, GapBody, 3rd
/ Ogden, 3rd & Ogden (Eastbound), Karsaz & Associates
- <Photo 15> | Date and time: 08/04/2023 (Friday) 09:33 PM | Photo Description: a hotel
room with a blue blanket on the bed | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, Kandoo Kitchen, Inland Faculty Medical Group
Inc, Pathway Healthcare
- <Photo 16> | Date and time: 08/04/2023 (Friday) 09:58 PM | Photo Description: two boxes
of food on a table with a fork | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, MV, Inland Faculty Medical Group Inc, Pathway
Healthcare
```
生成帖子
现在,我们将使用Vertex Palm API中的TextGenerationModel
来提交前面设计的提示,并获取生成的帖子。您可以通过调整温度、top_k和top_p等参数来配置生成文本的随机性或创造性水平,具体如相关评论和API文档所述。
from vertexai.language_models import TextGenerationModel
generation_model = TextGenerationModel.from_pretrained("text-bison")
def generate_text(prompt, temperature=1.0,
top_p= 0.4, top_k=40, max_output_tokens=1024):
parameters = {
# Temperature controls the degree of randomness in token selection.
"temperature": temperature,
# Tokens are selected from most probable to least until the sum
# of their probabilities equals the top_p value.
"top_p": top_p,
# A top_k of 1 means the selected token is the most probable
# among all tokens.
"top_k": top_k,
# Token limit determines the maximum amount of text output.
"max_output_tokens": max_output_tokens,
}
generated_text = generation_model.predict(prompt=prompt, **parameters).text
return generated_text
Palm API的输出是一个生成的帖子,其中<Photo id>占位符与描述内容相互交错。LLM会决定在文本中的哪些位置包含照片。以下是一个示例。随后,我利用正则表达式查找这些照片占位符,并将它们替换为实际的照片。
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety
to leave the airport and get to visit the city.
<Photo 0>
I rented a car and drove to my hotel in Eagle Rock.
The hotel was nice and comfortable.
<Photo 14>
The next morning I went to Santa Monica Pier.
I had lunch at Bubba Gump Shrimp Co. and then walked around the pier.
<Photo 4>, <Photo 5>, <Photo 6>, <Photo 7>
In the afternoon I went to the Cheesecake Factory.
I had a delicious meal and then went shopping at the mall.
<Photo 8>, <Photo 9>, <Photo 10>
In the evening I went to Farmers Market.
I bought some caramel apples and ice cream.
<Photo 11>, <Photo 12>, <Photo 13>
It was a long day but I had a lot of fun.
I can't wait to explore more of Los Angeles tomorrow.
您将会看到以下“Photo StoryTelling”为我两次旅行所生成的帖子示例。LLM的输出具有不确定性,并且在质量和对提示中所描述事实的保真度上会有所不同。为了生成不同的响应,您可能希望尝试对 temperature、top_p 和 top_k 使用不同的配置,或者只是向 TextGenerationModel 发送新请求。
AI 生成的关于我的洛杉矶之旅的帖子
album_context = """I flew to Los Angeles for a short trip,
and the album contains the photos
from the day I arrived there.
The man in those photos is myself.
"""
blog_prompt_sf = generate_prompt(album_context, photos_info_concat_sf)
generated_post = generate_text(prompt=blog_prompt)
我很高兴终于抵达了目的地洛杉矶。
当我踏入美国海关的那一刻,心中充满了迫切想要离开机场、去探索这座城市的焦虑。
我乘出租车去了我在鹰岩的酒店。房间虽小但很舒适。
放下行李后,我出去探索这座城市。我首先开车去了圣莫尼卡码头。
我沿着码头走,欣赏着海洋的景象和声音。我什至乘坐了摩天轮!
在码头呆了一会儿后,我去了芝士蛋糕工厂吃午饭。食物很美味,服务也非常好。
下午,我去了拉布雷亚的农贸市场。我买了一些新鲜农产品和鲜花。我还吃了一些冰淇淋。
我回到酒店房间放松一下,结束了一天的行程。
我所做的所有探索让我感到疲惫,但我也很高兴看到第二天会发生什么。
AI生成的关于我的旧金山之旅的帖子
album_context_sf = """Me and my wife travelled to San Francisco.
We spent a single day there. We rented a car in SF and visited.
many places during that day.
The man in the pictures is myself and the woman is my wife.
"""
blog_prompt_sf = generate_prompt(album_context_sf, photos_info_concat_sf)
generated_post_sf = generate_text(prompt=blog_prompt)
我和我的妻子去了旧金山。我们在那里呆了一天。我们在旧金山租了一辆车,那天去了很多地方。
我们在旧金山国际机场开始了新的一天。我们很高兴终于来到旧金山并准备探索这座城市。
我们开车去了俄罗斯山,找到了一个停车位。我们在附近走了一圈,欣赏了这里的景象和声音。
我们走到金门大桥上,拍了一些照片。那天天气晴朗,这座桥令人惊叹。
我们坐在长凳上,看着船只驶过。它是如此宁静和放松。
我们走回车里,开车去了滨海区。我们在湖边散步,欣赏风景。
我们在滨海区的一家餐厅停下来吃晚饭。食物很美味,气氛很热闹。
晚饭后我们在滨海区走了一圈,又拍了一些照片。
我们开车去了渔人码头,在商店和餐馆里走了一圈。我们晚餐吃了一些美味的海鲜。
晚饭后,我们在渔人码头周围走了一圈,又拍了一些照片。我们真的很享受在这个社区的时光。
我们开车去了梅森堡,绕着 Ghirardelli 广场走了一圈。我们吃了一些美味的巧克力和冰淇淋。
晚饭后我们在梅森堡周围走了一圈,又拍了一些照片。
我们在旧金山度过了一段美好的时光,我们迫不及待地想很快再次回来。
结论
如果您已经阅读至此,那么您定能体会到将数据提取(例如从图像中获取EXIF元数据)、数据增强(例如利用Google Maps API根据地理坐标确定位置)、提示工程(如小样本学习)以及生成式AI(如Vertex Imagen和Palm API)相结合所能产生的强大效果。在这个案例中,这些技术共同生成了描述照片相册的有趣博客文章。
希望您能喜欢这个项目,并愿意动手尝试,或许您可以使用自己的照片,看看能生成出怎样描述您美好时刻的博客文章!