はじめに
Image Analysis クライアント SDK を使ってリモート画像を解析してみました。
Microsoftのクイックスタート: Image Analysis 4.0の内容になります。
実装
Computer Visionリソースの作成
Azure PortalでComputer Vision を作成します。
環境変数の設定
.envファイルを作成し、Computer VisionのAPIキーとエンドポイントを設定します。
.env
1 2 3 |
VISION_KEY = "<YOUR KEY>" VISION_ENDPOINT = "<YOUR ENDPOINT>" |
APIキーとエンドポイントは、先ほど作ったComputer Visionリソースのページから確認できます。
Pythonファイルの実行
クイックスタート: Image Analysis 4.0のコードを実行します。(以下のコードには簡単な解説をコメントで入れています。)
解析に使う画像は、クイックスタートで使われているものと同じこちらの画像です。
test.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
<span class="c1"># dotenvライブラリをインポートして環境変数を読み込む </span><span class="kn">from</span> <span class="n">dotenv</span> <span class="kn">import</span> <span class="n">load_dotenv</span> <span class="nf">load_dotenv</span><span class="p">()</span> <span class="c1"># 必要なライブラリをインポート </span><span class="kn">import</span> <span class="n">os</span> <span class="kn">import</span> <span class="n">azure.ai.vision</span> <span class="k">as</span> <span class="n">sdk</span> <span class="c1"># Azureのビジョンサービスの認証情報を環境変数から取得 </span><span class="n">service_options</span> <span class="o">=</span> <span class="n">sdk</span><span class="p">.</span><span class="nc">VisionServiceOptions</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="sh">"</span><span class="s">VISION_ENDPOINT</span><span class="sh">"</span><span class="p">],</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="sh">"</span><span class="s">VISION_KEY</span><span class="sh">"</span><span class="p">])</span> <span class="c1"># 分析する画像のURLを指定 </span><span class="n">vision_source</span> <span class="o">=</span> <span class="n">sdk</span><span class="p">.</span><span class="nc">VisionSource</span><span class="p">(</span> <span class="n">url</span><span class="o">=</span><span class="sh">"</span><span class="s">https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png</span><span class="sh">"</span><span class="p">)</span> <span class="c1"># 画像分析のオプション設定 </span><span class="n">analysis_options</span> <span class="o">=</span> <span class="n">sdk</span><span class="p">.</span><span class="nc">ImageAnalysisOptions</span><span class="p">()</span> <span class="c1"># 分析に使用する機能を指定(キャプションとテキスト) </span><span class="n">analysis_options</span><span class="p">.</span><span class="n">features</span> <span class="o">=</span> <span class="p">(</span> <span class="n">sdk</span><span class="p">.</span><span class="n">ImageAnalysisFeature</span><span class="p">.</span><span class="n">CAPTION</span> <span class="o">|</span> <span class="n">sdk</span><span class="p">.</span><span class="n">ImageAnalysisFeature</span><span class="p">.</span><span class="n">TEXT</span> <span class="p">)</span> <span class="c1"># 分析する言語を指定(ここでは英語) </span><span class="n">analysis_options</span><span class="p">.</span><span class="n">language</span> <span class="o">=</span> <span class="sh">"</span><span class="s">en</span><span class="sh">"</span> <span class="c1"># ジェンダーニュートラルなキャプションを使用するかどうかを指定 </span><span class="n">analysis_options</span><span class="p">.</span><span class="n">gender_neutral_caption</span> <span class="o">=</span> <span class="bp">True</span> <span class="c1"># イメージアナライザーオブジェクトの作成 </span><span class="n">image_analyzer</span> <span class="o">=</span> <span class="n">sdk</span><span class="p">.</span><span class="nc">ImageAnalyzer</span><span class="p">(</span><span class="n">service_options</span><span class="p">,</span> <span class="n">vision_source</span><span class="p">,</span> <span class="n">analysis_options</span><span class="p">)</span> <span class="c1"># 画像の分析を実行 </span><span class="n">result</span> <span class="o">=</span> <span class="n">image_analyzer</span><span class="p">.</span><span class="nf">analyze</span><span class="p">()</span> <span class="c1"># 分析が成功したかどうかをチェック </span><span class="k">if</span> <span class="n">result</span><span class="p">.</span><span class="n">reason</span> <span class="o">==</span> <span class="n">sdk</span><span class="p">.</span><span class="n">ImageAnalysisResultReason</span><span class="p">.</span><span class="n">ANALYZED</span><span class="p">:</span> <span class="c1"># キャプションがあれば表示 </span> <span class="k">if</span> <span class="n">result</span><span class="p">.</span><span class="n">caption</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Caption:</span><span class="sh">"</span><span class="p">)</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> </span><span class="sh">'</span><span class="s">{}</span><span class="sh">'</span><span class="s">, Confidence {:.4f}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">caption</span><span class="p">.</span><span class="n">content</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">caption</span><span class="p">.</span><span class="n">confidence</span><span class="p">))</span> <span class="c1"># テキストがあれば表示 </span> <span class="k">if</span> <span class="n">result</span><span class="p">.</span><span class="n">text</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Text:</span><span class="sh">"</span><span class="p">)</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">result</span><span class="p">.</span><span class="n">text</span><span class="p">.</span><span class="n">lines</span><span class="p">:</span> <span class="c1"># バウンディングポリゴンの座標を文字列に変換して表示 </span> <span class="n">points_string</span> <span class="o">=</span> <span class="sh">"</span><span class="s">{</span><span class="sh">"</span> <span class="o">+</span> <span class="sh">"</span><span class="s">, </span><span class="sh">"</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="nf">str</span><span class="p">(</span><span class="nf">int</span><span class="p">(</span><span class="n">point</span><span class="p">))</span> <span class="k">for</span> <span class="n">point</span> <span class="ow">in</span> <span class="n">line</span><span class="p">.</span><span class="n">bounding_polygon</span><span class="p">])</span> <span class="o">+</span> <span class="sh">"</span><span class="s">}</span><span class="sh">"</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Line: </span><span class="sh">'</span><span class="s">{}</span><span class="sh">'</span><span class="s">, Bounding polygon {}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">line</span><span class="p">.</span><span class="n">content</span><span class="p">,</span> <span class="n">points_string</span><span class="p">))</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">line</span><span class="p">.</span><span class="n">words</span><span class="p">:</span> <span class="n">points_string</span> <span class="o">=</span> <span class="sh">"</span><span class="s">{</span><span class="sh">"</span> <span class="o">+</span> <span class="sh">"</span><span class="s">, </span><span class="sh">"</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="nf">str</span><span class="p">(</span><span class="nf">int</span><span class="p">(</span><span class="n">point</span><span class="p">))</span> <span class="k">for</span> <span class="n">point</span> <span class="ow">in</span> <span class="n">word</span><span class="p">.</span><span class="n">bounding_polygon</span><span class="p">])</span> <span class="o">+</span> <span class="sh">"</span><span class="s">}</span><span class="sh">"</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Word: </span><span class="sh">'</span><span class="s">{}</span><span class="sh">'</span><span class="s">, Bounding polygon {}, Confidence {:.4f}</span><span class="sh">"</span> <span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">word</span><span class="p">.</span><span class="n">content</span><span class="p">,</span> <span class="n">points_string</span><span class="p">,</span> <span class="n">word</span><span class="p">.</span><span class="n">confidence</span><span class="p">))</span> <span class="k">else</span><span class="p">:</span> <span class="c1"># 分析が失敗した場合はエラー詳細を表示 </span> <span class="n">error_details</span> <span class="o">=</span> <span class="n">sdk</span><span class="p">.</span><span class="n">ImageAnalysisErrorDetails</span><span class="p">.</span><span class="nf">from_result</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Analysis failed.</span><span class="sh">"</span><span class="p">)</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Error reason: {}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">error_details</span><span class="p">.</span><span class="n">reason</span><span class="p">))</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Error code: {}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">error_details</span><span class="p">.</span><span class="n">error_code</span><span class="p">))</span> <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s"> Error message: {}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">error_details</span><span class="p">.</span><span class="n">message</span><span class="p">))</span> |
実行結果は以下の通りです。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
<span class="go"> Caption: 'a person pointing at a screen', Confidence 0.7768 Text: Line: '9:35 AM', Bounding polygon {130, 129, 215, 130, 215, 149, 130, 148} Word: '9:35', Bounding polygon {131, 130, 171, 130, 171, 149, 130, 149}, Confidence 0.9930 Word: 'AM', Bounding polygon {179, 130, 204, 130, 203, 149, 178, 149}, Confidence 0.9980 Line: 'E Conference room 154584354', Bounding polygon {130, 153, 224, 154, 224, 161, 130, 161} Word: 'E', Bounding polygon {131, 154, 135, 154, 135, 161, 131, 161}, Confidence 0.1040 Word: 'Conference', Bounding polygon {142, 154, 174, 154, 173, 161, 141, 161}, Confidence 0.9020 Word: 'room', Bounding polygon {175, 154, 189, 155, 188, 161, 175, 161}, Confidence 0.7960 Word: '154584354', Bounding polygon {192, 155, 224, 154, 223, 162, 191, 161}, Confidence 0.8640 </span><span class="gp"> Line: '#</span>: 555-173-4547<span class="s1">', Bounding polygon {130, 163, 182, 164, 181, 171, 130, 170} </span><span class="gp"> Word: '#</span><span class="s1">:'</span>, Bounding polygon <span class="o">{</span>131, 163, 139, 164, 139, 171, 131, 171<span class="o">}</span>, Confidence 0.0360 <span class="go"> Word: '555-173-4547', Bounding polygon {142, 164, 182, 165, 181, 171, 142, 171}, Confidence 0.5970 Line: 'Town Hall', Bounding polygon {546, 180, 590, 180, 590, 190, 546, 190} Word: 'Town', Bounding polygon {547, 181, 568, 181, 568, 190, 546, 191}, Confidence 0.9810 Word: 'Hall', Bounding polygon {570, 181, 590, 181, 590, 191, 570, 190}, Confidence 0.9910 Line: '9:00 AM - 10:00 AM', Bounding polygon {546, 191, 596, 192, 596, 200, 546, 199} Word: '9:00', Bounding polygon {546, 192, 555, 192, 555, 200, 546, 200}, Confidence 0.0900 Word: 'AM', Bounding polygon {557, 192, 565, 192, 565, 200, 557, 200}, Confidence 0.9910 Word: '-', Bounding polygon {567, 192, 569, 192, 569, 200, 567, 200}, Confidence 0.6910 Word: '10:00', Bounding polygon {570, 192, 585, 193, 584, 200, 570, 200}, Confidence 0.8850 Word: 'AM', Bounding polygon {586, 193, 593, 194, 593, 200, 586, 200}, Confidence 0.9910 Line: 'Aaron Buaion', Bounding polygon {543, 201, 581, 201, 581, 208, 543, 208} Word: 'Aaron', Bounding polygon {545, 202, 560, 202, 559, 208, 544, 208}, Confidence 0.6020 Word: 'Buaion', Bounding polygon {561, 202, 580, 202, 579, 208, 560, 208}, Confidence 0.2910 Line: 'Daily SCRUM', Bounding polygon {537, 259, 575, 260, 575, 266, 537, 265} Word: 'Daily', Bounding polygon {538, 259, 551, 260, 550, 266, 538, 265}, Confidence 0.1750 Word: 'SCRUM', Bounding polygon {552, 260, 570, 260, 570, 266, 551, 266}, Confidence 0.1140 Line: '10:00 AM 11:00 AM', Bounding polygon {536, 266, 590, 266, 590, 272, 536, 272} Word: '10:00', Bounding polygon {539, 267, 553, 267, 552, 273, 538, 272}, Confidence 0.8570 Word: 'AM', Bounding polygon {554, 267, 561, 267, 560, 273, 553, 273}, Confidence 0.9980 Word: '11:00', Bounding polygon {564, 267, 578, 267, 577, 273, 563, 273}, Confidence 0.4790 Word: 'AM', Bounding polygon {579, 267, 586, 267, 585, 273, 578, 273}, Confidence 0.9940 Line: 'Churlette de Crum', Bounding polygon {538, 273, 584, 273, 585, 279, 538, 279} Word: 'Churlette', Bounding polygon {539, 274, 562, 274, 561, 279, 538, 279}, Confidence 0.4640 Word: 'de', Bounding polygon {563, 274, 569, 274, 568, 279, 562, 279}, Confidence 0.8100 Word: 'Crum', Bounding polygon {570, 274, 582, 273, 581, 279, 569, 279}, Confidence 0.8850 Line: 'Quarterly NI Hands', Bounding polygon {538, 295, 588, 295, 588, 301, 538, 302} Word: 'Quarterly', Bounding polygon {540, 296, 562, 296, 562, 302, 539, 302}, Confidence 0.5230 Word: 'NI', Bounding polygon {563, 296, 570, 296, 570, 302, 563, 302}, Confidence 0.3030 Word: 'Hands', Bounding polygon {572, 296, 588, 296, 588, 302, 571, 302}, Confidence 0.6130 Line: '11.00 AM-12:00 PM', Bounding polygon {536, 304, 588, 303, 588, 309, 536, 310} Word: '11.00', Bounding polygon {538, 304, 552, 304, 552, 310, 538, 310}, Confidence 0.6180 Word: 'AM-12:00', Bounding polygon {554, 304, 578, 304, 577, 310, 553, 310}, Confidence 0.2700 Word: 'PM', Bounding polygon {579, 304, 586, 304, 586, 309, 578, 310}, Confidence 0.6620 Line: 'Bebek Shaman', Bounding polygon {538, 310, 577, 310, 577, 316, 538, 316} Word: 'Bebek', Bounding polygon {539, 310, 554, 310, 554, 317, 539, 316}, Confidence 0.6110 Word: 'Shaman', Bounding polygon {555, 310, 576, 311, 576, 317, 555, 317}, Confidence 0.6050 Line: 'Weekly stand up', Bounding polygon {537, 332, 582, 333, 582, 339, 537, 338} Word: 'Weekly', Bounding polygon {538, 332, 557, 333, 556, 339, 538, 338}, Confidence 0.6060 Word: 'stand', Bounding polygon {558, 333, 572, 334, 571, 340, 557, 339}, Confidence 0.4890 Word: 'up', Bounding polygon {574, 334, 580, 334, 580, 340, 573, 340}, Confidence 0.8150 Line: '12:00 PM-1:00 PM', Bounding polygon {537, 340, 583, 340, 583, 347, 536, 346} Word: '12:00', Bounding polygon {539, 341, 553, 341, 552, 347, 538, 347}, Confidence 0.8260 Word: 'PM-1:00', Bounding polygon {554, 341, 575, 341, 574, 347, 553, 347}, Confidence 0.2090 Word: 'PM', Bounding polygon {576, 341, 583, 341, 582, 347, 575, 347}, Confidence 0.0390 Line: 'Delle Marckre', Bounding polygon {538, 347, 582, 347, 582, 352, 538, 353} Word: 'Delle', Bounding polygon {540, 348, 559, 347, 558, 353, 539, 353}, Confidence 0.5800 Word: 'Marckre', Bounding polygon {560, 347, 582, 348, 582, 353, 559, 353}, Confidence 0.2750 Line: 'Product review', Bounding polygon {538, 370, 577, 370, 577, 376, 538, 375} Word: 'Product', Bounding polygon {539, 370, 559, 371, 558, 376, 539, 376}, Confidence 0.6150 Word: 'review', Bounding polygon {560, 371, 576, 371, 575, 376, 559, 376}, Confidence 0.0400 </span> |
Caption部分では、画像のキャプションが生成されていることが分かります。Text部分は画像にあるテキストが行単位と単語単位で検出されていることが分かります。また、行や単語を囲む四角形の座標を示すBounding polygonや、行や単語のテキストの識別に関する確信度のスコアを示すConfidenceも含まれています。
おわりに
Image Analysis クライアント SDK を使ってリモート画像を解析してみました。想像以上に文字が正しく抽出されていたので驚きました。他の画像分析AIと比較してみたいです。
最後までお読みいただき、ありがとうございました!
以下のXでも情報を発信しています!
参考文献