When run the cobalt, I can see the useragent from the log:
[0101/000230:INFO:application.cc(690)] User Agent: Mozilla/5.0 (DirectFB; Linux x86_64) Cobalt/4.13031-qa (unlike Gecko) Starboard/1
So where does it come from? Is there a way to change it?
The default useragent is set in the following file, you can have a check:
https://cobalt.googlesource.com/cobalt/+/e9b4b99dab6e774b8b6e63add74c352cc5dd395a/src/cobalt/network/user_agent_string_factory.cc
std::string UserAgentStringFactory::CreateUserAgentString() {
// Cobalt's user agent contains the following sections:
// Mozilla/5.0 (ChromiumStylePlatform)
// Cobalt/Version.BuildNumber-BuildConfiguration (unlike Gecko)
// Starboard/APIVersion,
// Device/FirmwareVersion (Brand, Model, ConnectionType)
// Mozilla/5.0 (ChromiumStylePlatform)
std::string user_agent =
base::StringPrintf("Mozilla/5.0 (%s)", CreatePlatformString().c_str());
// Cobalt/Version.BuildNumber-BuildConfiguration (unlike Gecko)
base::StringAppendF(&user_agent, " Cobalt/%s.%s-%s (unlike Gecko)",
COBALT_VERSION, COBALT_BUILD_VERSION_NUMBER,
kBuildConfiguration);
// Starboard/APIVersion,
if (!starboard_version_.empty()) {
base::StringAppendF(&user_agent, " %s", starboard_version_.c_str());
}
// Device/FirmwareVersion (Brand, Model, ConnectionType)
if (youtube_tv_info_) {
base::StringAppendF(
&user_agent, ", %s_%s_%s/%s (%s, %s, %s)",
youtube_tv_info_->network_operator.value_or("").c_str(),
CreateDeviceTypeString().c_str(),
youtube_tv_info_->chipset_model_number.value_or("").c_str(),
youtube_tv_info_->firmware_version.value_or("").c_str(),
youtube_tv_info_->brand.c_str(), youtube_tv_info_->model.c_str(),
CreateConnectionTypeString().c_str());
}
return user_agent;
}
If your SbSystemGetDeviceType() is true for SystemDeviceTypeIsTv() (in file user_agent_string_factory_starboard.cc), you can customize the UA by implementing some fields of SbSystemGetProperty() + some SbSystemGet() functions.
This is a typical example:
Mozilla/5.0 (1) Cobalt/11.119147-gold (unlike Gecko) Starboard/8, 2_8_6/5 (3, 4, 7)
where,
kSbSystemPropertyPlatformName
kSbSystemPropertyNetworkOperatorName
kSbSystemPropertyManufacturerName
kSbSystemPropertyModelName
kSbSystemPropertyFirmwareVersion
kSbSystemPropertyChipsetModelNumber
SbSystemGetConnectionType()
SbSystemGetDeviceType()
Related
I'm sending hits to GA Measurement Protocol, and some of them do not make it to the GA. I've noticed that all of them have one thing in common: the user-agent is Firefox, only varying version and device. Some examples:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Mozilla/5.0 (Android 10; Mobile; rv:103.0) Gecko/103.0 Firefox/103.0
GA validator is OK with those examples when checking them through the debug mode like this:
https://www.google-analytics.com/debug/collect?v=1&tid=UA-XXXXXXXX-1&t=event&ec=Ecommerce&ea=purchase&pa=purchase&cid=1234567890.1234567890&ni=1&ti=184242&tr=1060&uip=X.X.X.X&ua=Mozilla%2F5.0+%28Windows+NT+10.0%3B+Win64%3B+x64%3B+rv%3A103.0%29+Gecko%2F20100101+Firefox%2F103.0&pr1id=test_1&pr1pr=530&pr1qt=1&pr1ps=1
I get this response:
{
"hitParsingResult": [ {
"valid": true,
"parserMessage": [ ],
"hit": "/debug/collect?v=1..."
} ],
"parserMessage": [ {
"messageType": "INFO",
"description": "Found 1 hit in the request."
} ]
}
BUT in the production settings GA responses with 400 bad request error to the same requests without providing any details: "Your client has issued a malformed or illegal request. That’s all we know.".
So what might be wrong with Firefox UA?
UPD: I've managed to make this work by unsetting the 'User-Agent' header in case it contains 'Firefox' - and the corresponding 'ua' parameter in the payload gets accepted then.
if (strpos($requestHeaders['User-Agent'], 'Firefox') !== false) {
unset($requestHeaders['User-Agent']);
}
But it's still unclear what was wrong with such headers in the first place.
I am running a scraping bot in headless mode. As you know it contains headless string in useragent when it's running in headless mode. To avoid that issue, I changed useragent. And the website detect this fake useragent and block scraping bot. How can I prevent this detection?
I am using selenium chromedriver.
Please add those options
# windows_useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
# linux_useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--no-sandbox")
options.add_argument("user-agent=#{linux_useragent}")
options.add_argument("--disable-web-security")
options.add_argument("--disable-xss-auditor")
options.add_option("excludeSwitches", ["enable-automation", "load-extension"])
navigator.platform and navigator.userAgent should be matched.
If userAgent is for windows, then navigator.platform should be "Win32"
If userAgent is for linux, then navigator.platform should be "Linux x86_64"
You can set like that
platform = {
windows: "Win32",
linux: "Linux x86_64"
}
driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", {
"source": "
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
}),
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
}),
Object.defineProperty(navigator, 'platform', {
get: () => \"#{platform[:linux]}\"
})"
})
and of course you need to set navigator.webdriver to undefined
I need to scrape a webpage the link to which is here
In this webpage there is a Cross Reference section that I want to scrape But when I use the python requests to collect the content of the page by below code:
url = 'https://www.arrow.com/en/products/lmk107bbj475mklt/taiyo-yuden'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
The resultant content does not have that cross reference part maybe bcz its not loaded.I can scrape the rest of the html content but not the cross reference part. Now when I did the same thing with selenium it worked fine which means selenium is able to find this element after its loaded.
Can Someone guide me how should I be able to get this done using python requests and beautifulsoup instead of selenium?
The data is loaded through Javascript, but you can extract it with requests, BeautifulSoup and json module:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.arrow.com/en/products/lmk107bbj475mklt/taiyo-yuden'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'
}
soup = BeautifulSoup(requests.get(url, headers=headers).text, 'lxml')
t = soup.select_one('#arrow-state').text
t = t.replace('&q;', '"').replace('&g;', ">").replace('&l;', "<").replace('&a;', "&")
data = json.loads( t )
d = None
for item in data['jss']['sitecore']['route']['placeholders']['arrow-main']:
if item['componentName'] == 'PdpWrapper':
d = item
break
if d:
cross_reverence_product_tiles = d['placeholders']['product-details'][0]['fields']['crossReferenceProductTilesCollection']['crossReverenceProductTiles']['productTiles']
print(json.dumps(cross_reverence_product_tiles, indent=4))
Prints:
[
{
"partId": "16571604",
"partNumber": "CGB3B1X5R1A475M055AC",
"productDetailUrl": "/en/products/cgb3b1x5r1a475m055ac/tdk",
"productDetailShareUrl": "/en/products/cgb3b1x5r1a475m055ac/tdk",
"productImage": "https://static5.arrow.com/pdfs/2017/4/18/7/26/14/813/tdk_/manual/010101_lowprofile_pi0402.jpg",
"manufacturerName": "TDK",
"productLineTitle": "Capacitor Ceramic Multilayer",
"productDescription": "Cap Ceramic 4.7uF 10V X5R 20% Pad SMD 0603 85\u00b0C T/R",
"datasheetUrl": "",
"lowestPrice": 0.0645,
"lowestPriceFormatted": "$0.0645",
"highestPrice": 0.3133,
"highestPriceFormatted": "$0.3133",
"stockFormatted": "1,875",
"stock": 1875,
"attributes": [],
"buyingOptionType": "AddToCart",
"numberOfAttributesToShow": 1,
"rrClickTrackingUrl": null,
"pricingDataPopulated": true,
"sourcePartId": "V72:2272_06586404",
"sourceCode": "ACNA",
"packagingType": "Cut Strip",
"unitOfMeasure": "",
"isDiscontinued": false,
"productTileHint": null,
"tileSize": 1,
"tileType": "1x1",
"suplementaryClasses": "u-height"
},
...and so on.
Selenium alone will be enough to scrape the Cross References section inducing WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.WideSidebarProductList-list h4")))])
Using XPATH:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[#class='WideSidebarProductList-list']//h4")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
['CGB3B1X5R1A475M055AC', 'CL10A475MP8NNNC', 'GRM185R61A475ME11D', 'C0603C475M8PACTU']
I have the code:
ClickableTextCell imageCell = new ClickableTextCell() {
#Override
public void render(Context context, SafeHtml data, SafeHtmlBuilder sb) {
if (data != null) {
String imagePath = "contact.jpg";
//sb.appendEscaped(imagePath);
sb.appendHtmlConstant("<img width=\"20\" src=\"" + imagePath + "\">");
}
}
};
Column<List<String>,String> imageColumn = new Column<List<String>,String>(imageCell) {
#Override
public String getValue(List<String> object) {
return "";
}
};
imageColumn.setFieldUpdater(new FieldUpdater<List<String>, String>() {
#Override
public void update(int index, List<String> object, String value) {
//Window.alert("You clicked " + object.get(index));
}
});
table.addColumn(imageColumn, columnName);
But when running in Eclipse it doesn't show an image. The web server got this error:
Request headers
Host: 127.0.0.1:8888
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://127.0.0.1:8888/abc.html?gwt.codesvr=127.0.0.1:9997
Cookie: MYCOOKIE=htzjk7g2pva9; JSESSIONID=1xjqdxl3kuuxw; PRODUCTSERVICECOOKIE=1xjqdxl3kuuxw
Connection: keep-alive
Response headers
Set-Cookie: PRODUCTSERVICECOOKIE=1xjqdxl3kuuxw;Path=/
Content-Length: 1397
Content-Type: text/html; charset=iso-8859-1
[WARN] 404 - GET /contact.jpg (127.0.0.1) 1397 bytes
Request headers
Host: 127.0.0.1:8888
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Also, the second question is where to put the Image so that when we deploy our final web app we can manage the image easily. I'm actually confused with the folder structures in Eclipse.
In MyProject, I have:
war\myproject
war\WEB-INF
war\MyProjet.css
war\MyProjet.html
But in MyProject I also have folder src which contains all java file
src\myproject\client\ (in client folder I put "contact.jpg"
src\myproject\server
I am not sure if I put the contact.jpg in the correct folder. Also, when we deploy our webapp, will Eclipse migrate all the image file into this folder:-->
war\myproject
If you place you image files in rc\myproject\public\img\contact.jpg folder then GWT will copy the files into your war\gwtmodulename\img\contact.jpg folder during compilation and you would need to code using GWT.getModuleBaseURL()+"img\"+"contact.jpg" as that gives you the location of image w.r.t your http:\\domain\war\gwtmodulename .
If you place you image files in war\img folder then you would need to code using GWT.getHostPageBaseURL()+"img\"+"contact.jpg" as that gives you the location of your image w.r.t yout http:\\domain\war\ .
Either:
Put your image in a public path. Static resources (such as images) in public path are automatically copied in the compiler output directory. To reference them in client code, you have to make them relative to GWT.getModuleBaseURL().
or
Use an ImageResource within a ClientBundle and use either:
imageResource.getSafeUri().asString() chain, to obtain its resolved (safe) URI; or
AbstractImagePrototype.create(imageResource).getHTML() for the whole <img> snippet.
I've excluded the new Image(imageResource).getUrl() option as it is useless for your use case (no need for an Image widget).
Also if you want to decorate a cell using an icon, take a look at IconCellDecorator, maybe somehow useful.
I want to run tests that change the user-agent in the http request sent from the browser (like the FF add-on, user agent switcher does). I saw you can do it by playing with the FF profile (http://seleniumhq.org/docs/09_webdriver.html).
Is there a way to do it within a test? Something like the function addCustomRequestHeader() that sets a header rather than adding it
You could insert a function like this to change the user agent on the fly before you make your http request:
function changeuserAgent() {
var altuserAgentGetter = function () {
return "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2) Gecko/20100115 <choose your string>";
};
if (Object.defineProperty) {
Object.defineProperty(navigator, "userAgent", {
get: altuserAgentGetter
});
}
else if (Object.prototype.__defineGetter__) {
navigator.__defineGetter__("userAgent", altuserAgentGetter);
}
}
If you're using the Selenium 2 Web Driver in Java, you can create a Firefox profile and set the agent string as a preference in the profile. Then use the profile to create the WebDriver object:
FirefoxProfile profile = new FirefoxProfile();
profile.setPreference("general.useragent.override", "Mozilla/5.0 (iPad; U; CPU OS 4_3 like Mac OS X; de-de) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8F191 Safari/6533.18.5");
WebDriver driver = new FirefoxDriver(profile);
For slightly more information and source code examples, see the Selenium Web Driver documentation for Firefox Driver at http://seleniumhq.org/docs/03_webdriver.html#firefox-driver.